Fundamentals Of Exploratory Data Analysis
Working on an analysis project can be super time-consuming when you have no clue how you should start analyzing. It’s easy to feel overwhelmed by the vast amount of data and the uncertainty of where to begin. This is a perfect guide to begin your own exploratory data analysis.
In this article you will learn all about exploratory data analysis, from the types of exploratory data analysis to the steps, methods, and more.
After learning all of these fundamentals, you will have a good idea of how to start conducting your analysis project.
What exactly is exploratory data analysis?
If you look up the internet, you will see that many define this as an analysis approach that helps you find patterns and trends in your data.
How I like to describe this is: This is the step where you can think out of the box and play around with your data. Here, you will be able to get a good idea of the overview of your data set without the need to do a deep dive. For instance, you will know whether your data set is skewed towards a demographic group, a social media platform, or even a particular product.
Types of Exploratory Data Analysis
There are 3 main types of exploratory data analysis.
1. Univariate
Judging from the name, this is where you only look at 1 variable (aka column/dimension) at any single time. This helps you to understand more about the nature of the variable (eg categorical/continuous), whether the data is skewed, and also to identify any outliers. This is the easiest type of exploratory data analysis.
2. Bivariate
This is where you look at 2 variables together. This enables you to understand the relationship between variables A and B. For instance, whether they are independent or correlated to one another.
Common examples are:
- Comparing time with variable X to see whether the variable is sensitive over time.
- Comparing 2 categorical variables to see whether they correlate.
3. Multivariate
The last type is multivariate where you look at 3 or more variables at a time. You can view this as a more advanced level of bivariate, just that this time around, you are looking at more variables and the relationships between them become more complicated. You will need the ability to really deep dive to identify patterns and outliers.
Exploratory Data Analysis Methods
You can explore your dataset with 2 main methods.
1. Graphical Method
The first one is to explore data through visual representations such as graphs and charts. Common examples are line graphs, box plot, pie charts, wordcloud, bar charts, and scatterplots.
2. Non-graphical Method
The second method is through statistical techniques. Meaning using non-visuals. Examples of metrics used are mean, median, mode, max, min, standard deviation, average, RMS, Skewness, and Kurtosis.
Learn more on exploratory data analysis.
Exploratory Data Analysis Steps
These are the 3 main steps to explore data sets.
Step 1: Understand the meaning of each column + Univariate
The first step to start exploring data is to understand what your variables are. These are indicated by the column header names.
You need to have a good understanding of the following:
- What the column is and its definition. Make sure to have no repeated column names.
- Is the variable categorical or continuous? As a rule of thumb, you can do calculations for continuous variables. For instance, product ID vs quantity. Both are numbers but product ID is categorical/qualitative while quantity is continuous/quantitative.
Once you have got these 2 sorted out, you can then perform a quick univariate analysis for each variable to check for skewed data and outliers.
Step 2: Find columns that have a relationship to each other + Bivariate
Next, you now identify columns you think may be related to each other and conduct bivariate analysis for every combination you have. Some tools allow you to do this fast.
Step 3: Deep dive into interesting findings
After steps 1 and 2, you should have a good feeling of your data set and this is when you can go crazy and test out whatever you want.
Usually, at this stage, you will already have hypotheses that you want to test out or questions that you want to verify. This is where you make those deep dives into the data to check those theories you have.
While this step is super fun, do set an allocated time for this so you don’t go out of hand and still hit the deadline of the analysis project.
Tools
Exploratory Data Analysis (EDA) can be performed using a variety of tools. Python with libraries like Pandas, Matplotlib, and Seaborn is popular for data manipulation and visualization. R offers powerful statistical analysis through packages like ggplot2 and dplyr. Excel is useful for smaller datasets and basic visualizations, while Tableau and Power BI provide interactive and user-friendly dashboards for exploring data. For structured data, SQL is essential for querying and preparing data for analysis.
Conclusion
This article is designed to give you a strong foundation for conducting your own exploratory data analysis. As a crucial skill for every data analyst, I believe this guide will be incredibly valuable for those looking for a step-by-step approach to performing EDA.