EDA according to the National Institute of Standards and Technology.

  1. History of EDA.
  2. Understand the purpose of EDA.
  3. Give examples of EDA extracted from actual data science projects.
  4. Address labelling and tidy graphs.

History of EDA

Understanding the purpose and benefits of EDA

  1. Uncover underlying structure.
  2. Extract important variables.
  3. Detect outliers and anomalies.
  4. Test underlying assumptions.
  5. Develop parsimonious models.
  6. Determine optimal factor settings.
  7. Maximize insight into a data set.

Examples of EDA in use.

Uncover underlying structure.

This decomposition of tis stock market analysis shows the seasonality which can ultimately be removed.

Extract important variables.

Detect outliers and anomalies.

This set of histograms came from an initial .hist() call that allowed us to look at the distribution of each column.

Test underlying assumptions.

Develop parsimonious models.

Determine optimal factor settings.

Maximize insight into a data set.

Addressing labels, color, and general tidiness.

--

--

--

Just a normally distributed millennial with a left skew. https://github.com/ozbunae

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why is it so hard to sort by price?

How I Discovered the Power of Effective Communication in Data Science

The Game Theory of ‘Bullshit’

Does the Scaling of Data Matters in Prediction?

Strategies of Spark Join

How To Turn Lousy Healthcare Data Into Actionable Patient Insights

Data Mining Team Proof of Work, November 2021

How subcontractors can use data to improve hit-ratios

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Andrew Ozbun

Andrew Ozbun

Just a normally distributed millennial with a left skew. https://github.com/ozbunae

More from Medium

Exploratory Data Analysis

Python VS R (Which is Better?)

UnderSampling the dataset using R

Statistics concepts for data analysis