This is a replica of a self portrait when I was trying to understand Root Mean Squared Error

I just remember sitting in data science boot camp and it being drilled into our heads to check the Mean Squared Error (MSE), the Root Mean Squared Error (RMSE), and the R squared (R2) whenever we were handling regression modeling. When the subject was first introduced there was extensive explanation of the material. A couple months after that, I have to be honest, I would sometimes just calculate these numbers trying to check accuracy and not have the deepest level of understanding about what each term truly means. Pun intended.

After spending time creating even a simple regression model, it…

Creating visualizations for you data is essential. In another post I did I take an in depth look at EDA according to the National Institute of Standards of Technology which can be found here.

After talking about the importance of EDA, it become a syntactical issue. In this article I plan walking through different techniques and tricks for customizing plots in Python.

As you read, keep referring back to this table I created for myself in an intro to Stats class. Actually my professor created it as a list, but I made a table out of it and keep it…

In a recent project on Broadway Grosses I used machine learning to predict when a Broadway show would close based on features like previous week’s grosses. The idea is that when we look at this graph we see a visible decline in gross with the red marking the end of the production’s life. We did this in a data set that had 5 years worth of Broadway grosses and marked the last 6 weeks of every show that had closed with a 1 and everything else with a 0. Making this a binary classification problem.

Support Vector Machine

Support Vector Machine was one…


The confusion matrix is a quintessential part of our work as data scientists. Our bread and butter; it is a form of visualizing the performance of our model. Tackling this remains relatively simple for two classes, but as our matrix balloons calculations can become muddy.

Technical terminology associated and reviewed in this article includes true positives, true negatives, false positives, and false negatives which in turn yields the true and false positive rates as well as true and false negative rates. We can also evaluate metrics like accuracy, precision, recall and F1 scores.

We are going to hit three main…

Exploratory Data Analysis, or EDA for short, critically establishes the initial relationships between variables and features. After the data has been cleaned thoroughly, it is the first proper insight into what the data will tell us. It is also how we give closure to a project as well. Many times it provides a small arsenal of visual representation used to strip the data of its pretentious jargon, friendly to stakeholders. The outline of this blog post is supposed to give an outline of:

  1. History of EDA.
  2. Understand the purpose of EDA.
  3. Give examples of EDA extracted from actual data science…

Andrew Ozbun

Just a normally distributed millennial with a left skew.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store