Plots for EDA


A plot is a graphical technique for representing a data set.Graphs are use to represent relation between variables and to display the information in intuitive manner .Different types of graphs represent different types of relations between variables.

In this article we will cover following very popular plots with their interpretation and python code snippets:-

  1. Line plots
  2. Scatter Plots
  3. Pair Plots
  4. Histogram
  5. box plots
  6. violin plots

Note : Some plots implementation is library dependent .Covering all mathematical aspects of all these plots is beyond the scope of this article.

Line Plot:

This is one the simplest plot to understand and yet very powerful when it comes to its interpretation. It is a plot with two variables X and Y and a line is drawn connecting the given coordinates
<x_{{i}},y_{{i}}> .
Python Implementation


Scatter plots:

A scatter plot also called scatterplot, scatter graph ,scatter chart, is a type of plot which uses X and Y coordinates on a 2d plane to display points.It is a well known plotting technique to study the interdependence of one  variable over other.

Python Implemantation

Pair Plots:

This is a extension to pair plots and histogram and its pdf representation.This is primarily used when we want to study the behavior of all variables with every other variables when the data is more than 2-Dimension.

Python implementation


Histogram is one of the best way for intensity representation.It is one of the accurate way of representing the distribution of the data more precisely probability distribution of data .The plotting of the histogram depends upon ‘bins’ i.e dividing the entire range into series of interval then based upon the number of values present inside a range of a bin the height of the bar of that bin is determined.

Python Implementation

Box Plot:

These are fairly complex types of plots ,these are used to represent data according to their quartiles.They also have lines extending vertically from the boxes indicating the variance outside the upper and lower quartile .The Space between different parts of the box indicate the variance(Spread) of the data.

Python Implementation

Violin Plots

It is a extension of box plots in this the kernel density plot is also plotted with box plots .A violin plot is better  than a plain box plot as  a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data.

Python Implementation

EDA is a must step to be performed before building any machine Learning Algorithm.All the  plots are very useful In EDA(Exploratory data analysis) of data before designing any machine Learning algorithm.

About the author


I write blogs about Machine Learning and data science

By abhinavsinghml

Most common tags

%d bloggers like this: