Introduction to Central Limit theorem


One of the most  beautiful  theorem in whole of statistics .

In this article we will understand Central Limit theorem.

The Theorem States “given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population”. This theorem also states “that the  mean of samples will follow Gaussian  distribution”.

Now Let’s understand step by step as “What is Central Limit Theorem ?” as per above given statement.

Before diving into discussion you need to understand following terms.

Population: In statistics, population refers to the total set of observations that can be made. (in most cases finite uncountable ) .
Sample: A subset of population which is often finite and countable.

Consider a problem of finding the mean of all the people in the universe.In this case it would be impossible to calculate the mean directly .

Here comes the power of CLT.

Instead of finding mean of whole population , find the means of samples of mean in following manner.

Take ‘m’ random samples of size ‘n’ and find the mean of each samples “\mu_{{s}}“.After this “take the mean of all the means of samples which we will call final mean \mu_{{f}}. if you have any doubts read the statement once again carefully.

Now the beauty of the theorem is the mean of all the means of samples \mu_{{f}} will be \approx mean of population \mu and this is Central Limit theorem .

The additional benefit of CLT is , the mean of samples will follow Gaussian distribution with parameters \mu,\frac{\sigma^{2}}{n} \ as \ n\rightarrow inf , where \mu is mean of population and \sigma^{2} is variance of population.

This makes Central Limit theorem very powerful in situations in which collecting all the points in population is impossible.

Factors on which CLT depends,

  1. Size of sample i.e ‘n’ Larger the n closer will be \mu_{{f}} \approx   \mu.
  2. No of samples i.e  ‘m’  Larger the m closer will be \mu_{{f}} \approx   \mu .

In spite of this much power Central Limit theorem , it has it limitations too,

  1. CLT is only applicable to population which follows a certain distribution with finite mean and variance , i.e CLT is cannot be applied to a distribution with infinite mean or variance e.g pareto distribution.
  2. CLT highly depends upon the size of ‘m’ , ‘n’.

Central Limit theorem is one of the most beautiful theorem i have ever learnt , it is so simple yet so powerful when it comes to find insights in given data.We will be covering many such interesting topics in Machine Learning so stay connected.

<Happy Machine Learning>


About the author


I write blogs about Machine Learning and data science

By abhinavsinghml

Most common tags

%d bloggers like this: