A Gentle Introduction to Probability Distributions – Part 1


Let’s talk about probability first

Before  jumping directly to distributions , first let’s refresh some basics of probability . In Simple terms a Probability  score give us the likelihood that an event  will occur. So by that definition we can easily write mathematical formulation of Probability

p(x) = \frac{Number \ of\ that\ event}{Total \ Events}

there are many deep concepts of probability like Total Probability theorem , Conditional probability etc , but for the understanding this article the above information is sufficient.

There is another concept ,The Random variable whose possible values are outcomes of a random phenomenon and to say more precisely the random variable can be be use to define the outcome of the probability of certain event.

Let’s understand with the help of a example . Let X be a random variable which is used to denote the outcome of a coin toss (“the experiment”), then  X can take the value 0.5 for X = heads, and 0.5 for X = tails.
This is a situation of Discrete probability distribution. 

Probability distributions are of two types :

  1. Continuous Probability distribution
  2. Discrete Probability distribution

Note :- We will cover Continuous Probability distribution in this article discrete in next article.

Probability Distribution describe the behavior of  the random variable i.e set of all possible outcome of the random phenomenon .In Continuous Distribution we don’t have a discrete value but rather a range in which a value of random variable occur .

Some of the definitions which must be understood for further discussion: –

PDF : Probability Density function of a Continuous random variable is used to specify the probability of the random variable  falling within a particular range of values.

PMF: is a function of calculating the  probability of a discrete random variable.

CDF : Cumulative Density function of a Continuous random variable can be said as cumulative addition of pdf at each point.

Mean: Mean of a collection describes central tendency of that collection. or simply  the sum of a collection of numbers divided by the number of numbers in the collection.

Variance : variance tells us about the diversity of the items in a collection or simply square of Standard deviation.

Skewness: is the measure of asymmetry of a probability distribution.

Kurtosis: is the measure of “Peakedness” or “Tailness” of distribution.

In this article we will cover some of the most popular continuous distributions,


Gaussian Distribution

If you have taken any statistic class chances are you are already familiar with this distribution . This is one of the most fundamental distribution in statistics and many random phenomenon in nature follows this distribution . Before going into any further details ,let’s first understand it graphically

Source : Wikipedia

The distribution is parameterized by \mu and  \varrho^{2} and written as  N(\mu,\varrho^{2}) where \mu is mean of the distribution and  \varrho^{2} is width of the distribution .As we can clearly see the Distribution gets taller at mean , this is because most of the values lies close to mean and very less at extremes.

 p(x) = \frac{1}{{\sqrt{2\pi \varrho^{{2}}}}} . e^{ \frac{ (x-\mu)^{{2}}}{{2 \varrho^{{2}}}}}}}

Properties of Gaussian Distribution

Mean :\mu
variance :  \varrho^{2}
skewnesss : 0
kurtosis: 0

Log Normal Distribution

The Shape of this distribution is similar to Normal distribution and the similarity ends here . The information described by a Log normal distribution is completely different . Let’s look at the Pdf of the distribution graphically before moving ahead.

Source : Wikipedia

If you look carefully at the plot then you can clearly see most of the values lies at the start of the distribution and very few at the end ,this is a very important property shown by log-normal as many random phenomenon tends to follow this property .Log-normal is by far most used distribution in internet related applications data.

Pdf of Log-Normal

 p(x) = \frac{1}{{x\sqrt{2\pi \varrho^{{2}}}}} . e^{ \frac{ (ln(x)-\mu)^{{2}}}{{2 \varrho^{{2}}}}}}}

Mean :  e^{(\mu +\frac{ \sigma^{{2}}}{{2}})}
Variance :  e^{(2\mu +\sigma^{2})} .[ e^{(\sigma^{2})}-1]
Skewness : ( e^{(\sigma^{2})}+2) . \sqrt{[ e^{(\sigma^{2})}-1]}
Kurtosis :  ( e^{(4\sigma^{2})}) + 2. ( e^{(3\sigma^{2})}) + 3.( e^{(2\sigma^{2})}) -6

Popular Applications of Log-Normal Distribution

  1. The length of comments posted in Internet discussion forums follows a log-normal distribution.
  2. Users’ dwell time on online articles (jokes, news etc.) follows a log-normal distribution.

Power law distribution

Before jumping to Power law distribution just a quick recap of Power law ” describes relationship between two quantities, where a relative change in one quantity results in a proportional relative change”. Power Law distribution as the name suggests follows power law . The graphical representation of  Pdf of Power law is given below.

Source : Wikipedia

but remember this is not the exact plot of power law distribution ,but  the most frequently used graphical methods of identifying power-law probability distributions using random samples are Q-Q plots.Pdf of power law is givien by following equation.

p(x) =( \frac{\alpha -1}{{x_{min}}}). ( \frac{x}{{x_{min}}})^{{-\alpha}}

Although the mean and variance of power law distribution exists ,but they are infinite in nature and explaining is beyond the scope of this article.

Application of Power Law Distribution

  1. Income of people are power law distributed.
  2. Size of Organizations.
  3. Magnitude of earthquake.

These are one of the most popular distributions used in the field Machine Learning , but to be frank understanding some of these distributions requires a bit of math study . We will be covering discrete Distributions in next article so stay tune .

“Happy Machine Learning”

About the author


I write blogs about Machine Learning and data science

By abhinavsinghml

Most common tags

%d bloggers like this: