Importance of randomness in machine learning

Random numbers are very important part of machine learning, they are the staring point for the predicting output in presence of input data. In order to understand the working of coefficients correctly you must understand importance of randomness and how to generate it.

After reading this article you will know,

  •  Importance of randomness.
  • Its use in machine learning.
  • how to generate it.

Random number are more robust against any kind of biases which make them very useful in cases where we want the whole system to be robust and will not contain any biases, though it is extremely difficult to eliminate human bias from algorithm but by using random numbers it makes it less prone to data bias.

If you are unaware of what is a bias, it can be thought of as a distraction or a force which tends to deviate the algorithm from the actual goal or sometimes it forces the algorithm to attain the goal but only from certain direction rather than allowing the algorithm to cover all possible paths i.e increase the power of algorithm in generalization.

How does randomness help ?

Consider a situation where we initialize all the coefficients of a neural network to zero rather than initializing using a random number in this case the network will itself behave as a linear model instead of a non-linear one. This is just one of many multiple scenarios where random numbers plays a great importance.

one other use case of randomness is, when we shuffle the train and test data with help of a random shuffler. This is extremely important to shuffle the data as data arrange in some order on the basis of some feature makes the algorithm to perform poor.

How to inject randomness in our model ?

Injecting randomness in model weather in form of random initialization of weights or random shuffle is done by using Pseudorandom number generator.

A random generator can be thought of as a function that generates numbers, generating true random numbers is beyond the scope of this article and neither used in machine learning, instead we use pseudorandom number, random number which generated using a deterministic process.

Randomness can also be generated by sampling a distribution more on this later.

Pseudorandom number generator are often functions which you call by either providing a seed or value to use it for start generating sequence, and if the seed value is not provided by the function caller then it takes  time or data  as its current seed value. Though the random number is generated using a deterministic approach and  in sequence but it does not mean that predicting sequence is deterministic.

Generating random number in python

There is a module name random in python which is used to generate random numbers see e.g  below,

import random
print("A random integer between 1-100 ",random.randint(0,101))

This code will generate a random integer  within the range 1-100.

If you are in machine learning then you must have used one of the most popular numerical computation library known as numpy. It contains a module random which is responsible for generating random numbers, as you must be thinking why to use numpy when python itself has a random number generator then answer to your question is numpy is extremely tuned up library for almost all numerical computation weather it matrix multiplication or broadcasting operations etc. using numpy for numerical has its own benefits. Generating random number using numpy is a straight forward process.

import numpy
print('A random integer between 1-100',np.random.randint(0,101))

generating a random integer between 0-100.

import numpy
print('A random integer from a normal distribution with mean 0 and std dev 1',np.random.normal(loc=0,scale=1,size=1))

generating a random integer from a normal distribution with mean 0 and std-dev 1.

Random numbers are extremely important while working with machine learning problems as they increase the generalization ability of the model. We will be covering many such interesting topics on machine learning so stay tuned.

Previous Post

What is NoSQL database

Next Post

Sophisticated CSS User Series Part I: Basic Conceptual Introduction With CSS & Its Benefits

Related Posts