Probability Theory in ML

Random Experiment:

In machine learning, we mostly deal with uncertain events. Random Experiment is an experiment in which the outcome is not known with certainty. That is, the output of a random experiment cannot be predicted with certainty.

Sample Space:

Sample space is the universal set that consists of all possible outcomes of an experiment. Sample space is usually represented using the letter “S” and individual outcomes are called the elementary events. The sample space can be finite or infinite. Few random experiments and their sample space are discussed below:

Experiment: Outcome of a college application

Sample Space = S = {admitted, not admitted}

Experiment: Predicting customer churn at an individual customer level

Sample Space = S = {Churn, No Churn}

Experiment: Television Rating Point (TRP) for a television program

Sample Space = S = {X|X belongs to R, 0<=X<=100}, that is X is a real number that can take any value between 0 and 100%


Event (E) is a subset of a sample space and probability is usually calculated with respect to an event. Examples of events include:

  • 1. Number of cancellation of orders placed at an E-commerce portal site exceeding 10%.
  • 2. The number of fraudulent credit card transactions exceeding 1%.
  • 3. The life of a capital equipment being less than one year.
  • 4. Number of warranty claims less than 10 for a vehicle manufacturer with a fleet of 2000 vehicles under warranty.

Random Variables

Random Variables play an important role in describing, measuring, and analyzing uncertain events such as customer churn, employee attrition, demand for a product, and so on. A random variable is a function that maps every outcome in the sample space to a real number. A random variable can be classified as discrete or continuous depending on the values it can take.

If the random variable X can assume only a finite or countably infinite set of values, then it is called a discrete random variable. Examples of discrete random variables are as follows:

  • 1. Credit rating
  • 2. Number of orders received at an e-commerce retailer which can be countably infinite.
  • 3. Customer churn
  • 4. Fraud [the random variables take binary values: a) Fraudulent transaction and b) Genuine transaction

About the Author

Silan Software is one of the India's leading provider of offline & online training for Java, Python, AI (Machine Learning, Deep Learning), Data Science, Software Development & many more emerging Technologies.

We provide Academic Training || Industrial Training || Corporate Training || Internship || Java || Python || AI using Python || Data Science etc