Probability: The Basics

Probability is the branch of mathematics that helps us measure and understand uncertainty. It answers questions like: "How likely is it to rain tomorrow?" or "What are the chances of getting heads when I flip a coin?"

I. Key Terms in Probability

Trial:

Experiment:

Outcome:

Sample Space (Ω):

➛ Each element of outcome are part of the sample space. OutcomeΩ

Random Experiment:

Event:

Probability of an Event (P(E)):

Complementary Events

Continuous Probability

Why is Probability Important?

How is Statistics and Probability connected?

  • Statistics helps interpret the results of testing machine learning models and make better decisions
  • Probability provides the foundation for understanding statistics and its tools.


II. The Rules (Axioms) of Probability

These forms the foundation for calculating and understanding probabilities.
Learning/images/set-1.png

Img Src: https://fiveable.me

These are the basic rules that all probabilities must follow:
  1. Non-Negativity:
    • The probability of any event E is always between 0 and 1 (inclusive).
    • There are no events that can have negative probability. 0P(E)1
  2. Total Probability:
    • The probability of the entire sample space S, which includes all possible outcomes, is always 1.
    • This means something in the sample space is guaranteed to happen. P(S)=1
  3. Additivity:
    • If two events cannot happen at the same time (mutually exclusive), then P(A or B)=P(A)+P(B)
    • Example: When flipping a coin, getting heads and getting tails are mutually exclusive. P(H or T)=1

III. Approaches to Probability

There are three main ways to think about probability:

  1. Axiomatic Probability (based on rules/axioms)
  2. Frequentist Probability (based on experiments and data)
  3. Classical Probability (based on equally likely outcomes)

★ Axiomatic Probability

★ Frequentist Probability

★ Classical Probability

ASPECT FREQUENTIST PROBABILITY CLASSICAL PROBABILITY
Definition Probability is defined as the long-run relative frequency of an event occurring in repeated trials. Probability is based on the assumption that all outcomes in the sample space are equally likely.
Interpretation Probability is an empirical measure derived from repeated experiments. Probability is derived theoretically based on symmetry and logical reasoning.
Assumption Assumes an infinite number of trials for probabilities to converge. Assumes all possible outcomes have the same likelihood.
Calculation Estimated from observed frequencies over repeated trials. Computed using the formula: P(A) = (Favorable outcomes) / (Total outcomes)
Example If a coin is flipped 1,000 times and lands on heads 520 times, the probability of heads is estimated as 520/1000 = 0.52. The probability of getting heads in a fair coin toss is 1/2 since both outcomes are equally likely.
Alignment with Axioms Follows the Axioms of Probability but interprets probability through empirical observations. Fully adheres to Axiomatic Probability, assuming equal likelihood of all outcomes.

4. Counting Principles: How Many Ways?

To find the probability of an event, we first need to count the number of outcomes. Counting plays a key role in determining the size of the sample space and calculating probabilities. By knowing how many ways an event can happen, we can better understand its likelihood.

There are two fundamental rules for counting:

1. The Addition Rule (for "OR")

2. The Multiplication Rule (for "AND")

I. Types of Counting

Based on the multiplication rule, we can handle more complex scenarios. The key questions to ask are: "Does order matter?" and "Can I reuse items?"

★ Exponents (Counting with Replacement)
★ Factorials (Counting without Replacement)
★ Permutations (Order Matters, Without Replacement)
★ Combinations (Order Does NOT Matter, Without Replacement)

II. The Inclusion-Exclusion Principle

What if events are NOT mutually exclusive? If we just add them, we double-count the overlap. The inclusion-exclusion principle helps us correct this.
The Inclusion-Exclusion Principle helps calculate the probability of the union of overlapping events by subtracting the over-counted intersections.

P(A or B)=P(A)+P(B)P(A and B)P(AB)=P(A)+P(B)P(AB)

Example:
In a class of 30 students, 15 play soccer, 10 play basketball, and 5 play both. What is the probability that a randomly chosen student plays soccer or basketball?


V. Advanced Probability Concepts

1. Independent Events:

2. Mutually Exclusive Events:

3. Comparison Example:

Let A = {roll a 6}, B = {roll a 2 or 3} on a die.

Case study

Simple case: Rolling a fair die, let Event A = {roll a 6} and let B =

  • P(A)=16
  • P(B)=13
  • P(A and B)=P(AB)=0
  • P(A or B)=P(AB)=12

Question: Are the events Independent, Mutually exclusive or neither?

  1. As per the definition of Independence P(AB)=P(A).P(B)
  • P(A)P(B)=1613=1180
  • Events A and B are NOT independent
  1. As per definition of Mutual Exclusive P(AB)=P(A)+P(B)P(AB)
  • P(A)+P(B)P(AB)=16+130=12=12
  • the events are mutually exclusive

VI. Conditional Probability

Conditional probability is the chance that something happens, given that something else has already happened.
Formula:

P(A|B)=P(AB)P(B)

Example:
If 30% of students play football, and 10% play both football and basketball, then the probability a student plays basketball given they play football is P(B|F)=0.100.30=0.33.

Note

P(AB)=P(A|B)P(B)=P(B|A)P(A)

This becomes the basis of Naive Bayes Theorem

Bayes' Theorem:
This is a special formula for "flipping" conditional probabilities:
bayes-1.png|500


VII. Partition and The Law of Total Probability

Sometimes, it's hard to calculate the probability of an event directly. The Law of Total Probability gives us a clever way to find it by breaking the problem into smaller, easier pieces.

★ What is a Partition?

A partition is a way of dividing the entire sample space into several events that are:

  1. Mutually Exclusive: None of the events overlap.
  2. Exhaustive: Together, the events cover all possible outcomes.

Think of it like slicing a pizza. Each slice is a separate piece (mutually exclusive), and all the slices together make up the whole pizza (exhaustive).

★ The Law of Total Probability

This law states that you can find the probability of an event (let's call it A) by considering how it behaves within each piece of the partition.

P(A)=P(A|B1)P(B1)+P(A|B2)P(B2)+...+P(A|Bn)P(Bn)

In simple terms: The total probability of A is a weighted average of its conditional probabilities, where the weights are the probabilities of each event in the partition.

Example 1: Choosing a Marble

Imagine two bags of marbles.

You flip a fair coin to decide which bag to draw from. What is the probability you pick a red marble?

So, the overall probability of picking a red marble is 45%.

Example 2: Student Study Habits

At a school, students are either "Full-Time" or "Part-Time".

We also know:

What is the probability that a randomly chosen student studies on the weekend?

So, there is a 62% chance that any given student studies on the weekend. This law is a key ingredient in Bayes' Theorem.


VIII. Bayes' Theorem (Putting It All Together)

Bayes' theorem helps us update our beliefs when we get new information. It connects conditional probabilities in a powerful way.

Formula:

P(AB)=P(BA)P(A)P(B)

Key Terms:

  1. Prior Probability (P(A)):
    • What you believed about A before seeing new evidence B.
    • Example: Suppose 5% of the population has a certain disease. Before testing, your prior belief about someone having the disease is 5%.
  2. Likelihood (P(B|A)):
    • How likely is the evidence B if your belief A is true?
    • Example: A medical test correctly identifies the disease 99% of the time (i.e., if someone has the disease, the test will be positive 99% of the time).
  3. Posterior Probability (P(A|B)):
    • What you should believe about A after seeing the evidence B.
  4. Marginal Likelihood (P(B)):
    • The overall probability of observing the evidence B. This is often calculated using the Law of Total Probability!

Example: The "Positive Test" Problem Revisited

Suppose 1% of people have a certain disease. A test for it is 99% accurate for sick people, but has a 2% false positive rate (healthy people who test positive). If you test positive, what is the chance you actually have the disease?

Step 1: Define Events

Step 2: List the Probabilities

Step 3: Calculate P(T+) using the Law of Total Probability
The sample space is partitioned into "Disease" and "Healthy".

Step 4: Apply Bayes' Theorem
We want to find P(D|T+), the probability of having the disease given a positive test.

So, even with a positive test, the chance you have the disease is only about 33.3%! This surprising result happens because the disease is rare.

Example 2: The Defective Bolt Problem

In a factory, three machines, M1, M2, and M3, produce 2000, 2500, and 4000 bolts daily, respectively. Their defect rates are:

A bolt is drawn randomly from a day's production and found to be defective. What is the probability it was produced by machine M2?

Step 1: Define Events

We want to find P(M2|D).

Step 2: List the Probabilities
First, let's find the prior probabilities of a bolt coming from each machine.

Next, list the likelihoods (the defect rates for each machine).

Step 3: Calculate P(D) using the Law of Total Probability
The events M1, M2, and M3 form a partition of the sample space.

The overall probability of picking a defective bolt is about 3.06%.

Step 4: Apply Bayes' Theorem
Now we can calculate the posterior probability, P(M2|D).

Conclusion:
The probability that the defective bolt came from machine M2 is approximately 38.5%.

Why is Bayes' Theorem Important?


IX. Key Takeaways