πŸ“Š Statistics: From Basics to Advanced

Welcome to your comprehensive guide to Statistics! This document combines bedrock concepts with advanced formulas to help you understand not just how to calculate, but why these measures matter.


🟒 Level 1: The Basics (Central Tendency)

These measures help us find the "center" or "typical" value in a dataset.

Concept Definition Formula Example (Data: 3, 5, 5, 7, 10, 12, 14)
Mean (ΞΌ) The average (balance point). βˆ‘xin (56/7)=8
Median The physical middle. Sort & find middle 7
Mode Most frequent value. Count occurrences 5 (appears twice)
Midrange Halfway between ends. Max+Min2 14+32=8.5

🟑 Level 2: Measuring the "Spread" (Dispersion)

Knowing the center isn't enough; we need to know if the data is tightly packed or widely scattered.

1. Range & IQR

2. Standard Deviation (Οƒ) & Variance (Οƒ2)

Standard deviation is the "gold standard" and that measures the spread of a data distribution.Β The more spread out a data distribution is, the greater its standard deviation.

Type Formula When to use?
Population Οƒ2=βˆ‘(xiβˆ’ΞΌ)2n When you have data for the entire group.
Sample s2=βˆ‘(xiβˆ’xΒ―)2nβˆ’1 When you're using a small group to estimate a big one.

Note: We divide by nβˆ’1 for samples to be more conservative (Bessel's Correction).


πŸ”΅ Level 3: Position & The Normal Distribution

How does one specific data point compare to the rest?

1. Z-Score (Standardization)

The Z-score tells you exactly how many Standard Deviations a point is from the mean.

Β Z=data point - meanstandard deviation=xβˆ’ΞΌΟƒ

2. The Empirical Rule (68-95-99.7)

In a Normal Distribution:


πŸ”΄ Level 4: Advanced Relationships (Bivariate & Inference)

Moving from one variable (x) to two (x and y).

1. Covariance & Correlation (r)

For in depth coverage refer Residuals

2. Standard Error (SE)

When we take samples, the means of those samples will vary. The SE tells us how much we expect that sample mean to "wiggle."

SE=Οƒn

3. T-Score vs. Z-Score (Confidence Intervals)

Use these to estimate a range where the "true" population mean likely lives.


πŸ“‹ Quick Formula Cheat Sheet

Measure Formula What it measures
Mean Absolute Deviation (MAD) $$MAD = \frac{\sum_\limits{n=1}^{n}{|x_i-\mu|}}{n}$$ Average distance from the mean (no squaring).
Coefficient of Variation (Cv) σμ Consistency: How big is the error relative to the mean?
Standard Error (SE) Οƒn The precision of your sample mean.
Z-Score xβˆ’ΞΌΟƒ Distance from mean in units of "Standard Deviation".
Covariance (Covxy) βˆ‘(xiβˆ’ΞΌx)(yiβˆ’ΞΌy)n If X goes up, does Y go up too?
Correlation (r) Covxyσx⋅σy Strength of relationship between -1 and 1.
R-Squared (R2) Refer R-Square