Understanding the ANOVA Test Results

How comparing “between” vs “within” variation tells us if group means truly differ?

Introduction

Imagine five classrooms, each trying a different study technique. At the end of the term, every student takes the same exam. You look at the average score in each class and wonder: Are these teaching methods really different, or are the average differences just random wiggles?

That question is exactly what ANOVA (Analysis of Variance) answers. ANOVA is a way to compare three or more group means by asking a simple-but-powerful question: Is the variation between group averages big compared to the natural variation within each group?

If “between” variation dwarfs “within” variation, at least one group mean is likely different. If not, the differences you see are probably just noise.

What ANOVA Tests?

The F-statistic and p-value are derived from an Analysis of Variance (ANOVA) test, which compares the means of multiple groups to determine if at least one group is significantly different.

How ANOVA Works in Feature Selection: A Flowchart

START: Select a numerical feature (e.g., 'study_hours') and a categorical target (e.g., 'result')

I: Group Data:

II: Calculate Variances

III: Compute F-Statistic

IV: Determine Statistical Significance

V: Rank and Select Features Repeat for all numerical features

Understanding the ANOVA Test Results

I had always been wondering the math behind Anova testing, which is one to be used for feature selection when we have Continuous Dependent Variable and Categorical Target Variable. Thus my attempt is to shot the math behind using the one example.

Sample data for this example is picked up from scipy’s documentation’s, f_oneway, which performs one-way ANOVA tests.

anova_1.webp|700
In above graph each group mean as a black dash on a line. Blue dashed line plots the grand mean. The red arrows from each grand mean to the group mean show “between-group” difference. If the null is true (all means equal), those arrows should be tiny. Green dash line represents the variance in each group.

ANOVA asks: Are the red arrows (between) meaningfully bigger than the green arrows (within)? If yes, the groups likely have different means.?

F-statistic Formula

F=Between-group varianceWithin-group variance=MSbetweenMSwithin

Where:

★ Step 1: General Stats

  1. Number of Groups (k) and Total Observations (N):
    • k=5 (Tillamook, Newport, Petersburg, Magadan, Tvarminne)
    • N=39 (Total number of observations)
    • Xij is ith data point in group j

★ Step 2: Compute Mean

i. First, we calculate the mean (x¯) for each group (location).
ii. Grand Mean (X¯G):**
X¯G=ijXijN=0.0845

★ Step 3: Compute Sum of Squares

i. Sum of Squares Between Groups (SSbetween)
SSbetween=ni(Xi¯GM)2=10(0.0800.0852)2+8(0.0750.0852)+7(0.1030.0852)+8(0.0780.0852)+6(0.0960.0852) =0.00025+0.00087+0.00233+0.00041+0.00066SSbetween=0.00452

Where ni​ is the number of observations in each group, Xi¯​ is the mean of each group.

ii. Sum of Squares Within Groups (SSwithin)

This measures the "noise" or spread inside each individual group. We find the sum of squared differences from each group's own mean.

SSwithin=(XijXi)2¯SSwithin=0.00539
iii. Total Sum of Squares (SStotal)
SStotal=SSbetween+SSwithin=0.00452+0.00539=0.00991

★ Step 4: Compute Mean Squares

Degrees of Freedom
  1. dfbetween=(Groups - 1)=(51)=4

  2. dfwithin=(Total Samples - Groups)=(395)=34

  3. Between-group variance (MSbetween): Measures variability between different groups

MSbetween=SSbetweendfbetween=0.004524=0.00113
  1. Within-group variance (MSwithin): Measures variability within each group
MSwithin=SSwithindfwithin=0.0053939=0.000159

★ Step 5: Compute the F-Statistic

The F-statistic is calculated as:

F=Between-group varianceWithin-group variance=MSbetweenMSwithin=0.001130.000159=7.12101947164245

★ Step 6: (Final) p-value calculation

Once we compute the F-statistic, the p-value is obtained from the F-distribution:

Final Interpretation

  • In our case when F=7.121, p=0.00028
  • Since p < 0.05, we reject the null hypothesis, meaning at least one group has a significantly different mean.

Interpreting Common Scenarios

Python Code

import numpy as np  
from scipy.stats import f_oneway  
  
tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735, 0.0659, 0.0923, 0.0836]  
newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835, 0.0725]  
petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105]  
magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, 0.0689]  
tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045]  
  
f_oneway(tillamook, newport, petersburg, magadan, tvarminne)

Output

F_onewayResult(statistic=np.float64(7.121019471642445), pvalue=np.float64(0.00028122423145345525))