Data Science P-Value Interpretation

The p-value is a crucial concept in hypothesis testing used in data science, statistics, and machine learning. This tutorial explains what a p-value is, how to interpret it, and how to calculate it using Python.

1. What Is a P-Value?

A p-value is the probability of obtaining test results at least as extreme as the observed data, assuming that the null hypothesis ( $H_0$ ) is true.

It helps determine whether the observed result is due to random chance or if there is statistical significance.
A smaller p-value indicates stronger evidence against the null hypothesis.

Example Meaning of P-Value:

If , there is a 5% chance that the observed results occurred due to random chance.
If , there is a 1% chance that the results are due to random variation.

2. Hypothesis Testing & P-Value

In hypothesis testing, we set up two hypotheses:

Null Hypothesis ( $$ H_0 $)$ : Assumes there is no effect or no difference.
Alternative Hypothesis ( $$ H_0 \text{ or } H_1 $)$ : Assumes there is an effect or a difference.

Then, we calculate the p-value to determine whether to reject $H_0$ or fail to reject $H_0$ .

3. How to Interpret the P-Value?

P-Value	Interpretation
	Fail to reject $H_0$ (Not statistically significant)
	Reject $H_0$ (Statistically significant)
	Strong evidence to reject $H_0$ (Highly significant)
$p \leq 0.001$	Very strong evidence to reject $H_0$ (Extremely significant)

Example Interpretation:

$p = 0.08$ → Fail to reject $H_0$ (No strong evidence to support $H_a$ ).
$p = 0.03$ → Reject $H_0$ (Significant evidence supporting $H_a$ ).
$p = 0.0005$ → Reject $H_0$ (Very strong evidence for $H_a$ ).

4. Example: P-Value in Action

Scenario:

A company claims that the average delivery time of their service is less than or equal to 30 minutes. A researcher collects data and finds an average delivery time of 28 minutes.

We conduct a hypothesis test:

$H_0$ : The mean delivery time is 30 minutes ()
$H_a$ : The mean delivery time is less than 30 minutes ( )

We will perform a one-sample t-test using Python.

5. P-Value Calculation in Python

import numpy as np
from scipy import stats

# Sample data
data = np.array([28, 32, 29, 31, 27, 30, 28, 29, 30, 28])

# Null hypothesis: Mean delivery time is 30 minutes
mu_0 = 30

# Perform a one-sample t-test
t_stat, p_value = stats.ttest_1samp(data, mu_0)

# Since we are testing if mean is "less than" 30, use one-tailed p-value
p_value_one_tailed = p_value / 2  # Divide by 2 for one-tailed test

# Print results
print(f"T-Statistic: {t_stat:.4f}")
print(f"P-Value (One-Tailed): {p_value_one_tailed:.4f}")

# Decision
alpha = 0.05  # 5% significance level
if p_value_one_tailed < alpha:
    print("Reject the null hypothesis: The average delivery time is significantly less than 30 minutes.")
else:
    print("Fail to reject the null hypothesis: No significant evidence that delivery time is less than 30 minutes.")

6. Explanation of the Code

We use stats.ttest_1samp(data, mu_0) to perform a one-sample t-test.
We divide the p-value by 2 to get the one-tailed p-value (because we are testing for “less than”).
If , we reject $H_0$ (significant result).

7. Sample Output

T-Statistic: -1.6560
P-Value (One-Tailed): 0.0686
Fail to reject the null hypothesis: No significant evidence that delivery time is less than 30 minutes.

Since , we fail to reject $H_0$ , meaning we do not have enough evidence that delivery time is significantly less than 30 minutes.

8. Common Mistakes with P-Values

P-Value is NOT the Probability That $H_0$ is True
- It only measures how likely the data is under $H_0$ .
- A small p-value does not prove that $H_0$ is false, only that it is unlikely.
P-Value Does NOT Measure Effect Size
- A small p-value does not tell you how large the effect is.
- Use confidence intervals or Cohen’s d to measure effect size.
P-Value Can Be Influenced by Sample Size
- Large sample sizes can produce small p-values even if the effect is not practically significant.
- Small sample sizes may not give a small p-value even if there is a real effect.

9. Summary

The p-value tells us how likely our results are under the null hypothesis.
If $p \leq 0.05$ , we reject $H_0$ (statistically significant result).
If $p > 0.05$ , we fail to reject $H_0$ (not significant).
P-values do not measure the probability that $H_0$ is true or the strength of an effect