Statistics
Statistics Complete Cheatsheet
Descriptive stats, probability, distributions, hypothesis testing, and regression — complete statistics reference.
01Descriptive Statistics▼
Mean
x-bar=sum(x)/n
Median
Middle value when sorted
Mode
Most frequent
IQR
Q3-Q1
Resistant to outliers.
Variance
s^2=sum(x-mean)^2/(n-1)
Sample variance.
Std dev
s=sqrt(variance)
68-95-99.7 rule for normal dist.
| Symbol | Meaning |
|---|---|
| x-bar | Sample mean |
| mu | Population mean |
| s | Sample std dev |
| sigma | Population std dev |
| n | Sample size |
02Probability▼
P(A)
0 to 1 range. P(certain)=1, P(impossible)=0
Complement
P(A')=1-P(A)
Addition
P(AuB)=P(A)+P(B)-P(AnB)
Multiplication
P(AnB)=P(A|B)*P(B)
Independent
P(AnB)=P(A)*P(B)
Conditional
P(A|B)=P(AnB)/P(B)
STATSProbability examples
# Deck of 52 cards P(Ace)=4/52=1/13 P(Red or Ace)=26/52+4/52-2/52=28/52 # Independent P(Head AND Head)=0.5*0.5=0.25 # Conditional P(2nd Ace | 1st was Ace)=3/51
03Distributions▼
| Distribution | Mean | Variance | Use when |
|---|---|---|---|
| Binomial(n,p) | np | np(1-p) | Fixed n trials, constant p |
| Poisson(lambda) | lambda | lambda | Rare events, fixed time |
| Normal(mu,sigma^2) | mu | sigma^2 | Continuous, bell-shaped |
| Uniform(a,b) | (a+b)/2 | (b-a)^2/12 | All outcomes equally likely |
STATSBinomial
P(X=k)=C(n,k)*p^k*(1-p)^(n-k) Example: 10 flips, p=0.5, P(X=6) =C(10,6)*0.5^6*0.5^4 =210*0.015625*0.0625=0.205
04Hypothesis Testing▼
H0
Null hypothesis: no effect. We try to disprove it.
H1
Alternative hypothesis: our claim.
p-value
Prob of result if H0 true. p
alpha
Significance level, usually 0.05.
Type I error
Reject H0 when true (false positive). Rate=alpha.
Type II error
Fail to reject H0 when false (false negative). Rate=beta.
STATSTest steps
1. State H0 and H1 2. Set alpha (0.05) 3. Choose test statistic 4. Calculate: t=(x-bar - mu0)/(s/sqrt(n)) 5. Find p-value 6. p
❓ Quiz
p<0.05 in hypothesis testing means?
p-value < alpha means the result is statistically significant — reject null hypothesis H0.
05Regression▼
Pearson r
-1 to 1
|r|>0.7 strong, <0.3 weak.
R-squared
0 to 1
Variance explained by model.
Regression line
y-hat=a+bx
b=slope, a=intercept.
Slope
b=r*(sy/sx)
Intercept
a=y-bar - b*x-bar
Point (x-bar,y-bar) on line.
⚠️
Correlation does NOT imply causation! Even r=1 does not prove one variable causes the other.