**Statistics Tutorial: Important Statistics Formulas**

** Population mean = μ = ( Σ X _{i} )
/ N**

**Population standard deviation = σ = √ [ Σ
( X _{i} - μ )^{2} / N ]**

**Population variance = σ ^{2} = Σ (
X_{i} - μ )^{2} / N **

**Variance of population proportion = σ _{P}^{2} = PQ / n**

**Standardized score = Z = (X - μ) / N **

**Population correlation coefficient = ρ =
[ 1 / N ] * Σ { [ (X _{i} - μ_{X}) / σ_{x} ] * [ (Y_{i} - μ_{Y}) / σ_{y} ] }**

**Statistics**

**Unless otherwise noted, these formulas assume simple
random sampling.**

**Sample mean = x = ( Σ x _{i} ) / n**

**Sample standard deviation = s = √[ Σ ( x _{i} - x )^{2} / ( n - 1 ) ]**

**Sample variance = s ^{2} = Σ ( x_{i} - x )^{2} / ( n - 1 )**

**Variance of sample proportion = s _{p}^{2} = pq / (n - 1)**

**Pooled sample proportion = p = (p _{1} * n_{1} + p_{2} * n_{2}) / (n_{1} + n_{2})**

**Pooled sample standard deviation = s _{p} = √ [ (n_{1} - 1) * s_{1}^{2} + (n_{2} - 1) * s_{2}^{2} ] / (n_{1} + n_{2} - 2) ]**

**Sample correlation coefficient = r = [ 1
/ (n - 1) ] * Σ { [ (x _{i} - x) / s_{x} ] * [ (y_{i} -
y) / s_{y} ] }**

**Simple Linear Regression**

**Simple linear regression line: ŷ = b _{0} + b_{1}x**

**Regression coefficient = b _{1} =
Σ [ (x_{i} - x) (y_{i} - y) ] / Σ [ (x_{i} - x)^{2}]**

**Regression slope intercept = b _{0} = y - b_{1} * x**

**Regression coefficient = b _{1} =
r * (s_{y} / s_{x})**

**Standard error of regression slope = s _{b1} = √ [ Σ(y_{i} - ŷ_{i})^{2} / (n - 2) ] / √ [ Σ(x_{i} - x)^{2} ]**

**Counting**

**n factorial: n! = n * (n-1) * (n - 2) * .
. . * 3 * 2 * 1. By convention, 0! = 1.**

**Permutations of n things, taken r at a time: _{n}p_{r} = n!
/ (n - r)!**

**Combinations of n things, taken r at a time: _{n}C_{r} = n!
/ r!(n - r)! = _{n}P_{r} / r!**

**Probability**

**Rule of addition: P(A ∪ B) = P(A) + P(B)
- P(A ∩ B)**

**Rule of multiplication: P(A ∩ B) = P(A) P(B|A)**

**Rule of subtraction: P(A') = 1 - P(A)**

**Random Variables**

**In the following formulas, X and Y are random variables, and a and b are constants.**

**Expected value of X = E(X) = μ _{x} = Σ [ x_{i} * P(x_{i}) ]**

**Variance of X = Var(X) = σ ^{2} =
Σ [ x_{i} - E(x) ]^{2} * P(x_{i}) = Σ [ x_{i} -
μ_{x} ]^{2} * P(x_{i})**

**Normal random variable = z-score = z = (X
- μ)/σ**

**Chi-square statistic = Χ ^{2} = [
( n - 1 ) * s^{2} ] / σ^{2}**

**f statistic = f = [ s_{1}^{2}/σ_{1}^{2} ] / [ s_{2}^{2}/σ_{2}^{2} ]**

**Expected value of sum of random variables = E(X + Y) = E(X) + E(Y)**

**Expected value of difference between
random variables = E(X - Y) = E(X) - E(Y)**

**Variance of the sum of independent random variables = Var(X + Y) =
Var(X) + Var(Y)**

**Variance of the difference between independent random variables = Var(X - Y) =
E(X) + E(Y)**

**Sampling Distributions**

**Mean of sampling distribution of the mean = μ _{x} = μ**

**Mean of sampling distribution of the
proportion = μ _{p} = P**

**Standard deviation of proportion = σ _{p} = √[ P * (1 - P)/n ] = √( PQ / n )**

**Standard deviation of the mean = σ _{x} = σ/√(n)**

**Standard deviation of difference of
sample means = σ _{d} = √[ (σ_{1}^{2} / n_{1}) +
(σ_{2}^{2} / n_{2}) ]**

**Standard deviation of difference of
sample proportions = σ _{d} = √{ [P_{1}(1 - P_{1}) / n_{1}]
+ [P_{2}(1 - P_{2}) / n_{2}] }**

**Standard Error**

**Standard error of proportion = SE _{p} = s_{p} = √[ p * (1 - p)/n ] = √( pq / n )**

**Standard error of difference for
proportions = SE _{p} = s_{p} = √{ p * ( 1 - p ) * [ (1/n_{1})
+ (1/n_{2}) ] }**

**Standard error of the mean = SE _{x} = s_{x} = s/√(n)**

**Standard error of difference of sample
means = SE _{d} = s_{d} = √[ (s_{1}^{2} / n_{1})
+ (s_{2}^{2} / n_{2}) ]**

**Standard error of difference of paired
sample means = SE _{d} = s_{d} = { √ [ (Σ(d_{i} - d)^{2} / (n - 1)
] } / √(n)**

**Pooled sample standard error = s _{pooled} = √ [ (n_{1} - 1) * s_{1}^{2} + (n_{2} - 1) * s_{2}^{2} ] / (n_{1} + n_{2} - 2) ]**

**Standard error of difference of sample
proportions = s _{d} = √{ [p_{1}(1 - p_{1}) / n_{1}]
+ [p_{2}(1 - p_{2}) / n_{2}] }**

**Discrete Probability Distributions**

**Binomial formula: P(X = x) = b( x; n,
P) = _{n}C_{x} * P^{x} * (1 - P)^{n - x} = _{n}C_{x} * P^{x} * Q^{n - x}**

**Mean of binomial distribution = μ _{x} = n * P**

**Variance of binomial distribution = σ _{x}^{2} = n * P * ( 1 - P )**

**Negative Binomial formula: P(X = x) = b*( x;
r, P) = _{x-1}C_{r-1} * P^{r} * (1 - P)^{x -
r}**

**Mean of negative binomial distribution =
μ _{x} = rQ / P**

**Variance of negative binomial
distribution = σ _{x}^{2} = r * Q / P^{2}**

**Geometric formula: P(X = x) = g( x;
P) = P * Q^{x - 1}**

**Mean of geometric distribution = μ _{x} = Q / P**

**Variance of geometric distribution = σ _{x}^{2} = Q / P^{2}**

**Hypergeometric formula: P(X = x) = h( x;
N, n, k) = [ _{k}C_{x} ] [ _{N-k}C_{n-x} ] / [ _{N}C_{n} ]**

**Mean of hypergeometric distribution = μ _{x} = n * k / N**

**Variance of hypergeometric distribution =
σ _{x}^{2} = n * k * ( N - k ) * ( N - n ) / [ N^{2} * (
N - 1 ) ]**

**Poisson formula: P( x; μ) = (e^{-μ})
(μ^{x}) / x!**

**Mean of Poisson distribution = μ _{x} = μ**

**Variance of Poisson distribution = σ _{x}^{2} = μ**

**Multinomial formula: P = [ n! / ( n _{1}!
* n_{2}! * ... n_{k}! ) ] * ( p_{1}^{n}_{1} * p_{2}^{n}_{2} * . . . * p_{k}^{n}_{k} )**

**Linear Transformations**

**For the following formulas, assume that Y is a linear
transformation of the random variable X, defined by the equation: Y = aX + b.**

**Mean of a linear transformation = E(Y) = Y = aX + b.**

**Variance of a linear transformation =
Var(Y) = a ^{2} * Var(X).**

**Standardized score = z = (x - μ _{x})
/ σ_{x}.**

**t-score = t = (x - μ _{x}) / [ s/√(n)
].**

**Estimation**

**Confidence interval: Sample statistic +
Critical value * Standard error of statistic **

**Margin of error = (Critical value) *
(Standard deviation of statistic)**

**Margin of error = (Critical value) *
(Standard error of statistic)**

**Hypothesis Testing**

**Standardized test statistic = (Statistic
- Parameter) / (Standard deviation of statistic)**

**One-sample z-test for proportions: z-score = z = (p - P _{0}) / √( p * q / n )**

**Two-sample z-test for proportions: z-score = z = z = [ (p _{1} - p_{2}) - d ] / SE**

**One-sample t-test for means: t-score = t
= (x - μ) / SE**

**Two-sample t-test for means: t-score = t
= [ (x _{1} - x_{2})
- d ] / SE**

**Matched-sample t-test for means: t-score
= t = [ (x _{1} - x_{2})
- D ] / SE = (d - D) / SE**

**Chi-square test statistic = Χ ^{2} = Σ[ (Observed - Expected)^{2} / Expected ]**

**Degrees of Freedom**

**The correct formula for degrees of freedom (DF)
depends on the situation (the nature of the test statistic, the number of
samples, underlying assumptions, etc.). **

**One-sample t-test: DF = n - 1**

**Two-sample t-test: DF = (s _{1}^{2}/n_{1} + s_{2}^{2}/n_{2})^{2} / { [ (s_{1}^{2} / n_{1})^{2} / (n_{1} - 1) ] + [ (s_{2}^{2} / n_{2})^{2} / (n_{2} - 1) ] }**

**Two-sample t-test, pooled standard error: DF = n _{1} + n_{2} - 2**

**Simple linear regression, test slope: DF
= n - 2**

**Chi-square goodness of fit test: DF = k -
1**

**Chi-square test for homogeneity: DF = (r
- 1) * (c - 1)**

**Chi-square test for independence: DF = (r
- 1) * (c - 1)**

**Sample Size**

**Below, the first two formulas find the smallest
sample sizes required to achieve a fixed margin of error, using simple random
sampling. The third formula assigns sample to strata, based on a proportionate
design. The fourth formula, Neyman allocation, uses stratified sampling to
minimize variance, given a fixed sample size. And the last formula, optimum
allocation, uses stratified sampling to minimize variance, given a fixed
budget.**

**Mean (simple random sampling): n = { z ^{2} * σ^{2} * [ N / (N - 1) ] } / { ME^{2} + [ z^{2} * σ^{2} / (N - 1) ] }**

**Proportion (simple random sampling): n =
[ ( z ^{2} * p * q ) + ME^{2} ] / [ ME^{2} + z^{2} * p * q / N ]**

**Proportionate stratified sampling: n _{h} = ( N_{h} / N ) * n**

**Neyman allocation (stratified sampling): n _{h} = n * ( N_{h} * σ_{h} ) / [ Σ ( N_{i} * σ_{i} ) ]**

**Optimum allocation (stratified sampling): n _{h} = n * [ ( N_{h} * σ_{h} ) / √( c_{h} ) ]
/ [ Σ ( N_{i} * σ_{i} ) / √( c_{i} ) ]**