Probability and Statistics Fundamentals
Probability Basics
Basic Concepts
Sample Space (Ω): The set of all possible outcomes
Event (E): A subset of the sample space
Probability Axioms:
- P(E) ≥ 0, for any event E
- P(Ω) = 1
- Mutually exclusive events: P(A ∪ B) = P(A) + P(B)
Conditional Probability
P(A|B) = P(A ∩ B) / P(B), where P(B) > 0
Bayes’ Theorem: P(A|B) = P(B|A) × P(A) / P(B)
Common Distributions
Discrete Distributions:
| Distribution | Probability Mass Function | Expectation | Variance |
|---|---|---|---|
| Bernoulli | P(X=k) = p^k(1-p)^(1-k) | p | p(1-p) |
| Binomial | P(X=k) = C(n,k)p^k(1-p)^(n-k) | np | np(1-p) |
| Poisson | P(X=k) = λ^k e^(-λ) / k! | λ | λ |
Continuous Distributions:
| Distribution | Probability Density Function | Expectation | Variance |
|---|---|---|---|
| Uniform | f(x) = 1/(b-a) | (a+b)/2 | (b-a)²/12 |
| Normal | f(x) = (1/√(2πσ²))e^(-(x-μ)²/(2σ²)) | μ | σ² |
| Exponential | f(x) = λe^(-λx) | 1/λ | 1/λ² |
Descriptive Statistics
Measures of Central Tendency
Mean: μ = Σxᵢ / n
Median: The middle value after sorting
Mode: The value with highest frequency
Measures of Dispersion
Variance: σ² = Σ(xᵢ - μ)² / n
Standard Deviation: σ = √σ²
Range: R = x_max - x_min
Interquartile Range: IQR = Q3 - Q1
Distribution Shape
Skewness: Measures the symmetry of a distribution
- Skewness > 0: Right-skewed (positively skewed)
- Skewness < 0: Left-skewed (negatively skewed)
- Skewness = 0: Symmetric
Kurtosis: Measures the peakedness of a distribution
- Kurtosis > 3: Leptokurtic (heavy tails)
- Kurtosis < 3: Platykurtic (light tails)
- Kurtosis = 3: Normal distribution
Inferential Statistics
Central Limit Theorem
Regardless of the population distribution, when the sample size is sufficiently large, the sampling distribution of the sample mean is approximately normal.
X̄ ~ N(μ, σ²/n)
Standardization: Z = (X̄ - μ) / (σ/√n) ~ N(0,1)
Confidence Interval
Confidence Interval for Population Mean (σ known):
CI = X̄ ± z_(α/2) × (σ/√n)
Common Confidence Levels: 90% → z = 1.645 95% → z = 1.96 99% → z = 2.576
Hypothesis Testing
Basic Steps:
- Establish hypotheses: H₀ (null hypothesis), H₁ (alternative hypothesis)
- Choose significance level α (typically 0.05)
- Calculate the test statistic
- Calculate the p-value
- Make a decision: reject H₀ if p < α
Common Tests:
| Test | Use Case | Statistic |
|---|---|---|
| Z-test | Large sample mean test | Z = (X̄ - μ₀) / (σ/√n) |
| t-test | Small sample mean test | t = (X̄ - μ₀) / (s/√n) |
| Chi-square test | Goodness of fit/Independence | χ² = Σ(O-E)²/E |
| F-test | Variance comparison | F = s₁²/s₂² |
Correlation and Regression
Correlation Coefficient:
r = Σ(xᵢ-x̄)(yᵢ-ȳ) / √[Σ(xᵢ-x̄)² × Σ(yᵢ-ȳ)²]
-1 ≤ r ≤ 1 The larger |r|, the stronger the correlation
Simple Linear Regression:
y = β₀ + β₁x + ε
β₁ = Σ(xᵢ-x̄)(yᵢ-ȳ) / Σ(xᵢ-x̄)² β₀ = ȳ - β₁x̄