Mar 18, 2024 • statistics

Probability and Statistics Fundamentals

Core concepts of probability and statistics for data science

#Statistics #Probability #Mathematics

Probability and Statistics Fundamentals

Probability Basics

Basic Concepts

Sample Space (Ω): The set of all possible outcomes

Event (E): A subset of the sample space

Probability Axioms:

P(E) ≥ 0, for any event E
P(Ω) = 1
Mutually exclusive events: P(A ∪ B) = P(A) + P(B)

Conditional Probability

P(A|B) = P(A ∩ B) / P(B),  where P(B) > 0
Bayes’ Theorem:
P(A|B) = P(B|A) × P(A) / P(B)

Common Distributions

Discrete Distributions:

Distribution	Probability Mass Function	Expectation	Variance
Bernoulli	P(X=k) = p^k(1-p)^(1-k)	p	p(1-p)
Binomial	P(X=k) = C(n,k)p^k(1-p)^(n-k)	np	np(1-p)
Poisson	P(X=k) = λ^k e^(-λ) / k!	λ	λ

Continuous Distributions:

Distribution	Probability Density Function	Expectation	Variance
Uniform	f(x) = 1/(b-a)	(a+b)/2	(b-a)²/12
Normal	f(x) = (1/√(2πσ²))e^(-(x-μ)²/(2σ²))	μ	σ²
Exponential	f(x) = λe^(-λx)	1/λ	1/λ²

Descriptive Statistics

Measures of Central Tendency

Mean: μ = Σxᵢ / n
Median: The middle value after sorting
Mode: The value with highest frequency

Measures of Dispersion

Variance: σ² = Σ(xᵢ - μ)² / n
Standard Deviation: σ = √σ²
Range: R = x_max - x_min
Interquartile Range: IQR = Q3 - Q1

Distribution Shape

Skewness: Measures the symmetry of a distribution

Skewness > 0: Right-skewed (positively skewed)
Skewness < 0: Left-skewed (negatively skewed)
Skewness = 0: Symmetric

Kurtosis: Measures the peakedness of a distribution

Kurtosis > 3: Leptokurtic (heavy tails)
Kurtosis < 3: Platykurtic (light tails)
Kurtosis = 3: Normal distribution

Inferential Statistics

Central Limit Theorem

Regardless of the population distribution, when the sample size is sufficiently large, the sampling distribution of the sample mean is approximately normal.

X̄ ~ N(μ, σ²/n)
Standardization: Z = (X̄ - μ) / (σ/√n) ~ N(0,1)

Confidence Interval

Confidence Interval for Population Mean (σ known):

CI = X̄ ± z_(α/2) × (σ/√n)
Common Confidence Levels:
90% → z = 1.645
95% → z = 1.96
99% → z = 2.576

Hypothesis Testing

Basic Steps:

Establish hypotheses: H₀ (null hypothesis), H₁ (alternative hypothesis)
Choose significance level α (typically 0.05)
Calculate the test statistic
Calculate the p-value
Make a decision: reject H₀ if p < α

Common Tests:

Test	Use Case	Statistic
Z-test	Large sample mean test	Z = (X̄ - μ₀) / (σ/√n)
t-test	Small sample mean test	t = (X̄ - μ₀) / (s/√n)
Chi-square test	Goodness of fit/Independence	χ² = Σ(O-E)²/E
F-test	Variance comparison	F = s₁²/s₂²

Correlation and Regression

Correlation Coefficient:

r = Σ(xᵢ-x̄)(yᵢ-ȳ) / √[Σ(xᵢ-x̄)² × Σ(yᵢ-ȳ)²]
-1 ≤ r ≤ 1
The larger |r|, the stronger the correlation

Simple Linear Regression:

y = β₀ + β₁x + ε
β₁ = Σ(xᵢ-x̄)(yᵢ-ȳ) / Σ(xᵢ-x̄)²
β₀ = ȳ - β₁x̄