PAC Learning (Probably Approximately Correct)
🔹 What is PAC Learning?
PAC stands for Probably Approximately Correct. It ensures that a learning algorithm produces a model that is approximately correct with high probability.
🔹 ε (Epsilon) – Error Tolerance
It represents the maximum acceptable error.
Example: ε = 0.05 means 5% error allowed (5 mistakes out of 100 emails).
🔹 δ (Delta) – Confidence
It represents how confident we are in the model.
Example: δ = 0.01 means 99% confidence that the error will be ≤ 5%.
🔹 Real-Life Example (Spam Detection)
Suppose you are building a spam classifier:
- ε = 0.05 → 5% error allowed
- δ = 0.01 → 99% confidence
This means your classifier has 99% confidence that the error is less than or equal to 5%.
🧠Memory Trick
ε = error you allow
1-δ = confidence about error
🔹 Hypothesis Space (H)
Hypothesis space is the set of all possible models (rules) that a learning algorithm can choose from.
Example:
- If email contains "free" → Spam
- If email contains "offer" → Spam
- If email length > 100 → Spam
- Always Not Spam
All these possible rules together form the hypothesis space H.
Why Hypothesis Space Matters?
- Large hypothesis space → more choices → harder to learn
- Small hypothesis space → easier to select the best model
- More hypotheses require more training data
🔹 Sample Complexity Formula
m ≥ (1/ε) [ ln|H| + ln(1/δ) ]
Improved (Standard) Form:
m ≥ (1/2ε²) ln(2|H|/δ)
Where:
m = number of training samples
|H| = size of hypothesis space
ε = error tolerance
δ = confidence parameter
✅ Use Case 1: How Much Data is Needed?
Problem:
You want to train a classifier with the following requirements:
- ε = 0.05 (5% error allowed)
- δ = 0.01 (99% confidence)
- |H| = 1000 (number of possible hypotheses)
Formula Used:
m ≥ (1/ε) [ ln|H| + ln(1/δ) ]
Substitute Values:
m ≥ (1 / 0.05) [ ln(1000) + ln(1 / 0.01) ]
m ≥ 20 [ ln(1000) + ln(100) ]
ln(1000) ≈ 6.9, ln(100) ≈ 4.6
m ≥ 20 × (6.9 + 4.6)
m ≥ 20 × 11.5 = 230
Final Answer:
You need at least 230 training samples (emails) to ensure the classifier achieves ≤ 5% error with 99% confidence.
🔹 VC Dimension (Model Capacity)
VC Dimension measures the capacity or complexity of a model — how well it can fit different patterns.
Simple Idea:
It is the maximum number of points that a model can perfectly classify in all possible ways.
Example:
- A line can classify 3 points in all ways → VC = 3
- But not 4 points → so VC = 3
Why Important?
- Higher VC → more complex model
- Needs more data
- Risk of overfitting
🔹 Inductive Bias
Inductive bias is the set of assumptions a learning algorithm makes to generalize beyond the training data.
Examples:
- Linear classifiers assume the decision boundary is a line.
- Decision trees assume data can be split hierarchically by features.
- k-NN assumes nearby points have similar labels.
Key Idea: Without inductive bias, learning is impossible — because infinitely many hypotheses could fit the training data.
🔗 Relation Between PAC Learning and Inductive Bias
- PAC requires inductive bias: To achieve PAC guarantees, the learner must restrict itself to a hypothesis space (H). This restriction is the inductive bias.
-
Bias defines hypothesis space size:
Stronger bias → smaller hypothesis space → less data needed.
Weaker bias → larger hypothesis space → more data required. -
Bias vs Generalization:
PAC learning explains when a given inductive bias leads to good generalization.
Too weak bias → huge H → impractical data requirement.
Too strong bias → may miss the true concept. - VC Dimension Connection: Inductive bias determines the VC dimension of the hypothesis space. PAC learning uses VC dimension to estimate how much data is required for reliable learning.
🔹 Conclusion
Think of PAC learning as a contract:
If you give me enough data (based on ε, δ, |H|, or VC dimension),
Then I promise that, with high probability, my model will be approximately correct on unseen data.
✔ In simple terms:
More data → Better guarantee of accuracy and confidence.
No comments:
Post a Comment