Thursday, 16 April 2026

PAC Learning Explained

PAC Learning Explained Simply

PAC Learning (Probably Approximately Correct)

🔹 What is PAC Learning?

PAC stands for Probably Approximately Correct. It ensures that a learning algorithm produces a model that is approximately correct with high probability.

🔹 ε (Epsilon) – Error Tolerance

It represents the maximum acceptable error.

Example: ε = 0.05 means 5% error allowed (5 mistakes out of 100 emails).

🔹 δ (Delta) – Confidence

It represents how confident we are in the model.

Example: δ = 0.01 means 99% confidence that the error will be ≤ 5%.

🔹 Real-Life Example (Spam Detection)

Suppose you are building a spam classifier:

  • ε = 0.05 → 5% error allowed
  • δ = 0.01 → 99% confidence

This means your classifier has 99% confidence that the error is less than or equal to 5%.

🧠 Memory Trick

ε = error you allow
1-δ = confidence about error

🔹 Hypothesis Space (H)

Hypothesis space is the set of all possible models (rules) that a learning algorithm can choose from.

Example:

  • If email contains "free" → Spam
  • If email contains "offer" → Spam
  • If email length > 100 → Spam
  • Always Not Spam

All these possible rules together form the hypothesis space H.

Why Hypothesis Space Matters?

  • Large hypothesis space → more choices → harder to learn
  • Small hypothesis space → easier to select the best model
  • More hypotheses require more training data

🔹 Sample Complexity Formula

m ≥ (1/ε) [ ln|H| + ln(1/δ) ]

Improved (Standard) Form:

m ≥ (1/2ε²) ln(2|H|/δ)

Where:
m = number of training samples
|H| = size of hypothesis space
ε = error tolerance
δ = confidence parameter

✅ Use Case 1: How Much Data is Needed?

Problem:

You want to train a classifier with the following requirements:

  • ε = 0.05 (5% error allowed)
  • δ = 0.01 (99% confidence)
  • |H| = 1000 (number of possible hypotheses)

Formula Used:

m ≥ (1/ε) [ ln|H| + ln(1/δ) ]

Substitute Values:

m ≥ (1 / 0.05) [ ln(1000) + ln(1 / 0.01) ]

m ≥ 20 [ ln(1000) + ln(100) ]

ln(1000) ≈ 6.9,    ln(100) ≈ 4.6

m ≥ 20 × (6.9 + 4.6)

m ≥ 20 × 11.5 = 230

Final Answer:

You need at least 230 training samples (emails) to ensure the classifier achieves ≤ 5% error with 99% confidence.

🔹 VC Dimension (Model Capacity)

VC Dimension measures the capacity or complexity of a model — how well it can fit different patterns.

Simple Idea:

It is the maximum number of points that a model can perfectly classify in all possible ways.

Example:

  • A line can classify 3 points in all ways → VC = 3
  • But not 4 points → so VC = 3

Why Important?

  • Higher VC → more complex model
  • Needs more data
  • Risk of overfitting

🔹 Inductive Bias

Inductive bias is the set of assumptions a learning algorithm makes to generalize beyond the training data.

Examples:

  • Linear classifiers assume the decision boundary is a line.
  • Decision trees assume data can be split hierarchically by features.
  • k-NN assumes nearby points have similar labels.

Key Idea: Without inductive bias, learning is impossible — because infinitely many hypotheses could fit the training data.

🔗 Relation Between PAC Learning and Inductive Bias

  • PAC requires inductive bias: To achieve PAC guarantees, the learner must restrict itself to a hypothesis space (H). This restriction is the inductive bias.
  • Bias defines hypothesis space size: Stronger bias → smaller hypothesis space → less data needed.
    Weaker bias → larger hypothesis space → more data required.
  • Bias vs Generalization: PAC learning explains when a given inductive bias leads to good generalization. Too weak bias → huge H → impractical data requirement.
    Too strong bias → may miss the true concept.
  • VC Dimension Connection: Inductive bias determines the VC dimension of the hypothesis space. PAC learning uses VC dimension to estimate how much data is required for reliable learning.

🔹 Conclusion

Think of PAC learning as a contract:

If you give me enough data (based on ε, δ, |H|, or VC dimension),

Then I promise that, with high probability, my model will be approximately correct on unseen data.

✔ In simple terms:

More data → Better guarantee of accuracy and confidence.

No comments:

Post a Comment