S3-SA3-0395
What is the Goodness of Fit Test?
Grade Level:
Class 9
AI/ML, Data Science, Physics, Economics, Cryptography, Computer Science, Engineering
Definition
What is it?
The Goodness of Fit Test is a statistical test that helps us check how well our observed data matches an expected pattern or distribution. It tells us if the differences between what we see and what we expect are just by chance, or if there's a real reason for them.
Simple Example
Quick Example
Imagine you have a bag of 100 laddoos, and you expect 50 to be motichoor and 50 to be boondi. You pick out 100 laddoos and find 48 motichoor and 52 boondi. The Goodness of Fit Test helps you decide if this small difference (48 vs 50) is just a random variation, or if the initial expectation of 50-50 was actually wrong.
Worked Example
Step-by-Step
Let's say a coin is supposed to be fair, meaning you expect 50% heads and 50% tails. You flip it 100 times and get 60 heads and 40 tails. Is the coin fair?
Step 1: State the expected outcomes. For 100 flips, expected heads = 100 * 0.50 = 50. Expected tails = 100 * 0.50 = 50.
---Step 2: Note the observed outcomes. Observed heads = 60. Observed tails = 40.
---Step 3: Calculate the difference between observed and expected for each outcome, square it, and divide by expected. For heads: (60 - 50)^2 / 50 = 10^2 / 50 = 100 / 50 = 2. For tails: (40 - 50)^2 / 50 = (-10)^2 / 50 = 100 / 50 = 2.
---Step 4: Sum these values to get the 'Chi-Square' (χ^2) test statistic. χ^2 = 2 + 2 = 4.
---Step 5: Compare this calculated χ^2 value (4) with a critical value from a Chi-Square distribution table (which you'd learn about in higher classes). If our calculated value is higher than the critical value (for a chosen 'significance level'), we would say the coin is likely not fair. For this example, if the critical value was, say, 3.84, then since 4 > 3.84, we might conclude the coin is probably not fair.
---Answer: The Chi-Square test statistic is 4. Based on this, there's evidence to suggest the coin might not be fair.
Why It Matters
Understanding Goodness of Fit is crucial in fields like AI/ML, where engineers use it to check if their models predict outcomes accurately. Data scientists use it to see if data fits certain patterns. Even in economics, it helps verify if market data aligns with theoretical predictions, making it a powerful tool for many real-world careers.
Common Mistakes
MISTAKE: Assuming a small difference between observed and expected automatically means the fit is bad. | CORRECTION: A small difference might just be due to random chance. The Goodness of Fit Test provides a statistical way to decide if the difference is significant or not.
MISTAKE: Not having a clear 'expected' distribution to compare against. | CORRECTION: You must always have a specific hypothesis or theoretical distribution (e.g., 'equal probability,' 'normal distribution') that you are testing your observed data against.
MISTAKE: Using the test when the sample size is very small. | CORRECTION: The Goodness of Fit Test, especially the Chi-Square version, works best with sufficiently large sample sizes. If expected frequencies are too low (e.g., less than 5), the test results might not be reliable.
Practice Questions
Try It Yourself
QUESTION: You expect students to choose three clubs (Chess, Drama, Science) equally. Out of 90 students, 25 choose Chess, 35 choose Drama, and 30 choose Science. What is the expected number of students for each club? | ANSWER: Expected Chess = 30, Expected Drama = 30, Expected Science = 30.
QUESTION: A spinner has 4 colours: Red, Blue, Green, Yellow. You expect it to land on each colour equally. You spin it 80 times and get: Red=15, Blue=25, Green=20, Yellow=20. Calculate the Chi-Square contribution for the 'Red' outcome. | ANSWER: Expected Red = 80/4 = 20. Contribution = (15 - 20)^2 / 20 = (-5)^2 / 20 = 25 / 20 = 1.25.
QUESTION: A dice is rolled 60 times. You observe the following counts: 1 (8 times), 2 (12 times), 3 (7 times), 4 (15 times), 5 (10 times), 6 (8 times). If the dice is fair, what is the expected count for each number? Calculate the total Chi-Square test statistic. | ANSWER: Expected count for each number = 60/6 = 10. Chi-Square = ((8-10)^2/10) + ((12-10)^2/10) + ((7-10)^2/10) + ((15-10)^2/10) + ((10-10)^2/10) + ((8-10)^2/10) = (4/10) + (4/10) + (9/10) + (25/10) + (0/10) + (4/10) = 0.4 + 0.4 + 0.9 + 2.5 + 0 + 0.4 = 4.6.
MCQ
Quick Quiz
What does the Goodness of Fit Test primarily help us determine?
If two different groups of data have the same average.
How well observed data matches an expected pattern or distribution.
If there is a relationship between two different variables.
The exact future value of a stock market share.
The Correct Answer Is:
B
The Goodness of Fit Test specifically checks how closely our actual observations fit a theoretical or expected distribution. Options A and C relate to other statistical tests, and D is about prediction, not fitting data patterns.
Real World Connection
In the Real World
Imagine a meteorologist at the Indian Meteorological Department (IMD) predicting monsoon rainfall patterns. They might use a Goodness of Fit Test to see if the actual rainfall data collected over a season matches their predicted distribution based on historical data or climate models. This helps them refine their models for more accurate future forecasts, impacting agriculture and disaster preparedness across India.
Key Vocabulary
Key Terms
OBSERVED DATA: The actual information or counts you collect from an experiment or survey. | EXPECTED DATA: The theoretical counts or frequencies you would anticipate based on a hypothesis or known distribution. | CHI-SQUARE (χ^2) TEST STATISTIC: A single number calculated in the test that summarizes the difference between observed and expected data. | DISTRIBUTION: The way data is spread out or arranged, showing how often different values occur.
What's Next
What to Learn Next
Once you understand Goodness of Fit, you can explore other statistical tests like the Chi-Square Test for Independence. This next step will teach you how to check if two different factors or variables are related to each other, building on your ability to analyze data patterns.


