S7-SA3-0146
What is Chi-Square Test Introduction?
Grade Level:
Class 12
AI/ML, Physics, Biotechnology, FinTech, EVs, Space Technology, Climate Science, Blockchain, Medicine, Engineering, Law, Economics
Definition
What is it?
The Chi-Square (pronounced 'kai-square') Test is a statistical tool used to check if there's a significant difference between what we observe (actual results) and what we expect to see (predicted results). It helps us decide if differences in data are just by chance or if there's a real pattern or relationship.
Simple Example
Quick Example
Imagine you expect 50 boys and 50 girls to join a school sports club, but you observe 60 boys and 40 girls. The Chi-Square Test helps you figure out if this difference (60 vs 50 for boys, 40 vs 50 for girls) is just a random fluctuation or if there's a real reason why more boys joined.
Worked Example
Step-by-Step
Let's say a chai shop expects to sell 100 cups of masala chai and 50 cups of ginger chai in an hour. They actually sell 120 masala chai and 30 ginger chai. Is this difference significant?
Step 1: Write down Observed (O) and Expected (E) values.
Masala Chai: O = 120, E = 100
Ginger Chai: O = 30, E = 50
---
Step 2: Calculate (O - E) for each category.
Masala Chai: (120 - 100) = 20
Ginger Chai: (30 - 50) = -20
---
Step 3: Calculate (O - E)^2 for each category.
Masala Chai: (20)^2 = 400
Ginger Chai: (-20)^2 = 400
---
Step 4: Calculate (O - E)^2 / E for each category.
Masala Chai: 400 / 100 = 4
Ginger Chai: 400 / 50 = 8
---
Step 5: Sum these values to get the Chi-Square statistic.
Chi-Square = 4 + 8 = 12
---
Answer: The Chi-Square value is 12. A higher value generally means a bigger difference between observed and expected.
Why It Matters
Understanding Chi-Square is crucial in fields like AI/ML for checking model accuracy, in medicine for clinical trials, and in economics for market research. Data scientists use it to make sense of information and help businesses and scientists make better decisions, impacting everything from new medicines to better phone apps.
Common Mistakes
MISTAKE: Using Chi-Square for comparing averages or means. | CORRECTION: Chi-Square is for comparing frequencies or counts (how many times something happens), not for average values.
MISTAKE: Not having enough data points (small sample size). | CORRECTION: The Chi-Square test works best with larger sample sizes. If expected counts are too small (e.g., less than 5), the test might not be accurate.
MISTAKE: Confusing the Chi-Square value itself with a 'yes' or 'no' answer. | CORRECTION: The Chi-Square value needs to be compared to a critical value from a Chi-Square distribution table (based on degrees of freedom and significance level) to determine if the difference is statistically significant.
Practice Questions
Try It Yourself
QUESTION: A mobile game company expects 70% of players to choose Team A and 30% to choose Team B. Out of 100 new players, 80 chose Team A and 20 chose Team B. Calculate the (O - E) for Team A. | ANSWER: 10
QUESTION: Using the data from Q1, calculate the Chi-Square value for the game company's data. | ANSWER: Step 1: Expected Team A = 70, Observed Team A = 80. Expected Team B = 30, Observed Team B = 20. Step 2: (O-E) Team A = 10, (O-E) Team B = -10. Step 3: (O-E)^2 Team A = 100, (O-E)^2 Team B = 100. Step 4: (O-E)^2/E Team A = 100/70 = 1.43 (approx), (O-E)^2/E Team B = 100/30 = 3.33 (approx). Step 5: Chi-Square = 1.43 + 3.33 = 4.76 (approx).
QUESTION: A survey asked 200 students their favourite fruit. 100 chose Mango, 60 chose Apple, 40 chose Banana. If we expected an equal distribution (meaning 1/3 for each), calculate the Chi-Square value. | ANSWER: Step 1: Observed: Mango=100, Apple=60, Banana=40. Expected: Mango=200/3=66.67, Apple=66.67, Banana=66.67. Step 2: (O-E) Mango=33.33, Apple=-6.67, Banana=-26.67. Step 3: (O-E)^2 Mango=1110.89, Apple=44.49, Banana=711.29. Step 4: (O-E)^2/E Mango=1110.89/66.67=16.66, Apple=44.49/66.67=0.67, Banana=711.29/66.67=10.67. Step 5: Chi-Square = 16.66 + 0.67 + 10.67 = 28.00 (approx).
MCQ
Quick Quiz
What kind of data is the Chi-Square Test primarily used to analyze?
Averages of numerical data
Frequencies or counts of categorical data
Relationships between two continuous variables
Changes in data over time
The Correct Answer Is:
B
The Chi-Square test is designed to compare observed frequencies (counts) with expected frequencies in categories, making option B correct. It is not for averages (A), continuous variable relationships (C), or time-series data (D).
Real World Connection
In the Real World
Imagine a food delivery app like Swiggy or Zomato. They might expect customers in Delhi to order a certain percentage of North Indian, South Indian, and Continental food. If actual orders differ greatly, they can use the Chi-Square test to see if this difference is significant, helping them decide if they need to change their restaurant partnerships or marketing strategies in that city.
Key Vocabulary
Key Terms
OBSERVED FREQUENCY: The actual number of times an event occurred in an experiment or survey. | EXPECTED FREQUENCY: The number of times an event is predicted to occur based on a hypothesis or theory. | STATISTICAL SIGNIFICANCE: A result is statistically significant if it is unlikely to have occurred by chance. | CATEGORICAL DATA: Data that can be divided into groups or categories (e.g., types of fruit, yes/no answers).
What's Next
What to Learn Next
Great job understanding the basics of Chi-Square! Next, you can learn about 'Degrees of Freedom' and 'P-value'. These concepts will help you interpret the Chi-Square value you calculate and truly understand if your observed differences are significant or just random.


