This is an introduction to the two of the most used Statistical Tests, t-test and Chi-square test, and when we need to use these tests. These are the first statistical tests we do with our data in exploratory Data Analysis.

The t-test is used to determine if there is a statistically significant difference between the means of two groups. It’s used to produce a P value when you’re conducting hypothesis testing. You generally use a t-test when your null hypothesis is about a numeric value. Let’s look at an example of this. In the drug testing example

Take drug testing as an example, you might want to prove that a drug raises a certain blood test score by a significant amount.

In that case, your hypotheses are about the blood test score and those values are numeric, so a t-test is appropriate.

Your null hypothesis would be that there is no effect, or in more formal terms, that the mean blood test result of the group of people treated with the drug was not significantly different from the mean blood test result of the group of people who were not treated with the drug.

Your alternative hypothesis would be that the drug does have an effect. And that the mean blood test result of the group of people treated with the drug was significantly higher than the mean blood test result of those who did not receive treatment.

Now sometimes, we need to conduct testing of values that are categorical rather than numeric. In those cases, we can’t use the mean, because the values aren’t numeric. Instead, we’re trying to see if two categorical variables are associated with each other.

The Chi-square test produces a P value in these situations.  Chi is actually a Greek letter that uses the symbol shown here.

No alt text provided for this image

 You generally use a Chi-square test when you’re null hypothesis is about categorical variables.

 Let’s look at an example. Suppose that we have this data set about students at a university. And it includes information about whether each student attended a public or private school, and whether they earned no honors, honors, or high honors in school

No alt text provided for this image

We might want to know whether there’s an association between the type of school that each student attended and their honors performance in college. These are both categorical variables, so a Chi-square test is appropriate.

The null hypothesis, in this case, is that there is no association between the two variables, or in other words, that the type of school a student attended and their honors performance in college are independent.

The alternative hypothesis is that the two variables are associated, and are dependent upon each other in some way. There is a relationship between the type of school that a student attended and their performance in college.

In this article, we are not going down to mathematical details, but as a user of Data, we should know what are these basic tests and when we can use them.

In my Data Science course, I go into detail with real-life examples of how these tests are done and interpreted. I simplify important statistical concepts for Data Science enthusiasts.

If you have Any questions? or requirements for consultation do get in touch with me on the link below or connect@decodingdatascience.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Need help?