# Hypothesis Testing

## What is Hypothesis Testing?

The hypothesis is a tentative statement that assumes the relationship between the variables under investigation. Hypothesis testing refers to the procedure conducted for the investigation of the reliability of the formulated hypothesis. This procedure is generally based on scientific guidelines and methods. Sometimes, it is difficult to research the entire population, hence, the best option is choosing a certain sample population and then applying the statistical methods to analyze the large population. Hypothesis testing is a kind of inferential statistic, which helps us in drawing the interference about the large population by conducting experiments or research on the sample population. In this article, we’ll discuss the types of tests in hypothesis testing, different steps involved in hypothesis testing, and the types of errors that may occur while performing the hypothesis testing.

## Types of Hypothesis Tests

Hypothesis tests are broadly divided into two categories, i.e., parametric tests, and Non-parametric tests.

### 1. Parametric Tests

Parametrics tests are conducted if the sample follows the normal distribution, i.e., the mean zero means and 1 variance. Student’s t-test and Analysis of the variance test (ANOVA) are examples of parametric tests.

### 2. Non-Parametric Tests

Non-parametric tests are conducted when the sample does not follow the normal distributions. Wilcoxon Signed Rank Test, Kruskal Walls Test, and Mann-Whitney U test are examples of Non-parametric tests.

Choosing the accurate statistical test for the hypothesis testing is a crucial step. There can be two cases of hypothesis testing, i.e., ‘one sample,’ and ‘two samples’ based on the number of samples that needed to be compared in the hypothesis testing. One sample means there is only one sample that needs to be compared with the given value, while two samples mean there exist at least two or more than two samples for the comparison. The possible tests in the case of two samples testing involve differences and correlations between the samples. Both the cases can either be paired (dependent samples)  or unpaired (independent samples). The following images represent the type of statistical test that one should conduct on the basis of the type of different criteria.

Hypothesis testing generally consists of the following steps,

### Step 1: State the Null hypothesis (Ho) and the Alternate Hypothesis (Ha)

The researcher needs to first formulate the research hypothesis. This hypothesis is then restated as the alternate hypothesis and the null hypothesis to test the research hypothesis mathematically. The alternate hypothesis predicts the relationship between the variables while the null hypothesis rejects the relationship between the variables. Following are some of the main differences between the null hypothesis and the alternative hypothesis.

Differences between the Null Hypothesis and the Alternative Hypothesis:

• An alternative hypothesis represents that there exists some statistical relationship among the variables under investigation, while the null hypothesis is the opposite of the alternative hypothesis, and it represents that there does not exist any association or relation between the variables.
• The researcher focuses on proving the relationship among the variables in the case of the alternative hypothesis, while the researcher focuses on invalidating the relationship among the variables in the case of the null hypothesis.
• The discrepancies between the hypothesis and the resultant data occur by chance only in case the null hypothesis holds true, but it does happen by chance if the alternative hypothesis holds true.
• The null hypothesis denotes zero effect on the variables due to the other variables while the alternative hypothesis denoted the existence of some effect on the variables due to the other variables.
• A null hypothesis is represented by the symbol Ho, and the alternative hypothesis is represented by the symbol H1 or Ha.
• Null Hypothesis Example:
There does not exist any relationship between watching television and violence among adults., i.e., H0: µ = 0
Alternative Hypothesis Example:
There exist a relationship between watching television and violence among adults, i.e., HA: µ ≠ 0

The main purpose of Hypothesis testing is to check whether the null hypothesis holds true or not. Generally, if the null hypothesis is rejected it automatically means the acceptance of the alternative hypothesis, but one should keep in mind that it does not hold true in every case. The researcher begins the testing by assuming the null hypothesis is false. For example, in the judiciary, the culprit is treated as innocent until proven guilty. Then, after various trials, this assumption that the person is innocent is being tested and it is checked whether the person is innocent or guilty. Similarly, in hypothesis testing, the null hypothesis is then tested whether it is true or false.

Let us understand it through an example, we assume the null statement ‘the average time children of a particular nation spent on watching television is five hours per week.’ We need to conduct hypothesis testing. We would take a sample population say 50 children from that particular nation and record the time (in hours) they spend watching television. Then we calculate the mean, called sample mean and compare it with the population mean we mentioned in the example. We should understand that the reason we are examining the null hypothesis is that we assume the null hypothesis is wrong. As we assume the null hypothesis is wrong, we check for the other possibilities, i.e., alternative hypothesis. Which in this case could be, ‘the average time the children of a particular nation spent on watching television is less than or more than five hours.’

### Step 2: Data Based on the Research Hypothesis

To assure the validity of the hypothesis testing, collecting the appropriate research data and effectively performing sampling are the crucial steps. If the data collected by the researcher is not much relevant to the targeted population (population under investigation) it would be difficult to deduce the statistical interference. For example, if the researcher wants to analyse the association of IQ level and gender. To make an accurate analysis, the sample should consist of an equal number of both the groups, i.e., an equal number of males and females under investigation, and should consider the various other factors such as economic class and availability of opportunities, that may influence the dependent variable, i.e., IQ level. The researcher may also take the help of various census data of different countries and regions related to the area under investigation, to strengthen the research.

### Step 3: Set the Decision Criteria

The decision criteria for the approval or rejection of the hypothesis testing is set by stating the level of significance for the given test. It is also termed as the significance level, it refers to the judgement criterion on the basis of which the researcher takes the decisions about the null hypothesis. Let us understand this with the analogy of the courtroom. The jury does not directly take the decision that whether the person is innocent or guilty. Instead,  the jury checks all the evidence. The pieces of evidence may either be in the favour of the person proving that he/she is innocent or either may be against him/her proving he/she is guilty. Similarly, the researchers conduct the study and the data obtained either supports the assumption (accepts null hypothesis) or does not support the assumption (null hypothesis rejected). The jury decides whether the pieces of evidence shown are enough to prove the person guilty or not, in the same manner, the data is collected in the hypothesis testing and to prove that the null hypothesis is wrong based on the probability (criteria) of the choosing the sample population. The significance level is usually set at 5 per cent in research related to behavioural studies.

### Step 4: Performing the Statistical Tests

Most of the statistical tests of hypothesis testing are based on the comparison of the ‘between-group variance’ (level of variations among the different categories) and the ‘within-group variance’ (spreading the data within a group). In case the between-group variance is large such that the groups are small or does not overlap with each other, this is the case when the p-value would be small. This implies that there is a lesser probability of the occurrence of the difference among the various groups is due to chance. On the other hand, if the within-group variance is high and the between-group variance is low, the statistical test will represent it with a high P-value. This implies that there is a high probability that the variations among the groups are by chance. The type of statistical test the researcher choose depends upon the type of statistical data collected in the research.

For example, In case the researcher is looking for the difference in the IQ level of the males and the females, then depending upon the data the researchers will conduct the one-tailed t-test to analyse whether males possess the more IQ or the females. This test can provide a rough estimate of the difference in the average IQ of both groups, i.e., males and females. The p-value represents the probability of occurrence of this difference in case the null hypothesis with no difference is approved.

### Step 5: Decision Making

Now, the researcher has to decide whether the null hypothesis is right or wrong on the basis of the results of the statistical tests. Generally, the researchers rely on the p-value given by the statistical tests to make the decision, but in some cases, they rely on the predetermined significance level value for rejecting the null hypothesis, which is 0.05, i.e., when the chances of occurrences of these results are less than 5 per cent given that the null hypothesis is true. Sometimes, researchers choose the significance value as 0.01, i.e., 1 per cent to reduce the risk of occurrence of type I error (explained later in this article). P-value refers to the likelihood of getting the sample outcome provided the null hypothesis is true, and its value lies between o and 1, i.e., the p-value can not be negative. The p-value is compared to the significance value for deciding the criteria (as mentioned in step 3). To sum up, the researcher needs to calculate the P-value, if the p-value is small, it means the chances of the null hypothesis being rejected is high. P-value is compared by a significant value say alpha; if the p-value is less than alpha, it means the alternative hypothesis is valid and the null hypothesis is rejected. The decision made by the researcher whether to retain or reject the null hypothesis is called the significance. The researcher reaches the significance (rejecting the null hypothesis) when the p-value is less than or equal to 0.5, and the researcher fails to reach the significance (null hypothesis retained) when the p-value is more than 0.5.

### Step 6: Presenting the Findings of the Study

Presenting the result of the whole study is the final step of hypothesis testing. The hypothesis testing results are to be mentioned in the discussion section of the research paper. The researcher needs to provide a summary of the hypothesis test conducted and the results obtained, for example, the approximated difference between the average means of the groups and the p-value associated with them. The researcher also needs to mention whether the results obtained by the hypothesis testing supports the formulated initial hypothesis or not, i.e., whether the null hypothesis is approved or rejected.

Following is the example of presenting the hypothesis testing results formally,

In the comparison of the mean IQ level of the males and the females, it is found that the average difference is 90, and the p-value associated with it is o.ooo2. Hence, the null hypothesis is rejected, i.e., the males do not have a higher IQ than the females.

## Types of Errors in Decision Making

In Hypothesis testing, generally, two types of errors may occur, i.e., Type 1 error, and Type 2 error

### Type I Error

In Type, I error The null hypothesis is rejected even though it is true. The chances of committing the Type I error is known as the significance level or alpha, and it is represented by the symbol  ‘α.’

### Type II Error

In Type II error, The Null hypothesis is accepted even though it is false in reality. The chances of committing the Type II error is known as the Beta and is represented by the symbol β. The chances of not committing the Type II error is known as the Power of the test. Let us understand these two errors below.

As mentioned in step 5 of the hypothesis testing, the researcher has to decide whether the null hypothesis holds or not. In hypothesis testing, we are not conducting the study on the entire population, instead, we conduct the study on the sample population, hence there is a possibility that the researcher may reach the wrong conclusion. There are four possibilities of decision making, i.e.,

• The decision that the null hypothesis is retained is correct.
• The decision that the null hypothesis is retained is incorrect.
• The decision that the null hypothesis is rejected is correct.
• The decision that the null hypothesis is rejected is incorrect.

The following table represents the four decision-making outcomes of the hypothesis testing.

Retaining the Null Hypothesis:

If the researcher takes the decision that the null hypothesis is retained, he/she can either be correct or incorrect. The correct decision would be if the researcher retains the true null hypothesis, this decision is known as the null finding or the null result. This decision simply means that whatever the researcher has assumed at the beginning of the hypothesis testing is true, i.e., the value predicted in the null hypothesis is right. However, if the researcher retains the false null hypothesis it is a wrong decision. This incorrect decision refers to the Type II error. Every decision that the researcher makes, there exist the possibility of occurrence of the type II error. The primary reason for the researcher committing this error could be his/her beliefs about the previous notions that may be false in reality.

Rejecting the Null Hypothesis:

The researcher can either be correct or incorrect if he/she makes the decision to reject the null hypothesis. It would be an incorrect decision if the researcher decides to reject the null hypothesis that is true in reality. This is known as the Type I error. Like Type II error, there exist the probability of committing the Type I error by the researcher while making the decision. The researcher who is committing this error is rejecting the previous belief of the truth, that is actually true. To reduce this error the null hypothesis is assumed true at the start of the hypothesis testing. As the null hypothesis is assumed to be true, the type I error can be controlled by stating the significance value; the value of significance that the researcher would set is the maximum probability of committing a Type I error. This value is generally set at .o5, which is compared to the p-value. As discussed earlier in this article, the null hypothesis is rejected if the probability of occurrence of Type I error is less than 5 per cent, i.e., the value of p is less than .o5, else the null hypothesis is retained. The decision is correct if the false null hypothesis is rejected. This decision is often known as the power of decision making because the researcher aims for this decision. The researcher begins verifying the null hypothesis only because he/she believes that it is false.

## Methods of Hypothesis Testing

### 1. Frequentist Hypothesis Testing

• The traditional method of hypothesis testing, i.e., frequentist hypothesis testing is a method wherein the researcher makes the assumption about the study based on the currently available data. Null Hypothesis significance testing is one of the popular subtypes of the frequentist approach. In this method, the researcher formulated the two hypotheses, i.e., the null hypothesis and the alternative hypothesis on the basis of the current data. The frequentist hypothesis testing is the simplest and the most widely used method since it is originated in the mid-1950s.

### 2. Bayesian Hypothesis Testing

• Bayesian hypothesis testing is the latest method of hypothesis testing. This method assumes states to examine a particular hypothesis on the basis of both the previously available data samples, commonly known as prior probability and the current data that can result in the possibility of a hypothesis. In other words, in the Bayesian method, the researcher depends on both the prior and the posterior probability for the hypothesis testing. The important component of the Bayesian hypothesis testing method is the ‘Bayes factor.’ Bayes factor indicates the plausibility ratio between the alternative hypothesis and the null hypothesis which is formulated at the beginning of the hypothesis testing.
• The important feature of Bayesian testing is that it helps in verifying whether the findings of the study are reliable or not. For example, a study conducted by a group of researchers showed that the cell phone towers is the reason behind the twenty people belonging to the same locality suffering from leukaemia. Another study proved this study wrong and claimed that there is no association of leukaemia with cell phone towers. This second study stated that the occurrence of leukaemia to the people belonging to the same locality can occur by chance also. This means that ignoring that anything could happen by chance also could be one of the reasons behind the failure of the first study. The reliability of the results of the study is usually checked by the P-value in the traditional hypothesis testing (as discussed earlier in this article). If the p-value is equal to or less than 0.o5 it means that the results of the study are reliable, this method is also known as the Non-Bayesian method. This method relies on the assumption that how likely the particular outcome has happened when the experiments are repeated again and again, i.e., it looks objectively that whether the particular experiment can be repeated or not, but the Non-Bayesian method views the subjectively. This method considers the beliefs and faiths of the person related to the study.
• In the Non-Bayesian method, the researcher needs to repeat the sampling again and again, but it is not required in the Bayesian method. The first step of hypothesis testing, i.e., ‘stating the hypothesis’ is the main difference between the Bayesian and the Non-Bayesian method. Bayesian assumes hypothesis considering only the current data while Non-Bayesian approach assumes hypothesis considering both the prior and posterior probabilities.
• The pros of the bayesian method are that it considers the previous knowledge about the data, and also consider the personal beliefs while analysing the results. However the cons of the bayesian method are that it is not justifiable to consider the prior knowledge and data in the current study, also the calculation is somewhat difficult in the case of the Bayesian method as compared to the non-bayesian method.