Inferential Statistics & Hypothesis Testing


 What is Inferential Statistics?

    A statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. This analysis is done by inferring properties of a population, conducting a hypothesis test and deriving estimates based on the conditions of a specific hypothesis test. In this page I will go over specific hypothesis test and their test statistics. An example chart is below that shows what I will cover.



Before we get into these complex tests, we must go over some basics properties and terminology that make up theses tests so we can better understand them.

Estimation & Confidence Interval (C.I)

Estimation is the process of obtaining information about a parameter by using a statistic. An estimator is a statistical method used to calculate an estimate based on observable data. A good estimator gives estimates that are both accurate and precise. Accuracy is measured in terms of bias. Numerically, bias is the distance between the mean of the sampling distribution and the population mean. Precision is measured in terms of standard error.

Two types of estimates exist: point estimates and interval estimates. A point estimate is a single value estimate for a parameter. An interval estimate is a range of values that is likely to contain the parameter being estimated. Combined with a probability statement, an interval estimate is called a con0dence interval. The percentage in which the conCdence interval contains the parameter is called the con0dence level, which is denoted by 𝑐.

A confidence interval is accurate if the conCdence interval contains the true population parameter. A conCdence interval's precision refers to the width of the confidence interval.

A confidence interval is constructed by looking at the sample statistic and margin of error. A margin of error, denoted by 𝑚, is the range of values above and below the point estimate. Numerically,

𝑚 = (critical value)(standard error)
where the critical value, which depends on 𝑐 and the underlying distribution of the statistic, is the number of standard errors to be added to the point estimate. Thus,

estimate ± 𝑚 = estimate ± (critical value)(standard error)
The resulting interval is referred to as the 𝑐(100)% conCdence interval. That is, 𝑐(100)% of the time, the true value of the population

parameter will be in the 𝑐(100)% confidence interval when the same estimator is used, as shown in the animation below.


Confidence Interval for population means Example:

Question: Suppose the mean final grade for a High School Geometry course is estimated. The population standard deviation is 4. The grades for a randomly selected Geometry class with 9 people are: 76, 80, 82, 83, 83, 85, 85, 87 and 88. Find the 95% confidence interval for the mean final grade.

Our sample mean is calculated by:


Needed critical values to conduct most confidence intervals:





The 95% C.I  of margin of error is:



To concur:

The population mean final grade is generally unknown. The 95% C.I means that a 95% chance exists that the calculated interval, [ 80.609, 85.835], contains the true population mean final grade.


Hypothesis Tests

In statistics, a hypothesis is a statement that makes a claim about the parameters of one or more populations. Hypothesis testing is the formal process by which a hypothesis is retained or rejected. Hypothesis testing compares two competing hypotheses about a population, the null hypothesis and the alternative hypothesis.

A null hypothesis, denoted 𝐻, is a statement assumed to be true unless sujcient data indicates otherwise. Typically, a null hypothesis is a statement of equality between the true value of the population parameter and the hypothesized value or a statement of no difference between the parameters of two populations. Ex: The statement "The average salary of the residents of San Francisco is not different than the average salary of the residents of Austin" or "The average salary of the residents of San Francisco is the same as the average salary of the residents of Austin" is a null hypothesis.

In contrast, an alternative hypothesis, denoted 𝐻𝑎 , is a statement that contradicts 𝐻. Typically, an alternative hypothesis asserts that the true value of the population parameter is not the same as the hypothesized value or that the parameters for two populations are different. Ex: The alternative hypothesis corresponding to the null hypothesis above is "The average salary of the residents of San Francisco is different from the average salary of the residents of Austin, Texas."

An alternative hypothesis may be left-tailed, right-tailed, or two-tailed depending on the nature of the difference from the null hypothesis.

  • A left-tailed alternative hypothesis asserts that the value of a parameter is less than the value asserted in the null hypothesis.
  • A right-tailed alternative hypothesis asserts that the value of a parameter is greater than the value asserted in the null hypothesis. 
  • A two-tailed alternative hypothesis asserts that the value of a parameter is not equal to, that is, either less than or greater than the value asserted in the null hypothesis.


How to understand p-value:

In hypothesis testing, the probability of obtaining a result that is as extreme or more extreme than the data if the null hypothesis were true is known as the 𝑝-value. The 𝑝-value of a result is determined from the test statistic. If the 𝑝-value is less than a specified significance level, denoted by 𝛼, then two possibilities exist.

The null hypothesis is true and the observed data is relatively unusual with a sample statistic that is extreme simply due to chance. The null hypothesis is false and the alternative hypothesis provides a more reasonable explanation for the population parameter.

In most fields, 𝛼 = 0.05 is used most often as the significance level for hypothesis testing. Thus, the probability that a result with an extreme deviation from the null hypothesis is due to chance must be 5% or less for the result to be considered statistically significant.


Type 1 and 2 Errors:

type I error is the incorrect rejection of a true null hypothesis, and a type II error is the failure to reject a false null hypothesis. In other words, a type I error is a false positive and a type II error is a false negative.



How to summarize a hypothesis test:

1. Specify the null hypothesis 𝐻and the alternative hypothesis 𝐻𝑎 .

2. Specify the significance level 𝛼.

3. Collect the data.

4. Calculate the test statistic and the corresponding 𝑝-value.

5. Compare the 𝑝-value with the significance level. 

6. Determine whether to reject or fail to reject 𝐻0.


One-Sample hypothesis test for population mean:

Example:

A popular electronics website wants to determine whether a smartphone has an 7.8 hour battery life as claimed by the manufacturer in response to user complaints of poor battery life. The website sampled 10 smartphones with a mean battery life of 7.6 hours. The population standard deviation of the battery life is 𝜎 = 0.57 hours. Does sufficient evidence exist that the battery life of the smartphone is actually lower than the manufacturer's claim at a significance level of 𝛼 = 0.05?

Creation of our test:





Since the 𝑝-value is greater than the significance level 𝛼 = 0.05, insufficient evidence exists to support the hypothesis that the mean battery life of the smartphone is less than the manufacturer's claim.

Although the mean battery life of the 10 sampled smartphones is less than the manufacturer's claim, the lower mean could have occurred due to chance. However, the probability that the sample mean battery life is at most 7.6 hours is 13.3%, which is much higher than the probability of incorrectly rejecting the manufacturer's claim that the smartphone has a mean battery life of 7.8 hours. Thus, the lower sample mean can be most likely be attributed to chance.


One-Sample hypothesis test for population proportions:

Example:

The human sex ratio is the ratio of the number of males to the number of females within a certain age group. According to a 2002 study on sex ratios, the expected ratio of males to females is 106 to 100 or 0.515. Because of cultural norms and national health policies, some nations may have a much higher or much lower sex ratio. In a random sample of 189 people, 85 people are males. Does suficient evidence exist that the sex ratio of males to females in the population is different than expected at the 𝛼 = 0.05 significance level?

Creation of our test:




Since the 𝑝-value is greater than the significance level 𝛼 = 0.05, insufficient evidence exists to support the claim that the sex ratio in the population from which the same is drawn is different than the expected sex ratio of 0.515.

The 𝑝-value of a two-tailed test is twice that of a one-tailed test. Since the question is framed as a difference from the expected proportion, the area above 𝑧 = 1.79 is included in the 𝑝-value. Note that had the question stated that the sex ratio of males to females is lower instead of different, then the results would have been that sujcient evidence exists to conclude that sex ratio of males to females is lower than the expected ratio of 0.515 because the 𝑝-value for a one-tailed test is 0.037.


Analysis of Variance (ANOVA)


Analysis of variance (ANOVA) controls for the errors associated with comparing multiple population means. Analysis of variance (ANOVA) determines whether a statistically signiCcant difference exists among the means of three or more populations. Equivalently, ANOVA tests for an association between a categorical predictor variable and a response variable. 


One-Way ANOVA:

The one-way ANOVA hypothesis test follows the same process as previously discussed tests. The null hypothesis for a one-way ANOVA is that all of the group means are equal. Caution should be exercised when stating the alternative hypothesis because the negation of the null hypothesis does not say that all group means are unequal. Instead, the alternative hypothesis should state that two groups with unequal means exist. The rest of the hypothesis test involves finding the 𝐹 -statistic and the 𝑝-value to make a decision based on a significance level.


Example:

A teacher believes that the exams created for the class varies in dijculty because of the differences in mean exam scores. Does suficient evidence exist at the 𝛼 = 0.01 level to support the teacher's belief that the exams scores have different means? Using the ANOVA table I coded below:


Our Test:


The 𝑝-value that corresponds to 𝐹 = 3.857 is 𝑃(𝐹 ≥ 3.857) = 0.0103. Since the 𝑝-value is greater than the signiCcance level (0.0103 > 0.01), the null hypothesis is not rejected. That is, at the 𝛼 = 0.01 significance level, insufficient statistical evidence exists to support the claim that the mean exam scores are different.

The alternative hypothesis to an 𝐹 test is that at least two of the means are unequal. Although the means of all exam scores could be all be statistically different, the 𝐹  test cannot determine which of the exams have statistically different means.


Post-Hoc:

If the null hypothesis is not rejected, then no further work is necessary. However, if the null hypothesis is rejected, further analysis is required because the 𝐹 -test does not determine which groups have different means. Post-hoc analysis determines which groups have different means, which group has the highest or lowest mean, and other relationships between the groups.


Example:

Suppose the ExamScores dataset is tested at a signiCcance level of 𝛼 = 0.05. Thus, a post-hoc test should be performed to determine which two means are statistically equal because the null hypothesis that the means are statistically the same is rejected. Which two groups have statistically different means? Below is the output of the Tukey's HSD test.


The output above shows that the confidence interval for the comparison between the means of Exam1 and Exam3 does not contain 0. Thus, sufficient statistical evidence exist to support the claim that the means of Exam1 and Exam3 are different.



Chi-Squared explanation and examples coming soon.



Thank you,



Jose Hernandez












































Comments