# Stats-Questions-2-1

Suppose you were hired to determine if college students could tell the difference in taste between Sprite and Sierra Mist. You conducted a study by randomly sampling 90 students from a large lecture on the Oregon State University campus. Each student was individually brought to a tasting room where three cups with lids were placed in front of them. Two of the cups contained one of the brands of soda (say Sprite) and the other cup contained the other brand of soda (Sierra Mist). Students did not know which brand was in which cup. Using a straw that was placed through the lid of each cup, each student was to taste the soda in each cup and report to you which cup they thought contained the different brand.

Suppose 52 of the 90 students were able to correctly identify the cup that contained the different brand of soda. Is that evidence that college students could tell the difference in taste between Sprite and Sierra Mist?

To answer this question of interest, answer the following questions.

What is the random variable in this problem? Does the random variable have a binomial distribution?Explain.

Here the random variable is the number of students who can identify the brands of soda correctly. Now as there are only two outcomes (they can either identify correctly or not), the probability to identify for each student can be considered equal, and each students answer is independent of each other so all the conditions of binomial distribution has been fulfilled. Thus the random variable have a binomial distribution.

What does p represent in the context of this study?

In this problem p represents the proportion of student (population proportion) who were able to correctly identify the cup that contained the different brand of soda.

Calculate the sample proportion, ˆp , of people in the study who could identify the cup that contained the different brand of soda. Show work.

The sample proportion here is,

Sample proportion = Number of student who correctly identifiedNumber of student studied=5290 = 0.57778

i] State the null and alternative hypotheses in statistical notation AND in words!!

Here we want to test whether the students can correctly identify the brands or not. Now if they make guesses then there is 0.5 probability that they will be correct i.e. 0.5 proportion student will say correctly. So here need to test whether the proportion is significantly larger than 0.5 or not, thus the null and alternative hypotheses are,

Ho: p ≤ 0.5 against Ha: p> 0.5.

ii] Assume that the observations are independent of each other. Which hypothesis test is the

appropriate one to use in this situation? Why?

Here we have a large sample (sample size more than 30) so we should use a Z test in this case.

iii] Perform the appropriate hypothesis test. Report the test-statistic and p-value.

Here,

Test statistic = 5290-0.50.5*(1-0.5)90=1.4757P-value = P(Z >1.4757) = 0.07

iv] Based on the p-value, answer the question of interest in a complete sentence in the context of the

problem.

Here based on the p-value we are failing to reject the null hypothesis at 5% significance level. This informs us that at 5% significance level we can conclude that the students can’t correctly identify the brands.

f. calculate the standard deviation for the one-sample z-methods

i] By hand, calculate the standard deviation of the sample proportion ( ˆp σ ) used to perform the

hypothesis test. Show work.

The standard deviation for the testing purpose is the standard error of the sample proportion. And here,

Standard error = hypothesized proportion*(1-hypothesized proportion)sample size = 0.5*0.590 = 0.0527

ii] By hand, calculate the standard deviation of the sample proportion used in the calculation of the confidence interval. Show work.

The standard deviation for confidence interval is,

Standard deviation = sampleproportion*(1-sampleproportion)sample size = 5290*1-529090 = 0.0521

iii] Why does the formula for the standard error when doing a confidence interval use the sampleproportion instead of the hypothesized value of the population proportion?

When we calculate the confidence interval we assume we have no idea about the population i.e. in that case we don’t have any information about the population proportion and hypothesized proportion. And we create a confidence interval which contains population proportion with some certain degree of reliability.

That is why the formula for the standard error when doing a confidence interval uses the sample

proportion instead of the hypothesized value of the population proportion.