Statistics Project, Statistics Homework

Your Name

Instructor

Subject

Date of Submission

Statistics Project, Statistics Homework

Question 11.)a.)> Project=read. table (“c:/xyz/data.csv”, sep=”,”,header=T)

> attach (project)

b.)> hist (physician,col=”blue

c.)> sd (physician)

d.) [1] 1591.87

e.) Regression

(X, Y) ~ N(mx, my, sx^2, sy^2, r), r being the correlation, not

Covariance), where mx and my are the respective means of x and y, whereas sx^2 and sy^2 are the respective variances.

Question 36. a and b) mvrnorm (n = 1, mu, Sigma, tol = 1e-6, empirical = FALSE, EISPACK = FALSE)

sigma-a positive-definite symmetric matrix specifying the covariance matrix of the variables.

n-the number of samples required.

mu-a vector giving the means of the variables

tol-tolerance (relative to largest variance) for numerical lack of positive-definiteness in Sigma.

empirical-logical. If true, mu and Sigma specify the empirical not population mean and covariance matrix.

EISPACK-logical. Set to true to reproduce results from MASS versions prior to 3.1-21

Sigma <- matrix(c(1,1,1,1),2,2)

Sigma

var(mvrnorm(n=500, rep(0, 30), Sigma))

var(mvrnorm(n=500, rep(0, 30), Sigma, empirical = TRUE))

mvrnorm(500,rep(0,3),0,0,1,1)

1.) No. These estimates are not valid since it does not take into consideration of the total number voters in the survey.

2.)

a.) Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters and a random sample of these clusters are selected. In this case, the group selected is a few clinics in Chicago area as opposed to the entire clinics in Chicago. All observations in the selected clinics are included in the sample. This may be as a result of limited capital to conduct the survey, limited amount of time and some other factors.

First Stage Sampling (FSU): Chicago

Second Stage Sampling (SSU): Selected clinics

Once the data from the questionnaire has been complied, sections of household which have accessed to a gun are noted down and the section of those that do not access to a gun also noted. The number of households that has accessed to a gun is divided by the total number of data gotten from the questionnaire, after which the result is multiplied by 100% to get the proportion of the households which have access to a gun.

Standard error of the proportion of children whose household has access to a gun is estimated from the average of the proportion of the same households that has access to a gun.

b.) The sampling population is the total number of parents who attend the selected clinics in Chicago. This sampling procedure does result in a representative sample of households with children due to the following reasons;

More testing is required

It’s not as accurate as the simple random sample especially if the sample is the same

This is a second-stage cluster sampling.

4a.) Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters and a random sample of these clusters are selected. All observations in the selected clusters are included in the sample. Jacoby and Handlin choose 26 journals from a list of 1285 scholarly journals. Cluster sampling is typically used when the researcher cannot get a complete list of the scholarly journals they wish to study but can get a complete list of groups or ‘clusters’ of the journals. It is also used when a random sample would produce a list of subjects so widely scattered that surveying them would prove to be far too expensive, for example, examining all the 1285 scholarly journals.

4 b. ) > Data=(read.table(“c:/xyz/jay.csv”,sep=”,”,header=T))

> attach(Data)

> sum(nonprob)

[1] 137

> sum(Data)

[1] 288

> sum(prob)

[1] 3

> sum(numemp)

[1] 148

> mean(nonprob)

[1] 5.269231

4c) Proportion that used non-probability method = 137/288

> sum(nonprob)/sum(Data)

[1] 0.4756944

> sd(nonprob)

[1] 10.09775

From the above results, it seems that experts have confidence in using non-probability sampling. This is seen by the number of those who prefer using the non-probability method being overwhelmingly more than those that do prefer the probability method.

Works Cited

Gentleman, Robert. R Programming for Bioinformatics. Boca Raton: CRC Press, 2009. Print.

Matloff, Norman S. The Art of R Programming: Tour of Statistical Software Design. San Francisco: No Starch Press, 2011. Print.