# Statistic Project, Sampling

Student’s Name

Instructor

Subject

Date of Submission

Statistic Project, Sampling

11. Population = 828 claims

a.) SRS – sample 85 each with 215 fields

Claims1142257

Error43210

fiyi438220

we estimate mean(Y) with mean(y)

n= ∑fi = 10, ∑fiyi=37

mean(y) = ∑fiyi/∑fi = 37/10 = 3.7

The standard error mean(y) isδy= [var mean(y)]1/2= [S2/n( 1 – f)]1/2

S2= 1/(n-1) ∑[yi- mean(y)]2

S2= 1/9 * (523 – 372/10) = 42.9

Estimated δy=0.45289429

b.) Estimated (Y) = Ny= 3.7 * 828 = 3063.8

standard error of Estimated Y = N δy = 828 * 0.45289429

c.) Sample – 18275 fields

pop = 178,020 fields

Number of errors in sample = 10

Mean(y) = 3.7

14.

a.) School1234

Smokers female729457511800

Smokers

Smokers105627

n= ∑fi = 46, ∑fiyi= 1495, mean(y)= 32.5

S2=1/(n-1)[∑fiyi2 – ∑(fiyi)2/∑fi]

=1/99[ 52525 – 14952/46] = 39.7727273

b.) The (1-α)100% confidence interval of mean(y) is:

mean(y) ±Z α/2* S/n1/2[(N- n/N]1/2

∑f = 46, ∑fiyi=1495, ∑(fiyi)2, N=2550, n=100 , S=6.30656224.

32.5 ± Z 0.025* (6.365224/1001/2)[(2550 – 100)/2550]1/2

32.5 ± 1.96* (6.365224/1001/2)[(2550 – 100)/2550]1/2 = 33.72287683

c.) The 100 (1 – α)% C.I for the population total

Nmean(y)±Z α/2* Ns/n1/2 *[(N-n) /N]1/2

2550 ± 1.96 * (255 * 6.30656224) /(100)1/2) [(2550 – 100)/2550]1/2= 2709

16.

> school1=read.table(“c:/xyz/data.csv”,sep=”,”,header=T)

> school1

returnf

1 1

2 1

3 1

4 0

5 1

6 9

7 1

8 1

9 0

10 0

11 1

12 0

13 0

14 1

15 0

16 1

17 0

18 0

19 1

20 0

21 0

22 9

23 0

24 0

25 0

26 1

27 0

28 0

29 0

30 1

31 1

32 0

33 1

34 1

35 0

36 0

37 1

38 1

39 1

40 1

a.) >sum (school1)

[1] 37

Percentage of parents who returned the forms: 37/78 *100 =47.44%

>sum (read.table(“c:/xyz/school2.csv”,sep=”,”,header=T))

[1] 37

Percentage of parents who returned the forms: 37/238 *100 =15.54%

>sum (read.table(“c:/xyz/school3.csv”,sep=”,”,header=T))

[1] 31

Percentage of parents who returned the forms: 31/261 *100 =11.88%

>sum (read.table(“c:/xyz/school4.csv”,sep=”,”,header=T))

[1] 18

Percentage of parents who returned the forms: 18/174 *100 =10.34%

>sum (read.table(“c:/xyz/school5.csv”,sep=”,”,header=T))

[1] 48

Percentage of parents who returned the forms: 48/236 *100 =20.34%

>sum (read.table(“c:/xyz/school6.csv”,sep=”,”,header=T))

[1] 22

Percentage of parents who returned the forms: 22/188 *100 =11.70%

>sum (read.table(“c:/xyz/school7.csv”,sep=”,”,header=T))

[1] 24

Percentage of parents who returned the forms: 24/113 *100 =21.24%

>sum (read.table(“c:/xyz/school8.csv”,sep=”,”,header=T))

[1] 84

Percentage of parents who returned the forms: 84/170 *100 = 49.41%

>sum (read.table(“c:/xyz/school9.csv”,sep=”,”,header=T))

[1] 50

Percentage of parents who returned the forms: 50/296 *100 =16.89%

>sum (read.table(“c:/xyz/school10.csv”,sep=”,”,header=T))

[1] 43

Percentage of parents who returned the forms: 43/207 *100 =20.77%

c.) > sum (read.table(“c:/xyz/consent.csv”,sep=”,”,header=T))

[1] 339

Percentage of parents who returned the forms: 339/9962 *100 =3.40%

0.95 *339= 322

b.)

The procedure is as follows:

• The weights wi are the inverses of the selection probabilities ψi.

• The weighted estimator of the population total is 1st ψ = ∑witi.

• We calculate ψ (estimate) for each.

Sample n=18275 pop N=178020Var(Y) =(N2S2/n)*(N-n)/n

9.)

Procedure

– Suppose the number of samples, n is greater than 1 and we sample with replacement.

-This implies πi = 1− (1 − ψi)n

-The probability that an item i is selected on the first draw is the same as the probability that item i is selected on any other draw.

-Sampling with replacement gives us n independent estimates of the population total, one for each unit in sample.

-We average these n estimates.

-Estimated variance is variance of the estimates divided by n

-N = 52 classes of states in the USA

– Mi students in class i (i = 1 to 52)

– Values of Mi range from 1 to 3142.

-We want a sample of 10 states.

-In this case ψi=Mi/3142

units size Cumulative size Y=Population 1

2

3

4

5

6

7

8

9

10

.

.

.

52 67

25

15

75

58

63

8

3

1

67

.

.

.

159 67

92

107

182

240

303

311

314

315

382

.

.

.

3142 Select a random number R between 1 and (TN) =52 by using random number table.

4137511

587766

3832368

2394253

30895356

3464675

3279116

690884

585221

13482716

464736

If Ti-1≤R≤Ti, then the ith unit is selected with probability Xi/52,

i = 1, 2,…, 52.

Repeat the procedure 10 times to get a sample of size 10.

First Draw: Draw a random number between 1 and 3142.

Suppose it’s 167

T3≤132≤T4, Unit Y is selected and Y4 = 2394253 enters in the sample.

2. Second Draw: Draw a random number between 1 and 64

Suppose it is 308

T6< 38 < T7 , Unit 7 is selected and Y7 = 3279116

Enters in the sample and so on.

This procedure is repeated till the sample of required size is obtained.

10.)

units size Cumulative size Y=Population

1

2

3

4

5

6

7

8

9

10

67

25

15

75

58

63

8

3

1

67

67

92

107

182

240

303

311

314

315

382

4137511

587766

3832368

2394253

30895356

3464675

3279116

690884

585221

13482716

Works Cited

Chambers, John M. Software for Data Analysis: Programming with R. Berlin: Springer New York, 2008. Print.

Gardener, Mark. Beginning R: The Statistical Programming Language. Indianapolis: John Wiley & Sons, 2012. Print.

Gentleman, Robert. R Programming for Bioinformatics. Boca Raton: CRC Press, 2009. Print.

Matloff, Norman S. The Art of R Programming: Tour of Statistical Software Design. San Francisco: No Starch Press, 2011. Print.