Intro to Statistical Inference Estimation Each slide has its own narration in an audio file. For the explanation of any slide click on the audio icon to start it. Professor Friedman's Statistics Course by H & L Friedman is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Statistical Inference involves: Estimation Hypothesis Testing Both activities use sample statistics (for example, X) to make inferences about a population parameter (). ). Estimation

2 Estimation Why dont we just use a single number (a point estimate) like, say, X to estimate a population parameter, ). ? The problem with using a single point (or value) is that it will very probably be wrong. In fact, with a continuous random variable, the probability that the variable is equal to a particular value is zero. So, P(X =) = 0. ). ) =) = 0. 0. This is why we use an interval estimator.

We can examine the probability that the interval includes the population parameter. Estimation 3 Confidence Interval Estimators How wide should the interval be? That depends upon how much confidence you want in the estimate. For instance, say you wanted a confidence interval estimator for the mean income of a college graduate:

You might have 100% confidence 95% confidence 90% confidence 80% confidence 0% confidence That the mean income is between $0 and $ $35,000 and $41,000 $36,000 and $40,000 $37,500 and $38,500 $38,000 (a point estimate) The wider the interval, the greater the confidence you will have in it as containing the true population parameter ). .

Estimation 4 Confidence Interval Estimators To construct a confidence interval estimator of ). , we use: X Z /n (1-) confidencen (1-) confidence) confidence where we get Z from the Z table. When we dont know we should really be using a different table (future lectures will cover this) but, often, if n is large (say n30), we may use s instead since we assume that it is close

to the value of . Estimation 5 Confidence Interval Estimators To be more precise, the is split in half since we are constructing a two-) confidencesided confidence interval. However, for the sake of simplicity, we call the z-) confidencevalue Z rather than Za/2 . /2 /2

-Z/2 Z/2 Estimation 6 Question You work for a company that makes smart TVs, and your boss asks you to determine with certainty the exact life of a smart TV. She tells you to take a random sample of 100 TVs. What is the exact life of a smart TV made by this company?

Sample Evidence: n =) = 0. 100 X =) = 0. 11.50 years s =) = 0. 2.50 years Estimation 7 Answer Take 1 Since your boss has asked for 100% confidence, the only answer you can accurately provide is: -) confidence to + years. After you are fired, perhaps you can get your job back by explaining to your boss that statisticians cannot work with 100% confidence if they are working with data from a sample. If you want 100% confidence, you must take a

census. With a sample, you can never be absolutely certain as to the value of the population parameter. This is exactly what statistical inference is: Using sample statistics to draw conclusions (e.g., estimates) about population parameters. Estimation 8 The Better Answer n =) = 0. 100 X =) = 0. 11.50 years S =) = 0. 2.50 years at 95% confidence: 11.50 1.96*(2.50/n (1-) confidence100) 11.50 1.96*(.25)

11.50 .49 The 95% CIE is: 11.01 years -) confidence-) confidence-) confidence-) confidence 11.99 years [Note: Ideally we should be using but since n is large we assume that s is close to the true population standard deviation.] Estimation 9 The Better Answer Interpretation We are 95% confident that the interval from 11.01 years to 11.99 years contains the true population parameter, ). . Another way to put this is, in 95 out of 100

samples, the population mean would lie in intervals constructed by the same procedure (same n and same ). Remember the population parameter (). ) is fixed, it is not a random variable. Thus, it is incorrect to say that there is a 95% chance that the population mean will fall in this interval. Estimation 10 EXAMPLE: Life of a Refrigerator The sample: n =) = 0. 100 X =) = 0. 18 years s =) = 0. 4 years Construct a confidence interval estimator (CIE) of the true population mean life (), at

each of the following levels of confidence: (a)100% (b) 99% (c) 95% (d) 90% (e) 68% Estimation 11 EXAMPLE: Life of a Refrigerator Again, in this example, we should ideally be using but since n is large we assume that s is close to the true population standard deviation. It should be noted that s2 is an unbiased estimator of 2: E(s2) =) = 0. 2 2 =) = 0.

s2 =) = 0. Estimation 12 EXAMPLE: Life of a Refrigerator (a) 100% Confidence [ =) = 0. 0, Z =) = 0. ] 100% CIE: years + years (b) 99% Confidence =) = 0. .01, Z =) = 0. 2.575 (from Z table) 18 2.575 (4/n (1-) confidence100) 18 1.03 99% CIE: 16.97 years 19.03 years (c) 95% Confidence =) = 0. .05, Z =) = 0. 1.96 (from Z table)

18 1.96 (4/n (1-) confidence100) 18 0.78 95% CIE: 17.22 years 18.78 years Estimation 13 EXAMPLE: Life of a Refrigerator (d) 90% Confidence =) = 0. .10, Z =) = 0. 1.645 (from Z table) 18 1.645 (4/n (1-) confidence100) 18 0.66 90% CIE: 17.34 years 18.66 years (e) 68% Confidence =) = 0. .32, Z =) = 0. 1.0 (from Z table) 18 1.0 (4/n (1-) confidence100) 18 0.4 68% CIE: 17.60 years 18.40 years

Estimation 14 Balancing Confidence and Width in a CIE How can we keep the same level of confidence and still construct a narrower CIE? Lets look at the formula one more time: X Z /n (1-) confidencen The sample mean is in the center. The more confidence you want, the higher the value of Z, the larger the half-) confidence width of the interval. The larger the sample size, the smaller the half-) confidencewidth, since we divide by n (1-) confidencen. So, what can we do? If you want a narrower interval, take a larger sample. What about a smaller standard deviation? Of course, this depends on

the variability of the population. However, a more efficient sampling procedure (e.g., stratification) may help. That topic is for a more advanced statistics course. Estimation 15 Key Points Once you are working with a sample, not the entire population, you cannot be 100% certain of population parameters. If you need to know the value of a parameter certainty, take a census.

The more confidence you want to have in the estimator, the larger the interval is going to be. Traditionally, statisticians work with 95% confidence. However, you should be able to use the Z-) confidencetable to construct a CIE at any level of confidence. Estimation 16 More Homework for you. Do the rest of the problems in the lecture notes. Estimation 17