# INFERENTIAL STATISTICS - University of Minnesota Duluth INFERENTIAL STATISTICS Samples are only estimates of the population Sample statistics will be slightly off from the true values of its populations parameters Sampling error: The difference between a sample statistic and a population parameter Probability theory Permits us to estimate the accuracy or representativeness of the sample The Catch-22 of Inferential Statistics

When we collect a sample, we know nothing about the populations distribution of scores We can calculate the mean (x-bar) & standard deviation (s) of our sample, but and are unknown The shape of the population distribution (normal or skewed?) is also unknown Probability Theory Allows Us To Answer: What is the likelihood that a given sample statistic accurately represents a population parameter? Sample N = 150 = ??? (N= N= Thousands) Number of serious crimes committed in year prior to prison for inmates entering

the prison system X=9.6 Sampling Distribution (a.k.a. Distribution of Sample Outcomes) OUTCOMES = proportions, means, etc. From repeated random sampling, a mathematical description of all possible sampling event outcomes And the probability of each one Permits us to make the link between sample and population What is the probability that a sample finding is reflects the population? Is something that is true of a sample statistic likely to be

true of a population parameter? Relationship between Sample, Sampling Distribution & Population POPULATION SAMPLING DISTRIBUTION (Distribution of sample means, proportions, or other outcomes) SAMPLE Sampling Distribution: Characteristics Central tendency Sample means will cluster around the population mean Since samples are random, the sample means should be

distributed equally on either side of the population mean The mean of the sampling distribution is always equal to the population mean Shape: Normal distribution Central Limit Theorem: Regardless of the shape of a raw score distribution (sample or population) of an interval-ratio variable, the sampling distribution will be approximately normal, as long as sample size is 100 Sampling Distribution: Characteristics Dispersion: Standard Error (SE) Measures the spread of sampling error that occurs when a population is sampled repeatedly Same thing as standard deviation of the sampling distribution

Tells exactly how much error, on average, should exist between the sample mean & the population mean Formula: / NN However, because usually isnt known, s (sample standard deviation) is used to estimate population standard deviation Sampling Distribution Standard Error Law of Large Numbers: The larger the sample size (N), the more probable it is that the sample mean will be close to the population mean

In other words: a big sample works better (should give a more accurate estimate of the pop.) than a small one Makes sense if you study the formula for standard error Sampling Distribution Applet 1. Estimation S ta tis tic a l M e th o d s D e s c r ip tiv e S ta tis tic s In fe r e n tia l S ta tis tic s E s tim a tio n

ESTIMATION H y p o th e s is T e s tin g Introduction to Estimation Estimation procedures Purpose: To estimate population parameters from sample statistics Using the sampling distribution to infer from a sample to the population Most commonly used for polling data 2 components: Point estimate Confidence intervals

Estimation Point Estimate: Value of a sample statistic used to estimate a population parameter Confidence Interval: A range of values around the point estimate Confidence Interval .546 Confidence Limit (Lower) .58 Point Estimate

.614 Confidence Limit (Upper) Example CNN Poll (CNN.com; Feb 20, 2009): Slight majority thinks stimulus package will improve economy The White House's economic stimulus plan isn't a surefire winner with the American public, but a majority does think the recovery plan will help. According to a new poll, fifty-three percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the

economy. On an individual level, there was less hope for improvement. According to the poll, 67 percent said it would not help them personally. The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points. Estimation POINT ESTIMATES

(another way of saying sample statistics) CONFIDENCE INTERVAL a.k.a. MARGIN OF ERROR Indicates that over the long run, 95 percent of the time, the true pop. value will fall within a range of +/- 3 Point estimates & confidence interval should be reported together

but a majority does think the recovery plan will help, according to a new poll. Fiftythree percent said the plan will improve economic conditions, while 44 percent said it won't stimulate the economy. . The Poll was conducted Wednesday and Thursday (Feb 18-19, 2009), with 1,046 people questioned by telephone. The survey's sampling error is plus or minus 3 percentage points.

Estimation1 : Pick Confidence Level Confidence LEVEL Probability that the unknown population parameter falls within the interval Alpha ()) The probability that the parameter is NOT within the interval ) is the odds of making an error Confidence level = 1 - ) Conventionally, confidence level values are almost always 95%or 99% Procedure for Constructing an Interval Estimate 2. Divide the probability of error equally into the upper and lower tails of the

distribution (2.5% error in each tail with 95% confidence level) 0.9 Find the corresponding 5 Z score .025 .025 -1.96 Z scores 1.96 Procedure for Constructing an Interval

Estimate 3. Construct the confidence interval Proportions (like the eavesdropping poll example): Sample point estimate (convert % to a proportion): Fifty-three percent said the plan will improve economic conditions 0.53 Sample size (N) = 1,046 Formula 7.3 in Healey Numerator = (your proportion) (1- proportion) 95% confidence level (replicating results from article) 99% confidence level intervals widen as level of confidence increases Example 1: Estimate for the economic recovery poll

p = .53 (53% think it will help) Z = 1.96 (95% confidence interval) N = 1046 (sample size) What happens when we Recalculate for N = 10,000 N back to original, recalculate for p. = .90 Back to original, but change confidence level to 99%