Transcription

A Review of BasicStatistical Concepts1The record of a month’s roulette playing at Monte Carlo canafford us material for discussing the foundations of knowledge.—Karl PearsonI know too well that these arguments from probabilities areimposters, and unless great caution is observed in the use of them,they are apt to be deceptive.—Plato (in Phaedo)IntroductionIt is hard to find two quotations from famous thinkers that reflect moredivergent views of probability and statistics. The eminent statisticianKarl Pearson (the guy who invented the correlation coefficient) was soenthralled with probability and statistics that he seems to have believed thatunderstanding probability and statistics is a cornerstone of human understanding. Pearson argued that statistical methods can offer us deep insightsinto the nature of reality. The famous Greek philosopher Plato also hadquite a bit to say about the nature of reality. In contrast to Pearson, though,Plato was skeptical of the “fuzzy logic” of probabilities and central tendencies. From Plato’s viewpoint, we should only trust what we can know withabsolute certainty. Plato probably preferred deduction (e.g., If B then C) toinduction (In my experience, bees seem to like flowers).Even Plato seemed to agree, though, that if we observe “great caution,”arguments from probabilities may be pretty useful. In contrast, somemodern nonstatisticians might agree with what the first author’s father,Bill Pelham, used to say about statistics and probability theory: “Figures1

2INTERMEDIATE STATISTICScan’t lie, but liars sure can figure.” His hunch, and his fear, was that “youcan prove anything with statistics.” To put this a little differently, a surprising number of thoughtful, intelligent students are thumbs-down on statistics. In fact, some students only take statistics because they have to (e.g., tograduate with a major in psychology, to earn a second or third PhD). Ifyou fall into this category, our dream for you is that you enjoy this bookso much that you will someday talk about the next time that you get totake—or teach—a statistics class.One purpose of this first chapter, then, is to convince you that KarlPearson’s rosy view of statistics is closer to the truth than is Bill Pelham’sjaded view. It is possible, though, that you fully agree with Pearson, butyou just don’t like memorizing all those formulas Pearson and companycame up with. In that case, the purpose of this chapter is to serve as aquick refresher course that will make the rest of this book more useful. Ineither event, no part of this book requires you to memorize a lot of complex statistical formulas. Instead, the approach emphasized here is heavilyconceptual rather than heavily computational. The approach emphasizedhere is also hands-on. If you can count on your fingers, you can countyour blessings because you are fully capable of doing at least some of theimportant calculations that lie at the very heart of statistics. The handson approach of this book emphasizes logic over rote calculation, capitalizes on your knowledge of everyday events, and attempts to pique yourinnate curiosity with realistic research problems that can best be solvedby understanding statistics. If you know whether there is any connectionbetween rain and umbrellas, if you love or hate weather forecasters, andif you find games of chance interesting, we hope that you enjoy at leastsome of the demonstrations and data analysis activities that are containedin this book.Before we jump into a detailed discussion of statistics, however, wewould like to briefly remind you that (a) statistics is a branch of mathematics and (b) statistics is its own very precise language. This is very fittingbecause we can trace numbers and, ultimately, statistics back to the beginning of human language and thus to the beginning of human writtenhistory. To appreciate fully the power and elegance of statistics, we need togo back to the ancient Middle East.How Numbers and Language Revolutionized Human HistoryAbout 5,000 years ago, once human beings had began to master agriculture, live in large city states, and make deals with one another, anunknown Sumerian trader or traders invented the cuneiform writingsystem to keep track of economic transactions. Because we live in aworld surrounded by numbers and written language, it is difficult for us

Chapter 1   A Review of Basic Statistical Conceptsto appreciate how ingenious it was for someone to realize that writingthings down solves a myriad of social and economic problems. WhenBasam and Gabor got into their semimonthly fistfight about whetherGabor owed Basam five more or six more geese to pay for a newlyweaned goat, our pet theory is that it was an exasperated neighbor whofinally got sick of all the fighting and thus proposed the cuneiform writing system. The cuneiform system involved making marks with a stylusin wet clay that was then dried and fired as a permanent record of economic transactions. This system initially focused almost exclusively onwho had traded what with whom—and, most important, in what quantity. Thus, some Sumerian traders made the impressive leap of impressing important things in clay. This early cuneiform writing system wasabout as sophisticated as the scribbles of your 4-year-old niece, but itquickly caught on because it was way better than spoken languagealone.For example, it apparently wasn’t too long before the great-greatgreat-grandchild of that original irate neighbor got a fantastically brilliant idea. Instead of drawing a stylized duck, duck, duck, duck to represent four ducks, this person realized that four-ness itself (like two-nessand thirty-seven-ness) was a concept. He or she thus created abstractcharacters for numbers that saved ancient Sumerians a lot of clay. Wewon’t insult you by belaboring how much easier it is to write and verifythe cuneiform version of “17 goats” than to write “goat, goat, goat, goat,goat, goat, goat, goat, goat, goat, goat, goat, goat, goat, goat, goat . . .” ohyeah “. . . goat,” but we can summarize a few thousand years of humantechnological and scientific development by reminding you that incredibly useful concepts such as zero, fractions, p (pi), and logarithms, whichmake possible great things such as penicillin, the Sistine Chapel, andiPhones, would have never come about were it not for the developmentof abstract numbers and language.It is probably a bit more fascinating to textbook authors than to textbook readers to recount in great detail what happened over the course ofthe next 5,000 years, but suffice it to say that written language, numbers,and mathematics revolutionized—and sometimes limited—human scientific and technological development. For example, one of the biggest rutsthat brilliant human beings ever got stuck into has to do with numbers. Ifyou have ever given much thought to Roman numerals, it may havedawned on you that they are an inefficient pain in the butt. Who thoughtit was a great idea to represent 1,000 as M while representing 18 as XVIII?And why the big emphasis on five (V, that is) in a base-10 number system?The short answer to these questions is that whoever formalized Romannumbers got a little too obsessed with counting on his or her fingers andnever fully got over it. For example, we hope it’s obvious that the Romannumerals I and II are stand-ins for human fingers. It is probably less obvious3

4INTERMEDIATE STATISTICSthat the Roman V (“5”) is a stand-in for the “V” that is made by yourthumb and first finger when you hold up a single hand and tilt it outwarda bit (sort of the way you would to give someone a “high five”). If you dothis with both of your hands and move your thumbs together until theycross in front of you, you’ll see that the X in Roman numerals is, essentially, V V. Once you’re done making shadow puppets, we’d like to tellyou that, as it turns out, there are some major drawbacks to Roman numbers because the Roman system does not perfectly preserve place (the waywe write numbers in the ones column, the tens column, the hundredscolumn, etc.).If you try to do subtraction, long division, or any other procedurethat requires “carrying” in Roman numerals, you quickly run into serious problems, problems that, according to at least some scholars,sharply limited the development of mathematics and perhaps technology in ancient Rome. We can certainly say with great confidence that,labels for popes and Super Bowls notwithstanding, there is a good reason why Roman numerals have fallen by the wayside in favor of thenearly universal use of the familiar Arabic base-10 numbers. In ourfamiliar system of representing numbers, a 5-digit number can never besmaller than a 1-digit number because a numeral’s position is even moreimportant than its shape. A bank in New Zealand (NZ) got a painfulreminder of this fact in May 2009 when it accidentally deposited 10,000,000.00 (yes, ten million) NZ dollars rather than 10,000.00(ten thousand) NZ dollars in the account of a couple who had appliedfor an overdraft. The couple quickly fled the country with the money(all three extra zeros of it).1 To everyone but the unscrupulous couple,this mistake may seem tragic, but we can assure you that bank errors ofthis kind would be more common, rather than less common, if we stillhad to rely on Roman numerals.If you are wondering how we got from ancient Sumer to modern NewZealand—or why—the main point of this foray into numbers is that lifeas we know and love it depends heavily on numbers, mathematics, andeven statistics. In fact, we would argue that to an ever increasing degree inthe modern world, sophisticated thinking requires us to be able understand statistics. If you have ever read the influential book Freakonomics,you know that the authors of this book created quite a stir by using statistical analysis (often multiple regression) to make some very interestingpoints about human behavior (Do real estate agents work as hard for youas they claim? Do Sumo wrestlers always try to win? Does cracking downon crime in conventional ways reduce it? The respective answers appear tobe no, no, and no, by the way.) So statistics are important. It is impossibleto be a sophisticated, knowledgeable modern person without having atleast a passing knowledge of modern statistical methods. Barack Obamaappears to have appreciated this fact prior to his election in 2008 when he

Chapter 1   A Review of Basic Statistical Concepts5assembled a dream team of behavioral economists to help him getelected—and then to tackle the economic meltdown. This dream teamrelied not on classical economic models of what people ought to do but onempirical studies of what people actually do under different conditions.For example, based heavily on the work of psychologist Robert Cialdini,the team knew that one of the best ways to get people to vote on electionday is to remind them that many, many other people plan to vote (Can yousay “baaa”?).2So if you want a cushy job advising some future president, or a moresecure retirement, you would be wise to increase your knowledge of statistics. As it turns out, however, there are two distinct branches of statistics,and people usually learn about the first branch before they learn about thesecond. The first branch is descriptive statistics, and the second branch isinferential statistics.Descriptive StatisticsStatistics are a set of mathematical procedures for summarizing andinterpreting observations. These observations are typically numerical orcategorical facts about specific people or things, and they are usuallyreferred to as data. The most fundamental branch of statistics is descriptive statistics, that is, statistics used to summarize or describe a set ofobservations.The branch of statistics used to interpret or draw inferences about a setof observations is fittingly referred to as inferential statistics. Inferentialstatistics are discussed in the second part of this chapter. Another way ofdistinguishing descriptive and inferential statistics is that descriptive statistics are the easy ones. Almost all the members of modern, industrializedsocieties are familiar with at least some descriptive statistics. Descriptivestatistics include things such as means, medians, modes, and percentages,and they are everywhere. You can scarcely pick up a newspaper or listen toa newscast without being exposed to heavy doses of descriptive statistics.You might hear that LeBron James made 78% of his free throws in 2008–2009 or that the Atlanta Braves have won 95% of their games this seasonwhen they were leading after the eighth inning (and 100% of their gameswhen they outscored their opponents). Alternately, you might hear theresults of a shocking new medical study showing that, as people age,women’s brains shrink 67% less than men’s brains do. You might hear ameteorologist report that the average high temperature for the past 7 dayshas been over 100 F. The reason that descriptive statistics are so widelyused is that they are so useful. They take what could be an extremely largeand cumbersome set of observations and boil them down to one or twohighly representative numbers.

6INTERMEDIATE STATISTICSIn fact, we’re convinced that if we had to live in a world withoutdescriptive statistics, much of our existence would be reduced to a hellishnightmare. Imagine a sportscaster trying to tell us exactly how well LeBronJames has been scoring this season without using any descriptive statistics.Instead of simply telling us that James is averaging nearly 30 points pergame, the sportscaster might begin by saying, “Well, he made his first shotof the season but missed his next two. He then made the next shot, thenext, and the next, while missing the one after that.” That’s about as efficient as “goat, goat, goat, goat. . . .” By the time the announcer had documented all of the shots James took this season (without even mentioninglast season), the game we hoped to watch would be over, and we wouldne