# Categorical Data Analysis

Categorical Data Analysis Categorical data arise whenever counts (as opposed to measurements) are made. Subjects (sample items) are classified as belonging to one of a set of categories and the numbers in the categories (the frequencies) are recorded.

Example Eye colours: eye colours of males visiting an optician, in four categories Colour Frequency observed A 89

B 66 C D 60 85 Example Tonsils: Relationship between nasal carrier status for Streptococcus pyogenes and size of tonsils

among 1398 children aged 0-15 years. Normal Enlarged Much enlarged Total Carriers

19 29 24 72

Noncarriers 497 560 269 1326

516 589 293 1398

Total Example Prussian cavalry deaths: numbers of cavalry soldiers killed by horsekicks in each of 14 units of the Prussian army over a 20-year period (1875-1894). Number killed 0

5 1 2 3 4

Frequency observed 144 91 32

11 2 Total 0 280

Often we wish to decide whether the categorical variables follow some well known distribution A chi-squared test will provide a method of testing the hypothesis that a data set follows a particular distribution. Often we wish to decide whether the

categorical variables follow some well known distribution A chi-squared test will provide a method of testing the hypothesis that a data set follows a particular distribution. It works by summing the quantity (Observed Expected)2/Expected The chi-squared test in the R program is fairly

limited it copes well with testing whether there is a significant relationship between nasal carrier status for Streptococcus pyogenes and size of tonsils among 1398 children aged 0-15 years (as in the second example) but gives us a problem with the other two. Consider now data from Standard and Poors 500 - an index of 500 of the largest, most

actively traded stocks on the New York Stock Exchange These data are available in R as sp500.R from the module website. Technique: To look at any one of the variables in a data frame such as sp.500, the \$ sign is helpful.

change in returns from day to day. We suspect that the logs of these changes may follow a normal distribution. These are placed in an R vector by using the command >d=diff(log(sp500\$adjclose)) The chisq function that is pre-defined in R is

not powerful enough to test the values of d to see if they conform to a normal distribution, so a program is written instead. We wish to test whether a normal distribution with the same mean and standard deviation of d will look similar to this histogram. Calculate, for example, the approximate

expected number between -0.04 and -0.02 by This can be repeated and made more sophisticated with more than 4 comparisons by writing a program. The one considered has 100 comparisons.

## Recently Viewed Presentations

• Multiple Sclerosis: What You Need to Know About the Disease * * * * * * The U.S. Food and Drug Administration has recently approved the marketing of Ampyra™ (dalfampridine, formerly known as fampridine SR, from Acorda Therapeutics) for its...
• See Exhibit A - Differentiating "Traditional" and "Red Dot" Letters) Twenty-four Scrabble® letter holders. (4 each included in regular Scrabble game x 6 games needed) Twelve letter holders should have labels representing each of the 100 Scrabble® letter tiles in...
• Stereo Computation using Iterative Graph-Cuts Vision Course - Weizmann Institute * * * * * * * * * * * * * * * * * * * * * * * * Energy Minimization Algorithm Start with arbitrary...
• Methods of Product Costing Cost Accumulation System defines cost object method of assigning costs to production Valuation Method specifies how product costs will be measured Six Possibilities Job Order Actual Normal Standard Process Actual Normal Standard Tracking Material Requisition Form...
• Types of Web Cookies. E-business (constructed by Dr. Hanh Pham) Session cookie: in-memory cookie or transient cookie, exists only in temporary memory while the user navigates the website. Persistent cookie: is stored until expiration date, often used as tracking cookies....
• Kyle R. Stephenson, B.S., Tierney K. Ahrold, M.A. & Cindy M. Meston, Ph.D. Department of Psychology, University of Texas at Austin Demographic Characteristics of the Participant Sample Hierarchical Linear Regression Outputs; Sexual Motives as Predictors of Sexual Satisfaction Table 1:...
• Must maintain a cumulative 2.0 GPA. PBED must be before the first day of class. ... Soldiers who do not obtain 2.0 GPA on a 4.0 scale, will be put on probation and if the next semester you do not...
• Without an HSC - 5 days 8 hour shifts augmenting flight line. CURRENT FLEET HSC FLOW. All 6 HSCs consume approximately the same amount of man hours and calendar days to complete. Workload for APG/Propulsion is offset to balance man...