Evaluating the Role of Bias Definition of Bias Bias is a systematic error that results in an incorrect (invalid) estimate of the measure of association A. Can create spurious association when there really is none (bias away from the null) B. Can mask an association when there really is one (bias towards the null) C. Bias is primarily introduced by the investigator or study participants Definition of Bias D. Bias does not mean that the investigator is

prejudiced. E. Bias can arise in all study types: experimental, cohort, case-control F. Bias occurs in the design and conduct of a study. It can be evaluated but not fixed in the analysis phase. G. Two main types of bias are selection and observation bias. Direction of Bias: Towards the Null Can you think of an example when this would happen?

Direction of Bias: Towards the Null Bias towards the null also occurs for RR below 1.0. Direction of Bias: Away from the Null Can you think of an example of when this will happen? Direction of Bias: Away from the Null Selection Bias A. Results from procedures used to select subjects into a study that lead to a result different from what would have been obtained from the entire

population targeted for study. B. Most likely to occur in case-control or retrospective cohort because exposure and outcome have occurred at time of study selection C. Selection bias can also occur in prospective cohort and experimental studies from differential loss to follow-up because this impacts which subjects are selected for analysis Can you think of an example? Selection Bias in a Case-Control Study A. Occurs when controls or cases are more (or less) likely to be included in study if they have been

exposed B. Result: Relationship between exposure and disease observed among study participants is different from relationship between exposure and disease in individuals who would have been eligible but were not included. The odds ratio from a study that suffers from selection bias will incorrectly represent the relationship between exposure and disease in the overall study population Selection Bias in a Case-Control Study Question: Do PAP smears prevent cervical cancer? Cases diagnosed at a city hospital. Controls randomly sampled from household in same city by canvassing the neighborhood on foot. Here is the true relationship:

Cervical Cancer Cases Controls Had PAP smear 100 150 Did not have PAP smear 150 100

Total 250 250 What is the prevalence of cervical cancer in the population? What measure of association do you use here? Calculate it. Selection Bias in a Case-Control Study Question: Do Pap smears prevent cervical cancer? Cases diagnosed at a city hospital. Controls randomly sampled from household in same city by canvassing the neighborhood on foot. Here is the true relationship:

Cervical Cancer Cases Controls Had PAP smear 100 150 Did not have PAP smear 150 100

Total 250 250 OR = 100/150 150/100 = (100)(100) / (150)(150) = 0.44 There is a 56% lower odds of cervical cancer among women who had PAP smears as compared to women who did not. The odds ratio for cervical cancer comparing women who had a pap smear to those who did not have a Pap smear was 0.44. Selection Bias in a Case-Control Study Recall: Cases from the hospital and controls come

from the neighborhood around the hospital. Now for the bias: Only controls who were at home at the time the researchers came around to recruit for the study were actually included in the study. Women at home were less likely to work and less likely to have regular checkups and Pap smears. Therefore, being included in the study as a control is not independent of the exposure. The resulting data are as follows: Selection Bias in a Case-Control Study Cervical Cancer Cases Controls

Had PAP smear 100 100 Did not have PAP smear 150 150 Total 250

250 Calculate the Odds Ratio. Interpret the Odds Ratio. Selection Bias in a Case-Control Study Cervical Cancer Cases Controls Had PAP smear 100

100 Did not have PAP smear 150 150 Total 250 250 OR = 100/150 100/150 = (100)(150) / (150)(100) = 1.0

There is no association between Pap smears and the risk of cervical cancer. The odds of cervical cancer among women who had Pap smears was the same as in women who did not have Pap smears. Selection Bias in a Case-Control Study Ramifications of using women who were at home during the day as controls: These women were not representative of the whole study population that produced the cases. They did not accurately represent the distribution of exposure in the study population that produced the cases, and so they gave a biased estimate of the association.

Selection Bias in a Cohort Study Occurs when selection of exposed and unexposed subjects is not independent of outcome can only occur in retrospective cohort study Why? Example: A retrospective study of an occupational exposure and a disease in a factory setting. exposed and unexposed groups are enrolled on the basis of prior employment records. The records are old, so records are missing. If exposed people without disease are more likely to have their records lost, then there will be an overestimate of association between exposure and disease.

Selection Bias in a Cohort Study Here is the true relationship if all records were available Diseased Non-Diseased Total Exposed 50 950 1000

Unexposed 50 950 1000 What is the prevalence of disease? What measure of association do you use? Calculate it Selection Bias in a Cohort Study Here is the true relationship if all records were available

Diseased Non-Diseased Total Exposed 50 950 1000 Unexposed 50

950 1000 RR = (50/1000) / (50/1000) = 1.00 Selection Bias in a Cohort Study 200 records were lost, all among exposed who did not get the disease Exposed Diseased 50

Non-Diseased Total 750 800 Unexposed 50 950 Calculate the Risk Ratio. Interpret the Risk Ratio. 1000

Selection Bias in a Cohort Study 200 records were lost, all among exposed who did not get the disease Exposed Diseased 50 Non-Diseased Total 750 800 Unexposed 50

950 1000 RR = (50/800) / (50/1000) = 1.25 Exposed individuals had a 25% greater risk of getting the disease than unexposed individuals. If more records were lost in this category (exposed subjects who did not get the disease), the bias would be even greater. Selection Bias: What are the solutions? Little or nothing can be done to fix this bias once it has occurred. You need to avoid it when you design and

conduct the study, for example, by using the same criteria for selecting cases and controls, obtaining all relevant subject records, obtaining high participation rates, and taking in account diagnostic and referral patterns of disease. Observation Bias (information bias) An error that arises from systematic differences in the way information on exposure or disease is obtained from the study groups Results in participants who are incorrectly classified as either exposed or unexposed or as diseased or not diseased Occurs after the subjects have entered the study

Several types of observation bias: recall bias, interviewer bias, and differential and non-differential misclassification Observation Bias: Recall Bias People with disease remember or report exposures differently (more or less accurately) than those without disease. Can result in over- or under-estimate of measure of association. (see following slide) Solutions: Use controls who are themselves sick; use standardized questionnaires that obtain complete information, mask subjects to study hypothesis

Classical Recall Bias: Cases Underreport Exposure TRUTH Case Control OBSERVED STUDY DATA Case Control Exposed 40 20

Exposed 30 20 Unexposed 60 80 Unexposed

70 80 Total 100 100 100 100 Odds Ratio: 2.7

Odds Ratio: 1.7 Truth: The odds of disease are 2.7 times higher in the exposed when compared to the unexposed. Observation Bias: Interviewer Bias Systematic difference in soliciting, recording, interpreting information. Can occur whenever exposure information is sought when outcome is known (as in case-control), or when outcome information is sought when exposure is known (as in cohort study). Solutions: mask interviewers to study hypothesis and disease or exposure status of subjects, use standardized

questionnaires or standardized methods of outcome (or exposure) ascertainment Observation Bias: Misclassification Subjects exposure or disease status is erroneously classified. Two types of misclassification non-differential differential Non-differential Misclassification Inaccuracies with respect to disease classification are independent of exposure. Can you think of an example?

Or, inaccuracies with respect to exposure are independent of disease. Can you think of an example? Will bias towards the null if the exposure has two categories. Non-differential misclassification makes the groups more similar. When interpreting study results, ask yourself these questions Given conditions of the study, could bias have occurred? Is bias actually present? Are consequences of the bias large enough to

distort the measure of association in an important way? Which direction is the distortion? Is it towards the null or away from the null? From file:///I:/0/BMTRY%20736%202015/Pai_Lecture7_Information%20bias.pdf Analysis of Case-Control Studies EXAMPLE: Case control study of spontaneous abortion and prior induced abortion (OUTCOME = spontaneous abortion; EXPOSURE = prior induced abortion) Prior Induced Abortion

Yes No Case (Spon. Ab.) 42 107 Control (Livebirth) 247 825 Analysis of case-control studies

Odds of being a case among the exposed = 42/247 (a/b) Odds of being a case among the unexposed = 107/825 (c/d) Odds ratio = [(a/b) / (c/d)] = (42/247) / (107/825)] = 1.3 Women with a history of induced abortion had a 30% increased odds of having a spontaneous abortion compared to women who never had an induced abortion. Question

Suppose that a case-control study was conducted among men in the US in order to find out whether a mothers use of hormones during pregnancy influenced her sons risk of developing testicular cancer later in life. Investigators selected 500 cases who were hospitalized for testicular cancer and 1000 controls. The study found that mothers in 90 cases and 50 controls had used hormones during pregnancy. Question Create a two by two table? Can you calculate risk of having testicular cancer from this data?

Calculate the odds ratio. Interpret the odds ratio. What is the purpose of the control group in a case-control study? Should you chose incident or prevalent cases for a case-control study? Does it matter? EVALUATING THE ROLE OF RANDOM ERROR Hypothesis Testing The assumption made about the result before you start the test is the null hypothesis (H0): RR=1, OR=1, RD=0. You are assuming that the H0 is true, NOT some alternative

hypothesis (HA) Definition of P value: Given that H0 is true, the pvalue is the probability of seeing the observed result, and results more extreme, by chance alone. Hypothesis Testing P value ranges from 0 to 1. The particular statistical test that is used depends on type of study, type of measurement, etc. Statistical Conventions p<=.05 is an arbitrary cutoff for statistical significance If p<= .05, we say results are unlikely to be due to chance, and we reject H0 in favor of HA. If p>.05, we say that chance is a likely explanation for

the finding and we do not reject H0. However, you cannot exclude chance no matter how small a P value is. In addition, you cannot mandate chance no matter how large a P value. More on the P value. P values reflect two things: the magnitude of the association and sample size (sample variability) It is possible to have huge sample where even a trivial risk increase or decrease is statistically significant It is possible to have a small sample where even a large risk increase or decrease is not statistically significant

Confidence Intervals Another approach to quantifying sampling variability is confidence intervals The actual measure of association given by the data is the point estimate. The point estimate has variability that can be expressed mathematically, just as a mean has a variance and standard deviation. Given sampling variability, it is important to indicate the precision of the point estimate, i.e., give some indication of sampling variability This is indicated by the confidence interval. Confidence Intervals

Confidence interval: Range within which the true magnitude of effect lies with a stated probability, or a certain degree of assurance (usually 95%) Strict statistical definition: If you did the study 100 times and got 100 point estimates and 100 CIs, in 95 of the 100 results, the true point estimate would lie within the given interval. In 5 instances, the true point estimate would not lie within the given interval. Confidence Intervals Width of confidence interval indicates amount of sampling variability in the data.

Width is determined by variability in the data and an arbitrary "certainty factor (usually 95%, but you can choose any % you want) If you chose 99% do the confidence intervals narrow or widen? The P value tells you the extent to which the null hypothesis is compatible with the data. The CI tells you much more: the range of hypotheses that are compatible with the data. Practice Exercise Practice Exercise for interpreting P values and confidence intervals Five studies were conducted on the same exposuredisease relationship.

Assume that there is no bias and confounding in these studies. The following results were seen (on next slide). Practice Exercise Study Sample Size Relative Risk P value 95% CI A B C D E 100 500 1000

2000 2500 2.0 2.0 3.5 3.0 3.2 .10 .06 .02 .015 .001

0.8 - 4.2 0.9 3.3 2.6 4.5 2.2 3.5 2.8 3.6 Practice Exercise Interpret each study result. Include interpretations of the relative risk, P value and confidence interval. What is the relationship between the sample size and the width of the confidence interval? What is the relationship between the sample size and P value? Which gives more information: the P value or

confidence interval? Practice Exercise Is there a relationship between the sample size and the relative risk? Are the five study results consistent on the basis of statistical significance? Are the five study results consistent on the basis of the point estimates and confidence intervals? Which is misleading the P value or confidence interval?