|
Statistics for Beginners An informal survey of our members showed that many would like to improve their understanding of statistics. This is the first part of a three-part article. Evaluating Medical and Scientific Studies/Papers Understanding scientific/medical studies or research projects/papers requires a knowledge of a number of fundamental statistical concepts. The first concerns study design: Types of studies A case control study is a retrospective study in which individuals have already developed a certain condition. These individuals are compared to control individuals. As an example, let us imagine that we want to study the relationship between smoking and lung cancer. We would first identify individuals who have developed lung cancer. We would then compare them to control individuals. We could then evaluate whether smoking frequencies are different between the two groups. The major shortcomings of this type of study are that the study is retrospective, recall bias could exist, and unintentional/unknown bias can also be introduced in both the selection of cases and the selection of controls. Ideally the individuals are as similar as possible; however, unknown, but confounding, differences could exist. In a cohort study, groups of individuals are identified and followed over time. Continuing our example, we could identify individuals who smoke and those who do not, and follow them over time to see if the incidence of lung cancer between the two cohorts is different. Short comings of this type of study are the longitudinal nature (which may take years or decades for an individual to develop a disease or endpoint) and confounding influences. The randomized clinical trial (controlled clinical trial) is usually considered the gold standard for studies. Randomized clinical trials are prospective studies in which individuals are randomized to at least one of two groups and followed longitudinally over time. If the assignment is completely random (to both the study participants and to the researchers), it is called a double-blinded randomized trial. An example would be a study which randomly compares two drugs for the treatment of salmonellosis. Individuals can be assigned to receive either drug A or drug B (one of the drugs may even be a placebo). (The drug preparations should look the same. They should taste the same. They should be administered on the same dosing schedule, etc. for true blinding). There should be no way to distinguish between the two groups of study participants. The major advantage of this type of study is that the random nature of group assignment should eliminate unsuspected (or suspected) confounding influences. Case control and cohort studies are also referred to as observational studies because no intervention is attempted. Once you understand study design, the next areas to understand involve study assignment, assessment, analysis, interpretation and extrapolation. We have already seen that bias can be introduced in assignment of individuals to one group or another (especially in the observational studies). The assessment should involve an appropriate measurement (one that is relevant to the study). It also should be accurate and precise. Recording bias can make a measurement inaccurate. A precise measurement can also be made with an inaccurate test. For instance, if one is evaluating the incidence of gastritis relating to new medicines by upper GI series as opposed to endoscopy, measurements may be very precise, but they are probably not very accurate. Assessments should also be complete. For instance, individuals may drop out of a study. If those individuals are not included in final analysis (and they all died because of the new drug), the new drug would not be truly evaluated. Analysis of a study rests to a large extent on testing the statistical significance between groups. This occurs in five steps. 1. The first step is to state the hypothesis. This should be performed before collecting the data. A common pitfall of studies is to first collect the data and then to analyze the data for comparisons that reach statistical significance. Such a fishing expedition may uncover real differences, but it can also uncover differences related to chance comparisons. For instance, if one were to evaluate a hundred variables that are totally unrelated, one could assume by random distribution that 5 of those 100 or 1 in 20 will show a statistically significant difference (even though no such difference exist). 2. The second step is to formulate the null hypothesis. In this step, the investigators assume that no true difference exists between the study group and the control group. This is known as the null hypothesis. 3. The third step is to decide the statistically significant cut-off value. Usually, this is a cut-off value of 5%. This means that if a difference is shown between the two groups, that there is a 95% chance that that difference is true (or less than a 5% chance that the difference is due to chance alone). 4. It is only the fourth step in which the data are actually collected. 5. The fifth step applies statistical significance tests. In this step, the investigators determine the probability that a difference between a study and a control group would occur if no true difference existed in the larger population from which both the study and the control group populations were selected. This probability is known as the P value. This means that one is calculating the probability that the data would occur if the null hypothesis of no difference were true. There are a number of statistical tests which can be chosen to perform the analysis of data. Assuming that the correct statistical tests are chosen, one then attempts to reject the null hypothesis. If a difference (usually if a P value of (equal or less than 0.05, i.e. a 5% chance that the result is due to chance alone) is detected, then one can reject the null hypothesis. Obviously the higher the statistical significance, the more likely it is the null hypothesis can truly be rejected. For instance, the P value of <0.001 means that there is a <1:1,000 chance that the observed result is due to chance alone. Differences are usually expressed by comparing means or medians. Both are measures of the center of a distribution. For symmetrical distribution, the mean is a reliable measure of the center or average. The mean can be reported as the arithmetic mean. For instance, if we measure the antibody levels in a number of patients, one of who has a value of 2, and another has a value of 3, another has a value of 4, another has a value of 5, and another has a value of 20, the mean would be a summation divided by the number evaluated (2 + 3 + 4 + 5+ 20 [equals 34] divided by 5 = 6.8). If the data are not symmetrically distributed, data can be reported as a geometric mean. This lessens the likelihood that an "outlier" will skew data analysis. For instance, in the above example, the individual who had the value of 20 skews the data upward but is weighted equally to the other values. The geometric mean is calculated by multiplying the values and taking the root to the n (2 x 3 x 4 x 5 x 20 taken to the root 5 = 4.7). Standard deviation expresses the spread of individual observations around the mean. A standard deviation is the square root of the variance. Variance is the measure of the spread of variability of quantitative measurements. The standard error of the mean indicates the degree of uncertainty in calculating estimate from a sample. A standard error can be calculated from the standard deviation by dividing the standard deviation by a square root of n (with n representing the number of values measured). The median is the value which divides the data in half; 50% of the observations have values lower than the median, and 50% have values greater than the median. The median is also referred to as the 50th percentile. For symmetrically distributed data, the mean +/- a standard deviation is usually reported; for non-symmetrical data (non-parametric data, see below), the median and 25th & 75th percentiles are usually reported. Range refers to the interval from the minimum to the maximum value in a set of quantitative measurements. For instance, the arithmetic mean in our example would be 6.8, the geometric mean would be 4.7, the median would be 4, and the range would be 2 through 20. Ed is Director of Tropical and Geographic Medicine, and Director of the Traveler's Advice and Immunization Center, Division of Infectious Diseases, Massachusetts General Hospital, in Boston. This article is based on a course in statistics that he gave at the Intensive Review Course in Clinical Tropical Medicine and Travelers' Health in Toronto. The course was sponsored by the American Society of Travel Medicine and Hygiene in cooperation with the American Committee on Clinical Tropical Health and Travelers' Health. |
|
|
|
|