Statistics for Beginners (part II)

By Ed Ryan MD DTM&H

(Part I appeared in the September/October issue of NewsShare)

This article will not review in detail which statistical test should be chosen for data analysis; however, it will make some broad statements on data type and how to approach data analysis. Data can be either categorical or continuous. An example of continuous data is blood pressure where a range of values can be observed. Categorical data is data that are not continuous in nature. Categorical data can be further subdivided into nominal data and ordinal data. Nominal data is data for which there is a name and which can not be ordered from high to low or large to small. Eye color or race would be an example of nominal data. Ordinal data are data that can be placed in an order and that is not continuous in nature. Level of dehydration (none, mild, moderate, severe) would be an example of such data. At analysis, continuous data can be broken down into categorical categories (but not vice versa). For instance, if one knows the serum bicarbonate level for all patients in a study, during analysis, the values can be compared continously (serum bicarb in group A: 13.2 mmol/L and in group B: 22.0 mmol/L) or one can subdivide the continuous data analysis into categorical values (for instance, proportion of individuals with a bicarbonate values less than 15 mmol/L).

In choosing a statistical test, one should understand whether the data to be analyzed are parametric (symmetric) or non-parametric in distribution. Parametric data are also referred to as normally distributed data. A classic example of normally distributed data is a bell shaped curve; for instance, a population based I.Q. evaluation. One can also imagine that data will not be bell shaped in distribution. A thumb nail rule for establishing whether data should be interpreted as parametric or a non-parametric is to compare the standard deviation of the data to the mean of the data. If the standard deviation is greater than 50% of the mean, one should analyze the data as non-parametric data. Categorical data can be analyzed by the chi2 test (or m x 2 tables if more than two groups are being analyzed). Continuous data that are normally distributed can be analyzed by parametric tests, including the T test for comparing means and Pearson's correlation. Non-parametric methods include the Mann-Whitney U Test, the Kruskal-Wallis and the Wilcoxon matched rank test, among others. The ANOVA test (analysis of variance) can be used to judge more than two groups that are normally distributed. The Mann-Whitney U test is used for evaluating two groups that are non-normally distributed and the Kruskal-Wallis test can be used for evaluating more than two groups that are not normally distributed.

The three most common errors of statistical significance testing are:

  1. A failure to state the hypothesis before conducting the study.
  2. A failure to interpret the results of statistical significance correctly by not considering type I error.
  3. A failure to interpret the results of statistical significance correctly by not considering type II error.

Type I error (also known as the alpha level) is the fact that the null hypothesis may be rejected when it is in fact true. By convention, studies have usually used a 5% chance of incorrectly accepting the rejection of the null hypothesis as being statistically significant; however, this is a statistical statement that there is a <5% chance; it is not 100% proof that the null hypothesis can be rejected. Type II error refers to the fact that a failure to reject the null hypothesis does not necessarily mean that no true difference exists between the compared groups in the larger population. It may be that the study was of insufficient size to detect a difference, or that individuals were followed for an insufficient amount of time, for true differences between the groups to become apparent. Most well designed studies aim for a type II error rate between 10 and 20%. A statistical power of a study is one minus the type II error rate. Therefore, most studies aim for a 80-90% power (an 80-90% chance that if the null hypothesis is not rejected that that is correct).

Once an association is recognized in a study, the next question is "how strong is that association". An association can be referred to as a risk factor (however, one must remember that an association does not explicitly mean a cause and effect relationship).

The relative risk is the probability of an outcome if a risk factor/association is present divided by the probability of the outcome if the risk factor/association is absent. For instance, let's imagine a study in which we have followed a thousand individuals who smoke and a thousand individuals who don't smoke and we have measured the incidence of lung cancer in both groups. If 30 individuals who smoke develop lung cancer in our study time period, and three individuals who don't smoke develop lung cancer in the study time period, the probability of developing lung cancer if the risk factor is present is 30 divided by 1,000 (or 0.03). The probability of developing lung cancer if one does not smoke is 3 divided by 1,000 (or 0.003). The relative risk is, therefore, 0.03 divided by 0.003 or 10. A relative risk of 10 implies that individuals who smoke are ten times more likely to develop lung cancer than individuals who don't smoke. A relative risk can be calculated for cohort studies.

An approximation of the relative risk for case control studies is an odds ratio. In case control studies, the number of individuals who have and do not have a disease does not necessarily reflect the natural frequency of that disease in the general population. In a case control study, it is the researcher who determines how many study patients are being evaluated and how many control patients are being evaluated, and so a true disease frequency in the population as a whole can not be established. To understand the difference between a risk and an odds ratio (which is an approximation of the risk), one should think of the probability (or ~j~) of drawing an ace from a deck of cards (4 from 52, or 1 in 13). The odds of drawing an ace on the other hand will be the number of times an ace will be drawn divided by the number of times it will not be drawn or 4 to 48, or 1 to 12.

An odds ratio is, therefore, the odds of developing an outcome if an association is present, divided by the odds of an outcome if the association is absent.

If one wants to quantitate the difference of an association (relative risk ERR] or odds ratio [OR]) between two groups, it is termed a point estimate of the strength of the association. Confidence intervals are a way of combining information about the strength of an association with information about the effects of chance in obtaining the observed results. A 95% confidence interval (CI) is most commonly used. So an association will be reported as an OR or RR with a 95% CI.

The next step in evaluating studies or research papers is interpreting the data. One must first decide whether statistically significant results are clinically important. If one, for instance, finds in a very large study that a mean PSA level of 10.5 is associated with high grade prostatic carcinoma, while a mean PSA value of 10.4 is associated with low grade prostatic carcinoma, that difference may be statistically significant; however, it is not clinically useful to a clinician holding a specific PSA value in his or her hand. One must also evaluate the strength of an association as judged by the size of the relative risk. One would also evaluate the consistency of that evaluation. Ideally one would also evaluate the biological possibility of a given finding. One would also hope that, if applicable, there would be a dose response relationship. This would allow an observation of whether various levels of exposure of the risk factor were associated with a change in frequency of disease in a consistent fashion.

The final stage of analyzing a study is extrapolation. One can extrapolate to an individual or to a group. For instance, based on a study with a relative risk or odds ratio —`of 10 of tobacco use and lung cancer, if an individual smokes, he/she is ten times more likely to develop lung cancer than if he/she did not smoke. One can also speak of an attributable risk percentage. The advantage of this concept is that it allows one to think of a percentage of developing disease that may be eliminated among those whose risk factor is removed. Attributable risk percentage can be thought of as:

(relative risk - 1 / relative risk) x 100%

An example: let's imagine that smoking is associated with a relative risk of 1.5 of developing lung cancer. This may not seem like a large risk of developing lung cancer; however, 1.5 minus 1 divided by 1.5 x 100% equals 33%. This means that the attributable risk percentage for smoking is 33% (which translates into the fact that if an individual does not smoke that they may decrease their risk of developing lung cancer by 33%).

Before we turn away from study analysis, there are two additional types of studies that we should mention: data based research studies (also called non-concurrent cohort studies, also called outcomes research) and meta-analyses. The former are an extension of chart reviews, and have grown out of the ability to access and analyze large data base files with computer programs. One could use a computer based filing system, for instance, to identify (in 1999) individuals who in 1990 had a given procedure. We could then identify appropriate control subjects and follow those two groups for outcome longitudinally through the computer.

A meta-analysis is a way of analyzing information from many single investigations for the purposes of reaching conclusions or addressing questions that were not addressed due to the size of the single investigations. The most useful types of meta-analyses are hypothesis driven. Relevant studies must be carefully identified and included. Draw-backs include the likelihood that all relevant studies may not have be identified as well as confounding influences relating to differences in quality of available data that are included. In addition, meta-analyses do not include unpublished studies that failed to document an association, so bias can exist.


This site is designed and supported by Shoreland, publishers of Travax® EnCompass and Travax®.
Content © ISTM. Send questions or comments about this site to ISTM.
ISTM Home Page