Volume 6, July 1, 2006
of Hypotheses in the Scientific Literature
Norman A. Desbiens, M.D.*
* Department of Medicine, University
of Tennessee, College
of Medicine – Chattanooga Unit
statement of a hypothesis is felt to be a crucial component of properly
conducted research. (U.S.E.P.A, 2005)
(A.A.A.S., 1990) Once a hypothesis
has been conceived, a researcher can then design a study that attempts to
support or refute it and determine the level of precision needed to do so,
after which the number of study units required can be determined. (Moher 1994,
122) Without a clear statement of the hypothesis, there is risk that
investigators will search for associations first and then discuss their “statistically
significant” findings later—a process that has a high probability
of inflating associations and leading to irreproducible results. (Cui 2002,
347) These are major issues, since validity and reproducibility of findings are
undeniable hallmarks of the scientific method.
It has been
recently demonstrated that the explicit statement of hypotheses is uncommon in
the scientific medical literature. (Desbiens 2004, 319) An exception is in the
reporting of randomized control trials, where hypotheses are prevalent by dint
of prominent medical journals requiring sample size calculations as recommended
by several organizations including the CONsolidated Standards Of Reporting
Trials (CONSORT) group. (Altman et al. 2001, 663)
Based on a
previous study, experience with the literature, and the fact that guidelines
for the reporting of the general scientific literature are less prevalent than
for the scientific medical literature, we hypothesized that the explicit
statement of hypotheses in the general scientific literature would occur in
less than a fifth of articles. We did not think that journal type, length of
report or field of science would affect our findings.
all articles published in Science and
Nature in 2004. These general
scientific journals are among the most referenced and have the highest impact
factors (in 2001, Science 23.33, Nature 27.96) in the scientific
literature.(Garfield 1994) We concentrated on the Research Articles, Brevia,
and Reports sections of Science and
the Brief Communications, Articles and Letters sections of Nature. For purposes of analysis, we considered articles in the
Articles and Research Articles sections to be full reports and those in other
sections brief reports. Since each journal uses a different and fine-grained
subject classification system (for example, Science
uses over 30 categories) each article was assigned to the following five broad
subject headings: physics or material science, biology, astronomy or
geosciences, human biology studies and chemistry.
assistant reviewed each article electronically for the root
“hypoth” and its inflections. The sentence containing this stem was
read in order to determine its meaning. If it referred to the study hypothesis,
it was considered to be an explicit statement. A 5 percent random sample of
each abstracted field that required interpretation was independently reviewed
by the author and percent agreement on abstracted data calculated.
We hypothesized less than 15 percent of articles would use
the root “hypoth.” This number would allow us to reproducibly test
the association of the presence or absence of an explicit hypothesis as the
dependent variable in a logistic regression with journal, scientific subject
and type of article (brief vs. long) as independent variables (6 degrees of
freedom). (Harrell 2001, 60-61) The c-index was used to indicate the predictive
ability of the model (0.5 = random, 1 = perfect prediction). Binomial
confidence intervals were calculated for percentages. Variables with p-values
of less than .05 were considered statistically significant.
For each article we determined if there was an explicit
statement of a hypothesis. In so doing, we eliminated hypotheses that were
mentioned after results were reported (such as “…these findings are
consistent with the hypothesis that…”). We next estimated the
implicit statement of hypotheses. Hypotheses can be implicit if they are stated
using a synonym for hypothesis or if an article is done to substantiate or
corroborate a hypothesis that has been stated in a previous article. In order
to identify implicit statements we sought the dictionary and thesaurus
definitions of the word “hypothesis,” from the Merriam-Webster
Dictionary and repeated this process for possible synonyms until we had gone
through 29 cycles and no new synonyms were identified. Candidate words or stems
indicating possible implicit hypotheses are listed in the Appendix. A 5 percent
random sample of the articles was then scanned for each synonym. If a synonym
was found, the sentence containing the candidate synonym was read in order to
determine its meaning. If it referred to the study hypothesis, it was
considered to be an implicit statement.
Seventeen hundred ninety-nine articles
were found. Of these, 963 (54%) were in Nature
and 836 (46 %) were in Science. The
distribution of article by field of study was as follows: biology 52%,
astronomy or geosciences 17%, physics or material science 14%, human studies
13% and chemistry 5%; 11% were full reports.
In the random sample obtained by an
independent abstractor, there was 100% agreement with the primary abstractor
(96%, 100%) (95% confidence internal) on length of report, 97% (91%, 99%) on
the presence of an explicit hypothesis, 91% (83%, 95%) on presence of an implied
hypothesis and 88% (79%, 93%) (79%, 93%) on article type (most of the
disagreements were in studies where it was hard to determine whether molecules
or cells were human or animal).
The stem “hypoth” was used in
534 of 1799 articles (30%). An explicit statement of a hypothesis was present
in 174 articles (10%). From the sample, we calculated the ratio of implicit to
explicit hypotheses to be 2.1:1. Using logistic regression analysis (c-index =
.72), article type was not associated with the presence of explicit hypotheses,
but journal and scientific field were. Compared to physics or material science
articles, general (adjusted odds ratio; 95% confidence interval: 9.2; 3.4,
25.3) and human biology articles (10.4; 3.6, 30.3) had much greater odds of containing
explicit hypotheses. Compared to articles in Science, those in Nature
were less likely to contain explicit hypotheses (0.38; 0.27, 0.54).
study indicates that researchers infrequently state hypotheses when they report
science. Our findings are consistent with a recent estimate of a prevalence of
hypotheses at about 25 percent in the wildlife research literature.(Guthery et
al 2004, 1325) To the student of scientific methodology, this low frequency
might seem paradoxical given the central role of a hypothesis in the conduct of
science. (Gauch 2003, 406-409) A notable exception is seen in reporting of
randomized controlled medical trials, where, though the explicit reporting of
hypotheses occurs in about half, the requirement of many journals to include
sample size calculations in the reporting of these trials guarantees that
hypotheses can be determined in 4 out of 5 of these studies. (Desbiens 2004,
don’t scientists report their hypotheses more frequently? Some scientists
may perceive that the first step in science is observation from nature and that
deductive reasoning from these observations leads to hypotheses that are
testable using inductive reasoning from observations. (Goodman 1999, 673)
(Savitz 2001, 975)
conceptualization could lead one to conclude that both observational studies
that do not test hypotheses and studies that test hypotheses are suitable for
publication in the medical literature and that less is methodologically
required from the former types of studies. The International Committee on
Harmonization (USFDA, 1998) hints at this rationalization when it states that
“Like all clinical trials, these exploratory studies should have clear
and precise objectives. However, in contrast to confirmatory trials, their
objectives may not always lead to simple tests of predefined hypotheses. In
addition, exploratory trials may sometimes require a more flexible approach to
design so that changes can be made in response to accumulating results. Their
analysis may entail data exploration.” In contradistinction to this view,
it has been strongly argued that even observation from nature must be based on
hypotheses that drive the investigation. (Gauch 2003, 406-409) For example, why
was an archeologist digging in a particular cave in Indonesia in the first place? Why
was an astronomer looking at a particular star cluster? Why did a geologist
study a particular rock formation?
lack of statement of a hypothesis at the beginning of a study calls in to
question how a scientist was able to determine the appropriate sample size for
analyses. Even in observational studies, registering a hypothesis keeps one
honest in the reporting of the chronology of research and helps avoid hindsight
and other biases that plague the human intellect. (Tversky and Kahneman 1974,
1124) The potential for this type of problem is suggested in our study by
articles that did not mention a hypothesis initially, but later discussed that
findings were consistent with a hypothesis.
folklore of science is replete with the importance of serendipity in science
(Fleming’s discovery of penicillin, Roentgen’s discovery of x-rays,
etc.) However, basing scientific methodology on anecdotes is at best poor
science and at worst folly. These anecdotes represent a numerator without a
denominator and do not give an inkling of the relative frequency of
serendipitous findings compared to those guided by hypotheses. If something is
found completely by serendipity, then the “utmost good faith” (Senn
1997, 404) and objectivity of scientists require that they state the entire
circumstances that surrounded their finding in their reports. These types of
findings will require reproduction and refinement under the annealing fire of a
hypothesis, as was done by Chain and Florey to further study penicillin.
(Bowler and Morus 2005, 452) A sorites of hypotheses from general to more
specific is crucial to the development of a scientific field.
It may be that scientists have
inadequately studied epistemology and the scientific method. They have learned
to write manuscripts in their fields from their mentors and the literature, but
have they been adequately grounded in the general principles of the scientific
method before beginning their specialized pursuits? (Guthery et al 2004, 6-13)
The American Association for the Advancement of Science states that science
must be taught as one of the liberal arts, “which it unquestionably
is,” but the study of
epistemology is not required for most science majors.
It may also be that the hypothesis
underlying a report in the scientific literature is implicitly understood by
scientists in the field and that an explicit statement is unnecessary. However,
given the detail required in science and the vagaries of human communication
this seems unlikely. Objectivity and explicitness are hallmarks of the
scientific method. Our study suggests that there may be twice as many
implicitly- as explicitly-stated hypotheses, yet there was the greatest
disagreement between observers in determining the implicit hypothesis.
The present study has several
limitations. One is that only two journals were selected for review. However,
they are among the most widely read and influential journals in science. It is
doubtful that other non-medical scientific journals would have a higher
frequency of statement of hypotheses. Another limitation is that the
abstraction of implicit hypotheses is perforce subjective. We tried to decrease
error by random sampling, using synonyms and re-abstracting, but we may yet
have underestimated the use of implicit hypotheses. This limitation further
strengthens the case for reporting hypotheses explicitly.
How could the present situation be
improved? The medical scientific literature has demonstrated that structural
requirements for journal acceptance can dramatically improve the statement of
hypotheses. (Desbiens 2004, 319) The present study demonstrates that explicit
hypotheses, albeit at a low rate, are much more frequently stated for general
and human biology studies than for other study types. It may be that the
requirements for the statement of hypotheses in the scientific medical
literature carry over to the general scientific literature.
In addition to the above
considerations, there is still much disagreement in the scientific community
about the classification of hypotheses, confusion between statistical and
research hypotheses and about whether some hypotheses are too trivial to state.
(Guthery et al 2004, 1325) It may also be that the lack of a clear statement of
hypotheses contributes to the balkanization of scientific disciplines. Pending
further discussions about the classification and role of hypotheses in the
reporting of science, general scientific journals could require authors to
state their hypotheses or otherwise label their papers as descriptive, and then
print these descriptors in a prominent location, perhaps after the
paper’s title. This requirement might be more difficult for general
scientific journals than for medical journals. Since many general research articles
report a theme of research that addresses many hypotheses (a recent example
addresses over nine separate studies in one paper), multiple hypotheses may
need to be reported. (Cowen and Lindquist 2005, 2185) In addition, the
reporting of the general scientific literature has been much more free-wheeling
than that of the general medical literature. Most medical journals require the
organization of articles into introduction, methods, results and discussion
sections. Reporting in most general science journals is often a mélange
of this content. Because of this history, bringing more structure into the
reporting of general science may be more daunting.
indicates that authors in major scientific journals infrequently state
hypotheses. The conduct and reporting of science would be improved by the
explicit statement of a hypothesis or the lack of one in every scientific
article. A further refinement could be the dated registering of a hypothesis in
an electronic repository when a project is formalized, as is being considered
for clinical trials. (Antes 2004, 321) Scientific journals could influence this
salutary change by requiring the statement of a hypothesis or the lack of one
as a condition for publication.
The author would like to thank Mr.
Benjamin Moffitt for his assistance in abstracting the articles and Ms. Jenny
Post for her work on the early conceptualization of the study.
D.G., K.F. Schulz, D. Moher, et al. CONSORT GROUP (Consolidated Standards of
Reporting Trials). 2001. The revised CONSORT statement for reporting randomized
trials: explanation and elaboration. Annals
of Internal Medicine 134:663-694.
Association for the Advancement of Science 1990. The Liberal Art of Science: Agenda for Action, section xiii. Washington, D.C.
G. 2004. Registering clinical trials is necessary for ethical, scientific and
economic reasons. Bulletin of the World
Health Organization. 82:321.
P.J., and I.R. Morus. 2005. Making modern
science. Chicago: The University of Chicago
L.E., and S. Lindquist. 2005. Hsp90 potentiates the rapid evolution of new
traits: drug resistance in diverse fungi. Science
L., H.M. Hung, S.J. Wang, et al. 2002. Issues related to subgroup analysis in
clinical trials. Journal of Biopharmaceutical Statistics 12:347-358.
N.A. 2004. The presence of hypotheses in the medical literature. American Journal of Medical Science 328:319-322.
E. “The Impact Factor,” Thompson
Scientific, n.d. http://scientific.thomson.com/knowtrend/essays/journalcitationreports/impactfactor
(October 18, 2005)
H.G. 2003. Scientific Method in Practice.
L. 1999. Hypothesis-limited research. Genome
S., Jeffrey J. Lusk, and Markus J. Peterson 2004. Hypotheses in Wildlife Science 32:1325-1332.
Frank E. 2001. Regression modeling
strategies. 1st ed. New
D., C.S. Dulberg, and G.A. Wells. 1994. Statistical power, sample size, and
their reporting in randomized controlled trials. Journal of the American Medical
D.A. 2001. Prior specification of hypotheses: cause or just a correlate of
informative studies? International
Journal of Epidemiology 30:975-958.
S. 1997. Statistical issues in drug
development. New York:
John Wiley & Sons.
A., and D. Kahneman. 1990. Readings in uncertain reasoning. San Francisco, CA,:
Morgan Kaufmann Publishers Inc.
U.S. Department of Health and Human
Services. Food and Drug Administration. Center for Drug Evaluation and Research
(CDER). Center for Biologics Evaluation and Research (CBER) September 1998.Guidance for Industry. E9 Statistical
Principles for Clinical Trials.
“What is the scientific
method?” Environmental Protection
Agency, n.d., http://www.epa.gov/maia/html/scientific.html.
(October 31, 2005)
Correspondence and reprint requests
Norman A. Desbiens, M.D.
Department of Medicine, University
of Tennessee College of Medicine
Tel: (423) 778-2998
Fax: (423) 778-2611
Table 1. Characteristics of articles
or material science 14
or geosciences 17
of the stem “hypoth*” 30
statement of hypothesis 10
Appendix. Candidate words or stems (*) used
to suggest implicit hypotheses
Return to Home Page