What Makes a Good Hypothesis?
Not every research paper need hypotheses but when you have them, they better be good. “Hypotheses are dead simple so writing them cannot be too difficult”, I thought for many years before starting to work on research designs that rely on hypotheses and realizing how wrong I had been.
In this post, I summarize my lessons learned about the important role of hypotheses in a research paper. To this end, I acknowledge that this is not a post about (null) hypothesis testing in statistics; also, my views are undoubtedly shaped by my publishing in a particular academic field, that is, Management Information System (but so are yours, too).
The Purpose of Hypotheses
To begin with, the purpose of hypotheses in a research paper is to define specific tests that can be unambiguously answered yes or no based on empirical analysis. Well, nothing is ever completely unambiguous so let’s say relatively unambiguously. For instance, let us look at a hypothesis from a paper that I am currently working on:
H1: Code correctness increases programming solution quality.
The hypothesis is made of two constructs, ‘code correctness’ and ‘programming solution quality’, and a proposed positive relationship from the former to the latter. By measuring the constructs and using, for instance, some statistical technique we can test if the proposed relationship holds or not. However, having been trained as a qualitative researcher, it took me many years to understand what’s the point in testing such an utterly simplistic statement — because it always depends.
Of course it depends! But, the point is to design the empirical study around the hypotheses so that everything that it depends on have been accounted for as much as possible. In other words, in a hypothesis-driven study, a successful research design enacts a laboratory, whether imaginary or real, in which it is possible to say something interesting by testing simple hypotheses.
The H1 above may also sound too obvious to be worth even tested. This is probably because the hypothesis is obvious apart from how exactly one should measure the two constructs. Correct code (in this case, defined as program code that executes without errors) should obviously increase the quality of a programming solution. Yet, in the paper, the hypothesis serves as a baseline check that the overall research model and our data aligns with what we know about the studied phenomenon in general. As such, the point is not so much to find supporting evidence for H1 but to failto falsify it in the sprit Karl Popper.
Yet, not every simplistic statement makes a good hypothesis…
Hypotheses Are Not Empirical Speculations
Based on my experience as a reviewer and journal editor, a common mistake is to write hypotheses as empirical speculations, that is, as more or less educated guesses what the data might show. This may perhaps make a paper look more ‘scientific’ to an uninformed reader, but such hypotheses are usually nothing but unnecessary decorations. If your analysis does nothing more than describes what’s there in the data, you can do that without convoluting the description with hypotheses.
By contrast, Figure 1 illustrates the idea that useful hypotheses function as the pivotal element between data analysis and theory in a research paper. By testing the hypotheses, we should learn something theoretically interesting about the studied phenomenon. This means that hypotheses must both derive from theory and be functional elements in the empirical analysis, which is easier said than done. They entail pinning down a theoretical dilemma in an empirically operational manner, which makes the hypotheses the crux of the analysis despite often appearing as simple one-line statements.
Hypotheses Are Short Written Statements that Connect Theory with Evidence
Good hypotheses are short statements that are crafted with extreme care. One should not think them as abstract ideas but take them literally as written statements. Every word matters and should be chosen to serve the critical link between theoretical ideas and the analysis of empirical data in the best possible way. I find myself typically writing and rewriting every hypothesis several times to find as good wording as possible.
Common mistakes in writing hypotheses include merging several empirical tests into one hypothesis. If the hypothesis cannot be settled with a specific empirical test, it should be broken down into several hypotheses. Also, one should not include boundary conditions to the hypotheses for this quickly makes the hypotheses unwieldy and you can never include all the conditions.
Concluding Remarks
I hope this post offers some useful reflections from my journey to understanding the role of hypotheses in a research paper. These can be summarized into the following three statements:
- A good hypothesis defines a single empirical test.
- A good hypothesis links empirical data to theory.
- A good hypothesis is relatively short.
Finally, I stress that the reflections in this post are based my own gradual learnings from working with hypotheses in research papers. As such, they are not particularly rigorously argued and there is a lot more to hypotheses than what has been said here. For instance, there is an important question whether study hypotheses must be fixed before collecting and analyzing data, or can the hypotheses be adjusted along with the analysis. People have different views about this that are rooted in the nature of knowledge in different academic disciplines. As I asked a friend who is trained in statistics to comment a draft version of this post, she told me that “a statistician would kill you over even suggesting that the hypothesis can be adjusted along the analysis”. Clearly, this would a worthy topic for another post!