# P-hacking

This week’s post is going to get a little mathematical, but don’t stop reading!

I will discuss a phenomenon known as P-hacking. But, before I go into what P-hacking is, I need to first explain how publishing in science works.

When researchers publish, they first submit their work to a journal where their research is sent to other scientists (the choice of whom is unknown to the submitting authors) to review the validity of the research, its contribution to the field, and its novelty. What this means, is whether the data is new and interesting, and can further contribute to scientific knowledge.

However the problem is, data which does not show a significant difference or an effect, will not be not published.

For example, let’s say that someone wanted to know if eating oranges could cause cancer. After performing their experiments, the researchers find no difference between their control and experimental groups. An interesting result as it means that we can now eat oranges without worrying about getting cancer, but because there was no effect, it will not be published (or extremely difficult to get it published).

This is where a focus on only publishing confirmatory results can lead to bias, and this is where P-hacking comes into play. In order to justify their research, the time, the money spent, and in order to obtain further funding so the scientists can continue to work, a correction of data can occur until a non-significant result becomes significant. This does not mean that they will specifically falsify results, but it means that the data may be analysed using different statistical tests, sample sizes may be increased or reduced, or the hypothesis might be changed to fit the data.

So what is a P-value? A P-value is an arbitrary value of probability, either 5% or 1%, that an effect being observed is true in the context of a hypothesis that no effect will be observed (a null hypothesis).

Several questions arise from the occurrence of P-hacking. Why do scientists do it? Can we trust the data? And, how do we stop it?

A meta-analysis (a very large study conducted that analyses the results of many studies) showed that the effect of P-hacking does not appear to drastically alter the outcome of the studies i.e. removing the P-value does not change the data presented. It just changes the conclusion in relation to the null hypothesis.

It occurs as a result of the nature of scientific publishing, however that is a bigger problem for another day……

Finally, the general consensus is that better study design, planning and execution; incorporating and acknowledging biological variability; and, better education on how to analyse and interpret data will help reduce the incidence of P-hacking.

### Sources:

Head et al (2015). The Extent and Consequences of P-Hacking in Science.

http://dx.doi.org/10.1371/journal.pbio.1002106