Thinking outside the cylinder: on the use of clinical trial registries in evidence synthesis communities

Clinical trials take a long time to be published, if they are at all. And when they are published, most of them are either missing critical information or have changed the way they describe the outcomes to suit the results they found (or wanted). Despite these problems, the vast majority of the new methods and technologies that we build in biomedical informatics to improve evidence synthesis remain focused on published articles and bibliographic databases.

Simply: no matter how accurately we can convince a machine to screen the abstracts of published articles for us, we are still bound by what doesn’t appear in those articles.

We might assume that published articles are the best source of clinical evidence we have. But there is an obvious alternative. Clinical trial registries are mature systems that have been around for more than a decade, their use is mandated by law and policy for many countries and organisations, and they tell us more completely and more quickly what research should be available. With new requirements for results reporting coming into effect, more and more trials have structured summary results available (last time I checked, it was 7.8 million participants from more than 20,000 trials, and that makes up more than 20% of all completed trials).

The reality is that not all trials are registered, not all registered trials are linked to their results, and not all results are posted in clinical trial registries. And having some results available in a database doesn’t immediately solve the problems of publication bias, outcome reporting bias, spin, and failure to use results to help design replication studies.

Aside: people working on Registered Reports have been looking at this for much less time (since 2013). Some of the more zealous young open science types decided this was a wild innovation that will variously “eliminate” or “fix” publication bias and all manner of other issues in scientific research. It won’t. Not immediately. But those of us who understand the history of prospective registration from clinical trial research can help to explain why it is much more complicated than it seems, and can help the rest of science catch up on the biases and unintended consequences that will appear as they further implement prospective registration in practice.

In a systematic review of the processes used to link clinical trial registries to published articles, we found that the proportions of trials that had registrations was about the same as the proportion of registrations that had publications (when they were checked properly, not the incorrect number of 8.7 million patients you might have heard about). Depending on whether you are an optimist or a pessimist, you can say that what is available in clinical trial registries is just as good/bad as what is available in bibliographic databases.

Beyond that, the semi-structured results that are available in are growing rapidly (by volume and proportion). The results data (a) help us to avoid some of the biases that appear in published research; (b) can appear earlier; (c) can be used to reconcile published results; and (d) as it turns out, make it much easier to convince a machine to screen for trial registrations that meet a certain set of inclusion criteria.

I suspect that the assumption that clinical trial registries are less useful than the published literature is a big part of the reason why nearly all of the machine learning and other natural language processing research in the area is still stuck on published articles. But that is a bad assumption.

Back in 2012, we wrote in Science Translational Medicine about how to build an open community of researchers to build and grow a repository of structured clinical trial results data. The following is based on that list but with nearly 6 years of hindsight:

  • Make it accessible: And not just in the sense of being open, but by providing tools to make it easier to access results data; tools that support data extraction, searching, screening, synthesis. We already have lots of tools in this space that were developed to work with bibliographic databases, and many could easily be modified to work with results data from clinical trial registries.
  • Make all tools available to everyone: The reason why open source software communities work so well is that by sharing, people build tools on top of other tools on top of methods. It was an idea of sharing that was borne of necessity back when computers were scarce, slow, and people had to make the most of their time with them. Tools for searching, screening, cleaning, extracting, and synthesising should be made available to everyone via simple user interfaces.
  • Let people clean and manage data together: Have a self-correcting mechanism that allows people to update and fix problems with the metadata representing trials and links to articles and systematic reviews, even if the trial is not their own. And share those links, because there’s nothing worse than duplicating data cleaning efforts. If the Wikipedia/Reddit models don’t work, there are plenty of others.
  • Help people allocate limited resources: If we really want to reduce the amount of time it takes to identify unsafe or ineffective interventions, we need to support the methods and tools that help the whole community identify the questions that are most in need of answers, and decide together how best to answer them, rather than competing to chase bad incentives like money and career progression. Methods for spotting questions with answers that may be out of date should become software tools that anyone can use.
  • Make it ethical and transparent: There are situations where data should not be shared, especially when we start thinking about including individual participant data in addition to summary data. There are also risks that people may use the available data to tell stories that are simply not true. Risks related to ethics, privacy, and biases need to be addressed and tools need to be validated carefully to help people avoid mistakes whenever possible.

We are already starting to do some of this work in my team. But there are so many opportunities for people in biomedical informatics to think beyond the bibliographic databases, and to develop new methods that will transform the way we do evidence synthesis. My suggestion: start with the dot points above. Take a risk. Build the things that will diminish the bad incentives and support the good incentives.

So you’ve found a competing interest disclosure. Now what?

Published research varies across a spectrum that at one end is simply marketing masquerading as genuine inquiry. Actors in lab coats. To counter this problem, every time research is published in a journal, the authors are expected to declare anything that might have affected their impartiality in that work. Unfortunately, we very rarely do anything with those disclosures. It is as if by disclosing a potential competing interest, any effects on the research are supposed to either magically disappear or readers will somehow be able to magically account for their influence on the conclusions.

Let’s say you are reading a news story about a clinical trial that shows multivitamins will help your children grow up to be smarter, taller, healthier, and stronger. Seems to good to be true? You ask: “Is there a chance that the research has been distorted to make it look better than it really is?” so you try to find a link to the actual article so that you can check to see who the researchers are… and evaluate the quality of the research to determine its validity in context.

It’s actually much harder to do than it sounds, because for a substantial proportion of published articles, authors have failed to disclose things that would be defined as a potential competing interest by any standard definition. And in most cases, the competing interest disclosures are hidden behind paywalls, so you won’t be able to check the disclosures unless you have a subscription (or pay to “rent” the article, or use Sci-Hub to access it for free).

Then you ask: “What should I actually do if I encounter a competing interest disclosure?” At the moment, you have one of the following options: (a) you could either ignore the disclosure and take the research at face value; (b) you can throw the research out and ignore it because the research findings may be compromised; or (c) you could apply the best tools we have available for measuring the risk of bias even though we know they won’t always catch the nuanced ways in which research designs and reporting can be distorted.

Or more simply:  ¯\_(ツ)_/¯

So while we know that competing interests cause the kinds of biases that can lead to widespread harm, they also introduce substantial uncertainty into biomedical research because we simply don’t know if we can safely use research findings to inform our decision-making regardless of whether the authors have disclosed their potential competing interests or not.

But I think a solution to the problem is on its way. The first and most important step in that solution is to bring competing interests disclosures out into the open in ways that we can actually use. We need them to be made public, structured in a proper taxonomy instead of a bunch of free text descriptions, and accessible – in ways that both humans and machines can interpret and make use of.

That is why we think a comprehensive public registry of competing interest disclosures for all researchers is a good idea. People have been working on similar ideas for a while, and we have put many of these things in the review we published in the new journal, Research Integrity and Peer Review.

  • If you are interested, be sure to check out the IOM report (check the review), and read between the lines to try and understand why there might have been some disagreements about the best ways to deal with competing interests. This work eventually led to Convey, which has some similar goals but is not necessarily aimed at being comprehensive, and seems to be progressing nicely towards a different kind of goal from the updates on the webpage.
  • One of the things we didn’t include in the review because I hadn’t seen it until too late, is this browser plugin, which uses information from PMC to display funding and competing interests statements alongside the abstracts in PubMed. You could always click on the PMC link and scroll down to try and find them, but this is a neat idea.
  • It turns out that the idea for creating a standard list of competing interests and maintaining them in a public space was proposed as early as 2007, by Gordon Rubenfeld, in a letter to The Lancet. Maybe it is finally the right time to do this properly.
  • If you are a researcher and you like what you see in the first issue of the Research Integrity and Peer Review journal, then please consider publishing your research there. Besides the very interesting and high-profile editorial board, you might even have your manuscript handled by me as an associate editor.

Of course there are more things that will need to be done once we can manage for a much more comprehensive, consistent, and accessible way to disclose competing interests in research, but those come down to treating competing interests just like any other confounding variable. We are currently working on both methods and policies that will help us populate the registry in a longitudinal fashion (i.e. a timeline for every researcher who has published anything in biomedical research), and keep it up to date. We are also working on ways to take existing free text disclosures and classify them according to how much of an influence they have had over research findings in the past, and on a scale that has been virtually impossible until very recently.

Also, check out a more precise description of the idea in this related opinion piece published in Nature today. As usual, I will add to this post and link to any responses, media, comments, interesting tweets, etc. below as I spot them online.

Media collection about conflicts of interest in systematic reviews of neuraminidase inhibitors

As usual, I’m keeping a record of major stories in the media related to a recently published paper. I will continue to update this post to reflect the media response to our article in the Annals of Internal Medicine.

When I checked last (1 May 2015), the Altmetric score was 112. Here’s the low-tech way to check that…


Do people outside of universities want to read peer-reviewed journal articles?

I asked a question on Twitter about whether or not people actually tried to read the peer-reviewed journal articles (not just the media releases), and if they encountered paywalls when they tried. This is what happened:

[Click on the time/date to see the conversation]

In case you don’t want to read through the whole conversation, it turns out that every person who answered the question said that they have in the past tried to access peer-reviewed journal articles, and that they have been stopped by paywalls. Some said it happened all the time.

There is very little evidence to show the prevalence of access and blocked access by the “interested public” for peer-reviewed journal articles. Some people seem to assume that only other scientists (or whatever) would be interested in their work, or that everything the “public” need to know is contained in a media release or abstract.

I think the results tell us a lot about the consumption of information by the wider community, the importance of scientific communication, the problem with the myth that only scientists want to read scientific articles, and the great need for free and universal access to all published research.

So far, I’ve been collecting whatever evidence I can get my hands on to relate to this question, especially in medicine, and I’ll add these pieces one by one below, just in case you are interested.

  1. Open access articles are downloaded and viewed more often than other articles, even when they do not confer a citation advantage. This is seen as evidence that people not participating in publishing are accessing the information.Davis, P.M., Open access, readership, citations: a randomized controlled trial of scientific journal publishing. The FASEB Journal, 2011. 25(7): p. 2129-2134.
  2. A Pew Internet Report found that one in four people hit a paywall when searching for health information online. Perhaps more importantly, that 58% of all people have looked for health information online (and in a country where only 81% use the Internet).
  3. From a UNESCO report on the development and promotion of open access: “First, it is known that [people outside of academia] use the literature where it is openly available to them. For example, the usage data for PubMed Central (the NIH’s large collection of biomedical literature) show that of the 420,000 unique users per day of the 2 million items in that database, 25% are from universities, 17% from companies, 40% from ‘citizens’ and the rest from ‘Government and others’.”Swan A. Policy guidelines for the development and promotion of open access, United Nations Educational, Scientific and Cultural Organization, 2012, Paris, France. (Page 30). Available at: course, people accessing PubMed Central from domestic IP addresses might often be academics working late at night at home without a VPN (like I am doing now).

About fifty people responded to my question on Twitter. I realise that my audience is probably biased towards the highly-educated, informed, younger, and information-savvy, but I think there are clear and obvious groups of people outside of universities who would be interested in reading published research. These people include doctors, engineers and developers, parents of sick children, politicians and policy-makers, practitioners across a range of disciplines, museum curators, artists, and basically everyone with an interest in the world around them. That this aspect of open access hasn’t been the feature of many surveys or studies seems bizarre to me.

Perhaps most importantly, I think we need to know a lot more about just how often people outside of academia want to access published research, and if problems with access are stopping them from doing so.

Surely the impetus to move towards universal and open access to published research would grow if more academics realised that actually *everyone* wants access to the complicated equations, to the raw data and numbers, and to the authors’ own words about the breadth and limits of the research that they have undertaken.

Introducing evidence surveillance as a research stream

I’ve taken a little while to get this post done because I’ve been waiting for my recently-published article to go from online-first to being citeable with volume and page numbers.

Last year, I was asked to write an editorial on the topic of industry influence on clinical evidence for the Journal of Epidemiology & Community Health, presumably after I published a few articles on the topic in early 2012. It’s an area of evidence-based medicine that is very close to my heart, so I jumped at the offer.

It took quite a bit of time to find a way to set out the entire breadth of the evidence process – from the design of clinical trials all the way through to the uptake of synthesised evidence in practice. In the intervening period, I won an NHMRC grant to explore patterns of evidence and risks of bias in much more detail, and the theme of evidence surveillance as an entire stream of research started to emerge.

Together with Florence Bourgeois and Enrico Coiera, we reviewed nearly the whole process of evidence production, reporting and synthesis, identifying nearly all the ways in which large pharmaceutical companies can affect the direction of clinical evidence.

It’s a huge problem because industry influence can lead to the widespread use of unsafe and ineffective drugs, as well as the more subtle problems associated with ‘selling sickness’. Even if 90% of the drugs taken from development to manufacture and marketing are safe, useful and improve health around the world, there’s still that 10% that in hindsight should never have been approved in the first place.

My aim is to find them, and to do so faster than has been possible in the past. It’s what we’ve started to call evidence surveillance around here (thanks Guy Tsafnat), and that’s also what we proposed in the last section of the article.

Note: If you can’t access the full article via the JECH website, you can always have a look at the pre-print article available here on this website. It’s nearly exactly the same as the final version.

Dealing with industry’s influence on clinical evidence

I co-wrote a piece for The Conversation about a new article that was published in the Cochrane Database of Systematic Reviews, written by Andreas Lundh and other luminaries from the research area. The authors showed that industry sponsored clinical trials more often report positive outcomes and fewer harmful side effects.

The most interesting result from the article was that the biases that make industry funded clinical trials more likely to produce positive results could not be accounted for  using the standard tools that measure bias. This is amazing because it gives us a strong hint that industry is an independent source of heterogeneity in the systematic reviews that include them.

Too bad it’s the 12th of the 12th 2012 and the world is about to end. We won’t have time to sort it out.

(Feature image from AAP Image/Joe Castro via The Conversation – click the link)