Why the inequalities in our information diets matter

img_2089

Part II: A new NHMRC Project for measuring the impact of social and news media on health behaviours

As promised after Part I – and now that I am back from the burnt orange colours of the United States to the purple jacarandas of Sydney – another update. But first, a quotation from one of the books in a series that inspired most of the work I do. It starts with the Emperor of the Galaxy talking to a research academic:

‘I am given to understand that you believe it is possible to predict the future.’

Seldon suddenly felt weary. There was going to be this misinterpretation constantly. Perhaps he should not have presented his paper. He said, “Not quite, actually. What I have done is much more limited than that…”

We were lucky enough to have been awarded a new 3-year NHMRC Project. With co-investigator Julie Leask and a team of excellent associate investigators from Sydney and Boston, I will be developing a pipeline of interconnected methods capable of linking and tracking exposure to news media at unprecedented scales; estimating differences in information diets by geography and demographics; correcting for biases in each of the data streams we will mine; and connecting all of that information to real health outcomes. In the first instance, those outcomes will be related to vaccines…but we will be rolling out these methods into a range of other health outcomes as soon as we can.

screen-shot-2016-11-12-at-10-45-37-pm
Around 260 million exposures to HPV vaccine tweets mapped at the county level in the United States.

Our major aims for the project are the following:

  • To determine whether measures of exposure to news and social media can be used to explain geographical differences in health behaviours (especially vaccines).
  • To determine the proportion of our information diets made up of evidence-based information, and characterise the quality of evidence in the news and social media that makes up our information diets.

If you are looking for an opportunity as a postdoctoral research fellow, and this kind of research sounds like fun to you, please get in touch. You will need to be very comfortable with developing machine learning methods for big messy datasets (GPUs and neural networks are a bonus), and have published innovative methods and released your code open source. And if you are considering a PhD, want a generous scholarship, and you can design a project that might align with these ideas (no matter whether your disciplinary background is computer science, medicine, epidemiology, statistics, psychology, sociology, or just about anything else), send me an email with your bio and ideas.

Why the inequalities in our information diets matter

Part I: Visiting the United States in November 2016

As I write this, I am on a train travelling from Boston to New York City, at a time when people in the United States are still coming to terms with what they thought they knew about their country. The trip has been rushed because I need to be back in Sydney to catch up on work, which meant I didn’t have time to catch up with all the excellent people on the east coast. Here is a picture of approximately where I am right now.

img_2104
Between Boston and New York City, about an hour before New Haven.

Without the benefit of hindsight, it feels like the disintegration of knowledge continues to accelerate into the post-truth era; that societies around the world are struggling to hold onto their values across a populous that outsources its opinions and beliefs to some nebulous idea of whoever is both influential and accessible. While information overload is not at all a new idea, we do not yet entirely understand how information overload may actually restrict rather than open up what we hear, what we click on, what we soak up into our worldview; and how that can reinforce our echo chambers to create polarisation and conflict.

While it is not something we usually like to admit, the information to which we are exposed – our information diets – are an excellent predictor of our attitudes and opinions. This is not necessarily because our information diets causally affect what we believe and how we act, but because we surround ourselves with people who agree with us and rarely seek out opinions that are contrary to our own. And even when different opinions are put right there in front of us, we disregard them.

For researchers who like to observe the world while pretending not to be part of it, right now looks less like a post-truth era and more like a golden era for understanding the impact of news media and social structure on human decision-making and behaviour.

There has never been a better time to take advantage of massive streams of data about information search and exposure to measure their impact on opinion, attitudes, decisions, and behaviours. In my team, we already use social connections on Twitter to train machine learning classifiers that can predict whether you are likely to have negative opinions about vaccines. We are now close to finalising work showing that a population’s (relevant) information diet can explain the variance in vaccine coverage of individual states in the United States.

img_2087More of Boston walking from Back Bay to Boston Children’s Hospital.

Part II: Some news
While I am still looking at the bright orange colours of the United States, and before I head back to the purple jacarandas in Australia, I will stop here. But I will share some related news soon.

 

Five tips for controlling the evidence base of your clinical intervention

You might remember me from such articles as Even Systematic Reviews are Pretty Easy to Manipulate” and I Can Predict the Conclusion of Your Review Even Without Reading It“. If you have been around for a while you may even remember me from Industry-based Researchers get More Love Because They are Better Connected“. 

In a web browser near you, see the newest installment: More Money from Industry, More Favourable Reviews

With colleagues from here in Sydney and over in Boston, I recently published the latest in a string of related research on financial competing interests in neuraminidase inhibitor research. This is probably the last paper we will do on this topic for a while (though we do have two more manuscripts related to financial competing interests on the way soon). In this one we looked at non-systematic reviews of neuraminidase inhibitor evidence and compared the number of non-systematic reviews and the proportions of favourable conclusions between authors who had financial competing interests and authors who did not. You will not be at all surprised to learn that authors who had relevant financial competing interests and wrote non-systematic reviews about the topic ended up writing more of them, were more likely to conclude favourably in the reviews, and also wrote more of other kinds of papers in the same area.

So after looking in way too much detail at the creation and translation of evidence in this area, I thought it would be good timing to write down a few tips on how anyone with deep pockets can control an evidence base and get away with it (for a while). So here are some hints on the easiest and fastest ways to control the research consensus for a clinical intervention, even when it isn’t as effective or safe as it should be.

Step 1. Design studies that will produce the conclusions you want.

Step 2. When publishing trial reports, leave out the outcomes that don’t look good; or just don’t publish them.

Step 3. When publishing reviews, just select whatever evidence suits the conclusion you like best and ignore everything else.

Step 4. If the data fail to illustrate the picture you want, you can report them as is and just write down a different conclusion anyway.

Step 5. Use the credibility of reputable and prolific academic researchers by paying them to run trials, write reviews, and talk to the media.

Step 6. Profit.

Two important caveats: I am not claiming that any of these things have been done deliberately for neuraminidase inhibitors or any of the interventions described above – I am describing these processes in general based on multiple sources, and in a flippant way. Of course it might have happened for some or many clinical interventions in the past but that is not what we claim here or anywhere else. Secondly, I am not anti-industry, I am anti-waste and anti-harm.

And everyone should share the blame. There are researchers from inside industry, outside industry with industry funding, and completely divorced from all industry activity who have each been responsible for the kinds of waste and harm we read about after the damage has been done.

No matter what kind of intervention you work on, poorly-designed or redundant studies are a waste of money, time, and can put participants at risk for no reason. Failing to completely publish the results of trials is just as bad, and producing piles of rubbish reviews that selectively cite whatever evidence helps prove your preconceived version of the truth is about as bad as trying to convince people that a caffeine colon cleanse cures cancer.

When I find time, I will continue to add in related links to specific papers (for now, mostly just those from my team and my collaborators) for each of these areas. There are hundreds of other relevant articles that have been written by lots of other smart people but for now I am just listing a selection of my own as well as some of my favourite examples for each category.

So you’ve found a competing interest disclosure. Now what?

Published research varies across a spectrum that at one end is simply marketing masquerading as genuine inquiry. Actors in lab coats. To counter this problem, every time research is published in a journal, the authors are expected to declare anything that might have affected their impartiality in that work. Unfortunately, we very rarely do anything with those disclosures. It is as if by disclosing a potential competing interest, any effects on the research are supposed to either magically disappear or readers will somehow be able to magically account for their influence on the conclusions.

Let’s say you are reading a news story about a clinical trial that shows multivitamins will help your children grow up to be smarter, taller, healthier, and stronger. Seems to good to be true? You ask: “Is there a chance that the research has been distorted to make it look better than it really is?” so you try to find a link to the actual article so that you can check to see who the researchers are… and evaluate the quality of the research to determine its validity in context.

It’s actually much harder to do than it sounds, because for a substantial proportion of published articles, authors have failed to disclose things that would be defined as a potential competing interest by any standard definition. And in most cases, the competing interest disclosures are hidden behind paywalls, so you won’t be able to check the disclosures unless you have a subscription (or pay to “rent” the article, or use Sci-Hub to access it for free).

Then you ask: “What should I actually do if I encounter a competing interest disclosure?” At the moment, you have one of the following options: (a) you could either ignore the disclosure and take the research at face value; (b) you can throw the research out and ignore it because the research findings may be compromised; or (c) you could apply the best tools we have available for measuring the risk of bias even though we know they won’t always catch the nuanced ways in which research designs and reporting can be distorted.

Or more simply:  ¯\_(ツ)_/¯

So while we know that competing interests cause the kinds of biases that can lead to widespread harm, they also introduce substantial uncertainty into biomedical research because we simply don’t know if we can safely use research findings to inform our decision-making regardless of whether the authors have disclosed their potential competing interests or not.

But I think a solution to the problem is on its way. The first and most important step in that solution is to bring competing interests disclosures out into the open in ways that we can actually use. We need them to be made public, structured in a proper taxonomy instead of a bunch of free text descriptions, and accessible – in ways that both humans and machines can interpret and make use of.

That is why we think a comprehensive public registry of competing interest disclosures for all researchers is a good idea. People have been working on similar ideas for a while, and we have put many of these things in the review we published in the new journal, Research Integrity and Peer Review.

  • If you are interested, be sure to check out the IOM report (check the review), and read between the lines to try and understand why there might have been some disagreements about the best ways to deal with competing interests. This work eventually led to Convey, which has some similar goals but is not necessarily aimed at being comprehensive, and seems to be progressing nicely towards a different kind of goal from the updates on the webpage.
  • One of the things we didn’t include in the review because I hadn’t seen it until too late, is this browser plugin, which uses information from PMC to display funding and competing interests statements alongside the abstracts in PubMed. You could always click on the PMC link and scroll down to try and find them, but this is a neat idea.
  • It turns out that the idea for creating a standard list of competing interests and maintaining them in a public space was proposed as early as 2007, by Gordon Rubenfeld, in a letter to The Lancet. Maybe it is finally the right time to do this properly.
  • If you are a researcher and you like what you see in the first issue of the Research Integrity and Peer Review journal, then please consider publishing your research there. Besides the very interesting and high-profile editorial board, you might even have your manuscript handled by me as an associate editor.

Of course there are more things that will need to be done once we can manage for a much more comprehensive, consistent, and accessible way to disclose competing interests in research, but those come down to treating competing interests just like any other confounding variable. We are currently working on both methods and policies that will help us populate the registry in a longitudinal fashion (i.e. a timeline for every researcher who has published anything in biomedical research), and keep it up to date. We are also working on ways to take existing free text disclosures and classify them according to how much of an influence they have had over research findings in the past, and on a scale that has been virtually impossible until very recently.

Also, check out a more precise description of the idea in this related opinion piece published in Nature today. As usual, I will add to this post and link to any responses, media, comments, interesting tweets, etc. below as I spot them online.

Twitter users with anti-vaccine opinions are relatively easy to spot if we can measure their misinformation exposure

So…I have been systematically collecting tweets about human papillomavirus (HPV) vaccines since October 2013. We now have over two hundred thousand tweets that included keywords related to HPV vaccines, and the first of two pieces of research we have undertaken using these data has just been published in the Journal of Medical Internet Research. It covers 6 months, 83,551 tweets from 30,621 users connected to each other through 957,865 social connections. The study question is a relatively simple one – we wanted to find out about how many people are tweeting “anti-vaccine” opinions about HPV vaccines, the diversity of their concerns, and how the misinformation exposure is distributed throughout the Twitter communities.

What we found was in some ways surprising – around 24% of the tweets about HPV vaccines were classified as “negative” (more on this later). To me, this seems like a very large proportion given that only around 2% of adults are actually refusing vaccinations for their children. In other ways, I’m less surprised because of how many people have so many other unusual beliefs, and the number of surveys that suggest that 20% to 30% of adults believe that vaccines cause autism.

Looking at how people follow each other within the group of 30,621 users, we found that around 29% of everyone who tweeted about HPV vaccines were exposed to a majority of these “negative” tweets because of who they follow.

To classify the tweets as either “negative” or “neutral/positive”, we used supervised machine learning classifiers that were slightly different to the normal kinds of classifiers that just use information about the text to examine the sentiment of a tweet. I’ll be talking about these machine learning classifiers at the MEDINFO conference in Sao Paulo this August.

What we really wanted to know was how many Twitter users were being exposed to this negative kind of information – usually anecdotes about harm, conspiracy theories, complete fabrications, or some strange amalgamation of all of them – whether these users mostly grouped together, and how far their information reached across communities that might be making decisions about HPV vaccines for themselves or their children.

exposure_follower_network
A network of 30,621 Twitter users who posted tweets about HPV vaccines in a six month period. Users in orange were exposed mostly to negative opinions. Circles are users, larger ones have more followers within this group of users. Users more closely connected are generally positioned closer to each other in the picture.

We also wanted to know a bit more about the reach of the actual science and clinical evidence that is being published in the area. As researchers, we know that there are now studies showing that the HPV vaccine is safe and that there is early evidence of effectiveness in the prevention of cervical cancer, but we don’t really know who might be “exposed” to that kind of information.

Perhaps unsurprisingly, the people producing the science of HPV vaccines were located pretty much as far away as they could possibly be from the people exposed mostly to negative opinions. Most of the tweets linked directly to peer-reviewed articles came from the people in the very top left section of the network illustration above.

The main contribution of our study was to determine how much more likely it is that a user who was previously exposed to negative opinions would be to then tweet a negative opinion. The answer was: “a lot more”.

But to address the reasons why users’ opinions were relatively easy to predict if we know about the information they were exposed to in the past, we have to do a lot more work…

It could be that the opinions were “contagious” and spread through the community. It might also be that people end up forming “homophilous” connections with other users who express the same negative opinions about HPV vaccines. The much more likely explanation is that people who share opinions about all kinds of other things besides HPV vaccines (like guns, religion, politics, conspiracies, organic vegetables, crystals, and magical healing water) are more likely to be connected to each other, and their opinions about HPV vaccines are due to the breadth of misinformation that spreads to them from influential news organisations, celebrities, friends, and magical water practitioners.

It is important that we are careful to explain that the study only demonstrates an association between what people are exposed to in the past, and the direction of their expressed opinions after that. It does not show causation, and it does not tell us how those people came to believe what they do.

The study does tell us something important about how we might be able to estimate risks of poor vaccination decision-making within particular communities in space and time. One of the things we would like to be able to do is to examine where the concentrations of misinformation exposure are distributed geographically in a couple of countries (US and Australia – because that is where we know best), as a way of helping public health organisations better understand who might be vaccine anxious (or at risk of becoming vaccine anxious), and the specific concerns they might have. Because remember, only 2% of adults are conscientiously refusing to vaccinate their children, but an awful lot more of them might be forming their opinions based on the awful misinformation that spreads through the communities they inhabit.

How to predict the conclusion of a review without even reading it…

Short version: We published a new article in the Journal of Clinical Epidemiology all about selective citation in reviews of neuraminidase inhbitors – like Tamiflu and Relenza.

Lots of reviews get written about drugs (especially the ones that get prescribed often), and the drugs used to treat and prevent influenza are no exception. There are more reviews written than there are randomised controlled trials, and I think it is hard to justify why doctors and their patients would need so many different interpretations of the same evidence. When too many reviews are written in this way, I like to call it “flooding”.

The reason for why there are so many reviews written probably has something to do with a problem that has been written about many times over by people much more eloquent than I am: when marketing is disguised as clinical evidence.

We recently undertook some research to try and understand how authors of reviews (narrative and systematic) manage to come up with conclusions that appear to be diametrically opposed. For the neuraminidase inhibitors (e.g. Tamiflu, Relenza), conclusions in reviews varied from recommending the early use in anyone who looks unwell, or massive global stockpiling for preventative use, to others who question the very use of the drug in clinical practice and raise safety concerns. We hypothesised that one of the ways these differences could manifest in reviews was through something called selective citation bias.

Selective citation bias happens when authors of reviews are allowed to pick and choose what they cite in order to present the evidence in ways that fit their predetermined views. And of course, we often associate this problem with conflicts of interest. This has in the past led to drugs being presented as safe and effective (repeatedly) when they simply aren’t.

By the way, here’s a picture of approximately where I am right now while I’m writing this quick update. I’m on a train between Boston and NYC in the United States, passing through a place called New Haven.

train

To test our hypothesis about selective citation bias, we did something quite new and unusual using the citation patterns among the reviews of neuraminidase inhibitors. We looked at 152 different reviews published since 2005, as well as the 10,086 citations in the reference lists pointing at 4,574 unique articles. Two members of the team (Diana and Joel) graded the reviews as favourable or otherwise, and when they both agreed that the review presented the evidence favourably, we put that in the favourable pile. The majority of reviews (61%) ended up in this group.

We then did two things: we undertook a statistical analysis to see if we could find articles that were by themselves much more likely to be cited by favourable reviews. And we constructed a bunch of classifiers using supervised machine learning algorithms to see how well we could predict which reviews were favourable by looking only at the reference lists of the articles.

What we found was relatively surprising – we could predict a favourable conclusion with an accuracy of 96.5% (in sample) only using the reference lists, and without actually looking at the text of the review at all.

A further examination of the articles that were most useful (in combination) for predicting the conclusions of the reviews suggested that the not-favourable pile tended to cite studies about viral resistance much more often than their favourable counterparts.

What we expected to find, but didn’t, was that industry-funded studies would be over-represented in favourable reviews. To me, the lack of a finding here means that the method we devised was probably better at finding what was “missing” from the reference lists of the majority rather than what is over-represented in the majority. The maths on this makes sense too.

So we think that applying machine learning to the metadata from published reviews could be useful for editors trying to review new narrative reviews. More importantly, when faced with multiple reviews that clearly disagree with each other, these methods could be used to help identify what’s missing from reviews in order to restore some balance in the representation of primary clinical evidence in things like reviews and guidelines.