Lies, Damn Lies, and Pharma Social Media Statistics

Lies, Damn Lies, and Statistics

Dose of Digital Mini White Paper

Somewhere along the way, I began to notice that I developed a really low tolerance for the abuse of legitimate research and statistics in an effort to garner a great headline. When I see these headlines, I’m usually among the first to dig into the “research” and figure out the real story. What’s missing? What methods were used? Do the actual conclusions match the headlines? A favorite of mine is the chart at the top of this post (courtesy of the P.A.P. Blog). Not every headline purported to be backed by statistics makes sense even if it appears to on the surface. This one appears to show that increasing the number of lemons imported to the US from Mexico reduces highway fatalities.

Healthcare social media is a pretty hot topic of course, so getting anything out about this topic is sure to get some attention. This fact makes me even more suspect when I see a headline pop up about some new research finding that’s published in major media outlets. Of course, where I see these things come up first is on Twitter where there’s an echo chamber effect of people sharing the same story back and forth and with hashtags that make it simple to follow along. (Pro Tip: stay tuned to the end of this post for a sneak preview of some unpublished Pew Research data about healthcare social media use).

One research study that came up recently and I still see from time to time pop up came from the National Research Corporation. They released the results of a survey via press release with the enticing title: “1 in 5 Americans Use Social Media for Health Care Information.” Here are the first two paragraphs of the press release:

“LINCOLN, Neb. – February 28, 2011 – One in five Americans use social media websites as a source of health care information, according to National Research Corp.’s Ticker survey, the largest, most up-to-date poll on consumer health care opinions and behaviors.

Facebook topped the list of available websites, with 94 percent of respondents indicating they’ve used the popular social network to gather information on their health care. 32 percent used YouTube, a video sharing site. Twitter, an emerging micro-blog site for B2C communication, landed in third with only 18 percent of respondents – tying with MySpace. FourSquare, a location-based website, garnered only 2 percent response.”

I have a lot of problems with this bit of research and I’m going to outline them not to attack this particular study, but as a lesson to everyone about how to these things the right way and what I believe is the wrong way. There are several issues that I see over and over again with data like this, so I’ll list these here, so that you are a bit more alert to potential problems with the research you come across each day.

  • Issue 1: “Rounding Up”: The headline of this release is “one in five Americans”, but the actual number from their data is 15.8%. One in five is 20% and makes for a nice round number and sounds much better than “one in 6.25 Americans” or “4 in 25 Americans” (if you don’t like quarter people). For those who don’t like to do the math, “1 in 5” is 25% higher than “1 in 6.25,” which is pretty significant.
  • Issue 2: “Transparency of Data”: You’re probably asking, “how is it that Jonathan that you know the actual number and I don’t?” Well, I asked the company for the actual results so I could review them for myself and they sent me this spreadsheet. That’s the only place you can find the real numbers. Without it, you’d have to get all your information from a press release. This is issue number two for me. If you know that you’re releasing a somewhat provocative piece of data, why not show the full information? Why only make a press release available? If you do this, you shouldn’t expect anyone with even a slight academic bent to place any confidence in the results.
  • Issue 3: “Proper Context”: Paragraph (actually sentence) number one of the release gives us the one in five number and mentions how big the survey was. Paragraph two leads with: “Facebook topped the list of available websites, with 94 percent of respondents indicating they’ve used the popular social network to gather information on their health care.” What’s missing here is a very important qualifier, which if included would make the sentence read like this (I added the bold): “Facebook topped the list of available websites, with 94 percent of respondents who said they use social media as a source of healthcare information indicating they’ve used the popular social network to gather information on their health care.” There’s a big difference between these two sentences. One seems to imply that 94% of Americans use Facebook as a source of healthcare information. The other implies that 94% of the 1 in 5 people who use social media as a source of healthcare information select Facebook as their top source. So, that’s actually not 94% of people, but rather 18.8% of Americans (94% of 20%) and, as we’ve already seen, this should be 14.8% of Americans indicated Facebook was their top source since the real number isn’t 20% but 15.8%.

The problem with reporting data like this is that people quickly take bits out of context. This really isn’t National Research’s fault, but I think being very explicit about sound bites of data is critical when you release them so as to minimize any chance of pieces being taken out of context.

  • Issue 4: “Mixing Data and Opinion”: Researchers are expected to provide some interpretation of the their data when they report it. It’s the “Discussion” section of any good research write up. However, a critical point here is that what was found in the study and what is interpretation of data need to be clearly separated: objective versus subjective. That is, I need to know when something was stated by respondents and something is the opinion of the author. Case in point from the National Research press release: “Americans think highly of the usability of social media but are tempered in crowning it the premiere source of health care information when considering all options.” This is “supported” by two other pieces of data from the survey:

“When asked social media’s influence, 1 in 4 respondents said it was “very likely” or “likely” to impact their future health care decisions.”‘

“When asked their level of trust in social media, 32 percent said “very high” or “high”, only 7.5 percent said “very low”.”

I’m not sure how these two bits of data support either “Americans think highly of the usability of social media” or “crowning it the premiere source of health care information when considering all options.” What the statement seems to imply is that, first, Americans think highly of the usability of social media (i.e., it’s easy to use) and, second, that they are in some way are considering it as a potential substitute to other sources of healthcare information. This would be fine except for the fact that there are no survey questions related to either point. Again, how do I know this and you don’t? I asked for the survey questions (see Issue 2 again), which you can find here. There are no questions about the “usability” of social media nor any questions about how social media sources rank against any other sources when it comes to healthcare. I don’t mind if someone makes this leap, but temper it with a clear statement indicating that it’s opinion instead of implying that it comes from the actual results.

  • Issue 5: “Definitions”: One of my biggest issues with particular study is the use of the term “healthcare information,” as in “One in five Americans use social media websites as a source of health care information.” What does this mean? This could range from anything from doing in-depth, fact-based, serious research on a medical condition to mentioning in a status update that you hit your funny bone. When you leave it up to respondents to define a term like this (as National Research did), expect people to interpret what this means in their own way. They asked: “Do you use social media (e.g. Facebook, Twitter, MySpace) as a source of healthcare information?” My response would be: “It depends on what you mean by healthcare information,” but that wasn’t an option. This is a critical point because not only do respondents interpret what this means, but so do those reading the results. Again, one person reads “as a source of healthcare information” as serious research and another reads it as throwaway comments on a friend’s funny update.
  • Issue 6: “Getting the Details Right”: I first found out about National Research’s survey when I read about it on CNN’s “The Chart” blog. It’s a pretty reliable source of good information and is written by people that I respect. However, they got this story wrong. CNN reported this in their review of the survey: “In the survey of nearly 23,000 people in the United States, 41% said they use social media as a source of health care information. For nearly all of them – 94% – Facebook was their site of choice, with YouTube coming in a distant second at 32% [emphasis added].” The problem is that 41% isn’t the actual number. As we’ve already seen, the actual number is 15.8% use social media as a source of healthcare information. I would have even accepted one in five, as National Research put in their press release. How did CNN come up with 41%?

CNN and National Research share the blame on this one. If you take a look at the detailed spreadsheet with the full findings of the survey, you’ll see how this happened. Here’s the piece of the spreadsheet in question:

What you see here, at a glance, is exactly what CNN reported. 40.8% “use social media as a source of healthcare information.” However, upon closer inspection, what you see is that those percentages shouldn’t be percentages at all, because they actually represent ages. That is, among those who said they use social media as a source of healthcare information, the average age of respondents was 40.8. For those who don’t use social media this way, the average age was 47.8. I’ve clarified this with National Research and they confirmed that this is indeed what these rows are supposed to communicate.

So, regarding getting the details right, CNN should have been more careful in how they reviewed the results, but National Research has an obligation to report their findings in a manner that is clear and without errors. CNN still has the wrong (much higher) data in their article, which makes the findings of the survey seem even more sensational.

  • BONUS…Issue 7: “Correlation versus Causation”: This issue wasn’t something that I observed with National Research’s work, but it is something I see all the time and, in fact, represents the most dangerous studies, so I’m going to include it here. The issue is when authors of surveys report a connection between results as causal versus correlated. It’s a finer, but absolutely critical point. Causation means that one thing caused another to happen. For example, I dumped a bucket of water on you, which caused you to get wet. Correlation means that that two variables are somehow related. For example, people who wear bathing suits are wet.

The causation part always makes sense to people, but the correlation bit is a bit trickier. Basically, the idea is that two pieces of information or variables tend to move in the same direction (positive correlation) or in the exact opposite (negative correlation). The other alternative, of course, is when there is no correlation and the two variables move independent of one another. So, using my “people who wear bathing suits are wet” example, this only says that people who wear bathing suits also tend to get wet (e.g., because they go in a pool). This isn’t always the case though, so the correlation might not be 100% because you don’t always get wet when you wear a bathing suit (i.e., you don’t go in the water). What this statement doesn’t say, however, is that wearing a bathing suit causes you to get wet. In addition, being wet doesn’t mean that you have on a bathing suit. The two pieces of information are related, but one doesn’t lead to another.

So what?, you’re asking. The issue is when we present correlation as causation. This is best shown with an example. One thing I see all the time is companies trying to figure out how much a Facebook “fan” (now “Like”) is worth. One of the studies that bothers me the most (and that got a massive amount of industry press) was a finding from Syncapse that showed a Fan was worth $136.38 (PDF of full study). The way it was reported all over was that for every fan you get, it’s worth an additional $136.38 in revenue to the company. That is, the very fact of being a fan caused people to spend $138.38 more on the brand than they ordinarily would. That means you could spend $136.37 to acquire a fan and you’ll still come out ahead if you were a marketer. That’s a big marketing insight. Hmmm…

Sound suspicious to you? It did to me, but I thought I was the only one since no one else seemed to want to point out the flaw and somewhat obvious observation. Being a fan doesn’t cause you to spend more  on the brand. You spend more on the brand than the average person because you like the brand. Because you like the brand, you become a fan. If you never use the brand or never heard of it, you’re probably not going to become a fan. So, your fans are made up almost entirely of people who spend more on your brand than average people…that’s why they’re called fans after all. Being a fan on Facebook doesn’t cause you to spend more. However, there is a correlation between being a fan on Facebook and how much you spend on the brand. They are related in some way (which is pretty obvious in this case).

Causation versus correlation. Don’t get these mixed up. Don’t accept any study that seems to mix these up.

Those are my issues with a lot of the research I see come out these days. Watch for it if you’re reading research and if you’re publishing any. I will be.

So, what’s the actual number? How many people actually use social media for healthcare information? I looked to Pew Research and Susannah Fox for this since I’ve been through nearly every single one of their studies and dissected each and consistently find the highest standards of research practices. I asked Susannah about her views on the National Research study and she had this to say: “It worries me that people are so eager to promote a sensational headline. The goal of research should be to help people make good decisions based on sound data. Facebook is not a dominant source for health information. Not even close.” I’m obviously in agreement.

Susannah was able to share with me some new data from Pew that isn’t published yet, but will be made public later this month.  She gave me some data from the 2010 version of the study, “The Social Life of Health Information.” You can access the 2008 version of the information here. So, here’s your sneak preview and the information that was shared with me for this post: “62% of internet users are on social network sites and, of those, just 15% get health information on the sites.”

What’s ironic about this, is that this actually sounds fairly close to the National Research findings. It turns out to be close…maybe…but it does highlight another subtle, yet important difference between the two studies. Pew asked people first if they use the internet and then, what percentage use social network sites (answer: 62%). National Research says: “1 in 5 Americans Use Social Media for Health Care Information.” However, they don’t account for Americans that don’t even use the Internet and those that use the Internet, but not social media sites. It doesn’t appear that they asked either question. However, it is an online survey, so I’ll assume that 100% of their sample uses the Internet, but 100% of Americans don’t. The next question should be: “Of those that use the Internet, what percentage use social media sites?” If they use social media sites, then you can ask “do you use social media sites for healthcare information?” If you don’t ask these questions, then you can’t claim “1 in 5 Americans” (which implies of ALL Americans) because what you actually are showing is “1 in 5 Americans who use the Internet and social media sites use social media for healthcare information.” Again, an important thing to pay attention to especially if you plan to compare the results of different studies…which you should never do.

So, before you go out and quote the latest survey, dig in a bit deeper. Dig in even further if you’re planning on doing something different based on the results. There are lots of well-intentioned studies that end up being misleading and, unfortunately, a bunch of intentionally misleading ones as well. I don’t think National Research’s study fits into the latter category, but it is a good illustration of the former. When you see a study like this in the future and aren’t quite sure about whether you should believe the results, feel free to share it with me on Twitter or contact me. I love seeing them and looking for what’s good and bad.

Related posts

23 thoughts on “Lies, Damn Lies, and Pharma Social Media Statistics

  1. “Facts are stubborn things, but statistics are more pliable.” – Mark Twain.

    Very nice post Jonathan. Just more proof that just because you read it, don’t make it true. (Except your post, of course!)

    1. I’ve got to write that quote down for future use. I suspect I’ll be needing it.

  2. Entertaining and insightful piece on use of statistics to support value of social media to industry.

    1. Thanks…it’s tough to do “statistics” and “entertaining” at the same time. If I managed to do this, then my work is done…for now.

  3. Thanks for this article Jonathan. One of the reasons that I read your blog regularly is because it is evident that you do a lot of research prior to posting. I think many of us are aware that some information out there is misleading or inaccurate, but we don’t always have the time to do the research. We appreciate that you’re doing some of it for us 🙂

    1. I’m sort of a research and information nerd/snob, so I can’t really do it any other way. Thanks for being a regular reader and supporter.

  4. Jess seilheimer

    Acceptable and warranted rant. Cheers to proper research trial/survey design & methodology (including proper samples sizes & inclusion criteria for worthiness of survey subjects.) Double cheers for industry folks & journalists who know how to interpret data and appropriately report on them.

    1. Thanks, Jess. I know you share my healthy skepticism for things like this, so I was figuring you’d appreciate this post.

  5. Agree, great and thoughtful post Jonathan. Thx for doing all this digging and sharing. Good stuff!

  6. Gilles Frydman and Jamie Heywood would love to hear that their sites are top of mind for you, if not for the general U.S. population. But it’s simply not possible that respondents to our survey were thinking of ACOR & PatientsLikeMe.

    Here are the questions that led to our estimates:

    Q6ab: Do you use the internet, at least occasionally? Do you send or receive email, at least occasionally? [Asked of all adults]

    WEB A/B Act87: Do you ever use the internet to use a social networking site like MySpace, Facebook or [Asked of all internet users]

    Thinking specifically about what you have done on social networking sites like Facebook and MySpace, have you ever used these sites to get health information? [Asked only of those who answered “yes” to Act87]

  7. In January, the big news was the 71.2% of the US population is “on” Facebook. Last week, the big news was the a little over 50% of the US population is “on” Facebook. Naturally, it’s not clear what either of these studies means by “on.” And I suppose it also depends on what your definition of “is” is. (Sorry — could not let that one pass!)

    1. Hi Angelique,

      If you’re interested, my colleagues here at Pew Internet have unpacked that topic a bit. Here’s the section of our site where we publish all our reports, presentations, etc. related to social networking:

  8. Joe, your post is a wonderful example (along with the above post by Jonathan) of how a blogger can set right a story that a mainstream publication (like the Washington Post or CNN) gets wrong. I’m glad you wrote it, I’m sorry you have had so many occasions to update it!

  9. Dianne

    How did the Pew research wind up defining health info?? Do you happen to know?

    1. Hi Dianne,

      I direct the Pew Research Center’s health & technology research. We publish our questionnaires and topline data along with our reports, for free, on our site. The most recent data point about social network site use has not yet been published — I provided it to Jonathan early — but I can point you to some relevant, earlier studies.

      Here is a direct link to the 2009 report which contains our published data related to social network site use and health information:

      The Social Life of Health Information

      Click on “Explore Survey Questions” to see the topline results.

      The September 2010 survey has served as the basis for a few reports already published on our site, including this one, which goes into detail about how we ask respondents about health information:

      Health Topics

      Again, just click on the links beneath “Explore Survey Questions” to read how we phrase the questions.

      Our upcoming study will include the complete topline for the September 2010 survey so everyone can see the flow of the questions.

      Please let me know if you have other questions. You can find me on Twitter – @SusannahFox – or email me at sfox at pewinternet dot org.

  10. Abtutt

    This just reinforces one of my favorite sayings…Statistics are like a bikini, what they reveal is nice to look at, but what gets concealed is vital.

    1. Absolutely will be using that quote at some point.

  11. Great article on a personal sore point. One that I would add is “selection bias”. This is particularly common in white papers based on a particular website audience via online surveys…

    I mention this in one of my posts, it’s short… I hope adding a link is ok:

    1. Great point. Definitely one worth adding. So many studies have been doomed by this whether intentional or unintentional.

  12. Kstones

    Excellent post. I would posit that it is equally important for researchers to avoid terms like “just,” as a finding is contextual. To say “just” 15% use Facebook to find healthcare information implies it is small or unimportant. The key is the trajectory and context within larger cultural trends. Just a nit, but words are powerful.

    1. I hadn’t considered this point. Thanks for pointing it out. Something I’ll definitely keep in mind. Those little words do add a bit of editorial to the results and takes away some of the objectivity.

  13. Howard Steinberg

    Jonathan – You nailed it!  I started a quick Google query in hopes of putting some teeth behind my belief that social media is WAY overrated as a source or influence for health decisions and there you were at top position and you delivered. But I’m hanging at the edge of the cliff – what’s the real number?.  Using PEW: can’t be more than 9% (62% x15%) of internet users.  Is my logic right? 

    1. Thanks. Glad the post helped.

      I’d go with the Pew number: “62% of internet users are on social network sites and, of those, just 15% get health information on the sites.” Multiplying the 62% and 15% isn’t necessary because it’s not a distinct sub-group. 15% of people who use the social media sites on the internet get health information from these sites. If you want to know the percentage of Americans (instead of internet and social media users), you’d have to dig into Pew’s numbers a bit more. Same thing if you wanted users of the internet who get health information from social media sites. This would be less than 15%, as you’d be looking at a smaller number from a bigger potential pool (all internet users versus internet and social media users).

Comments are closed.