Somewhere along the way, I began to notice that I developed a really low tolerance for the abuse of legitimate research and statistics in an effort to garner a great headline. When I see these headlines, I’m usually among the first to dig into the “research” and figure out the real story. What’s missing? What methods were used? Do the actual conclusions match the headlines? A favorite of mine is the chart at the top of this post (courtesy of the P.A.P. Blog). Not every headline purported to be backed by statistics makes sense even if it appears to on the surface. This one appears to show that increasing the number of lemons imported to the US from Mexico reduces highway fatalities.
Healthcare social media is a pretty hot topic of course, so getting anything out about this topic is sure to get some attention. This fact makes me even more suspect when I see a headline pop up about some new research finding that’s published in major media outlets. Of course, where I see these things come up first is on Twitter where there’s an echo chamber effect of people sharing the same story back and forth and with hashtags that make it simple to follow along. (Pro Tip: stay tuned to the end of this post for a sneak preview of some unpublished Pew Research data about healthcare social media use).
One research study that came up recently and I still see from time to time pop up came from the National Research Corporation. They released the results of a survey via press release with the enticing title: “1 in 5 Americans Use Social Media for Health Care Information.” Here are the first two paragraphs of the press release:
“LINCOLN, Neb. – February 28, 2011 – One in five Americans use social media websites as a source of health care information, according to National Research Corp.’s Ticker survey, the largest, most up-to-date poll on consumer health care opinions and behaviors.
Facebook topped the list of available websites, with 94 percent of respondents indicating they’ve used the popular social network to gather information on their health care. 32 percent used YouTube, a video sharing site. Twitter, an emerging micro-blog site for B2C communication, landed in third with only 18 percent of respondents – tying with MySpace. FourSquare, a location-based website, garnered only 2 percent response.”
I have a lot of problems with this bit of research and I’m going to outline them not to attack this particular study, but as a lesson to everyone about how to these things the right way and what I believe is the wrong way. There are several issues that I see over and over again with data like this, so I’ll list these here, so that you are a bit more alert to potential problems with the research you come across each day.
- Issue 1: “Rounding Up”: The headline of this release is “one in five Americans”, but the actual number from their data is 15.8%. One in five is 20% and makes for a nice round number and sounds much better than “one in 6.25 Americans” or “4 in 25 Americans” (if you don’t like quarter people). For those who don’t like to do the math, “1 in 5” is 25% higher than “1 in 6.25,” which is pretty significant.
- Issue 2: “Transparency of Data”: You’re probably asking, “how is it that Jonathan that you know the actual number and I don’t?” Well, I asked the company for the actual results so I could review them for myself and they sent me this spreadsheet. That’s the only place you can find the real numbers. Without it, you’d have to get all your information from a press release. This is issue number two for me. If you know that you’re releasing a somewhat provocative piece of data, why not show the full information? Why only make a press release available? If you do this, you shouldn’t expect anyone with even a slight academic bent to place any confidence in the results.
- Issue 3: “Proper Context”: Paragraph (actually sentence) number one of the release gives us the one in five number and mentions how big the survey was. Paragraph two leads with: “Facebook topped the list of available websites, with 94 percent of respondents indicating they’ve used the popular social network to gather information on their health care.” What’s missing here is a very important qualifier, which if included would make the sentence read like this (I added the bold): “Facebook topped the list of available websites, with 94 percent of respondents who said they use social media as a source of healthcare information indicating they’ve used the popular social network to gather information on their health care.” There’s a big difference between these two sentences. One seems to imply that 94% of Americans use Facebook as a source of healthcare information. The other implies that 94% of the 1 in 5 people who use social media as a source of healthcare information select Facebook as their top source. So, that’s actually not 94% of people, but rather 18.8% of Americans (94% of 20%) and, as we’ve already seen, this should be 14.8% of Americans indicated Facebook was their top source since the real number isn’t 20% but 15.8%.
The problem with reporting data like this is that people quickly take bits out of context. This really isn’t National Research’s fault, but I think being very explicit about sound bites of data is critical when you release them so as to minimize any chance of pieces being taken out of context.
- Issue 4: “Mixing Data and Opinion”: Researchers are expected to provide some interpretation of the their data when they report it. It’s the “Discussion” section of any good research write up. However, a critical point here is that what was found in the study and what is interpretation of data need to be clearly separated: objective versus subjective. That is, I need to know when something was stated by respondents and something is the opinion of the author. Case in point from the National Research press release: “Americans think highly of the usability of social media but are tempered in crowning it the premiere source of health care information when considering all options.” This is “supported” by two other pieces of data from the survey:
“When asked social media’s influence, 1 in 4 respondents said it was “very likely” or “likely” to impact their future health care decisions.”‘
“When asked their level of trust in social media, 32 percent said “very high” or “high”, only 7.5 percent said “very low”.”
I’m not sure how these two bits of data support either “Americans think highly of the usability of social media” or “crowning it the premiere source of health care information when considering all options.” What the statement seems to imply is that, first, Americans think highly of the usability of social media (i.e., it’s easy to use) and, second, that they are in some way are considering it as a potential substitute to other sources of healthcare information. This would be fine except for the fact that there are no survey questions related to either point. Again, how do I know this and you don’t? I asked for the survey questions (see Issue 2 again), which you can find here. There are no questions about the “usability” of social media nor any questions about how social media sources rank against any other sources when it comes to healthcare. I don’t mind if someone makes this leap, but temper it with a clear statement indicating that it’s opinion instead of implying that it comes from the actual results.
- Issue 5: “Definitions”: One of my biggest issues with particular study is the use of the term “healthcare information,” as in “One in five Americans use social media websites as a source of health care information.” What does this mean? This could range from anything from doing in-depth, fact-based, serious research on a medical condition to mentioning in a status update that you hit your funny bone. When you leave it up to respondents to define a term like this (as National Research did), expect people to interpret what this means in their own way. They asked: “Do you use social media (e.g. Facebook, Twitter, MySpace) as a source of healthcare information?” My response would be: “It depends on what you mean by healthcare information,” but that wasn’t an option. This is a critical point because not only do respondents interpret what this means, but so do those reading the results. Again, one person reads “as a source of healthcare information” as serious research and another reads it as throwaway comments on a friend’s funny update.
- Issue 6: “Getting the Details Right”: I first found out about National Research’s survey when I read about it on CNN’s “The Chart” blog. It’s a pretty reliable source of good information and is written by people that I respect. However, they got this story wrong. CNN reported this in their review of the survey: “In the survey of nearly 23,000 people in the United States, 41% said they use social media as a source of health care information. For nearly all of them – 94% – Facebook was their site of choice, with YouTube coming in a distant second at 32% [emphasis added].” The problem is that 41% isn’t the actual number. As we’ve already seen, the actual number is 15.8% use social media as a source of healthcare information. I would have even accepted one in five, as National Research put in their press release. How did CNN come up with 41%?
CNN and National Research share the blame on this one. If you take a look at the detailed spreadsheet with the full findings of the survey, you’ll see how this happened. Here’s the piece of the spreadsheet in question:
What you see here, at a glance, is exactly what CNN reported. 40.8% “use social media as a source of healthcare information.” However, upon closer inspection, what you see is that those percentages shouldn’t be percentages at all, because they actually represent ages. That is, among those who said they use social media as a source of healthcare information, the average age of respondents was 40.8. For those who don’t use social media this way, the average age was 47.8. I’ve clarified this with National Research and they confirmed that this is indeed what these rows are supposed to communicate.
So, regarding getting the details right, CNN should have been more careful in how they reviewed the results, but National Research has an obligation to report their findings in a manner that is clear and without errors. CNN still has the wrong (much higher) data in their article, which makes the findings of the survey seem even more sensational.
- BONUS…Issue 7: “Correlation versus Causation”: This issue wasn’t something that I observed with National Research’s work, but it is something I see all the time and, in fact, represents the most dangerous studies, so I’m going to include it here. The issue is when authors of surveys report a connection between results as causal versus correlated. It’s a finer, but absolutely critical point. Causation means that one thing caused another to happen. For example, I dumped a bucket of water on you, which caused you to get wet. Correlation means that that two variables are somehow related. For example, people who wear bathing suits are wet.
The causation part always makes sense to people, but the correlation bit is a bit trickier. Basically, the idea is that two pieces of information or variables tend to move in the same direction (positive correlation) or in the exact opposite (negative correlation). The other alternative, of course, is when there is no correlation and the two variables move independent of one another. So, using my “people who wear bathing suits are wet” example, this only says that people who wear bathing suits also tend to get wet (e.g., because they go in a pool). This isn’t always the case though, so the correlation might not be 100% because you don’t always get wet when you wear a bathing suit (i.e., you don’t go in the water). What this statement doesn’t say, however, is that wearing a bathing suit causes you to get wet. In addition, being wet doesn’t mean that you have on a bathing suit. The two pieces of information are related, but one doesn’t lead to another.
So what?, you’re asking. The issue is when we present correlation as causation. This is best shown with an example. One thing I see all the time is companies trying to figure out how much a Facebook “fan” (now “Like”) is worth. One of the studies that bothers me the most (and that got a massive amount of industry press) was a finding from Syncapse that showed a Fan was worth $136.38 (PDF of full study). The way it was reported all over was that for every fan you get, it’s worth an additional $136.38 in revenue to the company. That is, the very fact of being a fan caused people to spend $138.38 more on the brand than they ordinarily would. That means you could spend $136.37 to acquire a fan and you’ll still come out ahead if you were a marketer. That’s a big marketing insight. Hmmm…
Sound suspicious to you? It did to me, but I thought I was the only one since no one else seemed to want to point out the flaw and somewhat obvious observation. Being a fan doesn’t cause you to spend more on the brand. You spend more on the brand than the average person because you like the brand. Because you like the brand, you become a fan. If you never use the brand or never heard of it, you’re probably not going to become a fan. So, your fans are made up almost entirely of people who spend more on your brand than average people…that’s why they’re called fans after all. Being a fan on Facebook doesn’t cause you to spend more. However, there is a correlation between being a fan on Facebook and how much you spend on the brand. They are related in some way (which is pretty obvious in this case).
Causation versus correlation. Don’t get these mixed up. Don’t accept any study that seems to mix these up.
Those are my issues with a lot of the research I see come out these days. Watch for it if you’re reading research and if you’re publishing any. I will be.
So, what’s the actual number? How many people actually use social media for healthcare information? I looked to Pew Research and Susannah Fox for this since I’ve been through nearly every single one of their studies and dissected each and consistently find the highest standards of research practices. I asked Susannah about her views on the National Research study and she had this to say: “It worries me that people are so eager to promote a sensational headline. The goal of research should be to help people make good decisions based on sound data. Facebook is not a dominant source for health information. Not even close.” I’m obviously in agreement.
Susannah was able to share with me some new data from Pew that isn’t published yet, but will be made public later this month. She gave me some data from the 2010 version of the study, “The Social Life of Health Information.” You can access the 2008 version of the information here. So, here’s your sneak preview and the information that was shared with me for this post: “62% of internet users are on social network sites and, of those, just 15% get health information on the sites.”
What’s ironic about this, is that this actually sounds fairly close to the National Research findings. It turns out to be close…maybe…but it does highlight another subtle, yet important difference between the two studies. Pew asked people first if they use the internet and then, what percentage use social network sites (answer: 62%). National Research says: “1 in 5 Americans Use Social Media for Health Care Information.” However, they don’t account for Americans that don’t even use the Internet and those that use the Internet, but not social media sites. It doesn’t appear that they asked either question. However, it is an online survey, so I’ll assume that 100% of their sample uses the Internet, but 100% of Americans don’t. The next question should be: “Of those that use the Internet, what percentage use social media sites?” If they use social media sites, then you can ask “do you use social media sites for healthcare information?” If you don’t ask these questions, then you can’t claim “1 in 5 Americans” (which implies of ALL Americans) because what you actually are showing is “1 in 5 Americans who use the Internet and social media sites use social media for healthcare information.” Again, an important thing to pay attention to especially if you plan to compare the results of different studies…which you should never do.
So, before you go out and quote the latest survey, dig in a bit deeper. Dig in even further if you’re planning on doing something different based on the results. There are lots of well-intentioned studies that end up being misleading and, unfortunately, a bunch of intentionally misleading ones as well. I don’t think National Research’s study fits into the latter category, but it is a good illustration of the former. When you see a study like this in the future and aren’t quite sure about whether you should believe the results, feel free to share it with me on Twitter or contact me. I love seeing them and looking for what’s good and bad.