For pharmaceutical brands in today’s market, search engine optimization is a critical component of every web initiative. Research by ComScore (2010) suggests that over 80% of all Internet sessions start at a search engine and, when we say “search engine optimization,” we generally mean “optimization for Google.” Google searches account for over 80% of search traffic to our client sites; while the second most popular engine, Yahoo, drives less than 10% of that volume:
A major goal for any website project, and the purpose of SEO, is to get the site listed in the top few results on the first page of Google’s results for the most common search terms—the keywords that HCPs are typing into the search engine’s interface around the site’s content. Unfortunately, many sites that have aggressively pursued that goal – or worked with unscrupulous SEO vendors – have recently been severely penalized by Google.
The Google PageRank Algorithm
Your site’s position in the results is based on Google’s proprietary measure, called “PageRank.” The algorithm by which the search giant calculates PageRank is a closely held secret, but in broad terms, they have divulged that it is comprised of two major factors: on-site content (the relevance of the content on your site to the term being searched) and off-site links (the number of sites containing links to your site, the text content of those hyperlinks, and the authority of the sites containing those links).
Over the last 14 years, Google has made (sometimes daily) tweaks to their PageRank algorithm that reflect the evolving nature of the Internet. During 2007, they increased the importance of “fresh” content with an increasing weight placed on blog and news results linking to your site. Over 2009 and 2010, social reviews – Yelp, FourSquare, and so forth — became significantly more important for many sites, particularly for ranking in the local (Google Places) results. Recently, Google began giving these updates code names – “Panda” was the designation for a series of updates over 2011/12. In April 2012, Google lashed out in a major update (originally called the “Over Optimization Penalty”, then the “Webspam Algorithm Update”, and finally “Penguin”) that specifically targeted “overly optimized” sites.
Now Google and the SEO industry have spent much of the last 10 years locked in a pitched battle. SEO industry experts work very hard figure out how to push sites up the search giant’s results, they sell those services to a variety of clients, then Google changes their algorithm to outwit the methods; the SEO experts go back to the drawing board, while their previously optimized results are penalized if not in alignment with Google’s new rules.
In general, these battles have been fought in those two key areas contributing to PageRank—on-site content and off-site links:
1) On-site keyword padding. SEO firms have often recommended using desirable keywords in multiple places on a page – URL, title, keywords, description, headlines and links are typical. Where the page content is useful to users searching on those terms, this behavior is rewarded. However, Google is increasingly wary of sites that repeat large chunks of copy on multiple pages, or multiple sites that contain duplicate content. Loading every possible search term into your site’s “keyword” metatag, or putting large chunks of invisible text (text color matching page color) were two early techniques that Google has specifically targeted in previous algorithm adjustments.
2) Automated link generation. Since the first revelation of the importance of links to your site in building PageRank, several large players in the SEO space have developed sophisticated backlink generation programs. These have ranged from buying ads on popular sites and inserting links into those ad spaces (does anyone remember SearchKing?), to building domains that are largely link farms, to distributing PR that included site links… all attempting to “trick” Google into believing that your site is regarded as useful by a large number of sites.
In 2010, this was a $16.6 billion industry.
The “Penguin” Update
Engineer Matt Cutts has been Google’s SEO guru for a number of years. In announcing the Penguin update, Cutts said: “The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines…. We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings…. While we can’t divulge specifics, because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics.”
The only real way to tell if your site’s ranking was affected by the update is to review your analytics and see if your traffic from Google took a big hit around April 24. If it did, you need to do some work to get back on Google’s “good” list.
According to Glenn Gabe at G-Squared Interactive’s blog, you need to find and disable problematic inbound links, particularly:
- Paid text links using exact anchor text
- Comment spam
- Guest posts on questionable sites
- Article marketing sites
Open Site Explorer will quickly confirm if your site has a link problem.
Unless your SEO team has invested in dodgy links, it’s far more likely that Google’s targeting of duplicate content will be a problem for pharmaceutical sites for reasons we will examine in a bit. The official Google position is here, or for those who prefer video, here’s Matt Cutts’ take on the problem:
“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
Multiple versions of each page (for example web and print versions), content syndication, multiple URLs, and so forth may be a problem, but it will be on a case-by-case basis, not a systematic issue. By contrast, there should be universal concern among pharma marketers around Google’s guideline on boilerplate repetition: “Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.”
Duplicate Content and Pharma
The problem is that almost every pharma company’s legal review team insists on placing “Fair Balance” copy – usually the full Important Safety Information – on every page of their sites, which seems to be in violation of this guideline.
While Google doesn’t automatically see duplicate content as grounds for penalty (only “if it is used in a deceptive way or to manipulate search results”), the concern is that Penguin is totally algorithmic – there’s no human intervention, and consequently there’s potential for the predominance of ISI content to be misinterpreted as an attempt to spam. Since the initial rollout of Penguin on 4/24, there have already been two “clarifying” updates, and with every update there’s potential for our industry’s sites to be reevaluated negatively by the search giant.
Fixing the Problem
So what’s the solution?
Google’s Webmaster Guidelines provide a number of recommendations to minimize the impact of page-level content. You can tell Google to ignore specific pages using meta tags or a robots.txt file, and you can tell Google which version of a page you prefer them to include in the index using canonical tags on the non-preferred versions. But there’s no “official” way of excluding just a portion of a page from the search engine’s crawlers.
This problem is not new. Google’s overall guidelines have not changed significantly in the last 4 years, and forward-thinking pharma site developers have adopted a number of tactics to minimize the potential impact of pervasive ISIs. These solutions include:
1) Converting the ISI on all interior pages of a site to an image. While this solves the problem of duplicate content in Google – which cannot read text inside images – with substantial ISIs the image can considerably slow down the page load, and in any event an image cannot resize dynamically (for example, to display on a smartphone browser, or when the user is working on a small computer screen).
2) Using search-engine specific exclusion codes. Within Google’s Search Appliance – a version of Google that is available for internal site search – there exists a set of tags to disable the indexing of content and links using <–googleoff:all–>Material to be ignored<–googleon:all–> tags. The issue is that it’s not clear whether the Google tags apply to the larger search engine. Current thinking is that the main Google robots will continue to crawl content tagged in this fashion. Another option might be the “robots-nocontent” tag introduced in 2007 by Yahoo!, which provided similar functionality with a class tag that could be added to any element. Unfortunately, the tag was not widely adopted, and since Yahoo has been using the Bing engine since 2009, the tag may not even work for the engine that introduced it.
4) Put the ISI in an iFrame. The “brute force” approach is to create a separate HTML page that contains only the ISI, add the page level “no index” metatags and exclusions in robots.txt for that page, then load this page into every page in an iFrame (except the index page, which should contain the ISI as part of the page content). Using CSS to control the size of the iFrame gives some flexibility in display on different screen resolutions, and prevents scrollbars appearing (for most MLR teams among our clients, it’s critical that the ISI not be distinguished from the other page content). In the event that Google’s new crawler does start to index AJAX-driven content, this will be our fallback approach.
If you *have* been hit by Penguin, and you think it’s a mistake (or you have remedied the issues you identify with your site), you can submit a reconsideration request here.
The Best Approach to SEO
It’s more critical than ever that you do not attempt to “trick” Google. Since their inception, they’ve been completely transparent, publishing Webmaster Guidelines as part of their blog.
“Webmasters who spend their energies upholding the spirit of the basic principles will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.”
“Provide high-quality content on your pages, especially your homepage… If your pages contain useful information, their content will attract many visitors and entice webmasters to link to your site.”
Follow this advice, and SEO will largely take care of itself.