Will Penguin Freeze Out Pharma?

For pharmaceutical brands in today’s market, search engine optimization is a critical component of every web initiative. Research by ComScore (2010) suggests that over 80% of all Internet sessions start at a search engine and, when we say “search engine optimization,” we generally mean “optimization for Google.” Google searches account for over 80% of search traffic to our client sites; while the second most popular engine, Yahoo, drives less than 10% of that volume:

A major goal for any website project, and the purpose of SEO, is to get the site listed in the top few results on the first page of Google’s results for the most common search terms—the keywords that HCPs are typing into the search engine’s interface around the site’s content.  Unfortunately, many sites that have aggressively pursued that goal – or worked with unscrupulous SEO vendors – have recently been severely penalized by Google.

The Google PageRank Algorithm

Your site’s position in the results is based on Google’s proprietary measure, called “PageRank.” The algorithm by which the search giant calculates PageRank is a closely held secret, but in broad terms, they have divulged that it is comprised of two major factors: on-site content (the relevance of the content on your site to the term being searched) and off-site links (the number of sites containing links to your site, the text content of those hyperlinks, and the authority of the sites containing those links).

Over the last 14 years, Google has made (sometimes daily) tweaks to their PageRank algorithm that reflect the evolving nature of the Internet. During 2007, they increased the importance of “fresh” content with an increasing weight placed on blog and news results linking to your site. Over 2009 and 2010, social reviews – Yelp, FourSquare, and so forth — became significantly more important for many sites, particularly for ranking in the local (Google Places) results. Recently, Google began giving these updates code names – “Panda” was the designation for a series of updates over 2011/12. In April 2012, Google lashed out in a major update (originally called the “Over Optimization Penalty”, then the “Webspam Algorithm Update”, and finally “Penguin”) that specifically targeted “overly optimized” sites.

Now Google and the SEO industry have spent much of the last 10 years locked in a pitched battle. SEO industry experts work very hard figure out how to push sites up the search giant’s results, they sell those services to a variety of clients, then Google changes their algorithm to outwit the methods; the SEO experts go back to the drawing board, while their previously optimized results are penalized if not in alignment with Google’s new rules.

In general, these battles have been fought in those two key areas contributing to PageRank—on-site content and off-site links:

1)     On-site keyword padding. SEO firms have often recommended using desirable keywords in multiple places on a page – URL, title, keywords, description, headlines and links are typical. Where the page content is useful to users searching on those terms, this behavior is rewarded. However, Google is increasingly wary of sites that repeat large chunks of copy on multiple pages, or multiple sites that contain duplicate content. Loading every possible search term into your site’s “keyword” metatag, or putting large chunks of invisible text (text color matching page color) were two early techniques that Google has specifically targeted in previous algorithm adjustments.

2)     Automated link generation. Since the first revelation of the importance of links to your site in building PageRank, several large players in the SEO space have developed sophisticated backlink generation programs. These have ranged from buying ads on popular sites and inserting links into those ad spaces (does anyone remember SearchKing?), to building domains that are largely link farms, to distributing PR that included site links… all attempting to “trick” Google into believing that your site is regarded as useful by a large number of sites.

In 2010, this was a $16.6 billion industry.

The “Penguin” Update

Engineer Matt Cutts has been Google’s SEO guru for a number of years. In announcing the Penguin update, Cutts said: “The change will decrease rankings for sites that we believe are violating Google’s existing quality guidelines…. We see all sorts of webspam techniques every day, from keyword stuffing to link schemes that attempt to propel sites higher in rankings…. While we can’t divulge specifics, because we don’t want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on creating high quality sites that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics.”

The only real way to tell if your site’s ranking was affected by the update is to review your analytics and see if your traffic from Google took a big hit around April 24. If it did, you need to do some work to get back on Google’s “good” list.

According to Glenn Gabe at G-Squared Interactive’s blog, you need to find and disable problematic inbound links, particularly:

  • Paid text links using exact anchor text
  • Comment spam
  • Guest posts on questionable sites
  • Article marketing sites

Open Site Explorer will quickly confirm if your site has a link problem.

Unless your SEO team has invested in dodgy links, it’s far more likely that Google’s targeting of duplicate content will be a problem for pharmaceutical sites for reasons we will examine in a bit. The official Google position is here, or for those who prefer video, here’s Matt Cutts’ take on the problem:

“In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”

Multiple versions of each page (for example web and print versions), content syndication, multiple URLs, and so forth may be a problem, but it will be on a case-by-case basis, not a systematic issue. By contrast, there should be universal concern among pharma marketers around Google’s guideline on boilerplate repetition: “Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.”

Duplicate Content and Pharma

The problem is that almost every pharma company’s legal review team insists on placing “Fair Balance” copy – usually the full Important Safety Information – on every page of their sites, which seems to be in violation of this guideline.

While Google doesn’t automatically see duplicate content as grounds for penalty (only “if it is used in a deceptive way or to manipulate search results”), the concern is that Penguin is totally algorithmic – there’s no human intervention, and consequently there’s potential for the predominance of ISI content to be misinterpreted as an attempt to spam. Since the initial rollout of Penguin on 4/24, there have already been two “clarifying” updates, and with every update there’s potential for our industry’s sites to be reevaluated negatively by the search giant.

Fixing the Problem

So what’s the solution?

Google’s Webmaster Guidelines provide a number of recommendations to minimize the impact of page-level content. You can tell Google to ignore specific pages using meta tags or a robots.txt file, and you can tell Google which version of a page you prefer them to include in the index using canonical tags on the non-preferred versions. But there’s no “official” way of excluding just a portion of a page from the search engine’s crawlers.

This problem is not new. Google’s overall guidelines have not changed significantly in the last 4 years, and forward-thinking pharma site developers have adopted a number of tactics to minimize the potential impact of pervasive ISIs. These solutions include:

1)     Converting the ISI on all interior pages of a site to an image. While this solves the problem of duplicate content in Google – which cannot read text inside images – with substantial ISIs the image can considerably slow down the page load, and in any event an image cannot resize dynamically (for example, to display on a smartphone browser, or when the user is working on a small computer screen).

2)     Using search-engine specific exclusion codes. Within Google’s Search Appliance – a version of Google that is available for internal site search – there exists a set of tags to disable the indexing of content and links using <–googleoff:all–>Material to be ignored<–googleon:all–> tags. The issue is that it’s not clear whether the Google tags apply to the larger search engine. Current thinking  is that the main Google robots will continue to crawl content tagged in this fashion. Another option might be the “robots-nocontent” tag introduced in 2007 by Yahoo!, which provided similar functionality with a class tag that could be added to any element. Unfortunately, the tag was not widely adopted, and since Yahoo has been using the Bing engine since 2009, the tag may not even work for the engine that introduced it.

3)     Dynamically loading the ISI text asynchronously using AJAX. This is the solution we’ve adopted for many sites. In essence, we allow the page to load completely, and then asynchronously replace an ISI placeholder with the actual text of the ISI using JavaScript. Because search engine web crawlers ignore most JavaScript, only the placeholder text is indexed, but a visitor on a conventional browser (including SmartPhones) see the full ISI on every page. The full ISI should be included on the home page of the site, to ensure that it is indexed, and then this technique utilized on all interior pages. There’s currently some debate in the development community around whether the latest iteration of the Google spider will render and index AJAX-driven sites, and we are watching our site results with interest.

4)     Put the ISI in an iFrame. The “brute force” approach is to create a separate HTML page that contains only the ISI, add the page level “no index” metatags and exclusions in robots.txt for that page, then load this page into every page in an iFrame (except the index page, which should contain the ISI as part of the page content). Using CSS to control the size of the iFrame gives some flexibility in display on different screen resolutions, and prevents scrollbars appearing (for most MLR teams among our clients, it’s critical that the ISI not be distinguished from the other page content). In the event that Google’s new crawler does start to index AJAX-driven content, this will be our fallback approach.

Our Recommendation
We’ve  developed a JavaScript class available for download here – courtesy of Dino Gravato) that utilizes the jQuery library and some custom code to accomplish the third option. Feel free to use it as you like. Simply place your ISI into the /js/sisicontent/ directory as a file called isicontent.txt and it will replace the <div id=”isi”> in the HTML at runtime. All necessary JavaScript files have been included (you may omit jquery.js if that library is already in use on the pages of your site). The code has been tested on all browsers with market share over 5% as of May 2012 (including Mobile Safari, Chrome, Firefox, and Internet Explorer 6+), but is supplied without warranty or any guarantee of success.

Next Steps

If you *have* been hit by Penguin, and you think it’s a mistake (or you have remedied the issues you identify with your site), you can submit a reconsideration request here.

The Best Approach to SEO

It’s more critical than ever that you do not attempt to “trick” Google. Since their inception, they’ve been completely transparent, publishing Webmaster Guidelines as part of their blog.

“Webmasters who spend their energies upholding the spirit of the basic principles will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.”

“Provide high-quality content on your pages, especially your homepage… If your pages contain useful information, their content will attract many visitors and entice webmasters to link to your site.”

Follow this advice, and SEO will largely take care of itself.


Tags: , , ,

3 Responses to “Will Penguin Freeze Out Pharma?”

  1. Markus Hartmann June 15, 2012 at 3:17 pm #

    I am not sure if penguin is about duplicate content. What we learned is that is more about low quality content which results in bad bounce rates. 
    The four most important factors about penguin are in my opinion:
    - aggressive use exact-match anchor text
    - Overuse of exact-match domains
    - Low-quality article marketing & blog spam
    - Keyword stuffing in backlinks 
    As a solution for this ISI problem I would recommend the significant use of more unique content all over the website with good keyword-targeting onpage. 
    We are a german based Performance Marketing Agency spezialized on SEO for pharmaceutical companies. Find us online http://www.xeomed.de Thanks for this article.

    • David Cherry June 18, 2012 at 12:56 pm #

      I agree that the focus of Penguin to date is aggressive penalization of sites with backlink stuffing and overly optimized links, but the duplicate content penalty is in Google’s guidelines as a “spam” trigger, and is a specific problem for Pharma sites that have not otherwise violated the Google SEO guidelines. The systemic ISI requirement on many Pharma sites is creating a situation where automated tools such as the Page Similarity Checker register a 75-80% similarity in page content between two pages of the same site, even with completely unique content in the body (excluding ISI) of each page, and this is what I’m advocating that Pharma agencies should proactively address.

    • David Cherry June 18, 2012 at 1:00 pm #

      I agree that the focus of Penguin to date is aggressive penalization of sites with backlink stuffing and overly optimized links, but the duplicate content penalty is in Google’s guidelines as a “spam” trigger, and is a specific problem for Pharma sites that have not otherwise violated the Google SEO guidelines. The systemic ISI requirement on many Pharma sites is creating a situation where automated tools such as the Page Similarity Checker register a 75-80% similarity in page content between two pages of the same site, even with completely unique content in the body (excluding ISI) of each page, and this is what I’m advocating that Pharma agencies should proactively address.