
Unique Content has been a watchword of Google since the early days of the Search Engine, back in the 1990’s and lately the drive has been for content that is both unique AND quality.
Content Scraping, (the process of using software to trawl webpages, scraping up quality content to be ‘spun’ out on various blogs) ignores both of these Google directives but is an annoyingly effective, labour and creativity free, black hat method to build links to target websites and give them high PR. The Kind of thing that gives good SEOs a bad name.
Google is getting better at spotting content, and the use of the Canonical tag for the most part, protects the site that is scraped. There are times when this is not enough and for some webmasters the suppression from automatic duplicate content penalties is hard to shift, especially in competitive markets where Google is loath to point out distinct problems on a website, or to take strong action following a reconsideration request. The worst case scenario happens when a website with good quality content doesn’t have a canonical tag, for whatever reason, and the site carrying the Duplicate Content is using these tags; in that event Google’s Index will often decide the scraper is the original and the genuine site is the copy and penalise it accordingly.
The solution to this problem, other than submitting a reconsideration request and hoping for action, is to apply canonical tags, and change the content, and maybe even to go so far as to change the domain over and redirect. All in all a hard penalty for a web master who’s only crime was to write excellent copy.
Up until recently I had always believed that content scraping was malicious and that all sites containing scraped content were part of the chain of black hat optimisation and deserved everything they got when I reported them to Google for copying my or my clients’ content.
However an email from the webmaster of one such site made me think otherwise. I have nothing but the email to go on and hopefully following my response, some action, but if the person who responded to my fairly aggressive contact was on the level then it seems that much of the duplicate content on the web, and many of the spammy domains out there could actually be hijacked parked domains, rather than domains set up for the purpose of receiving stolen content.
In this circumstance the site seems to have been built – and ignored. It was a membership site that gave members access to write blogs, but its sign up and blog entry pages have no anti-robot security, like a captcha system. From what I could see it seems that loads of spam accounts had been set up on the site and then the blog machine went into full speed, generating in excess of 10,000 scraped blogs linking back to a number of websites of various subjects, websites that looked to be of mediocre quality, but selling popular products like Ugg Boots.
So what is to be done? Well the advice I gave to that webmaster was to take control. Add security and then lock all the accounts until the user manually reset their password, with the new security ‘bots couldn’t reopen their accounts. The process continues with the deletion of blogs and accounts not reset within 2 weeks. After this the webmaster is free to try and boost their business properly or to take down the parked website.
This is just one website, but if everyone followed this advice, and if all parked websites were removed, the scrapers would very quickly have a much reduced playing field in which to ply their trade, and may in fact have to resort to buying their own domains, reducing their profits and making it easier for Google to trace and lock down the results of their nefarious (word of the week!) actions.
So if you know a domain that is hosted scraped content, don’t just try wiggle out of the penalty, contact the webmaster directly and see if you can get some action to reduce the risk of duplicate content and make the internet a better place!
If you would like to link to this blog then please copy and paste the HTML code below into your website.