Robots are used by search engines (such as Google and Bing) to classify and catalogue data on websites. Site owners give Robots specific information on which pages the search engine can or cannot crawl. This is administrated via robots.txt file and the robots will choose to read this file before accessing the rest of the website. What can be seen and not seen by a search engine will ultimately affect your Search Engine Optimisation.
So What is Robots.txt?
Originating from 1994 robots.txt or Robots Exclusions Protocol (REP) regulates what can be indexed by a web crawler. The text file is placed within the websites hierarchy and can look similar to this:
This is the file that contains the information for the search engine to crawl. The file tells the search engine where it has access to and where it doesn’t thus regulating its actions. A website can disallow a robot any access to their site completely or disallow it from specific areas, the instructions slightly differ in the appearance:
1. User-agent: * Disallow: /
The / indicates that all web crawlers are denied access to all directories.
2. User-agent: * Disallow: /example/
This example shows all robots are denied access to two directories.
3. User-agent: * Disallow: /no-google/
This example shows that one specific website has no access to any of the directories on the site.
What To Remember When Using Robots.txt
- Unfortunately, the protocol only acts as an advisory to web crawlers meaning that they are not necessarily completely excluded from your website. This is apparent in the less honourable crawlers who may use your disallowed areas as directions and aim straight for them!
- Robots.txt is public information that everyone can have access too. This means you cannot hide anything that you have denied servers from seeing.
- Excluded pages are symbolised with the characters * and $ (for Google and Bing).
- The files are case sensitive so be sure to be diligent as a capitalised file will not be recognised, spacing is also not accepted.
- You cannot use several blocks on one URL - use a separate line for each. Think carefully about what you want to block as it will have an impact on your Search Engine Optimisation - do not hide what could be vital to your rankings.
- Any sub-domain your website may have will need its own robots.txt.
For further information on how SEO Junkies can help improve your search engine optimisation then contact us today!
We offer a wealth of knowledge and experience that can help you improve your campaign’s rankings using our proven track record of results in the SERPs (Search Engine Results Pages).
Building 4 Millars Brook, Molly Millars Lane, Wokingham, Berkshire RG41 2AD, Telephone: 0118 380 0203, Email: firstname.lastname@example.org