2007
How Important Is Robots.txt For Google?
Posted by John Cow in The Net, Traffic TipsIf you are new here then you will want to CLICK HERE to enter to WIN an LCD TV, Ipod Touch, Flip Mino and More!
Just how important is your blog’s robots.txt file if you want the world to find you and your blog in the search engines?
The robots.txt file is a file on your site that is meant to give instructions as to where search engine spiders may and may not go. This is not a wall but a permission system, which means that you can not force “bad” bots to listen to it. Bad bots are the bots that go all over your site but do not offer you any value at all.
The powerful reason for the robots.txt file is that it is listened to by the majority of all search engines and it helps to ensure that your site gets spidered and indexed properly. That means the pages you want to be found, can be found and the pages you want hidden will remain hidden.
We do not want to go into a long lesson on this, as there are loads of resources available on the topic that can be explained much better then we can explain them. What we will share with you however is that you want to use one and you want to upload it to the root directory on your server, located in the same place as your index page.
You can see the robot.txt we use at http://www.johncow.com/robots.txt
Out of curiosity weve been snooping around a little to see how others do it. Surprisingly enough, we found that there seem to be two different approaches to the system that are total opposites. We’re comparing 4 well known blogs here.

Shoemoney.com / PR6 / Alexa 2,988 - We know Jeremy is pretty tech savvy and he probably is the one with the most knowhow about how this would work. Then again, he might not give a poop about it and just let it be.
Here’s his robots.txt:
- User-agent: Googlebot
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /wp-admin/
Disallow: /feed/
Disallow: /archives/
Disallow: /sitemap.xml
Disallow: /index.php
Disallow: /*?
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: */feed/
Disallow: */trackback/
Disallow: /page/
Disallow: /tag/
Disallow: /category/
User-agent: Googlebot-Image
Disallow: /wp-includes/
User-agent: Mediapartners-Google*
Disallow:
User-agent: ia_archiver
Disallow: /
User-agent: duggmirror
Disallow: /
User-Agent: Googlebot
Disallow: /link.php
Disallow: /gallery2
Disallow: /gallery2/
Disallow: /category/
Disallow: /page/
Disallow: /pages/
Disallow: /feed/
Disallow: /feed
JohnChow.com / PR4 / Alexa 3,071 - Mr Chow has been around the block and we’re assuming he’s quite tech savvy too. Why else would he run a site called TheTechZone for over 8 years? His robots.txt is quite similar to Jeremy’s:
- sitemap: http://www.johnchow.com/sitemap.xml
User-agent: *
Disallow: /cgi-bin/
Disallow: /go/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /author/
Disallow: /page/
Disallow: /category/
Disallow: /wp-images/
Disallow: /images/
Disallow: /backup/
Disallow: /banners/
Disallow: /archives/
Disallow: /trackback/
Disallow: /feed/
User-agent: Googlebot-Image
Allow: /wp-content/uploads/
User-agent: Mediapartners-Google
Allow: /
User-agent: duggmirror
Disallow: /
Problogger.net / PR6 / Alexa 2,600 - The Problogger seems to take a totally different approach to things. Being part of B5 Media, an organization that makes money by running blogs, we’re pretty sure that the technical knowhow of SEO is widely available in a team of professionals. A copy of Darren’s robots.txt:
- User-agent: *
Disallow:
That’s right. The Problogger doesn’t hold back any secrets for the search engines of this world. Anyone is allowed to crawl through all of Darren’s content.
MattCutts.com / PR7 / Alexa 5,059 - Matt has been working for Google nearly eight years now and is currently head of Google’s webspam team. Surely its safe to assume that Matt knows what he’s doing. Like Problogger, Matt withholds almost nothing for the crawlers, just a files/ folder:
- User-agent: *
Disallow: /files/
Eventhough files/ won’t be indexed, Matt has put an index.html saying ‘Sorry’ in place to keep nosy cows like us out of there. Afterall, a robots.txt file is available for anyone to see. put one and one together and you can try to have a peak at the contents of a directory that’s specified in there.
As you can see, there seem to be two trains of thought on the subject.
26 Moos » ~ ~ Random Post






So all you people have to do is leave a comment on this post and we’ll pick a random person (via random.com) to credit the points to his or her account within the next 24 hours. We will obviously be checking if you’ve got the widget running on your blog, so if you’re not signed up yet with entrecard.com, do it now and get a head-start!





