Robots.txt file: The good, the bad, The Whitehouse

February 25th, 2009

After being in web design for so long it still amazes us that there are web-savvy developers (and designers) who don’t realise the importance of a robots.txt file.  Whilst it isn’t the be all and end all of a web design strategy, a clean, correctly-formatted robots file can help search engine robots, affectionately known as bots to weed through the information on your site quicker; and more importantly, not weed through the places you don’t want them looking – for example, your ‘/cgi-bin’ folder or that folder where you store all your secret CIA-level documentation; ‘/c14-files’, naturally : )

In essence, the gist behind a robots.txt file is simple: Also known as the Robots Exclusion Protocol, or robots.txt protocol, the file is a standard, nothing-special text file which has a certain format to it and is used to prevent (willing) web spiders and other web robots from accessing all or part of a website which would otherwise be open to the general public. Continue reading »