The purpose of these file is to tell the search engine robots or crawler
 that they are allowed to access my website. At this moment many of you 
might think that why should they insert the robots.txt file on the root 
directory of their site. When they want spiders to crawl their website 
completely and it is their normal duty. Than wait! I have a reply for 
you, when spiders look for a particular page on your website where that 
is not available than the normal result is error 404 and these is a 
known fact. Here comes a robots.txt file in action, it is a well known 
name for search engine spiders and they will look it to the file to 
check if any barrier is set on the site for them. If no robots.txt file 
created it will end to an error 404 page. The error will appear to 
spiders and they may report it as a broken link.
1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.
1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.