The purpose of these file is to tell the search engine robots or crawler
that they are allowed to access my website. At this moment many of you
might think that why should they insert the robots.txt file on the root
directory of their site. When they want spiders to crawl their website
completely and it is their normal duty. Than wait! I have a reply for
you, when spiders look for a particular page on your website where that
is not available than the normal result is error 404 and these is a
known fact. Here comes a robots.txt file in action, it is a well known
name for search engine spiders and they will look it to the file to
check if any barrier is set on the site for them. If no robots.txt file
created it will end to an error 404 page. The error will appear to
spiders and they may report it as a broken link.
1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.
1) Here's a basic "robots.txt":
User-agent: *
Disallow: /
This is interesting- here we declare that crawlers in general should not crawl any parts of our site, EXCEPT for Google, which is allowed to crawl the entire site apart from /cgi-bin/ and /privatedir/. So the rules of specificity apply, not inheritance.