You may want to prohibit crawling or indexing for many reasons. Sometimes this is
done on just a few pages or documents within certain portions of a site, and other times
it is done across the entire site. Here are some typical scenarios.
New sites
Say you’ve just purchased your domain name. Unless you already changed the default
DNS server assignments, chances are that when you type in your domain name, you
get to a domain parking page served by your domain registrar. It can be somewhat
annoying to see the domain registrar’s advertisements plastered all over your domain
while passing (at least temporarily) your domain’s link juice (if any) to its sites.
Most people in this situation will put up an “Under Construction” page or something
similar. If that is the case, you really do not want search engines to index this page. So,
in your index.html (or equivalent) file, add the following robots meta tag:
<meta name="robots" content="noindex">
The suggested practice is to have a “Coming Soon” page outlining what your site will
be all about. This will at least give your visitors some ideas about what to expect from
your site in the near future. If for some reason you want to block crawling of your entire
site, you can simply create a robots.txt file in the root web folder:
User-agent: *
Disallow: /
The star character (*) implies all web spiders. The trailing slash character (/) signifies
everything after the base URL or domain name, including the default document (such
as index.html).
done on just a few pages or documents within certain portions of a site, and other times
it is done across the entire site. Here are some typical scenarios.
New sites
Say you’ve just purchased your domain name. Unless you already changed the default
DNS server assignments, chances are that when you type in your domain name, you
get to a domain parking page served by your domain registrar. It can be somewhat
annoying to see the domain registrar’s advertisements plastered all over your domain
while passing (at least temporarily) your domain’s link juice (if any) to its sites.
Most people in this situation will put up an “Under Construction” page or something
similar. If that is the case, you really do not want search engines to index this page. So,
in your index.html (or equivalent) file, add the following robots meta tag:
<meta name="robots" content="noindex">
The suggested practice is to have a “Coming Soon” page outlining what your site will
be all about. This will at least give your visitors some ideas about what to expect from
your site in the near future. If for some reason you want to block crawling of your entire
site, you can simply create a robots.txt file in the root web folder:
User-agent: *
Disallow: /
The star character (*) implies all web spiders. The trailing slash character (/) signifies
everything after the base URL or domain name, including the default document (such
as index.html).