Millions of people use Yahoo! to find information, and having
your site in Yahoo! Search or the Yahoo! Directory can mean more sales, more
conversations with people you wouldn't have met otherwise, and more hits for
your web site. However, letting Yahoo! know that your site exists can be a bit
confusing. There's a distinction between Yahoo! Search http://search.yahoo.com and
the Yahoo! Directory http://dir.yahoo.com, and the process for submitting your
site to each is a bit different.
If other sites on the Web link to
your site, chances are good that Yahoo! has already added your site to its index. An index is simply another name for the total
list of sites that Yahoo! is watching. Yahoo! Search relies on a crawler to find new sites and keep current sites up-to-date. If a site that's
currently in Yahoo!'s index has linked to your site, the crawler has probably
already visited your site and automatically added it to Yahoo!'s index.
You can see if Yahoo! is already indexing your site by
searching for it with the
Browse to http://search.yahoo.com and
enter a query like this:
url:http://insert your site
While Yahoo! Search tries to include as many sites as possible
in its index, the Yahoo! Directory is more like an exclusive club, where sites
have to be approved by Yahoo! Editors. Because Yahoo! wants to maintain a highly
useful directory, the steps for inclusion are a bit more involved.
To see if your site is already listed in the Yahoo! Directory,
browse to http://dir.yahoo.com and search for the title of your site. If
you don't see your site among the results, you can suggest your site to the
Yahoo! Directory.
The first thing you need to determine about your site is
whether it's commercial or noncommercial, because you'll need to pay $299 to
submit a commercial site. According to Yahoo!, "If your site sells something,
promote[s] goods and services, or represents a company that sells products
and/or services," your site is commercial and should
be listed somewhere in the Business and Economy category within the directory.
If your site is purely personal, informational, or not-for-profit, your site is
noncommercial. A banner ad or text ad on your site doesn't necessarily make your
site commercial; if you have such an ad, it'll be up to the Yahoo! Editors to
decide whether your site is commercial.
Yahoo RSS
The Publisher's Guide contains a wealth of information about
RSS, tools for generating "Add to My Yahoo!" buttons, and a form for submitting
your RSS feed for indexing by Yahoo!.
As you update your RSS feed, you can notify My Yahoo! that
you've done so by pinging the service at this URL:
http://api.my.yahoo.com/rss/ping?u=insert your feed's URL
The Publisher's Guide contains a wealth of information about
RSS, tools for generating "Add to My Yahoo!" buttons, and a form for submitting
your RSS feed for indexing by Yahoo!.
As you update your RSS feed, you can notify My Yahoo! that
you've done so by pinging the service at this URL:
http://api.my.yahoo.com/rss/ping?u=insert your feed's URL
Imagine you have a directory on your server called /private and you'd like to keep any pages or files out
of Yahoo! Search results. Apache includes many ways to set authentication, but a
straightforward method involves setting a .htaccess file. The .htaccess file tells Apache how to configure a
particular directory,
and you can add a .htaccess
file to the /private directory with the following
information:
Note that AuthUserFile points to a file that contains
the username and password of the authenticated user, and you'll need to change
/your/path/to/ to a real directory on your server that's not
accessible via the Web. The next step is to create that password file with the
htpasswd tool. Enter the following command from a
command prompt:
This creates the proper .htpasswd file for that user and puts in place all of
the pieces for basic HTTP authentication.
robots.txt Exclusions
If server authentication seems like overkill and you'd rather
make your directory or files available to everyone except Slurp, you can do so
with a robots.txt file, which indicates how you'd
like robots to behave at your site. Well-behaved bots (such as Slurp) check for
robots.txt before indexing anything, to make sure
they're acting as the site owner wants them to.
With robots.txt, you can tell
Slurp that you'd like it to exclude certain directories or files from its crawl.
For example, if you'd like Slurp to skip a directory called /private, save the following line to a file called
robots.txt:
User-agent: Slurp
Disallow: /private/
You can also tell Slurp to skip specific files:
User-agent: Slurp
Disallow: /Private.doc
Disallow: /Private.html
Once you've listed all of the files and directories you'd like
to hide, add robots.txt to the root directory of
your web site, so it has a URL like this:
http://example.com/robots.txt