Omenao Bytes - A Guide to Online Advertisements: Yahoo Slurp

You are right!!

This is Yahoo Web indexing robot similar to Google bot discovering web document and generating a large searchable index. It follows only href link (note: src links are not followed by slurp) on the web page and that is how yahoo come to know about your newly made web page if it finds web page link on web pages that are already present in their index. Yahoo slurp can find you more easily if you are listed in ODP dmoz.org or Yahoo! Directory. If your web page is crawled by Yahoo! Slurp, that doesnt mean your web page will be available in search index. All Crawled documents are crosschecked at the next cycle of index.

Some meta tags to prevent Yahoo slurp action on web page are as follows:

<META NAME="robots" CONTENT="noindex">
Yahoo slurp will crawl the page but will not index this web page.

<META NAME="robots" CONTENT="nofollow">
Yahoo will not crawl any links present on this web page.

<META NAME="robots" CONTENT="noarchive">
Yahoo will not keep back up archive (cache) of web pages.(helpfull for the users if web page is offline)

Yahoo slurp also obey robot.txt guidelines and with its help you can prevent it to crawl websites or directories.

For ex:
User-agent: Slurp or "*" (Slurp is case sensitive)
Disallow: /cgi-bin/

Note: Even if you have disallowed the web page, but the Web page URL is kept by yahoo as thin documents with no content on it as a reference link from other public web pages.

Yahoo slurp makes request to your web server by determining the IP addresses, and you need to make sure that you host server handles these request as yahoo slurp consumes highest bandwidth if your web server having multiple IP addresses.

You can reduce Yahoo slurp activity by robot exclusion tags thereby disallowing unimportant documents. Also take care if your site is dynamic and duplicate url increasing crawling activity so use yahoo Dynamic Url Control in Yahoo! site explorer. Another useful robot tag is crawl rate control by which you can delay or increase the frequency of slurp crawling.

For Ex:
User-agent: Slurp
Crawl-delay: 0.5

So Stay Tuned!

Omenao

Omenao Bytes - A Guide to Online Advertisements

Thursday, January 28, 2010

Yahoo Slurp - Web Crawlers

No comments:

Post a Comment

Total Pageviews

Live Posts

Inside AdWords

Keyword Cloud

Google Tutor

Search This Blog

Blog Archive