The Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes, and how important it is in relation to other URLs in the site. This allows search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complement robots.txt, a URL exclusion protocol.
Sitemaps are particularly beneficial on websites where:
- some areas of the website are not available through the browsable interface, or
- webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines.
Search Engine Indexing
Sitemaps supplement and do not replace the existing crawl-based mechanisms that search engines already use to discover URLs. Using this protocol does not guarantee that web pages will be included in search indexes, nor does it influence the way that pages are ranked in search results. Specific examples are provided below.
- Google – Webmaster Support on Sitemaps: “Google doesn’t guarantee that we’ll crawl or index all of your URLs. However, we use the data in your Sitemap to learn about your site’s structure, which will allow us to improve our crawler schedule and do a better job crawling your site in the future. In most cases, webmasters will benefit from Sitemap submission, and in no case will you be penalized for it.”
- Bing – Bing uses the standard sitemaps.org protocol and is very similar to the one mentioned below.
- Yahoo – After the search deal commenced between yahoo and bing, yahoo site explorer has merged with bing webmaster tools.
Google introduced Google Sitemaps so web developers can publish lists of links from across their sites. The basic premise is that some sites have a large number of dynamic pages that are only available through the use of forms and user entries. The Sitemap files contains URLs to these pages so that web crawlers can find them. Bing, Google, Yahoo and Ask now jointly support the Sitemaps protocol.
Since Bing, Yahoo, Ask, and Google use the same protocol, having a Sitemap lets the four biggest search engines have the updated page information. Sitemaps do not guarantee all links will be crawled, and being crawled does not guarantee indexing. However, a Sitemap is still the best insurance for getting a search engine to learn about your entire site. Google Webmaster Tools allow a website owner to upload a sitemap that Google will crawl, or he can accomplish the same thing with the robots.txt file.
XML Sitemaps have replaced the older method of “submitting to search engines” by filling out a form on the search engine’s submission page. Now web developers submit a Sitemap directly, or wait for search engines to find it.
XML (Extensible Markup Language) is much more precise than HTML coding. Errors are not tolerated, and so syntax must be exact. It is advised to use an XML syntax validator such as the free one found at: www.validator.w3.org
There are automated XML site map generators available (both as software and web applications) for more complex sites.
More information defining the field operations and other Sitemap options are defined at www.sitemaps.org (Sitemaps.org: Google, Inc., Yahoo, Inc., and Microsoft Corporation).