Sabtu, 21 Januari 2012

Blogger robots.txt


>
Crawling or Indexing of your site by search engines gives the site owners joy to be true. A frequent visit by the search engine bot is a sign of a healthy blog or website. But this is not always the case, if you want certain pages not to be crawled or indexed. The pages may be a duplicate copy/copies of the original page, label extensions in Blogger, images, javascript, or even multimedia. How do you block the crawler/bot to stop indexing these pages? The solution is robots.txt


What is robots.txt?
The robots.txt is not a HTML file, but a simple text file which has the set of instructions which is understood by the crawlers and search engine bots. Search engine crawlers are also called as robots. Hence the name robots.txt. The purpose of robots.txt is not to stop the search engines to crawl the webpage of your website, but it has the instructions to make the robots obey accordingly. When we need certain pages of the website to be skipped by the search engine robots, we give those instructions in the robots.txt file.

Location of robots.txt
Where to place the robots.txt file in your website? This is a common question and it is important as well. The robots.txt file is placed in the main directory like this: http://websitename.com/robots.txt
For blogger: http://blogname.blogspot.com/robots.txt

Blogger by Google generates the robots.txt file automatically, so you need not worry about its creation or addition. Just type http://yourblogname.blogspot.com/robots.txt to view your robots.txt file contents.

The other way to view your robots.txt is to go to the Google WebMaster Tool. Login there and follow the number below:


How to generate a fresh copy of blogger robots.txt for your blogspot?
This can be done the same way by logging in to Google WebMaster tool. And follow the numbers in the pic:



In the above picture, you can add the rules you would like to apply and then create the robots.txt file as desired.
The robots.txt file structure for Blogger will look like:





User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search

Sitemap: http://blogname.blogspot.com/feeds/posts/default?



User-agent here is the search engine bots which are going to index through your site pages.
Disallow option is the files, urls, or any pattern which is going to be skipped from getting indexed. By default /search of Blogger pages are going to be skipped by the crawlers.


Similarly Allow option can be used to allow the indexing of the pages.

To check the pages already indexed by Google, use the command site:yourblogname.blogspot.com in Google.com

To check the validity of your robots.txt, check it on Motoricerca robots.txt validator.


Hope you found this article useful. Thanks for reading!!!

Tidak ada komentar :

Posting Komentar