Learn all about robots.txt file and wild card operators used in this file. You can also create your custom robots.txt file by understanding all aspects in this guide.
What is Robots.txt File?
Robots.txt file is used by webmasters to give instructions to web robots or crawlers about the website. This is also termed as robots exclusion protocol. You can instruct web robots and crawlers how to crawl your website.
When a web robot or crawler visit the website, it looks for permission from robots.txt.
The first question that comes in your mind and mostly asked by interviewers is
“What does User- agent: * specify”
This user-agent specifies the section is applied for every web robots. Asterisk (*) is a wildcard operator that represents a series of characters. “Disallow:” tells they can crawl the different website pages.
Also sometimes many webmasters confuse between different robots.txt file formats. Let’s take a look at different formats.
1. Allow All Web Robots and Crawlers
Whether you write
Both statements imply same instruction to all robots, allow indexing of everything on the website.
2. Block All Web Robots or Crawlers
If we insert backslash after disallow (Disallow: /) that means you are disallowing indexing of everything on the website from all web crawlers.
3. Block Particular Folder From Crawling
To disallow indexing of a specific folder on a website this format is used
4. Block Particular Web Robot Or Crawler From Crawling Your Website
To exclude a single web robot from crawling your website this format is used
5. Block All Website Urls From Crawling
If you want to block all website URLs with “?” use this format
6. Block Specific URLs From Crawling
Special character $ can be used to specify the end of particular URLs.
Example – If I want to block indexing of URLs with .asp then I will use this format
Disallow: /*.asp$ Exclude Unwanted Pages From Crawling
7. Exclude Unwanted Pages From Crawling
If you have an e-commerce website you can exclude checkout, payment info pages from indexing.
Whether you have a WordPress website or website created in other CMS available on the internet, you can easily exclude website pages, particular folder from Indexing by all web robots or by particular web robot using the robots.txt file.
You can also block with Meta Noindex, you can use this Meta Robot Tag to specify web robots can visit the webpage, but do not add the web page in search results. Even Meta Robot tag is considered better than blocking with Robots.txt file.
Meta Robots tag can just block from indexing and web crawlers can still crawl the web page. If the web page contains links that can pass link juice can be still crawled by using Meta Robot tag. To get more clarification you can read here (https://moz.com/learn/seo/robotstxt)
Follow the SEO best practices at #Scoolico in social media channels to get latest online marketing tips.