What is a Robots.txt File? | An Overview for SEO

0
70
Robots.txt Feature Image

Are you wondering what the is the robots.txt file? Well, In this post I’m going to tell you about it.

Robots.txt file is something that actually lives on your Web site. It’ll live at https://shoutech.com/robots.txt. It is also known as a robot exclusion protocol or standard, a text file that tells web robots (most often search engines) that which pages on your site to crawl. So if you want to see if your website has a robots.txt file or not you can navigate to your site put it /robosts.txt and it should live there.

Now the robots.txt file is a really important thing because it is recommended by Google that you specifically have one. And if Google can’t find it and other crawlers out there can’t find it. In some cases, they won’t crawl your Web site at all or at least that’s what they say. But you definitely want to have a robots.txt file and there are some things that you need to know about it.

The importance of a robots.txt file

Robots.txt file it basically at its most primitive basic state. It allows you to either block the Web site block portions of the Web site or index the web site.

So that’s basically what it does. It’s just a way to basically allow your site to be inside of Google or not. But in other search engines as well. But it gets a little bit tricky in some ways in that what the robots.txt file does is it tells Google specifically don’t crawl the content on the page. OK. So that gets a little goofy right. Because sometimes what’ll happen is Google will crawl the URLs or there’ll be a lot of links pointing at these pages and Google understands that they’re really highly authoritative pages right.

And so they’ll still index the site meaning the site will still show up inside of Google but they’ll be a little thing inside of where the Meta description result would be inside of Google that says meta description can’t be found because of a block and robots.txt. So if you ever see that that says the meta description can’t be found because of robots.txt then you know that you have a robots.txt block and that’s something that you need to fix. So the robots.txt file it has regular expressions and rejects that basically allows you to block portions of the site.

So in some cases, you might see a hundred things in somebodies robots.txt file that they’re trying to block. The reason that they want to do that is because if they get too many of these third party like widgets and things coming in trying to crawl the site it can slow down the side it can slow down the server and it can cause you to know server errors and all kinds of different issues and you know maybe you just want to block somebody from scraping content from your Web site or analyzing specific changes that you make on your site.

How to create a robots.txt file

If you’ve found a robots.txt file on your website. That’s why I advise you to make one as soon as possible.

  • Create a new text file and save it as robots.txt “ – you can use the Notepad on Windows PCs or TextEdit for Macs and then “Save As” a text-delimited file.
  • Now Upload this file to the root directory of your website – this is usually a root level folder which makes it appear directly after your domain name.
  • If you use subdomains, you’ll need to create a robots.txt file for each subdomain.

What to include in your robots.txt file

Allow indexing of everything:

User-agent: *
Disallow:
or
User-agent: *
Allow: /

Disallow indexing of everything:

User-agent: *
Disallow: /

Disallow indexing of a specific folder:

User-agent: *
Disallow: /folder/

Disallow Bingbot from indexing of a folder, except for allowing the indexing of one file in that folder:

User-agent: Bingbot
Disallow: /folder1/
Allow: /folder1/myfile.html

You can use several lines of instructions to allow or disallow specific URLs and to add multiple Sitemaps. If you do not decline any URL, then search engine bots assume that they have permission to crawl it.

A WordPress Specific Robots.txt File: 

When it comes to create a robots.txt file for WordPress, there are three main standard directories in every WP install. They are:

  • wp-content
  • wp-admin
  • wp-includes

Here is a robots.txt example file for wordpress:

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Allow: /wp-content/uploads/
Sitemap: https://shoutech.com/sitemap.xml

A Cautious Warning / Checking For Errors

It is also important to keep in mind that not all spiders/crawlers follow the standard robots.txt protocol. Malicious users and spammy bots will look at your robots.txt file and look for sensitive information such as private sections, sensitive data folders, admin zones, etc. While many robots.txt files (and 99% of the time are correct), this additional To take the step forward, you could omit to place these in your robots.txt file and simply hide them via the use of a meta robots tag.

So, if you want to be extra safe and use block your sensitive pages using meta tags in the header file.

<html>
<head>
<title>…</title>
<meta name=”robots” content=”noindex, nofollow”>
</head>

Conclusion

The goal of optimizing your robots.txt file is to prevent search engines from crawling pages that are not publicly available. For example, keep the page in your wp-plugins folder or your WordPress Admin folder.

By setting your robots.txt correctly, you are not only boosting your own SEO. You are helping your visitors.

If search engine bots can spend their crawl budget wisely, they will organize and display your content best in SERPs, which means that your website will be more visible and rank higher.

It does not make much effort to set up your robots.txt file. It’s mostly a one-time setup, and you can make fewer changes as needed.

LEAVE A REPLY

Please enter your comment!
Please enter your name here