Adding crawling exclusions

Exclusions help prevent certain pages or sections of a website from being crawled. For example, the website you want to crawl might have thousands of pages in a catalog that are irrelevant to your current project. Or perhaps the sitemap you're creating must not include certain pages. In these and similar cases, exclusions are useful.

To create exclusion rules for our crawler, just enter keywords or directory in the input field. For dTo create exclusion rules for our crawler, simply enter keywords or directories in the input field. For directories, begin the input with a / . If there's an exact match between the URL being crawled and the exclusion rule (either a directory or a keyword), that URL will be excluded from the result.

For example, using the keyword blog  would exclude the following pages:

  • www.example.com/blog
  • www.example.com/about/blog
  • www.example.com/about/blog-post
  • www.example.com/blogger-list

However, using the directory /blog  would exclude only:

  • www.example.com/blog

It would not exclude:

  • www.example.com/about/blog, because the directory is /about/blog , which differs from the set exclusion /blog .