Adding crawling exclusions
Exclusions help prevent certain pages or sections of a website from being crawled. For example, the website you want to crawl might have thousands of pages in a catalog that are irrelevant to your current project. Or perhaps the sitemap you're creating must not include certain pages. In these and similar cases, exclusions are useful.
To create exclusion rules for our crawler, just enter keywords or directory in the input field. For dTo create exclusion rules for our crawler, simply enter keywords or directories in the input field. For directories, begin the input with a /
. If there's an exact match between the URL being crawled and the exclusion rule (either a directory or a keyword), that URL will be excluded from the result.
For example, using the keyword blog
would exclude the following pages:
- www.example.com/blog
- www.example.com/about/blog
- www.example.com/about/blog-post
- www.example.com/blogger-list
However, using the directory /blog
would exclude only:
- www.example.com/blog
It would not exclude:
- www.example.com/about/blog, because the directory is
/about/blog
, which differs from the set exclusion/blog
.