Understanding Crawled URLs

To see what URL-s we visited during Application Scanning, you find a “Crawled URLs” finding in your scanning results

By selecting “view finding”, you get information about how many URLs we visited during scanning, and how many were saved for security testing under “Details”.

As a web application may have a number of similar web pages that are vulnerable to the same attacks, we look for unique web pages. We use a heuristic page filtering mechanism, and the content we deem unique is saved for security testing, while similar content is filtered as duplicate.

Under “References”, you are able to down a CSV file containing a list of all visited URLs along with the following information:

  • HTTP method used to access the URL

  • HTTP response status code and content length

  • the referrer, which is the URL that pointed us to this URL

  • the source of the URL, which can be:

    • “seed”, if the URL was specified via Application Scanning settings, potential starting point (such as /admin/wp-content), picked up from indexes (such as sitemap.xml), or found on external sources (such as the Internet Archive), or

    • “crawling”, if the URL was visited by navigating the web application

  • the status, indicating whether we saved the content for security testing, or filtered it out as duplicate

FAQ

Q: Why do some of the URLs appear in the CSV file multiple times?

A: You may see the same URL reappearing in a file a few times, which means the scan visited the URL multiple times, for example, links from multiple pages go to the same URL. While most of this repetition is filtered out as duplicate, in some cases we may decide to save multiple occurrences of the same URL for security testing. This can be due to

  • getting a different HTTP response code the second time we visit the URL, or

  • using a different HTTP method on the URL, or