Feature Request Summary:

The configuration options for Federated Search crawlers should include a way to target the HTML element that contains the content to be indexed.

Description/Use Cases:

I want the crawler to crawl just the main content of my external content.

Business impact of limitation or missing feature:

Currently, the crawler appears to crawl the first ten thousand characters of text found on the external source. In our case, our page header and navigation contain more than ten thousand characters so the result of the crawl is 170 pages of exactly the same content.

This makes the crawl feature unusable and we'll need to build an integration using the API to use this feature.