What's my plan?
Add-on AI agents - Advanced

You can use a web crawler to import content into your advanced AI agent. This gives your AI agent the ability to create AI-generated answers to customer questions based on information in external websites.

This article gives you some best practices for using a web crawler to import content for an advanced AI agent.

This article contains the following topics:

  • Use the web crawler on the right type of sites
  • Limit reimports to a reasonable frequency
  • Keep the overall number of knowledge sources low
  • Check the import summary
  • Start small and test

Related articles:

  • Troubleshooting issues with web crawler imports for advanced AI agents
  • Managing imported knowledge sources for advanced AI agents

Use the web crawler on the right type of sites

The web crawler is best suited for websites that function as help centers or product description pages. For e-commerce pages, we recommend building an integration capable of retrieving relevant product information and adding that information in a dialogue or procedure.

It’s recommended to use a Zendesk help center as your primary knowledge source. Websites can have any format, including dynamic elements and JavaScript, which means they’re much harder to predictably ingest. While the web crawler has powerful configuration options, these require enablement and practice. Zendesk help centers are, by nature, simpler and more predictable in format, leading to better results. Imports should also generally be faster when using a Zendesk help center.

Only publicly accessible websites can be crawled. If a website requires authentication, the web crawler can't access it.

Limit reimports to a reasonable frequency

Imports aren't a real-time web search. The AI agent doesn't search live data in a help center, file, or website. Rather, the information is imported into the AI agent on a one-time or recurring basis. The AI agent uses this imported information when generating its replies.

Daily imports aren’t recommended unless the knowledge source is updated very frequently. For most organizations, a weekly or monthly cadence is fine. Remember that you can always manually reimport if new changes need to be reflected outside the scheduled reimport.

Keep the overall number of knowledge sources low

You can add multiple knowledge sources to a single AI agent, including multiple web crawls. Nevertheless, it’s recommended to keep the overall number of knowledge sources within a reasonable limit. In some cases, having lots of sources can lead to reduced accuracy and increased latency.

Check the import summary

If you have a successful crawl but encounter other issues (for example, the AI agent’s answers are incomplete or poor), you can review the import summary to check whether all expected URLs and content were imported. This is the first and best way to understand what’s been imported and what to troubleshoot after import.

Start small and test

If you want to check whether content was crawled correctly and you have pages that follow a specific pattern, the fastest thing to do is restrict your crawl just to one or two examples of those pages. You can use a Start URL of one target page and a Max crawling depth of zero. Alternatively, you can set the Max pages to crawl to some low number that can be quickly processed.

Powered by Zendesk