Scenario in which we would like to use the Help Center Search Crawler incl. feedback regarding limitations


  • Gorka Cardona-Lauridsen
    Zendesk Product Manager

    Josef Prandstetter Thank you for your thorough feedback and questions!

    Would it be possible to crawl website, which are restricted by an authentication mechanism like user/password? as described here not in phase 1

    It is not on the immediate roadmap for the crawler. I suggest you instead build a middleware that can authenticate to an API for the service that hosts your content and then ingest is using Zendesk's Federated search - External Content REST API. The crawler is intended to be a no-code alternative, but it comes with some tradeoffs where this is one -. at least for now.

    will it be possible to exclude certain pages? We have several articles in our product web helps with redundant information to Zendesk Help Center articles and they should also mainly live inside the Help Center.
    It would be great to have an exclude list.

    We have plans to make it possible to exclude pages and paths through the crawler setup UI, but right now the workaround I can suggest is to generate a custom sitemap for each crawler you want to set up with only the pages you want it to crawl. This is described in more detail in here under step 1.

    You left a comment to your last questions:

    How will you handle updates like:

    • indexed content gets updated
    • indexed/crawled site gets deleted
    • new site needs be indexed

    Will this always be a full update or just a diff?
    Is it possible to automate the crawler within Zendesk Guide Admin interface or via API?

    Where they all sufficiently answered by the article or is there anything you would like me to answer in more detail?

  • Josef Prandstetter

    Gorka Cardona-Lauridsen:
    Thank you - everything has been answered fine.
    Our priority concern, which is open, is the API limits - is there a chance that these will be adjusted? Can we discuss this in the course of the EAP, because we assume that these limits are not only relevant for our self-implemented crawler, but also for the upcoming native Zendesk implementation.

  • Gorka Cardona-Lauridsen
    Zendesk Product Manager

    Josef Prandstetter

    Our priority concern, which is open, is the API limits - is there a chance that these will be adjusted?

    We are open to adjusting them and are currently gathering feedback as to what is needed. What is your need in terms of?

    • Character length of record(page) body
    • # of sources
    • # of types
    • # of records(pages)

    I also have a few other questions:

    1. Regarding the length of the body, do you think it would work for you if we index the first 10.000 Charaters of the page instead of not indexing the page at all?
      The issue with very long pages is that it heavily affects search latency and can cause to failed queries.

    2. In the data model you have in mind for your use case, would all the sources denominate content that is hosted on different domains or would you have content from different parts of the same domain with different source denominations?

    3. Would you need to have different content from the same domains, but in the same language surfaced in different help centers (if you use multiple HC's within the same Guide account)?


  • Josef Prandstetter

    Sorry for the delay - it took a little longer to get the correct data from our various Technical Writer teams:

    • External record size / Character length of recorded (page) body:
      We did an analysis and here is the result:
      Some percent of our articles extent 10.000 characters, but only a small amount exceeds 20.000 characters.
      We can identify the articles exceeding 20.000 characters and would publish a new company policy that these articles would need to be updated, because such long articles make no sense.

    • External sources
      We have currently 254 user manuals, if we count each language per user manual separate.
      We expect to have a little bit more than 300 user manuals by end of the year.
      We count each user manual language separately, because each of them has a separate URL, e.g. WebOffice:
      Web As you can see the only difference between these URL is the language code, but each of them has a separate Table of content (= sitexml).
      Some of our user manuals are available in up to 4 languages (English, German, Italian and French), most of them only in 2 language (English & German) and a view of them just in English.

      Some more examples:
    • As you can see above by our examples currently our user manuals are hosted on different domains, but we plan to adopt and host them on only one documentation hub.
    • We expect to have different content (= products) from the same domain in several languages.

    I hope these figures give you a good idea how we want to use the Federated Search capability.

    Best Regards!


Please sign in to leave a comment.

Powered by Zendesk