To set up the search crawler
-
In Guide, click the Settings icon (
) in the sidebar, then click Search settings.
- Under Crawlers, click Manage.
- Click Add Crawler.
- In Name this crawler, configure the following:
- Name - Enter the name that you want to assign to the crawler. This is an internal name that identifies your search crawler on the crawler management list.
-
Owner - Enter the name of the Guide admin user responsible for crawler maintenance and troubleshooting. By default, the crawler owner is the user creating the crawler, however you can change this name to any Guide admin.
Crawler owners receive email notifications both when the crawler runs successfully and when there are error notifications, such as problems with domain verification, processing the sitemap, or crawling pages. See Troubleshooting the Search Crawler.
- In Add the website you want to crawl, verify ownership of your domain by configuring the following:
- Website URL - Enter the URL of the website that you want to crawl.
-
Domain ownership verification - Click Copy to copy the HTML tag onto your clipboard, then paste the tag into the <head> section in the HTML code of your site's non-authenticated home page. You can do this after you complete the crawler setup and you can always find the verification tag on the edit crawler page. See Managing search crawlers.
Note: Do not remove the tag once it is in place, as the crawler needs to complete successful domain verification each time it runs.
- In Add a sitemap, in Sitemap URL, enter the URL for the sitemap you want the crawler to use when crawling your site. The sitemap must follow the sitemaps XML protocol and contain a list of all pages within the site that you want to crawl.
The sitemap can be the standard sitemap containing all the pages of the site, or it can be a dedicated sitemap that lists the pages that you want the crawler to crawl. All sitemaps must be hosted on the domain that the crawler is configured to crawl. You can set up multiple crawlers on the same site that each use different sitemaps defining the pages you want the search crawler to crawl.Note: The search crawler does not support sitemap indexes. A sitemap is a file that lists the URLs of each page you want to index, and a sitemap index is a file that lists the URLs for individual sitemaps.
- In Add filters to help people find this content, configure the source and type filters used to filter search results by your end users. Source refers to the origin of the external content, such as a forum, issue tracker, or learning management system. Type refers to the kind of content, such as blog post, tech note, or bug report.
- Source - Click the arrow, then select a source from the list or select + Create new source to add a name that describes where this content lives.
- Type - Click the arrow, then select a type from the list or select + Create new type to add a name that describes what kind of content this is.
Note: To edit or delete sources and types created during search crawler setup, see Managing search filters. - Click Finish.
The search crawler is created and pending. Within 24 hours the crawler will verify ownership of the domain and then fetch and parse the specified sitemap. Once the sitemap processing succeeds, the crawler begins to crawl the pages and index its content. If the crawler fails either during domain verification or while processing the sitemap, the crawler owner will receive an email notification with troubleshooting tips to help resolve the issue. The crawler will try again in 24 hours. See Troubleshooting the search crawler.Note: Zendesk/External-Content is the user agent for the search crawler. To prevent the crawler from failing due to a firewall blocking requests, whitelist (or allow-list) Zendesk/External-Content.
- Set up your help center theme for federated search.
For external content to show up in search results in your help center search, you must have a theme that supports federated search results. To do this, use the latest version of the Copenhagen theme or replace the old {{help_center_filters}} and {{filters}} helpers with the new {{source_filters}} and {{type_filters}} helpers (see the Help Center templating cookbook).
-
Select the content that you want to include and exclude in your help center search results. See Including external content in your help center search results.
-
If desired, configure Knowledge results to include external content in Knowledge searches. See Configuring the context panel in the Zendesk Agent Workspace.
15 Comments
well-documented content 👍
Hello, do you know if the search crawler can be used to index content in a Jira Confluence site ? Many thanks !
Hello Julien,
Yes! The crawler can be used to index content from your Confluence site. If you are running into issues setting it up, please reach out to us directly for support.
Hey JULIEN SERVILLAT did you manage to get this set up and working?
@... we were told this wasn't possible, so am getting conflicting information...
Hello Matt Farrington-Smith, yes we managed to setup the crawler with Confluence. Indexing of the article works, we are just finalizing the upgrade of the theme to include the correct placeholders to return the results of federated search.
hello, the content of the crawler, after a search, is presented embedded or it is just a link that pops-out a window or tab to the external content?
I ask this because I am wondering if it make sense to host HTML content in a s3 bucket, without custom domain. if the crawler embeds the content, it doesn't matter not having a custom domain. different to the case if it pops-it out.
I suspect not using the custom domain may create problems like the domain-verification rule..
I hope my question makes sense!
When users perform a search, relevant external content discovered by the crawler is ranked and presented on the search results page, where users can filter the results and click the links to view the external content link in another browser tab.
For more information, see About Zendesk Federated Search
Hi,
Do we have guide on deploying search crawlers on MS Sharepoint?
Hello, Is the crawled external visibility set to Everyone? Is there a way to control the visibility settings of the crawled external content?
We'd like to limit the crawled external content to agents and admins for one of our use cases but based on what's available in the settings, I assume that it's not possible.
I'm having difficulties getting external content to show up for everyone in search results. I have enabled it and it is verified that the crawler is working. Am I missing something?
When configuring the crawler, if you designate a site while including a subdomain, will the crawler cover the entire domain or just the subdomain?
Is the crawler's verification tag persistent across crawlers or is a new one generated for each crawler? For instance, if I start configuring a crawler, copy the tag, but don't save the crawler because I don't have the sitemap URL, will that tag be the same when I go back to finish creating the crawler?
This is visible to Everyone and it is not possible to restrict it the same way as user segment for Help Center articles.
Hi mfg,
For every domain that you will designate for the crawler, it will have a different verification tag. Same goes if you will create another crawler for the same domain that has already been verified.
Has anyone attempted this with Madcap/Flare? We're testing currently and and we're unable to verify the domain in our POC for a <companyname>.mcoutput.com domain. I'm guessing it is because we don't own the Madcap domain of mcoutput.com, am I right?
We are using an eccommerce platform that has their site index as storename/xmlsitemap.php
looks like this requires it to be .xml? google crawler has no issue with our sitemap. What can i do?
Hi.
I am trying to set up a search crawler for our help desk. I have embedded the meta tag in the header of the target webpage, the url for the site and for the sitemap are both correct, but It still gives me a warning saying that the domain cannot be verified and that it cannot find the sitemap. What can I do?
Please sign in to leave a comment.