Recent searches


No recent searches

Troubleshooting the search crawler



image avatar

Elizabeth Williams

Zendesk Documentation Team

Edited Jun 21, 2024


0

14

14 comments

What do we do if we have the Locale not Detected error, but our pages have clear headers with lang="en" in them?

0


Susan R. we experienced a similar issue, and it is because "en" isn't an officially supported language code. In Zendesk, the language can be either en-US or en-GB. See Zendesk language support by product. After we changed this in our documentation from "en" to "en-US", it could index the content and did not encounter the "Locale not detected" error anymore. Hope this helps resolve your issue too!

0


We are experiencing an issue where the domain name is correct and setup correctly (matches the URLs being crawled) but still getting a Invalid url domain error. Has anyone else run into this problem?

 

1


image avatar

Dane

Zendesk Engineering

Hi Jon, 

I noticed that one of your admin have submitted a ticket similar to this concern. Please keep track of that ticket for the resolution. 
 
Cheers!

0


We have setup a crawler but the sitemap also contains URLs which lead to images is there a way to exclude certain pages/types from the results?

1


image avatar

Viktor Osetrov

Zendesk Customer Care

Hello Lars,
 
What do you mean - do you want to filter search results or Google search results? 

Regarding excluding certain pages/types from the Google search results, you can use the following string:
<meta name="robots" content="noindex" />
I believe an alternative solution is generating and uploading sitemaps directly inside Google Search Central - using this instruction. All steps you can find here
 
Hope it helps

0


@... I have a crawler which is connected to the page of our company. The sitemap.xml file contains also URLs which directs you to an image. Those images then occur in the Guide Search but they are irrelevant and therefore I do not want to include those in the search results. We could not index those, but we do that to occur in Google Search result. This is why Iam asking if it is possible to filter the results of the crawler somehow.

0


image avatar

Viktor Osetrov

Zendesk Customer Care

Hello Lars,

Thanks for your clarifications. Zendesk currently does not provide any functionality to selectively exclude parts of the content from its search results.
The possible solutions only are:

1. Add a robot.txt file to your server:
You could add a robots.txt file to your website which tells web crawlers to ignore certain pages. The robots.txt file gives instructions to web robots about which pages on your site to crawl. You could set this up to prevent the Zendesk crawler from crawling the URLs of the images. Here is an example of what that could look like:
```
User-agent: zendesk
Disallow: /images/
```
In this example, it disallows the Zendesk crawler from crawling any URLs that include "/images/". Replace "/images/" with the appropriate path according to your website's structure.
 
Please note that this solution would also prevent other search engines from indexing those image URLs.
 
2. Create Separate Sitemaps:
Another way would be to separate your sitemap.xml into two: one for Zendesk that does not include the image URLs and another for Google that includes the image URLs. This way, you can control what URLs each search engine gets to see and index.
 
For Zendesk, use a sitemap without the URLs directing to the images and for Google use the sitemap with the URLs to the images.

Apologies for the limitations. Hope it helps. 

0


I keep getting "Indexing failed" errors that say "Invalid url domain" in the error report. But the link works correctly, the site map works, and the site verification works. I don't understand what is failing and this article doesn't have anything about the errors above.

 

0


image avatar

Viktor Osetrov

Zendesk Customer Care

Hello Alex,

Regarding Indexing failed" errors that say "Invalid url domain" 
Could you please check the following moments:
1. Ensure that your domain is correctly set up
2. Check 'robots.txt'. For example, "https://www.google.com/robots.txt"
3. Please make sure that you are using links like that "http://www.example.com"
4. Check DNS settings 
5. Please notice sandboxes have their own limitations 

Hope it helps

0


Hello,

Within the past week or so, we've started receiving "Body can't be blank" errors. These are for two typedoc pages where the content of the body is supplied dynamically. The crawler was previously able to index these two pages and is successfully indexing other similar pages. We haven't made any changes to these pages recently. How can I correct this issue?

0


image avatar

Hiedi Kysther

Zendesk Customer Care

Hey Stacie Loving

Thanks for bringing this to our attention. I see you already created a ticket on Support regarding this issue. This is the right move as this may need further review and investigation. Please keep an eye out on our team's update on your ticket. 

Thanks! And, have a great day! 

0


Does the <meta name="zd-site-verification" content="crawler-verification-token"> always have to be first after <head> or can it be anywhere between <head> </head>?

 

0


image avatar

Dainne Kiara Lucena-Laxamana

Zendesk Customer Care

Hi Peter Rittau 

It can be anywhere between <head> </head>

0


Please sign in to leave a comment.