Troubleshooting the search crawler

Return to top

10 Comments

  • Susan Russell

    What do we do if we have the Locale not Detected error, but our pages have clear headers with lang="en" in them?

    0
  • Global Support: Rein

    Susan Russell we experienced a similar issue, and it is because "en" isn't an officially supported language code. In Zendesk, the language can be either en-US or en-GB. See Zendesk language support by product. After we changed this in our documentation from "en" to "en-US", it could index the content and did not encounter the "Locale not detected" error anymore. Hope this helps resolve your issue too!

    0
  • Jon Bolden

    We are experiencing an issue where the domain name is correct and setup correctly (matches the URLs being crawled) but still getting a Invalid url domain error. Has anyone else run into this problem?

     

    0
  • Dane
    Zendesk Engineering
    Hi Jon, 

    I noticed that one of your admin have submitted a ticket similar to this concern. Please keep track of that ticket for the resolution. 
     
    Cheers!
    0
  • Lars Schweikardt

    We have setup a crawler but the sitemap also contains URLs which lead to images is there a way to exclude certain pages/types from the results?

    1
  • Viktor Osetrov
    Zendesk Customer Care
    Hello Lars,
     
    What do you mean - do you want to filter search results or Google search results? 

    Regarding excluding certain pages/types from the Google search results, you can use the following string:
    <meta name="robots" content="noindex" />
    I believe an alternative solution is generating and uploading sitemaps directly inside Google Search Central - using this instruction. All steps you can find here
     
    Hope it helps
    0
  • Lars Schweikardt

    @... I have a crawler which is connected to the page of our company. The sitemap.xml file contains also URLs which directs you to an image. Those images then occur in the Guide Search but they are irrelevant and therefore I do not want to include those in the search results. We could not index those, but we do that to occur in Google Search result. This is why Iam asking if it is possible to filter the results of the crawler somehow.

    0
  • Viktor Osetrov
    Zendesk Customer Care
    Hello Lars,

    Thanks for your clarifications. Zendesk currently does not provide any functionality to selectively exclude parts of the content from its search results.
    The possible solutions only are:

    1. Add a robot.txt file to your server:
    You could add a robots.txt file to your website which tells web crawlers to ignore certain pages. The robots.txt file gives instructions to web robots about which pages on your site to crawl. You could set this up to prevent the Zendesk crawler from crawling the URLs of the images. Here is an example of what that could look like:
    ```
    User-agent: zendesk
    Disallow: /images/
    ```
    In this example, it disallows the Zendesk crawler from crawling any URLs that include "/images/". Replace "/images/" with the appropriate path according to your website's structure.
     
    Please note that this solution would also prevent other search engines from indexing those image URLs.
     
    2. Create Separate Sitemaps:
    Another way would be to separate your sitemap.xml into two: one for Zendesk that does not include the image URLs and another for Google that includes the image URLs. This way, you can control what URLs each search engine gets to see and index.
     
    For Zendesk, use a sitemap without the URLs directing to the images and for Google use the sitemap with the URLs to the images.

    Apologies for the limitations. Hope it helps. 
    0
  • Alex Duffey

    I keep getting "Indexing failed" errors that say "Invalid url domain" in the error report. But the link works correctly, the site map works, and the site verification works. I don't understand what is failing and this article doesn't have anything about the errors above.

     

    0
  • Viktor Osetrov
    Zendesk Customer Care
    Hello Alex,

    Regarding Indexing failed" errors that say "Invalid url domain" 
    Could you please check the following moments:
    1. Ensure that your domain is correctly set up
    2. Check 'robots.txt'. For example, "https://www.google.com/robots.txt"
    3. Please make sure that you are using links like that "http://www.example.com"
    4. Check DNS settings 
    5. Please notice sandboxes have their own limitations 

    Hope it helps
    0

Please sign in to leave a comment.

Powered by Zendesk