Recent searches


No recent searches

SEMRUSH crawl returns 403 errors on Zendesk links



Posted Dec 01, 2021

We use SEMRUSH to crawl and audit our website, but it looks like for some reason the crawl fails on link that point to Zendesk. Links such as https://help.tokyotreat.com/hc/en-us which work fine when opened on a browser, fail on SEMRUSH crawls.

Is there any fix for this?

Thanks.


4

22

22 comments

image avatar

Eric Nelson

Zendesk Developer Advocacy

Hey Roberto Reale, this is actually related to a bug that we're tracking. Do you mind if I move this into a ticket?

0


Same problem here. How do we fix it?

1


Hello Eric Nelson! Of course, please go ahead and if possible let me know when there is a fix available. We currently have a ton of errors just for that.

Thanks!

1


Eric Nelson - Hi there. Has there been any progress on this? I believe we are experiencing the same issue. 

1


Hi,

 

Did you get a fix for this?

 

Thanks

Wes

1


We have the exact issue.

Our Zendesk HC page also gets 403 error in semrush. But page loads fine when accessed.

Please let us know if there is a fix .

 

We got following info from Semrush too.

 

Thank you for contacting Semrush! 

We apologize for the delay as we have been experiencing high request volumes. 

We are getting a 403 status error based on their domain blocking our bots at this time. If our bots are blocked it will cause them to time out. This does not directly mean there is an error with the page, thusly why you may be seeing a 200 code on your end. 

Additionally information on whitelisting can be found within our Site Audit section of our knowledge base: 
https://www.semrush.com/kb/681-site-audit-troubleshooting. To recap how the domain reads or blocks our bot is why we specifically are retaining a 403 status code at this time. 

Please let us know if you have any additional questions or requests! 

2


Hello, we are having this exact same issue too. Can I get this created into a ticket as well?

2


Confirming this issue as well. This is not specific to semrush however, our google search console report also reports this which leads to impact on SEO. 

2


We're getting the same error. Any progress on a fix? I see at least from this thread that it was first reported on December 1, 2021 and today is May 20, 2022. 

1


Same issue here, any updates on the fix?

Thanks

2


Hi Eric Nelson, I wanted to also request an update here. This bug is preventing us from properly evaluating our site's health. Thanks in advance!

1


image avatar

Greg Katechis

Zendesk Developer Advocacy

Hi all, I want to jump in here to give an update and some more context for what is happening here. 

At Zendesk, we use an edge layer in our network infrastructure to ensure that the requests that are being made are not malicious or otherwise potentially damaging. One of the many ways that we handle this is by determining the likelihood that the requesting is coming from a bot. If the determination is made that it's likely to be a bot, one option for us is to present a captcha. If it's an actual user, they can complete the captcha and successfully return the resource. If it isn't or if the captcha fails, we will return a 403.

With Semrush, what I described here is very likely what is happening. Our devs are almost finished with the documentation that we'll be sharing that will explain what I'll share with you now, but before that's ready to go, I can give you the general overview.

In a situation like this, there are two paths that can be taken, one from our side and one from the bot provider's side, in this case Semrush. Any actions that we take on our side have the potential to make our overall infrastructure less safe, so we first have to have the bot provider take action. That first step will be for them to submit an application to become a verified bot with Cloudflare and they can do that by following the instructions here. If you provide that link to them, that should be able to pass that to the right people on their side who can fill that out. If their application is approved, the issue should be resolved!

If that is not successful, there will be additional steps that Semrush will need to take that we will have outlined in that article once it is available. It will be asking them to consider which endpoints they're using when crawling, as well as some recommendations for specific APIs depending on their use-case. Since that's not quite ready yet and the application will be the next step for Semrush, I'll wait for the article to be published that before I go into that with any more detail.

I know that it's difficult to be told to go from one company to the other, so I apologize for making that your first step. The good news is that if that is successful, not only will this work for all of you, it will also help ensure that Zendesk's network is as safe as it can be.

As soon as that article is published, I'll make sure to drop an update here. If anyone receives word back from Semrush that I can help with, let me know and I'll be glad to assist!

0


Greg, your explanation is helpful and makes sense as a way to help protect against malicious attacks, but pushing the reach out to SEMrush (or other major bots) to customers seems like a weak response from the Zendesk product team. There are only a few handfuls of major legitimate bots that crawl sites and these can be easily found. As an example, here is a list from 2017 that includes SEMrushbot https://www.imperva.com/blog/most-active-good-bots/

While not all edge cases can be anticipated, it should part of Zendesk's product strategy to make sure that it's product works well with other major systems and tools used by customers. There are many more tools used by smaller sites/companies, but SEMrush would be on the list for many of the medium-to-larger size Zendesk customers. As part of product discovery, I might suggest the Zendesk product manager read What is Enterprise SEO? for a list of a few others in addition to the list shared above.

It is great the Zendesk engineering team has identified the issue Zendesk is causing errors in the SEO/marketing tools used by a large percentage of it's customers. However, the Zendesk product managers should be doing the work to make sure that Zendesk is playing well with other tools and not pushing that work to customers. Given the prominence of SEMrush in the market, it is equally as likely that the Zendesk product/engineering team has missed something in their protection implementation with Cloudflare. Zendesk product managers should easily be able to find contact information for their counterparts at SEMrush, ahrefs, Cloudflare, etc. to make sure they fully understand the situation with other major platforms and to find a resolution.

3


image avatar

Ryan McGrew

Zendesk Product Manager

Hey All,

I just wanted to say that we've been working with Cloudflare on the Semrush issues you have all reported. The Cloudflare team has been working to adjust their bot management functionality to allow the Semrush bot through. There have already been a handful of changes rolled out and a final update will be going out on Monday that should alleviate all the remaining issues we're seeing. If you're still encountering issues after end of business on Monday, please let us know and we'll continue to work with Cloudflare on this. 

Thanks!

0


Hey hey!

Sorry for digging up this old thread, but this issue seems to have resurfaced for ahrefs as of March 27th. That is, everything worked fine on March 26th, and our audit lead to a score of 1 due to all pages responding with 403 errors on the 27th. I've reached out to ahrefs first, and they believe it to be an issue with the site rejecting the crawler. The bots they specifically pointed that should work are AhrefsBot and the AhrefsSiteAudit bot. What can we do to resolve this?

3


Hello guys, any updates on this? We are also getting tons of 403 errors from Semrush. Thanks!

0


image avatar

Gorka Cardona-Lauridsen

Zendesk Product Manager

Koen Doodeman Henrique Vilela Thanks for reporting this. We are working on finding the issue. Are you still experiencing it?

0


No we are not - it seems it fixed itself :)

0


So, Eric Nelson Gorka Cardona-Lauridsen, we're in the somewhat same spot. We offer services for clients that involve crawling their website content. With full consent from the clients.

Yet, Cloudflare blocks our crawler just the same. And since we're not crawling any site, just on demand, we don't meet Cloudflare's Good Bot requirement of at least 1000 requests/day on their network. 

So, how can we solve this? As mentioned, the client is in full consent and all we want is to perform a normal operation. How?

Thanks - Martin

0


I have what is I believe a similar and related issue to the one described here. The short version: We are getting 403 errors when our web crawler attempts to index our Zendesk Guide site articles. More information:

We use Lucidworks Fusion to index our website and provide search results to our customers. We are moving our support documentation to Zendesk, but we also want users searching on our website to discover our Zendesk documentation. 

Fusion allows us to add multiple datasources/websites to a collection to be searched together. Great! Tests with other websites showed this would work. Initially I did get this to work with our Zendesk site on 4/18. Fusion indexed it after some initial difficulties. I changed one setting on 4/19 and then the crawl failed. I changed it back and it failed again. I've gone through Fusion's settings all weekend trying may different settings to no avail. 

Then I thought to come here and search on what would cause Zendesk to return 403 errors. This post seems very closely related. What is really confusing is having it work once, and then having it fail. Is it possible Zendesk/Cloudflare has blocked our web crawler? Is there a way to authorize our web crawler? Discovery of our documentation via searching our website is critical to the support of our customers. I've submitted a ticket to Zendesk support, but thought I'd also post here. 

0


image avatar

Shawna James

Community Product Feedback Specialist

Hello everyone,
 
Thank you for taking the time to provide us with your feedback here. I want to note that this has been logged for our PM team to review. For others who may be interested in this feature request, please add your support by upvoting this post and/or adding your use case to the comments below. Please note that if you have a new feedback request that is not related here, please go ahead and create a new feedback post so we can log your requests separately. Thank you again!

0


Hi Gorka Cardona-Lauridsen Shawna James - it looks like Ahrefs have updated their IP range for their crawl bot (see article: https://help.ahrefs.com/en/articles/78658-what-is-the-list-of-your-ip-ranges) Due to this we are getting 403 errors like reported by the OP. From what I've read in this thread, your team should be able to resolve this?
Can I request that these are updated to allow the Ahrefs crawl bot from accessing Zendesk pages?

1


Please sign in to leave a comment.

Didn't find what you're looking for?

New post