SEMRUSH crawl returns 403 errors on Zendesk links
We use SEMRUSH to crawl and audit our website, but it looks like for some reason the crawl fails on link that point to Zendesk. Links such as https://help.tokyotreat.com/hc/en-us which work fine when opened on a browser, fail on SEMRUSH crawls.
Is there any fix for this?
Hey Roberto Reale, this is actually related to a bug that we're tracking. Do you mind if I move this into a ticket?
Same problem here. How do we fix it?
Hello Eric Nelson! Of course, please go ahead and if possible let me know when there is a fix available. We currently have a ton of errors just for that.
Eric Nelson - Hi there. Has there been any progress on this? I believe we are experiencing the same issue.
Did you get a fix for this?
We have the exact issue.
Our Zendesk HC page also gets 403 error in semrush. But page loads fine when accessed.
Please let us know if there is a fix .
We got following info from Semrush too.
Thank you for contacting Semrush!
We apologize for the delay as we have been experiencing high request volumes.
We are getting a 403 status error based on their domain blocking our bots at this time. If our bots are blocked it will cause them to time out. This does not directly mean there is an error with the page, thusly why you may be seeing a 200 code on your end.
Additionally information on whitelisting can be found within our Site Audit section of our knowledge base:
https://www.semrush.com/kb/681-site-audit-troubleshooting. To recap how the domain reads or blocks our bot is why we specifically are retaining a 403 status code at this time.
Please let us know if you have any additional questions or requests!
Hello, we are having this exact same issue too. Can I get this created into a ticket as well?
Confirming this issue as well. This is not specific to semrush however, our google search console report also reports this which leads to impact on SEO.
We're getting the same error. Any progress on a fix? I see at least from this thread that it was first reported on December 1, 2021 and today is May 20, 2022.
Same issue here, any updates on the fix?
Hi Eric Nelson, I wanted to also request an update here. This bug is preventing us from properly evaluating our site's health. Thanks in advance!
Hi all, I want to jump in here to give an update and some more context for what is happening here.
At Zendesk, we use an edge layer in our network infrastructure to ensure that the requests that are being made are not malicious or otherwise potentially damaging. One of the many ways that we handle this is by determining the likelihood that the requesting is coming from a bot. If the determination is made that it's likely to be a bot, one option for us is to present a captcha. If it's an actual user, they can complete the captcha and successfully return the resource. If it isn't or if the captcha fails, we will return a 403.
With Semrush, what I described here is very likely what is happening. Our devs are almost finished with the documentation that we'll be sharing that will explain what I'll share with you now, but before that's ready to go, I can give you the general overview.
In a situation like this, there are two paths that can be taken, one from our side and one from the bot provider's side, in this case Semrush. Any actions that we take on our side have the potential to make our overall infrastructure less safe, so we first have to have the bot provider take action. That first step will be for them to submit an application to become a verified bot with Cloudflare and they can do that by following the instructions here. If you provide that link to them, that should be able to pass that to the right people on their side who can fill that out. If their application is approved, the issue should be resolved!
If that is not successful, there will be additional steps that Semrush will need to take that we will have outlined in that article once it is available. It will be asking them to consider which endpoints they're using when crawling, as well as some recommendations for specific APIs depending on their use-case. Since that's not quite ready yet and the application will be the next step for Semrush, I'll wait for the article to be published that before I go into that with any more detail.
I know that it's difficult to be told to go from one company to the other, so I apologize for making that your first step. The good news is that if that is successful, not only will this work for all of you, it will also help ensure that Zendesk's network is as safe as it can be.
As soon as that article is published, I'll make sure to drop an update here. If anyone receives word back from Semrush that I can help with, let me know and I'll be glad to assist!
Greg, your explanation is helpful and makes sense as a way to help protect against malicious attacks, but pushing the reach out to SEMrush (or other major bots) to customers seems like a weak response from the Zendesk product team. There are only a few handfuls of major legitimate bots that crawl sites and these can be easily found. As an example, here is a list from 2017 that includes SEMrushbot https://www.imperva.com/blog/most-active-good-bots/
While not all edge cases can be anticipated, it should part of Zendesk's product strategy to make sure that it's product works well with other major systems and tools used by customers. There are many more tools used by smaller sites/companies, but SEMrush would be on the list for many of the medium-to-larger size Zendesk customers. As part of product discovery, I might suggest the Zendesk product manager read What is Enterprise SEO? for a list of a few others in addition to the list shared above.
It is great the Zendesk engineering team has identified the issue Zendesk is causing errors in the SEO/marketing tools used by a large percentage of it's customers. However, the Zendesk product managers should be doing the work to make sure that Zendesk is playing well with other tools and not pushing that work to customers. Given the prominence of SEMrush in the market, it is equally as likely that the Zendesk product/engineering team has missed something in their protection implementation with Cloudflare. Zendesk product managers should easily be able to find contact information for their counterparts at SEMrush, ahrefs, Cloudflare, etc. to make sure they fully understand the situation with other major platforms and to find a resolution.
I just wanted to say that we've been working with Cloudflare on the Semrush issues you have all reported. The Cloudflare team has been working to adjust their bot management functionality to allow the Semrush bot through. There have already been a handful of changes rolled out and a final update will be going out on Monday that should alleviate all the remaining issues we're seeing. If you're still encountering issues after end of business on Monday, please let us know and we'll continue to work with Cloudflare on this.
Sorry for digging up this old thread, but this issue seems to have resurfaced for ahrefs as of March 27th. That is, everything worked fine on March 26th, and our audit lead to a score of 1 due to all pages responding with 403 errors on the 27th. I've reached out to ahrefs first, and they believe it to be an issue with the site rejecting the crawler. The bots they specifically pointed that should work are AhrefsBot and the AhrefsSiteAudit bot. What can we do to resolve this?
Bitte melden Sie sich an, um einen Kommentar zu hinterlassen.