Pagination - Duplicate records returned, some records omitted
The pagination challenge continues, but I've made some good progress!
At a high level, I'm using the following pattern to page through all of my organization's external records:
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records
- Extend array with "records" JSON objects
- Check "meta" > "has_more" - if true, prepare another request:
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=AFTER_CURSOR (where AFTER_CURSOR is MTA > MjA > MzA for my current record set)
- Extend array with "records" JSON objects for each result
What I've found is that paginated requests on the final page (MzA) consistently includes duplicate results from previous requests.
These 3 IDs are consistently duplicated between the initial list request against external_content/records and paginated request with cursor MzA:
- 01EEVATKF2H7C3Z8C7AT2C1JV4
- 01EGGZJ1EFNWHCJFA97ZX11NYK
- 01EGH1ZDGPTGFE8GZ49P3HN85Y
These 2 IDs are consistently duplicated between MjA and MzA after cursor requests:
- 01EEVATKC2DWGSH9RSVQCNFA0R
- 01EGGZJ1HSV7HH505RFKJBJZ8B
After cursor position MzA as of this writing only returns 5 results, so all are duplicated for this cursor position, but without an apparent order/pattern.
Additionally, some records aren't fetched at all, which leads to failures to create articles due to the external_id already existing (an example is external_id f84ee0e6d232876d1f1880c63f57d697 - record ID unknown because I can't seem to fetch this record via API). This external record's link: https://skytapguidetest.zendesk.com/hc/en-us/search/click?data=BAh7CjoHaWRJIh8wMUVFVkFURlpCS000MEY1MFlaQVRNM0g1OAY6BkVUOgl0eXBlSSIcZXh0ZXJuYWxfY29udGVudF9yZWNvcmQGOwZUOgh1cmxJIjVodHRwczovL2hlbHAuc2t5dGFwLmNvbS9NYWludGVuYW5jZV9XaW5kb3dzLmh0bWwGOwZUOg5zZWFyY2hfaWRJIikwMDgyYTY5My1iMmFkLTRlNDItYmQ4Ni1jN2E2Yjg1YmNmNjAGOwZGOglyYW5raQY%3D--ddf91ba1dc8f48a448a2e608ff1b67f5d7b1d2ec
To troubleshoot, I've tried walking backward through paginated requests by specifying the before_cursor value in page[after] calls - while the results are interesting (I can walk backward in a loop forever; eventually the cursor position recycles), it doesn't seem to provide a full return of external record results, and couldn't really be used to populate a full record fetch without a lot of work. My goal with this approach was to see if starting at a different cursor offset would produce different results - so far that hasn't been the case.
I appreciate your review of this issue! Any help is appreciated!
Thank you,
Marcus
-
Official comment
Hi Marcus,
Thanks for the feedback and the detailed bug report. We'll look into it, but I can't promise a quick response as we're a bit short staffed at present.I hope this issue isn't blocking you.
Best regards,
Ronan
-
I did end up finding the record with external_id f84ee0e6d232876d1f1880c63f57d697: 01EEVATFZBKM40F50YZATM3H58
It appears when using after cursor: MjQ
I found this after cursor with the following steps (convoluted, I know):
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records
- Capture before cursor: MQ
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=MQ
- Capture before cursor: Mg
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=Mg
- Capture before cursor: Mw
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=Mw
- Capture before cursor: NA
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=NA
- Capture after cursor: MTQ
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=MTQ
- Capture after cursor: MjQ
- GET https://skytapguidetest.zendesk.com/api/v2/guide/external_content/records?page%5Bafter%5D=MjQ
- Record 01EEVATFZBKM40F50YZATM3H58 appears in response
-
Ah, and explanation of the use case/design, in case it's helpful to understand why I'm trying to do this and could inform advice on a "better way".
We have ~500 articles in our help documentation repository. We would like to create one external record per article. Our help articles are included in a sitemap.xml file, which the automation I'm working on reads from as a "source of truth" regarding articles that exist in documentation. The possible interactions we need to account for:
- An article's title or description is updated, but not the URL (the external_id is based on URL) - in this case we need to issue a PUT request to update the existing external record for this article
- A new article at a new URL is created and added to the site map - in this case we will issue a POST request to create a new external record for this article
- An existing article's URL changes in our documentation site map - in this case we will create a new external record (see point 2), and issue a DELETE call against the old, now orphaned, external record
In order to perform analysis against existing external records, we need to be able to generate a full list of external records, and compare those against our documentation site map.
Once finalized, I expect we will set this automation to run either once weekly or once monthly (frequency of change is low in our documentation). Our current in-development external record count is less than 40 (4 paginated GET requests at 10 results per page). Initial article import will be heavy in POST calls to create new articles (~500). Subsequent runs will be mainly GET heavy to list all articles (500 external articles, 10 per page, 50 GET requests), with low frequency of PUT, POST, DELETE calls to keep documentation and external records in sync.
Hopefully this info is helpful.
Please sign in to leave a comment.
3 Comments