How to Export Zendesk Guide Data as CSV
Here is a script that exports Zendesk Guide Article information as a CSV:
This tool uses the Zendesk Help Center API(https://developer.zendesk.com/rest_api/docs/help_center/introduction) to pull Guide Information and populate it in a csv file. This will output values such as Article Id, Article Title, and Vote Count.
The output looks like:
Here is a nifty video overview and instructional:
Let me know if you have any questions!
Scott! This is awesome. And your video is super helpful.
I'm going to try out your script. Thanks for much for sharing this!
This is great and super useful!
I ran pip3 install requests. That seemed to work ok.
Also, I wanted to check my subdomain name. How can I do that?
When I run the script I get this error message:
File "Zendesk_Guide_Article_Export.py", line 1
SyntaxError: Non-ASCII character '\xff' in file Zendesk_Guide_Article_Export.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
We need a way to do the following:
- extract all article links owned by a particular author or
- extract all Draft articles owned by a particular author
As I am not a programmer, do you have ideas on how I can do this?
I just downloaded Python and am going thru the documentation.
Thank you for the kind words.
I am glad to hear that you were able to successfully get the requests library using pip. This suggests that python is installed correctly, and requests is now installed on your machine.
I want to give you a big thanks for providing feedback so quickly, I imagine your issue is one that will arise on many occasions.
To address your questions:
- Zendesk has a good article on locating your subdomain here: https://support.zendesk.com/hc/en-us/articles/221682747-Where-can-I-find-my-Zendesk-subdomain-
- My main suspicion with your error is that the file was saved incorrectly. Sometimes adding the “.py” at the end of the file isn’t enough to save it correctly. What program did you use to copy and paste the script? I have had this issue with Microsoft Word before. One solution is to download the full repository from GitHub (as described in the video I posted above), and move Zendesk_Guide_Article_Export.py to where you want to run it from (i.e. your Desktop). Let me know if this works
- You asked about extracting all article links owned by a particular author. Once you get my script working, I strongly suggest you do this from a spreadsheet (such as in Google Sheets or Excel). If you “Sort By” to organize the data, sort the sheet by author id. It will list all articles, article URLs, etc. by author id. This will work similarly in most spreadsheet software, but here is some information if using Google Sheets (which is free): https://support.google.com/docs/answer/3540681?co=GENIE.Platform%3DDesktop&hl=en
- I am not certain how to connect the author id to an authors name, however (our team is small, so I just dive back into Guide and can link author id with a name). We can brainstorm on this one if you have a giant team.
- Lastly, you asked about extracting all Draft articles owned by a particular author. Draft can be exported as TRUE/FALSE as written in the API documentation (true means the article is currently a draft).
- If that value is exported, I would suggest doing the organization from a spreadsheet (such as in Google Sheets or Excel). If this sounds reasonable, then I can add the code to export this value and update you when it is done.
Hi We figured it out. RUN SUCCESSFULLY!!!! When a file is downloaded to Mac, it is changed to a different format. So, we had to install "The Unarchiver" from the Mac App Store to open the zip file (rar). I had to also use IDLE to open/edit the script.
I added in the draft field to the output
I have a list of the agent ids (we have over 100+) and their names so I can map within the spreadsheet. Once I extract the Agent Ids and name mapping, I can create an Excel formula to add a new column with the Agent name to each row. Then, I can filter by that agent to get their individual report.
Thank you so much, Scott. This is very, very helpful.
This is great Mary!
I am so happy to see that you got it to work, and also thank you for sharing information about the "The Unarchiver".
Did your code work? (Nice work on adding stuff!)
One comment is that you may want to change int(draft) to bool(draft). If it runs either way, then it is probably fine either way.
I updated the code in the repository to add a "draft" row as well.
Yes. The code ran fine and produced the draft column. Then I added a new column to represent the Author Name. In Agent UI, I exported the list of Agent ids and names. Then, pasted into the spreadsheet (as a table). I used the Vlookup function in Excel to add the correct Author name next to the article they authored. Then, when done I added the ss to Google sheets so Agents can run their own reports at the end of month (I will update once per month)
great work on the script and the video. Super helpful! I was just curious how I can add the timestamp (updated_at) to the script and kept getting one error after another. I would appreciate some pointers on how to achieve this as adding this
didn't work for me.
It also looks like somebody else had the same question on the YT video.
Thanks for any input on this.
I just added the "updated_at" code in the GitHub repository.
In short I added the "updated_at" as a string type. Hopefully that will do it for you! I had trouble storing it as a different data type.
It looks like the YouTube commenter was able to do it! It might be worth asking how Apathia did it in case you need a different data type.
Keep me posted!
Thanks, Scott! It worked like a charm. Really appreciate your assistance.
Is it also possible to extract the body of an written article? We need a .CSV file so our translation agency can make the translations into foreign languages.
Please let me know if you (or any of you) have any tips on this!
Thanks in advance,
Sorry for the delay on a response. It looks like it is possible to get the body of the article outputted as a CSV. See the 'body' key of this page https://developer.zendesk.com/rest_api/docs/help_center/articles/.
Do you want just the body of the information? Or all of the information in addition (i.e. article id, title, etc.)?
Either way, I think it is possible. If you are familiar with Python, it should be pretty straight forward to edit this script: https://github.com/scotthavard92/Zendesk-Help-Center-Export-Scripts/blob/master/Article_Information_Export/Zendesk_Guide_Article_Export.py.
You will essentially want to do what Mary (above in the comments) did for 'draft', but do it for 'body' instead. something like:
for article in data['articles']: body = article['body'] decode_10 = unicodedata.normalize('NFKD', body) output_10.append(decode_10)
You will also need to add the info to the write rows section at the bottom and the header appending at the top.
I am happy to help you figure it out! let me know.
Hi Scott great work here.. I was wondering if there is a possibility to tweak the script so that it returns Category ID or title of Category for an article as well? (like you did with Section Id)
Thanks for your reply. I have a meeting with our Lead Developer next week to see what works for us.
I will let you know the outcome of that meeting.
Hello Scott, how are you? I'm trying to run the script anyway, but it returns the same error reported by Mary Paez, "Enter your Zendesk Subdomain (not full URL, but something such as your company name): topisuporte
Traceback (most recent call last):
File "Zendesk_Guide_Article_Export.py", line 8, in <module>
subdomain = input("Enter your Zendesk Subdomain (not full URL, but something such as your company name): ")
File "<string>", line 1, in <module>
NameError: name 'topisuporte' is not defined" Could it be my language? en I tried to change the line but without success, If I get the line of code: url = 'https: //' + subdomain + '. Zendesk.com/api/v2/help_center/en/articles.json?sort_by=title&sort_order=asc' To change to my subdomain and use in the browser, the data is returned successfully. Would you have any tips to give? Thank you very much,
Hello Scott, I made the settings using python 3.7 and managed to go from the step I mentioned in the previous comment, but at the end of the export, I returned the error: Number of articles: 333 Traceback (most recent call last): File "Zendesk_Guide_Article_Export.py", line 93, in <module> writer.writerows ([output_2]) File "C: \ Python 3.7.2 \ lib \ encodings \ cp1252.py", line 19, in encode return codecs.charmap_encode (input, self.errors, encoding_table)  UnicodeEncodeError: 'charmap' codec can not encode character '\ u0327' in position 31: character maps to <undefined> Do you know what it can be and how to solve it?
Thanks for reaching out. So I looked up the error and it looks like a cedilla(ç) is breaking it. Out of curiosity, what language is your data in?
Maybe try replacing the 'NFKC' in lines 47, 51 and 75 with either 'NFD' or 'NFC' and see if that solves the problem? This upcoming week I can dive in more and try to fix it. This is my first guess though, so I would give the above a shot if you are in a hurry.
Here is documentation explaining the character decoding that my script does:
You can see that there is some commentary on how it tries to interpret characters, including the cedilla.
Let me know if this works!!! Good luck, and keep me up to date!
Unfortunately Section ID doesn't easily export from the Zendesk API endpoint that I use :(.
I can do a little digging to see if I can't find another solution.
Thank you for your reply.. I tried tweaking your code but the Zendesk Api does not provide this key for articles. So right now i'm just using two different python scripts to get me that. It's basically your script but for sections. It would be great to somehow run both commands in one script and get one file with all the informations.
Thank you for your reply!
Scott, I have no words to thank for your help, The export went smoothly, As for your question, my language is en Sorry for my ignorance, I do not understand programming, Can you tell me the API of the creation date field? I saw that in the export it returns me the update, we have some goals for the agents based on article creation. Another question I have is about the order that is set up in excel, In order to be able to measure and adjust the metrics, I'm having to use a transpose in excel to invert the rows by columns, could I already handle this in the script?
I also need to export the article body, but when I added the code to do so, I ran into the same problem that Diogenez experienced (my help center articles are in English):
File "Zendesk_Guide_Article_Export.py", line 109, in <module>
File "C:\Users\karen.d.snyder\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 19, in encode
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in position 37400: character maps to <undefined>
I tried replacing ‘NFKD’ with ‘NFKC’, 'NFD', and 'NFC', but the same error occurred. I am not a Python programmer, so I am not familiar with Unicode conversion. The article at https://support.zendesk.com/hc/en-us/community/posts/360004174147-Exporting-help-center-data-to-CSV provides a manual method of exporting articles to CSV by using the API, saving the result as a JSON file, and then using the website https://json-csv.com to convert the JSON to CSV. I did perform this procedure successfully, but it is manual. There was no mention of any Unicode conversion, but perhaps it is handled by the website?
Hi, Karen, how are you? I used Scott's tip and it worked perfectly, If you want to test your script: "Maybe try replacing the 'NFKC' in lines 47, 51 and 75 with either 'NFD' or 'NFC' and see if that solves the problem"
Hi Diogenes, I tried with 'NFKC', 'NFD', 'NFC', and 'NFKD', and each time got the same error. I guess I have some articles with the '\u200b' character, and maybe none of your articles have it. Then I had the idea of just stripping that character out. It is a zero-width space, so it is not needed in the final output (https://www.fileformat.info/info/unicode/char/200B/index.htm). I modified the code that gets the article body to replace the character with an empty string:
for article in data['articles']:
body = article['body']
decode_10 = unicodedata.normalize('NFKD', body)
decode_10_nozwsp = decode_10.replace(u'\u200b', '')
After modifying the code to remove the '\u200b', the script worked and I was able to export the data with the article body!
I did notice that my exported CSV file has an empty row after each row, and the numbers such as article IDs appear as 3.60023E+11 instead of 360022658132. I wonder if this is because I'm working on a PC, and in the video he is working on a Mac? These are of course minor issues that can be fixed manually.
Would it be possible to use these scripts to output Community posts?
Karen D Snyder, that great that worked, in my file is also unconfigured, are you using Excel? just format the column to number without decimal places that will keep the correct code
@Jeff Callahan, getting posts would be a different API endpoint, and posts don't have all the same properties that articles do. I think it would not be a difficult modification of the script, to change the API endpoint and change some of the properties being exported. Information on the post API and properties is at https://developer.zendesk.com/rest_api/docs/help_center/posts/. I guess you would need to change this:
url = 'https://' + subdomain +'.zendesk.com/api/v2/help_center/en-us/articles.json?sort_by=title&sort_order=asc'
to something like this:
url = 'https://' + subdomain +'.zendesk.com/api/v2/help_center/community/posts.json'
(adding sorting if you didn't want the default order of sorted by created_at)
And you would need to remove the code that gets article properties that don't exist in posts (section_id, draft), then if you want the post details, add code to get the details property, and you'd probably want to change some of the column names.
@Diogenez, yes, I'm using Excel, so I formatted the columns to numbers with no decimal places. I just thought it was interesting that in the YouTube video that showed this export being done, his resulting CSV file didn't need to have the formatting changed, and didn't have empty rows.
Hi Scott...Thanks for publishing this. I was able to successfully run the python script and pull articles from our instance. However, it only found 106 articles and I have 372 published articles. I am an admin within the ZD environment and I verified that it was pulling articles that I wasn't an author of, but not sure what I am missing to pull the other articles. I appreciate your assistance!
Hopefully, @... will be able to come back and respond to your request. In the meantime, this article on Creating and Editing Articles might have some helpful information.
The only solution for a list of IDs is Scott's script. I had to send this post and Scott's links to someone else in my org, because I don't have API access. But I had to get all the Section IDs (which actually would work, with a "Remove Duplicates" in the output file, which is now on Google Sheets) to integrate Pendo with Zendesk on our product.
Pendo is becoming really big. To allow for easier integration, Zendesk really should offer an export of article and section data, natively. For your Feature Request list.
Sorry for the delayed response.
I am assuming aren't successfully exporting your "internal" articles, or those not viewable to everyone, when you run the script. Is that true?
If you haven't tried already, give the Oauth script a shot:
When I run the script not using Oauth it doesn't return any of our "internal" articles.
When I do use the Oauth script it dumps everything other than archived articles.
Let us know if that helps solve it!
Please sign in to leave a comment.