All public-facing product documentation at Zendesk is published in branded help centers. Though most other teams at Zendesk create content directly in the help center, the Docs team creates and maintains the product documentation offline in DITA source files. DITA, which stands for Document Information Typing Architecture, is an XML-based data model for authoring and publishing content.
This article covers the following topics:
Why DITA?
DITA is an industry-standard for creating and maintaining large documentation sets. Jacquie Samuels on techwhirl.com describes the problem it tries to solve as follows:
Writing content in Word, email, PowerPoint, WordPress, HTML, InDesign, FrameMaker, or any other format is equivalent to writing on stone tablets. Your content is essentially stuck in that format and dumb as a rock. Dumb content can’t be easily reused or repurposed, and that’s inefficient and costly.DITA is a way of writing and storing your content so you can manage it like an asset. It leverages XML (eXtensible Markup Language), to make your content intelligent, versatile, manageable, and portable.
For example, content that is in DITA can be published to (and fully branded), PDF, HTML, RTF, PowerPoint, and mobile–all without ever copying and pasting anything between files.
(Source: What Is DITA? on TechWhirl)
Apart from separating content from format, the other benefits of DITA for the Zendesk Docs team are as follows:
- Forces us to be disciplined about content structure. A DITA file is XML. If the structure is invalid, the tool won't let us do anything with it.
- Allows us to move content around easily. We just drag a topic node from one place to another in the structure.
- Allows us to reuse content by importing chunks of content in multiple articles.
- We don't often publish PDFs but when we do, we use the DITA source files.
The DITA authoring tool we use is Oxygen XML Author. In addition to its robust authoring environment, we rely on a host of other features, including file search, file diff, change tracking, and HTML transformations. Other DITA authoring tools include Framemaker, Arbortext, and XMetal, (to mention a few).
How we publish to the help center
We use Author to create or update content in DITA source files. When we're ready to publish (usually at the same time a product feature is released or updated), we transform the DITA to HTML, then manually paste the source HTML into the article code editor in Guide. What the process lacks in elegance it makes up in simplicity.
Occasionally, we need to update many articles in a short amount of time. For example, when Zendesk rebranded the Outbound product to Connect, all the Outbound docs had to be updated at 7 a.m. Pacific time on a specific date. In the days leading up to the change the writer updated the DITA source files, then we did a batch transformation of all the DITA files and pushed them to the help center using the Zendesk API. Publishing them only took a few minutes.
The batch-publishing tool we used is open source on Github for anybody to use. See https://github.com/chucknado/zpu. For instructions on how to use it, see the readme at https://github.com/chucknado/zpu/blob/master/README.md.
How we manage files
We store the DITA files in a Google Team Drive, which syncs the files automatically to each writer's computer. The writers always have the latest versions at their fingertips. When one writer saves changes to an article, the changes are immediately propagated to the other writers' computers through Team Drive sync. Team Drive also keeps any changes for the last 30 days. Again, the goal is to keep the process simple.
We also store images in the Google Team Drive but upload them to an Amazon S3 file server to make them available publicly. All the images in our articles are downloaded to your browser from S3, not from the help center. The Amazon S3 service makes managing the images simpler.
How we publish localized articles
The default language of our help centers is English. We also publish the product docs in German, Spanish, French, Japanese, Korean, and Brazilian Portuguese.
When a localization handoff is due, we use the help center API to download selected English articles from the help centers and write them to HTML files. We use the Amazon API to download article images from our Amazon S3 bucket. We package the files and hand them off to our localization vendor. After the vendor returns the translations, we upload the articles and images with the help center and Amazon APIs.
The API client we use to manage handoff files is called ZLO (Zendesk localization tools), which was created internally by the Docs team. The ZLO client is open source and available on Github at https://github.com/chucknado/zlo. You can read more about it in the documentation on Github.
26 Comments
Hi Charles,
Thanks for all that information. I have successfully completed the following:
- Published my DITA topics to HTMLS in Oxygen Author
- Batch created Zendesk Guide articles from these using the files you shared on GitHub (Batch-publish transformed DITA files to Help Center)
- Batch updated the same files
However, I am still at a loss as to best approach for managing images. My DITA source is managed in SVN and the image paths are relative to this. I see you suggest using tools like Amazon S3 and Cyberduck for managing Zendesk Guide images but am unsure of the process. Are these images protected behind authentication or are they public?
At this point I can't even successfully call the Create Attachment API as I get a 500 response code for no apparent reason.
Can you advise?
Thanks in advance,
Ann
Hi Ann,
We don't use any image attachments in our articles.
We upload the images separately to a folder on Amazon S3 using the Cyberduck FTP client. Amazon s3 is a web server so all the images in our S3 folder automatically get published to the web.
All the files on S3 get their own URLs. Our URLs all share the same path:
https://zen-marketing-documentation.s3.amazonaws.com/docs/en/image1.png
https://zen-marketing-documentation.s3.amazonaws.com/docs/en/image2.png
...
We use these URLs directly in the DITA source files for the articles. Example:
Oxygen Author displays the images in the article in the WYSIWYG view.
From there, it's just a matter of transforming the DITA to HTML and publishing the HTML.
We recently started automating how we manage images using the S3 API. We use a Python library called Boto 3, which gives you access to the S3 API. For example, uploading an image file with Boto looks like this:
For more info, see the following resources:
Thanks Charles,
I have registered with Amazon AWS and am using a trial S3 service account. How do you manage authorization to your images in S3?
Mine are not showing in Oxygen because of lack of authorization (403 error).
Thanks again,
Ann
We use a preference in Cyberduck to automatically set the correct permissions when we upload images:
You can also change the permissions of the files already on S3:
Thanks.
Interesting!
Zendesk Team doesn't use Guide Editor internally?
Hi Toru,
Most teams at Zendesk (Advocacy, HR, IT, Facilities, etc) have internal Help Centers and use the Guide editor to write content. As mentioned in the article above, the DITA process the Docs team uses for public-facing docs has a steep learning curve and isn't for everybody. Professional tech writers are generally familiar with it.
Thanks.
Thank you!
I hope Zendesk Guide focuses on the Doc team's use-case since public documentation can be maintained by the Doc team. Zendesk Guide is too poor to help the internal team yet.
I like Zendesk. I hope Zendesk Guide would help you and us to write an article and maintain it without any alternative solutions.
This is a fascinating and extremely helpful article. Our company's documentation is DITA-based, and are now looking to move from PDF documentation. I would like to know, which DITA repository system to you use? We currently use DITAToo.
My company is looking at migrating a bunch of DITA-based HTML to a Zendesk Guide. I read this article and was able to use the batch-publish tool successfully for a few topics that I had already created in Guide. However, we'd like to be able to use the tool to create (as opposed to update) articles in bulk from HTML. The readme.md file mentions mods that need to happen to the tool to create articles. Can you elaborate on what mods need to be done?
Thanks!
Hi William,
You have to use the Create Article endpoint and specify more properties for each article, such as the article's initial author and section, as well as it access permissions. Here's a quick script that'll upload a collection of HTML files to HC:
https://gist.github.com/chucknado/40ff923e6a2eef6ad529bf32cd12640f
Hi, Chuck.
Thanks for passing along the script. Finally getting around to playing with it.
I'm getting a 404 error when I try to run it. I know it's parsing the files correctly. I believe the API token is good. I know it could be many different things, but if you could point me in a likely direction for this 404 error, I'd be grateful.
Cheers,
Will
Hi William-
The 404 (Not found) indicates that a resource doesn't exist - it most likely hasn't been created yet, or is being referred to incorrectly. If you have a more explicit example of the error it might be possible to assist, but there are numerous possibilities.
I found this article after finding one of the members on your doc team on LinkedIn and asking how this section of your support site was published: https://support.zendesk.com/hc/en-us/categories/200201796-Zendesk-updates
I want to confirm: Is this the process you use for publishing items in the Announcements, Release notes, and Service notifications sections? If so, I'd like to understand the pre-DITA process better. Specifically, and without revealing anything proprietary, how do you gather, organize, and format these articles. (I see each section seems to have its own content/topic patterns.) And how much of the process is automated? Was that automation easy or did it take a significant amount of developer time?
Thanks for any guidance you can provide.
Chuck
Hi Chuck,
Ephemeral content like release notes and announcements don't follow the same process. This content is produced by program or product managers directly, not the Docs team. It's all a manual process. The Docs team will do an editorial review of the announcements in HC before they go public.
Charles
For images (and other cross references), you might want to consider using @keyref instead of @href, then manage the paths to the images in a single key definition file. That way, when the path changes from your local system to some server path, you only have to update the keydef file instead of updating the paths on each of your topics.
Scott
Thanks, Scott. For those who want to learn more, see Indirect key-based addressing on www.oxygenxml.com.
In our case, we use production urls in all DITA topics. If a linked article doesn't exist on HC yet, we create a placeholder article in Help Center to get its url. For images, we know what an image url on s3 will be if we know the image file name. Writers can also upload their images to s3 while an article is still in development to view the images in the article in Oxygen Author.
This model is not for everyone. Most the members of the Docs team at Zendesk came from large software companies with more traditional production models. But in the SaaS industry like ours, the lines between contexts are getting blurred.
Hi Charles Nadeau
I'm trying out your create articles script and running into the same error as William Kellett, above.
- Failed to create article with error 404: {"error":"RecordNotFound","description":"Not found"}
This article does not exist in our HC, and I've double checked that all of the variables are correct. Everything seems correct (checked by doing print to screen) right up until the attempt to post the article in HC.
Any suggestions?
Hi Shay,
I assume you're referring to the gist at https://gist.github.com/chucknado/40ff923e6a2eef6ad529bf32cd12640f?
The error could mean the API didn't find any of the following records in your account: author, section, user_segment, or permission_group.
Double-check to make sure the values of the following variables in the script are all valid (I'm using '12345' as a placeholder):
For example, you must specify the user id of an existing user in your account as the author_id. The script won't create the user.
Hi Charles,
Thank you for your reply. Yes, I meant the gist.
Yesterday I revalidated all the variables, and discovered that I had the wrong permission_group_id. Once I changed that, the script worked perfectly.
Thanks again!
Hello, Charles!
Could you please show the proper examples of handoff,json and localized_content.json files? I am referring to your ZLO project on github.
I've found only the example of _custom_loader.json in the documentation.
Thank you in advance.
Hi Maria,
I pushed example files to the docs folder in the repo. See example-handoffs.json and example-localized_content.json.
Thanks.
Charles
Thank you, Charles. It really helped me a lot!
Your project on GitHub is a brilliant and powerful tool for any knowledge base localization routine. I am so grateful for sharing it with us.
Hi Charles,
The described workflow requires a lot of manual operation and pasting HTML manually is killing the whole advantage of using DITA.
Using API looks like a solution here, however when you're using DITA and you doing the batch transform of the articles into HTML there is no way to automatically produce an association between the HTML file and the SectionID where this article should belong. A workaround would be to create a reference file, as you mentioned - either in the format of the yml or csv, then parse it and upload the article. But this approach has different issue: when you add more articles to the section there is no way to identify which articles need to be updated and which need to be uploaded.
Therefore it would be good to have an API smart enough to understand that if there is already an article in the section with the same name - update it, if not - upload it.
Problem #2: Crosslinking.
Some of the articles have links to other articles in the guide. Direct upload these HTMLs generated by DITA does break these links.
I would like to hear how zendesk solved this issue?
Theoretically to ensure that links are not broken it's possible to use API to store new Article IDs, then using a programmatic way to go through all of the articles, parse HTML, update links, and then update articles using the API again.
Thank you!
Hi Valerie, thanks for the thoughtful comments.
Problem #1: Publishing.
> when you add more articles to the section there is no way to identify which articles need to be updated and which need to be uploaded.
You could maintain a registry of articles that includes both the DITA file name and corresponding HC article id of each article. Before publishing, the script could check the registry. If the article is in the registry, then it was published before and the script knows to use the Update Article endpoint using the recorded article id. If it's not in the registry, then the script knows to use the Create Article endpoint instead. Because the endpoint returns the id of the new article, the script can then add the article to the registry so it knows to make an update request next time. The registry could be something as simple as a json or yml file.
Problem #2: Crosslinking.
> Some of the articles have links to other articles in the guide. Direct upload these HTMLs generated by DITA does break these links.
If I understand the question, you're referring to using DITA file names and resource ids in xrefs to other articles? Something like:
Our team uses Help Center URLs in xrefs to other articles, bypassing the problem. We do use DITA resource ids for anchor links and these carry over in the transformation without breaking.
As you mention, you could also programmatically go through all the articles, parse HTML, update links, and then update articles using the API again.
We use this method to update links and images in article translations when they come back from the translators. All the urls in the translations still point to English articles. So we maintain a JSON inventory of all localized articles and images. It was created initially with the Translations API but now we update it on an ongoing basis as we get translations back. The script uses it to parse the translations and replace the href and src values.
Hi Charles!
Thank you for your answers. I still have questions:
> Before publishing, the script could check the registry.
Do you have an example which fields in DITA do you use for these purposes? And by a chance a script that checks them and rebuilds the index file?
Thanks!
Hi Valery, unfortunately we don't have a script for the use case I'm describing. Our team publishes articles manually.
To track which article in Help Center that a DITA file corresponds to, we use the `props` attribute in the top-level `topic` element of a DITA file to store the article id of the Help Center article. The writers manually add the id to the element when they publish a new article. Example:
Hope this helps.
Please sign in to leave a comment.