Workflow: How to normalize Zendesk ticket data for AI models

Raw Zendesk ticket data can be difficult to digest for AI models. Tickets often include HTML, signatures, quoted email threads, inconsistent fields, and untrusted user input.

Some of the most commonly used cases where Zendesk ticket data can be used with AI models are:

AI-based triage and routing
Agent-facing summaries
Sentiment and topic detection
QA scoring systems
Conversational search
For use with schema-aware GPT assistants

This workflow explains how to normalize (clean, standardize, and structure) Zendesk ticket data so you can use it reliably with AI models.

The workflow includes the steps below:

Step 1: Extract ticket data
Step 2: Clean HTML
Step 3: Remove signatures and disclaimers
Step 4: Mask sensitive information
Step 5: Normalize ticket fields
Step 6: Choose primary text for AI inputs
Step 7: Apply prompt-safe formatting
Step 8: Truncate and validate data

Disclaimer: This article is provided for instructional purposes only. Zendesk does not support or guarantee the code. Post any issues you have in the comments section or try searching for a solution online.

Step 1: Extract ticket data

Retrieve ticket data from the Zendesk Support API.

Create a .env file.

In your project directory, create a file named .env and add your Zendesk account details:

ZENDESK_SUBDOMAIN=yoursubdomain
ZENDESK_EMAIL=you@company.com
ZENDESK_API_TOKEN=your_api_token

Add the ticket fetch function (app.js)

The following code builds the Zendesk API base URL from your subdomain, creates an HTTP Basic Authorization header using your API token, and defines a fetchTicket() function that retrieves a ticket (including comment events):

const ZENDESK_BASE_URL = `https://${process.env.ZENDESK_SUBDOMAIN}.zendesk.com/api/v2`;

const authHeader = Buffer.from(
  `${process.env.ZENDESK_EMAIL}/token:${process.env.ZENDESK_API_TOKEN}`
).toString("base64");

async function fetchTicket(ticketId) {
  const response = await fetch(
    `${ZENDESK_BASE_URL}/tickets/${ticketId}.json?include=comment_events`,
    {
      method: "GET",
      headers: {
        "Authorization": `Basic ${authHeader}`,
        "Content-Type": "application/json"
      }
    }
  );

  if (!response.ok) {
    throw new Error(`Zendesk API error: ${response.status}`);
  }

  return response.json();
}

Next, extract data from the response. At a minimum, you’ll want:

Ticket metadata (ID, subject, priority, group, etc.)
Ticket description
Public and internal comments
Tags
Custom fields

Comments should include html_body and/or plain-text fallback (body, plain_body).

Note: For bulk ticket data exports, use the Incremental Exports API. For this tutorial, we use the Ticketing API.

Step 2: Clean HTML

Zendesk ticket data often contains HTML markup such as <p>, <br>, and links. Most LLM workflows work best with plain text, so in this step you convert HTML to readable text by stripping tags and normalizing formatting.

function htmlToText(html) {
  if (!html) return "";

  const parser = new DOMParser();
  const doc = parser.parseFromString(html, "text/html");

  // Remove scripts and styles
  doc.querySelectorAll("script, style").forEach(el => el.remove());

  // Convert <br> to newlines
  doc.querySelectorAll("br").forEach(br => br.replaceWith("\n"));

  return normalizeWhitespace(doc.body.textContent || "");
}

function normalizeWhitespace(text) {
  return text
    .replace(/\r\n/g, "\n")
    .replace(/\n{3,}/g, "\n\n")
    .trim();
}

Step 3: Remove signatures and disclaimers

When you process email ticket data, you often see signatures and disclaimers that add bulk. If you don’t need them, remove them.

function stripSignatures(text) {
  const lines = text.split("\n");

  const signatureMarkers = [
    /^--$/,
    /^sent from my/i,
    /^best regards/i,
    /^thanks[,!]?$/
  ];

  for (let i = 0; i < lines.length; i++) {
    if (signatureMarkers.some(re => re.test(lines[i].trim()))) {
      return normalizeWhitespace(lines.slice(0, i).join("\n"));
    }
  }

  return normalizeWhitespace(text);
}

Step 4: Mask sensitive information

Some tickets include sensitive information. Before you send data to an AI model, remove or mask sensitive data.

function maskPII(text) {
  return text
    // Email addresses
    .replace(
      /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi,
      "[REDACTED_EMAIL]"
    )
    // Phone numbers
    .replace(
      /\b(\+?\d{1,3}[\s-]?)?\(?\d{3}\)?[\s-]?\d{3}[\s-]?\d{4}\b/g,
      "[REDACTED_PHONE]"
    )
    // Token-like strings
    .replace(/\b[a-f0-9]{32,}\b/gi, "[REDACTED_TOKEN]");
}

Step 5: Normalize ticket fields

If you use custom ticket fields, define them. This ensures readability and prevents extra API requests each time you process ticket data for your AI model.

const TICKET_SCHEMA = {
  customTicketField1: null,
  customTicketField2: null,
  customTicketField3: null,
  customTicketField4: []
};

Map custom ticket field IDs to the schema.

const CUSTOM_FIELD_MAP = {
  111: "customerTier",
  222: "platformsAffected"
};

function normalizeFields(ticket) {
  const fields = { ...TICKET_SCHEMA };

  for (const field of ticket.custom_fields || []) {
    const key = CUSTOM_FIELD_MAP[field.id];
    if (!key) continue;

    fields[key] = normalizeValue(field.value);
  }

  return fields;
}

function normalizeValue(value) {
  if (Array.isArray(value)) {
    return value.map(v => v.toLowerCase()).sort();
  }

  if (typeof value === "string") {
    const v = value.toLowerCase();
    return v;
  }

  return value;
}

Step 6: Choose primary text for AI inputs

Define which parts of the ticket to extract. In this code sample, we focus on ticket comments.

function getPrimaryText(ticket, comments) {
  const newestPublic = [...comments]
    .reverse()
    .find(c => c.public !== false);

  const raw = newestPublic?.html_body || "";

  let text = htmlToText(raw);
  text = stripSignatures(text);
  text = maskPII(text);

  return text;
}

Step 7: Apply prompt-safe formatting

Use prompt-safe format. This ensures that the model treats user content as data, not instructions.

function safePromptBlock(userText) {
  return `
<<<BEGIN_USER_MESSAGE>>>
${userText}
<<<END_USER_MESSAGE>>>
`.trim();
}

Step 8: Truncate and validate data

In the final step, truncate and validate the normalized data.

function truncate(text, maxChars = 8000) {
  if (text.length <= maxChars) return text;
  return text.slice(0, maxChars) + "\n...[TRUNCATED]";
}

Also validate that required fields exist, values match expectations, and text is present.

After you complete these checks, you have AI-friendly, normalized ticket data that’s ready for your prompts.