Great question. Yes—AI can auto-categorize and tag support tickets for small teams, reliably, without a big IT lift. Here’s how to do it so the results are measurable and the rollout is low-risk.
The problem: Support inboxes mix billing, bugs, how-to questions, and urgent outages. Humans triage inconsistently, reporting gets noisy, and time-to-first-response drifts.
Why it matters: Clean, consistent tags power faster routing, accurate dashboards, and smarter staffing. For small teams, shaving 2–5 minutes of triage per ticket is material.
What I’ve seen work: Keep the category list short (6–10), use a two-pass approach (rules then AI), set confidence thresholds, and let AI tag 70–85% of tickets with high precision while humans review the rest.
- Do: Cap top-level categories at 10. Add tags for nuance.
- Do: Define each category in one sentence plus 2–3 examples.
- Do: Use a confidence threshold (e.g., 0.70) and auto-route only when above it.
- Do: Add a few keyword “guardrails” (e.g., refund, outage) before AI classification.
- Do: Review 50 tickets weekly to refine the taxonomy.
- Don’t: Start with 30+ categories. You’ll tank accuracy and trust.
- Don’t: Let AI guess when uncertain—send to “General triage.”
- Don’t: Mix bug reports and feature requests under one bucket.
What you’ll need:
- A helpdesk or inbox (e.g., Zendesk, Help Scout, Intercom, Freshdesk, Front, or Gmail).
- An automation layer (native triggers/webhooks or a connector like Zapier/Make).
- An LLM endpoint (e.g., GPT-4 class model). Use subject + first ~500 characters of body.
- 200–500 recent tickets exported for testing.
- A draft taxonomy (6–10 categories, 10–25 tags).
Step-by-step:
- Define taxonomy: 6–10 categories such as Billing, Login/Access, Bug Report, Feature Request, Shipping/Delivery, How-To/Usage, Account Changes, Urgent/Outage.
- Write category rules: One-sentence definition + 2 examples per category. Keep a shared doc.
- Collect samples: 25 tickets per category. Note the correct category and tags.
- Set rules first: If subject/body contains strong keywords (e.g., “refund,” “cancel,” “can’t log in”), apply those tags immediately and skip AI.
- Build the classifier prompt (below). Require JSON, include your categories, and ask for a confidence score and reason.
- Offline test: Run 100–200 historical tickets through the prompt. Target ≥85% precision on auto-routed tickets at confidence ≥0.70.
- Wire automation: On new ticket created → apply keyword guardrails → call AI → if confidence ≥0.70, set category/tags and route; else set “General triage.”
- Human-in-the-loop: Add an “AI-suggested” note so agents can accept/edit. Log edits to improve prompts.
- Iterate monthly: Merge low-volume categories; promote common tags.
Copy-paste prompt (robust baseline):
“You are a support ticket classifier for a small business. Categorize and tag the ticket strictly using the allowed values. Output valid JSON only, no prose.Allowed categories: [Billing, Login/Access, Bug Report, Feature Request, Shipping/Delivery, How-To/Usage, Account Changes, Urgent/Outage, General].Allowed tags (examples, use zero or more): [refund, invoice, subscription, password reset, account lockout, two-factor, crash, error-500, slow-performance, integration, shipping-delay, tracking, return, exchange, workflow, onboarding, downgrade, upgrade, outage].Rules: If not confident, choose General. Prefer specific categories over General. Consider both subject and body. Return a confidence 0.00–1.00 and 1–2 sentence reason.Respond with JSON: {category: string, tags: string[], urgency: one of [low, normal, high], confidence: number, reason: string}Ticket subject: [paste subject]Ticket body: [paste first 500 characters of body]”
Worked example:
- Input: Subject: “Refund for double charge.” Body: “I was billed twice for May. Please reverse one charge. Order #48392.”
- Expected JSON: {“category”:”Billing”,”tags”:[“refund”,”invoice”],”urgency”:”normal”,”confidence”:0.86,”reason”:”Billing dispute with explicit refund request”}
- Automation: Apply tags, route to Billing queue, attach macro with refund steps.
Metrics to track:
- Auto-triage rate: % of tickets auto-tagged and routed (target 60–80%).
- Precision on auto-routed: % correct among auto-routed (target ≥85%).
- Manual correction rate: % of AI tags edited by agents (target ≤15%).
- Time-to-first-response: Aim for 15–30% faster within 30 days.
- SLA breach rate: Especially for Urgent/Outage (target -30%).
Common mistakes and quick fixes:
- Too many categories: Merge into 6–10; move nuance to tags.
- Letting AI guess: Enforce confidence threshold and General fallback.
- No keyword guardrails: Add a short dictionary for refunds, outages, password resets.
- Unlabeled test data: Label 200 tickets first; otherwise you can’t measure precision.
- Ignoring multilingual: Detect language; translate to English for classification; store original text.
1-week action plan:
- Day 1: Draft taxonomy (8 categories, 20 tags). Write category definitions + examples.
- Day 2: Export 300 tickets. Manually label 150 for ground truth.
- Day 3: Implement keyword guardrails (refund, reset, outage, shipping).
- Day 4: Plug in the prompt above. Test on 150 labeled tickets. Tune wording to lift precision.
- Day 5: Go live with confidence ≥0.70. Auto-route Billing, Login/Access, and Bug; send rest to General.
- Day 6: Review 50 live tickets. Adjust tags and guardrails.
- Day 7: Baseline metrics. Set weekly targets for auto-triage rate and precision.
Expectation: Within 2 weeks, you should see ~60–75% of tickets auto-tagged and routed with ≥85% precision, and a noticeable drop in time-to-first-response.
Your move.
—Aaron
