XSELL Technologies — spaCy · Transformers · AWS SageMaker · MLOps
93% model accuracy · $500K new business · 7TB text corpus
The Problem
XSELL Technologies builds AI systems that coach contact center agents in real time. At the core of that product is the ability to understand customer conversations at scale — classifying intent, extracting entities, summarizing call context, and surfacing actionable signals from a continuous stream of call transcripts and chat logs.
The challenge was scale and quality simultaneously. The text corpus spanned roughly 7TB of raw customer interaction data — call transcripts, live chats, and CRM notes — with high linguistic variability across industries, products, and agent communication styles. Pre-LLM, there was no off-the-shelf solution that could handle this volume with the accuracy required for real-time agent coaching. Everything had to be built, trained, and productionized from first principles.
The Solution
We built a production NLP platform using spaCy Transformer models fine-tuned on domain-specific conversation data. The pipeline handled the full NLP task suite: intent classification, named entity recognition, sentiment analysis, and abstractive summarization of call segments.
Preprocessing was a significant engineering effort in itself — normalizing transcripts across formats, handling speaker diarization artifacts, cleaning ASR (automatic speech recognition) noise, and building a labeling pipeline to generate training data from the raw corpus. The final models achieved 93% accuracy on held-out test sets and were deployed to AWS SageMaker with a real-time inference endpoint serving the coaching UI.
Alongside the models, we built a full MLOps pipeline on SageMaker — automated retraining triggers, model versioning, A/B evaluation, and a monitoring layer that tracked prediction drift and data quality in production.
System Flow
Tech Stack
Key Engineering Challenges
Raw call transcripts from ASR systems carry significant noise — misrecognized words, speaker attribution errors, inconsistent formatting across vendors, and filler content that degrades model quality. Building a robust preprocessing pipeline that could normalize this corpus at scale, while preserving semantically meaningful signal, required extensive experimentation with cleaning heuristics and quality filters before any model training could begin.
General-purpose BERT variants underperformed on contact center language, which has high domain specificity — product names, agent scripts, compliance language. We implemented a two-stage fine-tuning approach: continued pretraining on the unlabeled domain corpus before task-specific fine-tuning. This significantly improved accuracy without requiring proportionally more labeled data.
The coaching product required model predictions fast enough to be useful mid-call — not post-call analysis. Getting Transformer-based models to serve at acceptable latency under concurrent load on SageMaker required model quantization, batching optimization, and endpoint auto-scaling configuration to maintain p95 latency SLAs during peak traffic.