Real-Time AI Sales Coaching for Telemarketing Agents
Project: Real-Time Sales Coaching
Tech Stack: Node.js, TypeScript, React, WebSocket, STT (multi-provider), LLM coaching engine, SFTP deployment
Category: Real-Time AI, Voice Intelligence, Sales Technology
Background
This is a real-time AI-powered sales coaching system built for JustDial's telemarketing team. It listens to live phone calls and delivers instant AI-generated coaching insights to agent dashboards — while the call is still happening.
How It Works
PBX/Dialer (Audio) → WebSocket → coaching-core (Node.js)
↓
Dual-channel STT
(agent + customer separately)
↓
LLM Coaching Engine
↓
coaching-front (React Dashboard)- Audio ingestion: Raw PCM audio from PBX via WebSocket (two channels: agent mic + customer)
- Speech-to-text: Each channel piped to STT separately — preserves speaker identity
- Coaching: LLM analyzes conversation in real time and generates coaching hints
- Dashboard: Agent sees live suggestions without the customer hearing anything
Challenge 1: Dual-Channel Audio Processing
Problem: Standard telephony setups mix agent and customer audio into a single stream, making speaker diarization complex. The system needed to identify who said what accurately.
Solution: The PBX was configured to deliver two separate WebSocket streams — one per channel. This eliminates diarization ambiguity and allows the LLM to reason clearly about agent vs. customer intent.
Challenge 2: Remote Deployment via SFTP
Problem: Development was done locally but deployment was to a remote production server (192.168.40.170). Manual file copying was error-prone.
Solution: Used VS Code SFTP extension with a .vscode/sftp.json configuration:
{
"name": "SSO",
"host": "192.168.40.170",
"protocol": "sftp",
"port": 22,
"username": "...",
"remotePath": "/var/www/app"
}This enabled one-click sync of code changes to the production server.
Challenge 3: Understanding the Existing Architecture
Problem: A new developer needed to quickly understand a complex dual-service architecture (backend + frontend, real-time audio + LLM) without documentation.
Solution: Claude analyzed the entire codebase and produced a full architectural breakdown covering: audio flow, WebSocket protocols, LLM integration points, and frontend state management — used as the onboarding document.
Key Learnings
- Separate audio channels per speaker is far more reliable than post-hoc diarization
- Real-time coaching must have <500ms latency or agents lose trust in the system
- VS Code SFTP extension is a practical tool for rapid iteration on remote servers
Session Date: February 2026 | Environment: Production PBX integration
