Case Study No. 002
Convoice AI: voice agents built around real business calls.
I co-founded Convoice and led software engineering. We were building AI voice agents for real business calls: connect a phone number, upload knowledge, review transcripts and call history, and run the system from a dashboard that operators could actually use.
Convoice workspace
LiveKnowledge
Websites, PDFs, call notes
Calls
History and routing
Conversation Review
AI voice agents for business calls
Convoice
01 The Problem
The problem was never “can we make a voice AI demo.” It was much more operational than that. Businesses were missing calls, working through staffing gaps, and stuck with rigid IVR trees or scripted bots that fell apart as soon as the conversation moved off the happy path.
- Customer-facing calls were expensive to staff and easy to miss.
- Existing IVR-style systems were brittle and poor at complex queries.
- Setup had to stay simple enough for non-technical operators to actually use.
What We Built
Convoice was more specific than “voice AI for business.” We built a business dashboard for creating and operating voice agents, not just a prompt wrapped in a phone call.
Build And Launch
The goal was to let someone create, customize, and deploy a voice agent in a few clicks, including phone-number setup and demo-booking style flows.
Knowledge Ingestion
We made the system knowledge-backed: files, websites, recordings, and other business context could all feed the bot.
Calls And Review
The app included concrete operational surfaces for calls, conversations, recordings, transcriptions, summaries, and outbound-call management.
Operations
Analytics, voice settings, reservations, scheduling, and team settings were part of the product surface rather than afterthoughts.
Operating Target
~2s
We cared a lot about getting latency down to around two seconds because trust breaks fast when a voice system hesitates.
Cost Target
~$0.10
One of the key constraints was keeping the system cheap enough to run, with a target in the neighborhood of ten cents per minute.
Scale Target
0 to ~200
We were also thinking about scale from the beginning, with a target of handling roughly 200 concurrent calls without turning the product into an enterprise-only science project.
Technical Shape
On the product side, the frontend was a real operating surface: knowledge management, calls, conversations, analytics, voice settings, reservations, and workspace admin all had to coexist in one system that still felt understandable.
On the engineering side, I split responsibilities across services instead of hiding everything behind one vague backend. There was dedicated telephony infrastructure, a synchronous query path, and a broader FastAPI backend handling file management and other non-serverless work alongside Lambda-based preprocessing and upload flows.
The stack was AWS-heavy and fairly pragmatic: DynamoDB, S3-oriented preprocessing, Pinecone-backed retrieval, and telephony wired through providers like Twilio, OpenAI, Google Cloud Text-to-Speech, Azure speech, Deepgram, and ElevenLabs. The interesting part was not any one component by itself. It was getting latency, configurability, and operating cost to work together in something people could actually run.
The product claim stayed attached to phone calls, reservations, and customer support workflows instead of drifting into generic AI theater.
Performance mattered because slow turn-taking breaks trust much faster in voice than it does in text.
The interesting part was aligning product claims, infrastructure cost, and the actual interface people had to operate.