Case Study No. 002

Convoice AI: voice agents built around real business calls.

I co-founded Convoice and led software engineering. We were building AI voice agents for real business calls: connect a phone number, upload knowledge, review transcripts and call history, and run the system from a dashboard that operators could actually use.

Inbound / Outbound Knowledge + Analytics

Convoice workspace

Live

Knowledge

Websites, PDFs, call notes

website faq.pdf recordings

Calls

History and routing

Conversation Review

Caller: can you move Friday's table for four to 7:30?
Bot: updated, confirmed, and logged for follow-up.
Voices Summaries Reservations

AI voice agents for business calls

Convoice

01 The Problem

The problem was never “can we make a voice AI demo.” It was much more operational than that. Businesses were missing calls, working through staffing gaps, and stuck with rigid IVR trees or scripted bots that fell apart as soon as the conversation moved off the happy path.

  • Customer-facing calls were expensive to staff and easy to miss.
  • Existing IVR-style systems were brittle and poor at complex queries.
  • Setup had to stay simple enough for non-technical operators to actually use.

What We Built

Convoice was more specific than “voice AI for business.” We built a business dashboard for creating and operating voice agents, not just a prompt wrapped in a phone call.

Build And Launch

The goal was to let someone create, customize, and deploy a voice agent in a few clicks, including phone-number setup and demo-booking style flows.

Knowledge Ingestion

We made the system knowledge-backed: files, websites, recordings, and other business context could all feed the bot.

Calls And Review

The app included concrete operational surfaces for calls, conversations, recordings, transcriptions, summaries, and outbound-call management.

Operations

Analytics, voice settings, reservations, scheduling, and team settings were part of the product surface rather than afterthoughts.

Operating Target

~2s

We cared a lot about getting latency down to around two seconds because trust breaks fast when a voice system hesitates.

Cost Target

~$0.10

One of the key constraints was keeping the system cheap enough to run, with a target in the neighborhood of ten cents per minute.

Scale Target

0 to ~200

We were also thinking about scale from the beginning, with a target of handling roughly 200 concurrent calls without turning the product into an enterprise-only science project.

Technical Shape

On the product side, the frontend was a real operating surface: knowledge management, calls, conversations, analytics, voice settings, reservations, and workspace admin all had to coexist in one system that still felt understandable.

On the engineering side, I split responsibilities across services instead of hiding everything behind one vague backend. There was dedicated telephony infrastructure, a synchronous query path, and a broader FastAPI backend handling file management and other non-serverless work alongside Lambda-based preprocessing and upload flows.

The stack was AWS-heavy and fairly pragmatic: DynamoDB, S3-oriented preprocessing, Pinecone-backed retrieval, and telephony wired through providers like Twilio, OpenAI, Google Cloud Text-to-Speech, Azure speech, Deepgram, and ElevenLabs. The interesting part was not any one component by itself. It was getting latency, configurability, and operating cost to work together in something people could actually run.

The product claim stayed attached to phone calls, reservations, and customer support workflows instead of drifting into generic AI theater.

Performance mattered because slow turn-taking breaks trust much faster in voice than it does in text.

The interesting part was aligning product claims, infrastructure cost, and the actual interface people had to operate.