Voice is making a comeback – and it sounds very different this time. Once considered a legacy channel, voice is now being reinvented by artificial intelligence (AI), reshaping how retailers connect with customers in ways that are more intuitive, more human, and far more intelligent.
“Voice is experiencing a real resurgence,” says Christopher Connolly, director, solutions engineering, APAC, at Twilio. “It’s not just about picking up the phone anymore – it’s about orchestrating intelligent, multi-channel conversations that build loyalty and trust.”
For more than 17 years, Twilio has been at the forefront of voice technology, starting with a radical idea: Make the world’s most complex telecommunications infrastructure as simple as writing a few lines of code. Today, the same principle is powering the next generation of AI-driven voice experiences, enabling retailers to leverage automation, data, and generative AI to transform their customer service and engagement.
The new voice of retail
Conversational commerce has evolved beyond any single medium. Customers might start by speaking into a device, then continue via text, or follow up through video. “It’s media-agnostic,” Connolly says. “You might begin a voice conversation, switch to chat, and the context follows you. That seamless experience has only become possible because the technology – both speech and AI – has improved dramatically.”
What’s different today is not just that voice recognition is better, but that the machine on the other end can actually converse. Thanks to large language models and advanced speech synthesis, AI systems now respond with nuance and empathy, making interactions feel human rather than robotic. For short exchanges – checking an order status or finding a store location – many consumers wouldn’t know if they were speaking to a human or a well-trained AI.
Even when a person is involved, AI plays an invisible role behind the scenes. Speech analytics can assess sentiment, identify key topics, or flag moments of frustration – providing real-time and actionable insights to improve both service quality and customer experience.
Twilio’s voice evolution
When Twilio launched in 2008, telecommunications was still evolving toward the seamless connectivity we know today. Networks and standards varied across countries, meaning that global businesses often required custom engineering to send messages at scale.
Twilio changed that by introducing APIs that abstracted away the complexity – letting any developer send an SMS or make a call with a few lines of code. This “radical simplification” democratised communications and gave startups the ability to easily communicate with their customers.
That foundation enabled Twilio to pioneer Elastic SIP Trunking – the ability to make and receive calls over the internet, with automatic scaling of capacity. Businesses no longer had to pre-purchase phone circuits for unpredictable demand.
The same philosophy drives Twilio’s modern AI voice platform. Developers can now access best-in-class speech recognition, natural voice synthesis, and transcription through a single API – choosing providers like ElevenLabs or Deepgram with a configuration tweak. “We take care of the complexity, the latency, the scaling,” Connolly says. “All the developer has to think about is the experience.”
Orchestrating intelligent conversations
In retail, voice and AI are converging to create what Connolly calls “orchestrated experiences” – interactions that flow naturally between channels while feeling consistent and personal.
Orchestration means ensuring all the systems behind a customer interaction – from CRM and order management to messaging and call centres – work in harmony. “It’s like conducting an orchestra,” Connolly explains. “Every system plays its part, but the goal is a unified, frictionless experience.”
A well-orchestrated system can decide, for example, when to escalate from a chatbot to a call from a human agent, or when to follow up a browsing session with a personalised SMS offer. It can also predict which customers prefer which channel – voice, text, or social – and adjust accordingly.
Retailers are already using orchestration to tailor engagement based on purchase value and intent. For everyday items, automation handles most interactions; for higher-value purchases, such as electronics or luxury fashion, a personal call might seal the deal. “It’s about knowing when to bring in the human touch,” says Connolly. “And those are the moments when trust and loyalty are built.”
Generative AI and the shopping conversation
The promise of AI voice in retail is not just reactive service – it’s proactive, conversational commerce. Imagine asking a virtual assistant, “Show me red dresses under $200,” and having it respond in a natural voice, display options, suggest accessories, and check local stock availability.
The interaction can continue after purchase: “Where’s my order?” prompts a real-time delivery update. “Can I return it in-store?” brings a quick confirmation. The entire lifecycle – from discovery to purchase and support – can now occur seamlessly through voice, chat, or any channel the customer prefers.
Behind the scenes, Twilio’s APIs link AI models, inventory databases, and communication tools. Retailers don’t need to build AI engines from scratch – they simply orchestrate existing systems into one cohesive conversation.
The end of “creepy”
For years, retailers worried about over-personalisation – knowing too much about a customer’s behaviour felt invasive. But that’s changing fast, driven by younger consumers.
“Gen Z has completely flipped the script,” Connolly notes. “They expect you to know who they are and what they were just doing on your site. The old ‘creepiness factor’ is gone – now it’s about convenience.”
This shift is redefining trust. When an airline recognises a customer by their mobile number and skips the verification steps, it’s no longer seen as intrusive – it’s expected. “If I’ve called about a delayed order, just give me the update,” Connolly says. “Don’t make me verify my name three times. Reducing friction builds loyalty.”
What’s next: Richer, smarter, more human
Connolly points to rich communication services (RCS) – enhanced messaging that embeds video, carousels, and interactive elements directly into native phone apps – as one of the next evolutions of AI-driven customer engagement breakthroughs on the horizon.
“In North America, we’re seeing retailers get incredible returns from these messages,” he says. “It’s not just text – it’s visual, it’s interactive, and it’s backed by AI. That’s what’s coming to Australian devices next year.”
For all the advances in AI and automation, Connolly emphasises that the goal isn’t to replace human interaction – it’s to enhance it. “Voice is still the most natural way for humans to communicate,” he says.
“AI just helps us make those conversations smarter, faster, and more personal.”