WritingScalewayScalewaypublished Dec 12, 2023seen 5d

Empower, or how to enrich customer communication with conversational AI - guest post by Ringover

Open original ↗

Captured source

source ↗

Empower, or how to enrich customer communication with conversational AI - guest post by Ringover Deploy • Nicolas Jje • 12/12/23 • 6 min read

The year 2023 has been a turning point in the democratization of artificial intelligence, especially when it comes to the use of generative AI among consumers and businesses.

Now AI is no longer just limited to companies’ technical departments. It’s the focus of everyone's attention–including top management across industries–as witnessed by the buzz around the ai-PULSE event organized by Scaleway last month.

As a French scale-up specialized in voice services, Ringover was delighted to be one of the innovative exhibitors at ai-PULSE, notably as we were able to tell the story of our conversational analysis solution Empower, which we launched last spring.

Empower offers transcription and mood analysis functionalities which enhance understanding of end-customer needs and optimize managerial and operational processes.

So, how did Ringover build this conversational AI tool?

1. The decision to integrate AI

We began developing Empower in 2022, when we decided to incorporate artificial intelligence into our cloud telephony solution through an advanced transcription tool, powered by technology developed in-house.

To develop it, we first began a data collection phase.

We used this dataset to feed our neural networks using PyTorch tools.

Then, of course, the data had to be sorted, organized, and cleaned to guarantee a qualitative result, before being annotated. This annotation consists in marking out the terrain for the model by labeling the data. It ensures that data tokenization (segmentation into smaller sequences) and normalization run smoothly. This initial pre-processing phase makes a major contribution to facilitating model learning.

We then defined the neural architecture of our model, a structure in which each artificial neuron plays a key role in understanding the input and generating the output. Next, we calculated the loss function, i.e. the margin of error between model predictions and actual values.

Finally, we began the back-propagation process through which we weigh the results, adjusting the model's internal parameters to reduce this margin of error as much as possible. The smaller this loss function, the better the model's predictions are likely to be, and the better its performance.

After a few months, we extended the capabilities of our transcription tool with some of the features described above to make it a full-fledged conversational analysis solution.

This first version took three months to develop, and we launched it internally in February 2023. A handful of Ringover customers participated in the testing phase.

The technical team took into account the feedback from all the testers to fine-tune the learning manually, i.e. to adapt it to the tasks we had defined for our model and, ultimately, to improve contextual understanding for each transcript.

Two months later, at the end of April, we officially launched Empower. We then changed the software architecture to microservices. In addition to simplifying the development of the application, this switch had two objectives:

to easily allow us to make the improvements suggested by testers

to make the software more stable, reliable, and scalable.

Since then, we've continued development, applying a few patches and continually adding new features.

2. Model optimization

Contextual understanding is one of the major challenges of conversational AI solutions. Having pre-trained models can save time, but in any case, the fine-tuning phase is essential to a high-quality solution.

To this end, we completed our dataset and injected a total of no less than 2000 hours of audio conversations to improve the quality of ASR (Automatic Speech Recognition). The other challenge common to all transcription engines, which we had to overcome, concerns proper nouns.

The "name game" can give the best artificial intelligence models a hard time. And for good reason: systems are not always able to break down names and identify their structure, not to mention special features such as surnames containing special characters. To limit these side effects, we are developing two areas of improvement:

The first consists of injecting better quality (internal) training data with the right dosage to avoid output degradation.

The second is to use user names to reduce the gap between model predictions and actual values.

3. The technical foundations behind the other functionalities

The conversational analysis solution doesn't just use AI for transcription. The tool is also capable of performing other analytical and generative tasks, such as mood analysis, call summaries and translation.

Sentiment analysis for mood analysis

Mood analysis is possible thanks to sentiment analysis performed through text. For the time being, we are using a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model, because it offers better performance. However, we are already working on a model of our own, developed by our teams, with the aim of processing the speech signal in addition to the text signal. This will further improve the accuracy of mood analysis.

OpenAI for call summaries

We use in-house technology to extract highlights from the transcript text. These highlights can be accessed from Empower's "Recap" section.

Next, OpenAI comes into play. Our team issues a customized prompt to ChatGPT through the Open AI API, so that it can generate a summary from the highlights extracted beforehand.

Example prompt:

_Summarize the text below, highlighting the most important points in the form of a bulleted list.

Text: """

{text input}

"""_

Deepl for translation

All Empower functions are available in three languages: French, Spanish, and English. Summaries, transcripts, and highlights can be translated at the click of a button, which is very useful in multilingual environments. The Deepl API supports this function.

4. Why PyTorch and not another framework?

The choice of PyTorch as a framework was a natural one. The various team members had already worked with this user-friendly framework. Its Python-like syntax facilitates experimentation and debugging, giving our developers great flexibility and freedom in building models.

What's more, research into PyTorch is very active, so we benefit from regular updates. These are the reasons why we chose…

Excerpt shown — open the source for the full document.