This article is part one in a month-long series aimed at learning and exercising the RingCentral APIs in Python as part of their new Game Changers challenge. Feel free to follow along, leave a comment, or even participate in the challenge yourself!
Yesterday, we talked a great deal about the MonkeyLearn API and how it can be used for topic extraction or sentiment analysis of text-based messages. Unfortunately, this text limitation means we can’t directly leverage MonkeyLearn to categorize and filter incoming voice messages. We have to convert the audio into text first!
Once upon a time, I leveraged the services of Rev.com to translate some personal documents from English into another language so I could file them with the government of another country. It was fast, easy, and relatively inexpensive to use their platform to schedule the translation job. Recently, the team has expanded their services and now provides audio to text “translation” through the Rev.ai platform.
Once again, the very first step is to create a free account on Rev.ai. According to the platform’s pricing, the first 5 hours of transcription are free and no credit card is required to sign up, so let’s get started. Once you sign up, the very first thing you can do is upload a file to try a direct transcription.
As I want a better idea of how the system works, let’s give it a try using the audio from a recent YouTube video I posted about CoderCruise. It’s only a 3 minute video, and the extracted audio file is just about 3MB in size. It’s a great quick test, though I truly hope my customers never leave voice messages this long. Uploading the file takes you immediately to the Recent Jobs screen where you can see the job is processing. After a few minutes, it will complete and the transcript is available, either in text or JSON format.
It’s not perfect, but it’s a close enough speech-to-text solution that we can then pipe it through our AI classifier and gain some insight on the contents without needing to listen to the audio itself.
Wiring the API
Just like before, we’ll be leveraging Python for our API interactions. The first step to leveraging the Rev.ai API is to install it:
pip install rev_ai
Once the API is installed, we can submit a job programmatically, just like we did through the web console: 1
from rev_ai import apiclient access_token = 'your_access_token' # Create client with your access token client = apiclient.RevAiAPIClient(access_token) file_job = client.submit_job_local_file(filename="/home/ericmann/codercruise.m4a", metadata="CoderCruise 2019 testimonial", skip_diarization=False)
The output of this operation is, again, a job handle with an in-process status. We can further leverage the API to query for the job status, polling until it completes, or leverage a callback URL so the API will tell us when it’s done. 2
For the moment, it’s enough to refresh the Recent Jobs web console and see the new job queued up for processing. Once it’s done, we can retrieve either the text or JSON transcript of the audio with another API call:
import json from rev_ai import apiclient access_token = 'your_access_token' job_id = 'your_job_id' # Create client with your access token client = apiclient.RevAiAPIClient(access_token) # Get transcript as text transcript_text = client.get_transcript_text(job_id) print(transcript_text)
Now the various building blocks of our virtual voicemail assistant are starting to take shape. The assistant can receive voice messages from customers. We can convert those messages to text. An AI platform can automatically categorize those messages for us. The next step in the process will be to act on those categorizations as they come in!
- There is a capability to submit hosted files directly. I’m not standing up a web server to host this audio for now, but moving forward we can have Rev.ai download our voice messages directly from RingCentral and avoid storing the raw audio anywhere else. ↩
- Our full voicemail assistant will leverage a callback URL so we can directly link the transcript output from Rev.ai to our classifier with MonkeyLearn. ↩