How to Evaluate the Performance of a Speech-to-Text system

Neon light speech bubble with the word ‘hello’ inside
Neon light speech bubble with the word ‘hello’ inside
Photo by Adam Solomon on Unsplash

Speech-to-Text (STT) is the task of converting audio to words. An automatic STT system will often return not just its best guess as to the words spoken, but additional information like timestamps and confidence measures. To judge how well an STT system performs means measuring the mistakes it makes. STT systems make three kinds of mistakes when comparing the output of automatic transcription to a reference transcription :-

  1. They mistake (or substitute) one word for another

Beyond blue, our sky takes on many spectacular hues

White clouds against a blue sky
White clouds against a blue sky
“sky” by LorenzWorkz is licensed under CC BY 2.0

The colours of the sky are caused by a complex interplay between our sun and our planet’s atmosphere. Artists have long painted the sky in all its guises, and storytellers have often taken inspiration from it. Ancient Greeks like Plato and Aristotle were the first to write their theories about colour and its relation to meteorological phenomena. Yet, it took scientists many centuries to unravel the science behind the colours we see in the sky.


High quality data is key for building useful machine learning models

Prague Library, with plenty of data stored inside its books — Image by izoca from Pixabay

Machine learning models learn their behaviour from data. So, finding the right data is a big part of the work to build machine learning into your products.

Exactly how much data you need depends on what you’re doing and your starting point. There are techniques like transfer learning to reduce the amount of data you need. Or, for some tasks, pre-trained models are available. Still, if you want to build something more than a proof-of-concept, you’ll eventually need data of your own to do so.

That data has to be representative of the machine learning task, and its collection is…

Words carry meaning, but there’s much more to spoken language

Photo #WOCinTechChat

Since the launch of Alexa, Siri, and Google Assistant, we’re all becoming much more used to talking to our devices. Beyond these virtual assistants, voice technology and conversational AI have increased in popularity over the last decade and are used in many applications.

One use of Natural Language Processing (NLP) technology is to analyse and gain insight from the written transcripts of audio— whether from voice assistants or from other scenarios like meetings, interviews, call centres, lectures or TV shows. Yet when we speak, things are more complicated than a simple text transcription suggests. …

Using Python libraries to visualise audio

Photo credit: #WOCinTechChat

There are several Python libraries available that make it very easy to view waveforms in different ways. In this post, I’ll go through some of the ways to get started.

To begin, import the key libraries:

import matplotlib.pyplot as plt
import numpy as np
import pysptk
from import wavfile

For the audio, I used Audacity to record the short phrase “What’s today’s weather?” as a wave file. I’ll use scipy to read in that wave file, and matplotlib to visualise it.

fs, x =
y = np.linspace(0,len(x)/float(fs), len(x))
ya = np.max(np.absolute(x))
plt.plot(y, x, color="#004225")
plt.xlabel("Time (seconds)")
plt.ylim(-ya, ya)
plt.xlim(0, y[-1])…

We can’t avoid bringing our own perspective to the products we build

Writing computer code at a desk
Writing computer code at a desk
Writing computer code — Image by Free-Photos from Pixabay

The majority of folks who build technology don’t intend to be biased. Yet we all have our own unique perspective on the world, and we can’t help but bring that into our work. We make decisions based on our views and our experiences. Those decisions may each seem small in isolation, but they accumulate. And, as a result, technology often reflects the views of those who build it.

Here are a few of the places I’ve seen where bias creeps into technology.

The datasets we construct

With the recent success of machine learning (ML) and AI algorithms, data is becoming increasingly important. ML algorithms…

During my time working at Amazon Alexa, I designed and ran an Alexa Skills workshop to encourage teenagers into coding. This workshop can be adapted to all levels and ages — our youngest participant was 12 — and all you need is a computer with a browser and an internet connection. When you finish, you’ll have an Alexa skill which can tell you facts about famous folk along with some of their quotes.

NB: A more in-depth instruction booklet is available here, from a collaboration with Feminist Internet, which taught undergraduate students to build a skill about famous feminists.



Companies are investing millions of pounds to develop Artificial Intelligence (AI) technology. Many people use that AI technology daily to make their lives easier. But search Google for images of “Artificial Intelligence”, and you’ll be faced with a sea of glowing, bright blue, connected brains. The imagery used to illustrate AI is a far cry from the more mundane reality of how AI looks to its users, where it powers services like navigation, voice assistants and face recognition. The disconnect makes it hard to grasp the reality of AI.

Artificial Intelligence (AI) is a fast growing field. The 2018 AI Index report illustrates just how fast it is growing. It reports that published research papers in AI have increased 7x since 1996, university enrolment on AI courses has increased 5x since 2012, investment in AI startups in the US has increased 113% since 2012 and mentions of AI and machine learning (ML) in the earnings calls of tech companies have increased more than 100x since 2012. These statistics show how AI is growing not just in academia, but the technology is rapidly being adopted by businesses and becoming commercialised.

In the past few years, machine learning (ML) has become commercially successful and AI firmly established as a field. With its success, more attention is being paid specifically to the gender gap in AI. Compared to the general population, men are overrepresented in technology. While this has been the case for several decades, the opposite was true in the early days of computing when programming was considered a woman’s job.

Diversity has been shown to lead to good business outcomes like improved revenue. …

Catherine Breslin

Machine Learning scientist & consultant :: voice and language tech :: powered by coffee ::

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store