What does a Machine Learning Scientist do in industry?

Building models and much more

Catherine Breslin
5 min readFeb 20, 2022

Machine learning scientist is a role in high demand and companies who want to build ML technology are having trouble filling all their vacancies. Yet, in-depth descriptions of what the job entails are hard to come by. Candidates new to industry don’t always know what they can offer with their ML skills, and businesses don’t know what to expect of their new hires.

As with all new types of role, there’s not a clear definition that everyone will recognise. Job titles vary from place to place, while job adverts are aimed at selling the role to potential candidates and so often focus on the most exciting parts of the job. At the core, a machine learning scientist – sometimes ML Engineer or Researcher – builds on machine learning research to create products. As a scientist, they are the expert on how machine learning works and what are its strengths, limitations & biases. This knowledge can be brought to bear in tackling business problems.

But, what are the things that an ML Scientist do as part of their job, and how can they make impact at a company?

Problem definition & company strategy

The first place that a scientist can influence is in the very early stages of a product’s life – defining what their company can reasonably build with machine learning, and what problems should be tackled another way. Some problems are well suited to ML and some aren’t. Some ML models might be useful but only if they’re integrated into a user interface that explicitly works around their limitations. Some problems shouldn’t even be tackled at all! It’s often not obvious to non-experts what ought to be tackled with ML, and here is where ML Scientists can advise. Their expertise is invaluable in narrowing down options, figuring out what the company should build, identifying how to execute, and from that translating company goals into ML goals.

Data

Once there’s an idea about what the company will build, the first step is usually finding the right data to get started. In academia, data often isn’t a problem because there are publicly available datasets and shared benchmarks. For a commercial problem though, it’s a stroke of luck if a public dataset is both well matched to your task and has a commercial license. Acquiring data, storing it, labelling and cleaning are all necessary steps for any ML scientist before being able to build a model.

In the longer term, data from product users is critical for ongoing monitoring and improvement. Working with other teams to set up the right mechanisms for ongoing collection of data is crucial. ML expertise is key here because the data and models are so tightly coupled.

Modelling

Probably the one part of the job that everyone can agree on is that ML scientists should build and evaluate ML models. This involves trying out different model architectures, tuning them, taking inspiration from published research, and generally experimenting to find the best setup for their company’s specific task.

Evaluation is a key part of building ML models. Standard ML metrics like error rate are one part of evaluation, but are often a few steps removed from any business impact. A particular classifier for customer support may have a very respectable 93% accuracy, but that’s not necessarily impactful if it only can recognise 20% of the things that customers are asking about. Working with other teams to define relevant business metrics can better quantify the impact of machine learning.

Once models are performing well and evaluations show it’s clear that they can benefit your product, ML scientists should work to deploy them. Unless the company is very small, it’s usual that a software engineering team will maintain the actual runtime systems which use the deployed ML models.

Keep on top of research

To productionise ML research needs ML scientists to keep one eye on the research world and relevant publications. Scientists can identify trends in research, learn about new models which are likely to benefit the company, and know about new datasets & tools. This knowledge lets ML scientists have an informed view about how the company’s technology should evolve over time.

Project planning

With knowledge about how to build ML models, scientists have input into planning. Estimates for how long things will take, how likely they are to succeed, the risks, and what impact they’ll have are really important inputs for making company-wide decisions. It’s unlikely that other folks within the company will have the same in-depth knowledge about how to effectively plan and execute ML work, and so it’s crucial that ML scientists contribute here.

Software engineering

ML science requires a good level of software engineering skill. In many companies, software engineering often takes more of scientists’ time than anticipated.

First, scientists must work with software engineering teams to ensure that their models are effectively integrated into the company’s products and properly monitored. This often requires in-depth discussions and a good technical knowledge of how all teams operate to be successful. Sometimes compromises have to be made, and ML scientists should be comfortable with understanding those compromises from both the ML and the software engineering perspective.

Second, ML scientists have to maintain their own infrastructure for building and evaluating their models & pipelines. As teams get large and company products mature, this often becomes a substantial piece of work. Ensuring that experiments and model building are reproducible, automated as much as possible, and able to be built on by others in the team requires a solid software foundation for all but the simplest of ML models.

Maintenance

ML products are no different from other software products in the number of bugs reported after a product is launched! Bug fixing and tracking down issues with deployed models can take time from ML scientists, who are usually the ones best able to understand and debug issues with the models they’ve built.

Communication

Machine learning and AI have been around a while, but their widespread adoption is relatively new. Many people in non-ML roles don’t yet intuitively understand the nuances of machine learning. For this reason, ML scientists often have to act as evangelists for their team, translating their work to be widely understandable, and carefully explaining its impact.

On top of this, ML scientists have to document and communicate their work in detail so that other ML scientists can build off their findings. That might be outside the company, as research papers, or internal documentation for others on their team. Wherever it gets documented, it can take time for scientists to understand a set of experiments, pull out their insight and craft a narrative.

This post covers a broad range of things that ML scientists do and where they can bring value to their companies. Though it’s with noting that not all of these are necessarily done by all ML scientists. The bigger the team & company, and the more mature the product, the more specialised the people on the ML team can be.

Let me know how this ties in with your experience of ML and if you think there’s anything missing!

--

--

Catherine Breslin

Machine Learning scientist & consultant :: voice and language tech :: powered by coffee :: www.catherinebreslin.co.uk