Skip to content

    A Guide to Machine Learning Model Development and Production

    on November 1, 2021

    Machine learning is a subset of artificial intelligence (AI) that uses algorithms to learn from trends, data sets and certain behaviors. This process involves the development of machine learning models that can answer questions, predict future outcomes and solve organizational problems.

    What Are Machine Learning Models?

    A machine learning model is an algorithm that represents the patterns in a data set. While it does have similarities to a software app, those similarities quickly fade when you look at how an ML model is made and how it performs. An app is made by a team of software developers, who write instructions on what the app should do when a user interacts with it. Once the software is ready to be released, it will perform exactly as it was designed.

    Models, on the other hand, are designed to operate on their own, with no interaction from the user once it starts running. Entirely dependent on the quality and the structure of the data it receives, if the data changes, the model will no longer work as expected. The model contains an algorithm that was trained using an initial set of data, representing the relationship between the data and an outcome.  You can then use the model when you have a new set of data to predict a probable outcome. 

    As an example, an ML model can be trained to analyze data about how people like or dislike a series of photos. It can then be provided with a new set of photos to predict how people will react to each of them. As another example, an ML model can be provided with data on stock market trends and, when it’s exposed to current stock data, make predictions on whether share prices will go up or down. 

    Why Are Machine Learning Models Used?

    Machine Learning models are used to analyze data more efficiently than a human can. There are typically two scenarios where an ML model is used: when the analysis process is tedious or when a large volume of data makes it difficult for a human to identify patterns.

    In the first case, this is usually when you have to perform the same analysis repeatedly using different data, like a large retailer determining the prices of new products or a credit card company screening new applications. An ML model can help to automate the decision-making process.

    In the second case, ML models are useful when you simply have too much data for a human to analyze efficiently without some help, like weather forecasting or predicting hospital bed usage rates during an epidemic. Of course, each industry has its own use cases for machine learning models, some of which include:

    • Financial Services: Models are used to identify insights in data that can reveal investment opportunities, identify borrowers with high-risk profiles, and recognize and prevent fraud.
    • Health Life Sciences: ML models are used to analyze data to identify trends, to improve diagnosis and treatment. 
    • Retail: Retailers use ML to recommend items based on previous purchases, to specify what ads should be placed in front of specific groups of prospects and to predict how customers will react to ads.

    How Does Modeling Work?

    Before you can create a machine learning model, you will need to precisely define the problem you want it to solve and acquire the data you need to solve that problem. If your problem statement, for example, is to predict a company’s sales for the next quarter, you will need data on previous sales. With this information in hand, you can determine which algorithm is most likely to answer your question. 

    In machine learning, there are essentially two families of techniques you can use to analyze the data. If the answer you want to solve is somewhere in the data, you can use supervised learning, which can sort the dependent variables from the independent variables in that data. If the answers are not in your data, then you will likely want to use an unsupervised learning technique. 

    So, if you want to recommend additional items for a shopper to add to an online shopping cart, you would likely use unsupervised learning. If you want to determine if a loan applicant is likely to default on their payments, a supervised learning technique would be the better choice. Some models can use algorithms based on either learning techniques, such as identifying identity theft from financial transactions, or determining what the housing market will look like next year. 

    How-Does-Modeling-Work

    Machine Learning Algorithms

    Choosing the right algorithm to use for your model is seldom an easy choice. In fact, data science teams will usually try several algorithms before deciding which is the best candidate to be tested. While the number of available algorithms in any given library is huge, some of the most common types include:

    Regression

    There are two primary forms of regression: linear and logistic. Linear regression is used to show a linear relationship between two variables. Logistic regression is used to predict values by estimating the connections in a data set. They are designed to answer questions like “how much?” or “how many?” 

    Cluster Analysis

    This is an example of an unsupervised learning technique that arranges data in groups or clusters. It looks for similarities between data points to arrange them in groups. This ensures that similar groups are clustered together while each group is as dissimilar to the others as possible.

    Decision Trees

    This type of algorithm uses a tree representation of possible outcomes, or decisions that an organization can use to determine the best path forward for solving certain problems.

     

    The Machine Learning Model Process

    As a branch of data science, it should be no surprise that the machine learning model follows the data science lifecycle: manage, develop, deploy and monitor.

    Model Management

    This is where the need for a new ML model is determined, and its problem statement is defined. In most cases, model management includes not just members of the data science team but also management and members of the organization’s business unit. 

    Here, roles for each subsequent step are assigned, and the specific steps required are determined. These include, for example, who will develop the model, who will deploy it, and what the model’s key performance indicators will be when it’s ready to be deployed in a user environment. 

    Model-driven organizations always revisit the model management phase to evaluate the success or failures during the process, to determine additional use cases for a successful model and to improve its process for the next project.

    Model Development

    During this stage, the data requirements are determined. If the data is not readily available, it must be acquired. Then, the training data set is cleaned, removing unnecessary variables, duplicate variables and ensuring tags, names and labels are all standardized.

    Then, the available data is split into training and testing data. A method should be determined for testing each algorithm and hyperparameter in a systematic and measurable way, such as a cross-validation technique. 

    With the problem statement defined and the data cleaned, the model can now be developed. The data science team will select algorithms to try and define the hyperparameters the model must use, like what the regression penalty should be in the face of an overabundance of variables. 

    Once each model has been trained and assessed, the best performing algorithms are then tested on a fresh data set. The results of the tests will determine which model should be used or if additional modifications to the models are needed. 

    Model Deployment

    At this stage, the data scientists can hand off the model to engineers and developers who will deploy the model in its working environment. The engineers deploy the model as an API endpoint, or develop an app so that the end-user can access it in a user-friendly way.

    Only a well-trained and successfully tested model should ever go to deployment. A data science team may need to work with dozens, or hundreds, of different models and parameters to get one that will work reliably. 

    Model Monitoring

    Like any growing child learning to interact with the world around it, all machine learning models need to be monitored after they have been deployed. Each time the model receives new data, it will make predictions accordingly.  However, if the incoming data changes, the model may provide biased outputs as it may not be trained to handle the new data. When this happens, the model needs to be updated or replaced with a newly trained version. 

    Organizations rolling out ML models at scale use automated systems for monitoring all of their models simultaneously, like Domino Model Ops. An integral part of Domino’s Enterprise MLOps Platform, this can be scheduled to run automated tests and to alert the data science team immediately should a problem arise. 

    Machine Learning Models in Action

    Today, machine learning models surround us. Business leaders use them to improve profitability, manage resources and order inventory. A growing number of vehicles use them to assist driving and make traveling safer. Many are behind the scenes of our online interactions, nearly invisible, helping us make shopping decisions, enjoy better videos, or helping to diagnose our health problems and prescribe treatments. 

    For the model-driven companies using a systematic process rooted in data science procedures, costs continue to go down while ROI increases at scale. In February 2021, Lockheed Martin announced that it had realized over $20 million in annual value by scaling its AI/ML solutions using the Domino Data Science Platform. This is by no means an anomaly or outlier. In fact, a recent Forrester study has determined that companies can realize a 542% return on investment by using Domino’s Data Science Platform. 

     

    Model Development With Domino Data Lab

    Today, you no longer need to staff your data science team with PhDs to develop sustainable and profitable ML models. However, you do need the knowledge, tools, libraries and resources that are fundamental to building ML models at scale in an enterprise environment. 

    To find out why a growing number of Fortune 500 companies are adopting Domino’s Data Science Platform, watch a demo of it in action. You can also start exploring its features for yourself with a free trial.

     

    David Weedmark is a published author who has worked as a project manager, software developer and as a network security consultant.

    Other posts you might be interested in