Machine learning: Accelerating your model deployment

10 February, 2021

Mark McQuade

Link Copied!

Machine learning challenges

Among the most difficult aspects of machine learning is the process of operationalizing developed ML models that accurately and rapidly generate insights to serve your business needs. You’ve probably experienced some of the most prominent hurdles, such as:

Inefficient coordination in lifecycle management between operations teams and ML engineers. According to Gartner, 60% of models don’t make it to production due to this disconnect.
A high degree of model sprawl, which is a complex situation where multiple models are run simultaneously across different environments, with different datasets and hyperparameters. Keeping track of all these models and their associatives can be challenging.
Models may be developed quickly, but the process of deployment can often take months — limiting time to value. Organizations lack defined frameworks for data preparation, model training, deployment and monitoring, along with strong governance and security controls.
The DevOps model for application development doesn’t work with ML models. The standardized linear approach is made redundant by the need for retraining across a model lifecycle with fresh datasets, as data ages and becomes less usable.

The ML model lifecycle is fairly complex, starting with data ingestion, transformation and validation so that it fits the needs of the initiative. A model is then developed and validated, followed by training. Depending on the length of development time, you may need to repeatedly perform training as a model moves across development, testing and deployment environments. After training, the model is then set into production, where it begins serving business objectives. Through this stage, the model’s performance is logged and monitored to ensure suitability.

Rapidly Build Models with Amazon SageMaker

Among the available tools to help you accelerate this process is Amazon SageMaker. This ML platform from Amazon Web Services (AWS) offers a more comprehensive set of capabilities towards rapidly developing, training and running your ML models in the cloud or at the edge. The Amazon SageMaker stack comes packaged with models for AI services such as computer vision, speech and recommendation engine capabilities, as well as models for ML services that help you deploy deep learning capabilities. It also supports leading ML frameworks, interfaces and infrastructure options.

But employing the right toolsets is only half the story. Significant improvements in ML model deployment can only be achieved when you also consider improving the efficiency of lifecycle management across the teams that work on them. Different teams across organizations prefer different sets of tooling and frameworks, which can introduce lag through a model lifecycle. An open and modular solution — agnostic of the platform, tooling or ML framework — allows for easy tailoring and integration into proven AWS solutions. A solution such as this will allow your teams to use the tools they are comfortable with.

That’s where the Rackspace Technology Model Factory Framework comes in, by providing a CI/CD pipeline for your models that makes them easier to deploy and track.

Let’s take a closer look at exactly how it improves efficiency and speed across model development, deployment, monitoring and governance, to accelerate getting ML models into production.

End-to-end ML blueprint

When in development, ML models flow from data science teams to operational teams. As previously noted, preferential variances across these teams can introduce a large amount of lag in the absence of standardization.

The Rackspace Technology Model Factory Framework provides a model lifecycle management solution in the form of a modular architectural pattern, built using open source tools that are platform, tooling and framework agnostic. It is designed to improve the collaboration between your data scientists and operations teams so they can rapidly develop models, automate packaging and deploy to multiple environments.

The framework allows integration with AWS services and industry-standard automation tools such as Jenkins, Airflow and Kubeflow. It supports a variety of frameworks such as TensorFlow, scikit-learn, Spark ML, spaCy, and PyTorch, and it can be deployed into different hosting platforms such as Kubernetes or Amazon SageMaker.

Benefits of the Rackspace Technology model factory framework

The Rackspace Technology Model Factory Framework affords large gains in efficiency, cutting the ML lifecycle from an average of 15 or more steps to as few as five. Employing a single source of truth for management, it also automates the handoff process across teams, simplifies maintenance, and troubleshooting.

From the perspective of data scientists, the Model Factory Framework makes their code standardized and reproducible across environments, and it enables experiment and training tracking. It can also result in up to 60% of compute cost savings through scripted access to spot instance training. For operations teams, the framework offers built-in tools for diagnostics, performance monitoring and model drift mitigation. It also offers a model registry to track models’ versions over time. Overall, this helps your organization improve its model deployment time and reduce effort, accelerating time to business insights and ROI.

Solution overview – from development and deployment, to monitoring and governance

The Model Factory Framework employs a curated list of Notebook templates and proprietary domain-specific languages, simplifying onboarding, reproduction across environments, tracking experiments, tuning hyperparameters and consistently packaging models and code agnostic to the domain.

Once packaged, the framework can execute the end-to-end pipeline which will run the pre-processing, feature engineering and training jobs, log generated metrics and artifacts, and deploy the model across multiple environments.

Development: The Model Factory Framework supports multiple avenues of development. Users can either develop locally, integrate with Notebooks Server using Integrated Development Environments (IDEs) or use SageMaker Notebooks. They may even utilize automated environment deployment using AWS tooling such as AWS CodeStar.
Deployment: Multiple platform backends are supported for the same model code and models can be deployed to Amazon SageMaker, Amazon EMR, Amazon ECS and Amazon EKS. Revision histories are tracked, including artifacts and notebooks with real-time batch and streaming inference pipelines.
Monitoring: Model requests and responses are monitored for detailed analysis which enables the ability to address model and data drift.
Governance: Data and model artifacts are clearly separated and access can be controlled using AWS IAM and bucket policies that control model feature stores, models and associated pipeline artifacts. The framework also supports rule-based access control through Amazon Cognito, traceability with Data Version Control, and auditing and accounting through extensive tagging.

Using a combination of proven accelerators, AWS native tools and the Model Factory Framework, companies can experience significant acceleration in model development automation, reducing lag and effort and experiencing improvements in time to insights and ROI.

If your organization is interested in utilizing the Model Factory Framework to simplify and accelerate your ML use cases, visit our AI and ML pages for further info, including customer stories, details of supported platforms and other helpful resources.

Tags:

Cloud Insights

Learn how

Machine learning: Accelerating your model deployment

Recent Posts

Related Posts

Machine learning challenges

Rapidly Build Models with Amazon SageMaker

End-to-end ML blueprint

Benefits of the Rackspace Technology model factory framework

Solution overview – from development and deployment, to monitoring and governance

Use your data to transform your customer experiences.