Machine learning: Accelerating your model deployment
Mark McQuade
Business models rely on data to drive decisions and make projections for future growth and performance. Traditionally, business analytics has been reactive — guiding decisions in response to past performance. But today’s leading companies are turning to machine learning (ML) and AI to harness their data for predictive analytics. This shift, however, comes with significant challenges.
According to IDC, almost 30% of AI and ML initiatives fail. The primary culprits behind this failure are poor-quality data, low experience and challenging operationalization. They also require a lot of time to maintain, since you need to repeatedly train ML models with fresh data through the development cycle, due to data quality degradation over time.
Let’s explore the challenges presented when developing ML models and how the Rackspace Technology Model Factory Framework simplifies and accelerates the process — so you can overcome these challenges.
Machine learning challenges
Among the most difficult aspects of machine learning is the process of operationalizing developed ML models that accurately and rapidly generate insights to serve your business needs. You’ve probably experienced some of the most prominent hurdles, such as:
- Inefficient coordination in lifecycle management between operations teams and ML engineers. According to Gartner, 60% of models don’t make it to production due to this disconnect.
- A high degree of model sprawl, which is a complex situation where multiple models are run simultaneously across different environments, with different datasets and hyperparameters. Keeping track of all these models and their associatives can be challenging.
- Models may be developed quickly, but the process of deployment can often take months — limiting time to value. Organizations lack defined frameworks for data preparation, model training, deployment and monitoring, along with strong governance and security controls.
- The DevOps model for application development doesn’t work with ML models. The standardized linear approach is made redundant by the need for retraining across a model lifecycle with fresh datasets, as data ages and becomes less usable.
The ML model lifecycle is fairly complex, starting with data ingestion, transformation and validation so that it fits the needs of the initiative. A model is then developed and validated, followed by training. Depending on the length of development time, you may need to repeatedly perform training as a model moves across development, testing and deployment environments. After training, the model is then set into production, where it begins serving business objectives. Through this stage, the model’s performance is logged and monitored to ensure suitability.
Rapidly Build Models with Amazon SageMaker
Among the available tools to help you accelerate this process is Amazon SageMaker. This ML platform from Amazon Web Services (AWS) offers a more comprehensive set of capabilities towards rapidly developing, training and running your ML models in the cloud or at the edge. The Amazon SageMaker stack comes packaged with models for AI services such as computer vision, speech and recommendation engine capabilities, as well as models for ML services that help you deploy deep learning capabilities. It also supports leading ML frameworks, interfaces and infrastructure options.
But employing the right toolsets is only half the story. Significant improvements in ML model deployment can only be achieved when you also consider improving the efficiency of lifecycle management across the teams that work on them. Different teams across organizations prefer different sets of tooling and frameworks, which can introduce lag through a model lifecycle. An open and modular solution — agnostic of the platform, tooling or ML framework — allows for easy tailoring and integration into proven AWS solutions. A solution such as this will allow your teams to use the tools they are comfortable with.
That’s where the Rackspace Technology Model Factory Framework comes in, by providing a CI/CD pipeline for your models that makes them easier to deploy and track.
Let’s take a closer look at exactly how it improves efficiency and speed across model development, deployment, monitoring and governance, to accelerate getting ML models into production.
End-to-end ML blueprint
When in development, ML models flow from data science teams to operational teams. As previously noted, preferential variances across these teams can introduce a large amount of lag in the absence of standardization.
The Rackspace Technology Model Factory Framework provides a model lifecycle management solution in the form of a modular architectural pattern, built using open source tools that are platform, tooling and framework agnostic. It is designed to improve the collaboration between your data scientists and operations teams so they can rapidly develop models, automate packaging and deploy to multiple environments.
The framework allows integration with AWS services and industry-standard automation tools such as Jenkins, Airflow and Kubeflow. It supports a variety of frameworks such as TensorFlow, scikit-learn, Spark ML, spaCy, and PyTorch, and it can be deployed into different hosting platforms such as Kubernetes or Amazon SageMaker.
Benefits of the Rackspace Technology model factory framework
The Rackspace Technology Model Factory Framework affords large gains in efficiency, cutting the ML lifecycle from an average of 15 or more steps to as few as five. Employing a single source of truth for management, it also automates the handoff process across teams, simplifies maintenance, and troubleshooting.
From the perspective of data scientists, the Model Factory Framework makes their code standardized and reproducible across environments, and it enables experiment and training tracking. It can also result in up to 60% of compute cost savings through scripted access to spot instance training. For operations teams, the framework offers built-in tools for diagnostics, performance monitoring and model drift mitigation. It also offers a model registry to track models’ versions over time. Overall, this helps your organization improve its model deployment time and reduce effort, accelerating time to business insights and ROI.
Solution overview – from development and deployment, to monitoring and governance
The Model Factory Framework employs a curated list of Notebook templates and proprietary domain-specific languages, simplifying onboarding, reproduction across environments, tracking experiments, tuning hyperparameters and consistently packaging models and code agnostic to the domain.
Once packaged, the framework can execute the end-to-end pipeline which will run the pre-processing, feature engineering and training jobs, log generated metrics and artifacts, and deploy the model across multiple environments.
- Development: The Model Factory Framework supports multiple avenues of development. Users can either develop locally, integrate with Notebooks Server using Integrated Development Environments (IDEs) or use SageMaker Notebooks. They may even utilize automated environment deployment using AWS tooling such as AWS CodeStar.
- Deployment: Multiple platform backends are supported for the same model code and models can be deployed to Amazon SageMaker, Amazon EMR, Amazon ECS and Amazon EKS. Revision histories are tracked, including artifacts and notebooks with real-time batch and streaming inference pipelines.
- Monitoring: Model requests and responses are monitored for detailed analysis which enables the ability to address model and data drift.
- Governance: Data and model artifacts are clearly separated and access can be controlled using AWS IAM and bucket policies that control model feature stores, models and associated pipeline artifacts. The framework also supports rule-based access control through Amazon Cognito, traceability with Data Version Control, and auditing and accounting through extensive tagging.
Using a combination of proven accelerators, AWS native tools and the Model Factory Framework, companies can experience significant acceleration in model development automation, reducing lag and effort and experiencing improvements in time to insights and ROI.
If your organization is interested in utilizing the Model Factory Framework to simplify and accelerate your ML use cases, visit our AI and ML pages for further info, including customer stories, details of supported platforms and other helpful resources.
Recent Posts
Convera Reimagines its Customer Service with AI: Highlights from the AWS re:Invent Presentation
January 20th, 2025
How to Manage and Optimize the Chaos of Cloud Operations
January 17th, 2025
Three Benefits of Embracing a FinOps Approach to Cloud Management
December 16th, 2024
How Are You Planning to Strengthen Cybersecurity in 2025?
December 11th, 2024
Data sovereignty: Keeping your bytes in the right place
December 6th, 2024