The SageMaker Saga

Many data scientists develop, train, and deploy ML models within a hosted environment. Regrettably for them, they do not have the convenience and facility for scaling up or scaling down resources as and when required based on their models.

This is where AWS SageMaker comes into picture! It solves the issue by facilitating developers to build and train models in order to get faster production with bare minimum efforts at an economical cost.

But first…what is AWS you ask?

Photo by Hello I'm Nik on Unsplash

Amazon Web Services (AWS) is a widely adopted, world’s most comprehensive on-demand cloud platform by, you guessed it...Amazon, offering over 200 fully featured services from data centres from around the world. AWS services can be used to build, monitor, and deploy any application type in the cloud enabling millions of people and businesses, including the fastest-growing start-ups, leading government agencies and largest enterprises to lower costs, innovate faster and become more agile. Providing a massive global cloud infrastructure, AWS allows you to quickly innovate, iterate and experiment. With proven operational expertise, flexibility to choose the services you need and way more functionality and features than any other cloud provider, AWS lets you focus on innovation not just infrastructure. As a language and OS agnostic platform providing an unmatched experience, AWS provides a highly secure, scalable and reliable low-cost infrastructure platform in the cloud that powers hundreds of thousands of businesses and millions of customers in over 190 countries around the world. Today AWS has the most dynamic and largest community of customers and partners virtually from every industry and every size.

Welcome, AWS SageMaker

Launched in 2017, Amazon SageMaker is a cloud-based machine-learning platform that is fully-managed and decouples your environments across developing, training and deploying, letting you scale them separately whilst helping you optimise your spend and time. AWS SageMaker includes modules that can be used together or independently to build, train, and deploy ML models at any scale by the data scientists and developers. AWS SageMaker empowers everyday developers and scientists to use machine learning without any previous experience. A whole lot of developers across the world are adopting SageMaker in various ways, some for end-to-end flow while others to scale up training jobs.

Why AWS SageMaker: The Advantages

The AWS SageMaker comes with a pool of advantages, some of which I am listing below:

It improves and enhances the productivity of a machine learning project
It aids in creating and managing compute instance within the least amount of time
It reduces the cost of building machine learning models by up to 70%
It automatically creates, deploys, and trains model with complete visibility by -
inspecting raw data
It reduces the time required for data labelling tasks
It helps in storing all Machine Learning components in one place
It trains model faster and is highly scalable
It maintains uptime — Process keeps on running without any stoppage
It maintains high data security

A big umbrella of all the ML services, Sagemaker tries to provide one single place for all your Machine Learning and Data science workflows. It tries to cover all steps involved right from Provisioning Cloud Resources and Importing Data to Cleaning the data, Labelling the data (including manual labelling) and Training models to Automation and Deploying models in production.

AWS Sagemaker Demo in 10 minutes

Looking for a quick start on Sagemaker Console, check out this video on youtube

Sagemaker in 11 minutes by Anuj Syal

Exploring the Full Potential: SageMaker’s Features and Capabilities

Source: aws.amazon.com/sagemaker

Prepare

Even if you don't have a labelled dataset, AWS Sagemaker allows you to take the help of mechanical Turks to label your dataset correctly. One of it is Amazon SageMaker Ground Truth which is a fully managed data labeling service which helps to build the right training dataset. You can get started with labeling your data in minutes through the SageMaker Ground Truth console using custom or built-in data labeling workflows.

Build

AWS SageMaker makes it easy to build ML models and get them ready for training by providing everything you need to swiftly connect to your training data and help you select and optimize the best algorithm and the apt framework for your application. Amazon SageMaker includes hosted Jupyter notebooks that make it easy to explore and visualise your training data stored on Amazon S3. You can either connect directly to data in S3, or use AWS Glue to move data from Amazon DynamoDB, Amazon RDS, and Amazon Redshift into S3 for analysis in your notebook. For ease of selection of algorithms, AWS SageMaker includes the 10 most frequently used ML algorithms which come pre-installed and optimised thereby delivering up to 10 times the performance you would find running these algorithms anywhere else. SageMaker also comes pre-configured to run Apache MXNet and TensorFlow - two of the most widely used open-source frameworks. Besides, you even have the option of using your own framework.

Train

The next essential feature in AWS SageMaker machine learning is training a model. In this stage you need to focus on the evaluation of the model. The training of a model primarily involves an algorithm, and the selection of the algorithm involves various other factors. For effective and faster use, AWS SageMaker provides in-built algorithms as well. Another key requirement for training the Machine Learning model refers to compute resources. The size of the training dataset and the desired speed of results help in determining the requirement of resources. The next important characteristic also accounts as a formidable aspect in Amazon ML vs. SageMaker which deals with evaluation. After completion of the AWS online training of the model, you have to evaluate the model for testing the accuracy of the inferences. The AWS SDK for Python (Boto) or high-level Python library in SageMaker helps in sending requests for inferences to model. Jupyter notebook assists in training and evaluation of the model.

Deploy

Once your model is trained and tuned, AWS SageMaker makes it easy to deploy in production so you can start running generating predictions on new data (a process called inference). To deliver high performance and high availability both, SageMaker deploys your model on an auto-scaling cluster of Amazon EC2 instances spread across multiple availability zones. AWS SageMaker also comes with built-in A/B testing capabilities to help test your model and experiment with different versions to achieve the best results. AWS SageMaker takes away the heavy lifting of ML, so one can build, train, and deploy machine learning models easily and efficiently.

Validating a Model with SageMaker

You have the option of evaluating your model using offline or historical data:

Offline Testing: For this, historical data is used to send requests to the model through Jupyter notebook in Amazon SageMaker for evaluation.

Online Testing with Live Data: Multiple models are deployed into the endpoint of Amazon SageMaker, and it directs live traffic to the model for validation.

Validating Using a "Holdout Set": For this, a part of the data called a “holdout set” is set aside. The model is later trained with remaining input data and generalizes the data based on what it learnt initially.

K-fold Validation: For this, the input data is split into two parts - K, which is the validation data for testing the model, and the other part called k−1 which is used as training data. Now, based on the input data, the ML models evaluate the final output.

Sneak Peek: AWS SageMaker Studio and Architectural View

It is a fully integrated development environment for machine learning where build, training, and deployment of models can all be done under one single roof.

Amazon SageMaker Notebooks: Used for easily creating and sharing Jupyter notebooks.
Amazon SageMaker Experiments: Used for tracking, organizing, comparing, and evaluating different ML experiments.
Amazon SageMaker Debugger: As the name suggests, it is used for debugging and analyzing training issues of complex types and receiving alert notifications for the errors.
Amazon SageMaker Model Monitor: This is used to detect quality deviations for deployed ML models. Amazon SageMaker Autopilot: It is used to build ML models automatically with full visibility and control.

Final Words: Conclusion

Machine learning is the future of application development and AWS SageMaker is all set to revolutionize the world of computing. The sheer productivity of applications in machine learning will create new prospects for adoption of ML services such as the SageMaker.

Anuj Syal's Blog