AWS Sagemaker for absolute beginner

Making Machine Learning model building & deployment in live production a seamless experience for both Software Engineers and Data Scientists


You might be familiar with Amazon Web Services(AWS), one of the most popular cloud services from Amazon, which provides cloud computing infrastructure in the form of various modular and scalable building blocks. Things like Computing Instances, Storage, Database, Network Transport Layers, Security, and Traffic Monitoring are building blocks at your disposal that could be used to create and deploy any type of application in the cloud. The scalable nature of the AWS cloud instances makes it easy to create applications and services in a live production setting. Machine Learning and Artificial Intelligence applications are the natural progression for us to make good use of AWS. Luckily, Amazon addresses this use case for us and AWS SageMaker is configured with all AWS resources so that it streamlines the build, train, test, and deploy process for ML & AI.

This post will give you a gentle introduction to AWS SageMaker on the following sections without the need for prior AWS training/certification.

1. Introduction to SageMaker & Issues to solve

2. Features of AWS SageMaker

3. Case Study: Building a Movie Recommendation Service in a live production setting

4. Pros and Cons of AWS SageMaker


So what is the use case for Amazon SageMaker and what issues can it help us to address?

As you probably know, if you are doing machine learning today, the workflow involves complex and extraneous steps all the way from data preparation, data cleaning, feature engineering, choosing and building the model, setting up the learning environment, training, tuning and debugging the model, managing versions of the models, deploying the model, monitoring the performance of the model, validating the results and eventually scaling for the production environment. Developers (Software Engineers and Data Scientists) nowadays use a collection of different tools to achieve this complex workflow for machine learning systems, and it is not so easy to have a seamless development experience from experimentation to live production.


What if all of these steps could be streamlined and managed in a single service, a one-stop-shop for everything one would need to deploy ML model from data experimentation to live production? This is exactly what Amazon SageMaker has to offer. A simpler, faster, and well-integrated platform that maximizes efficiency for model building and deployment.

Sounds like black magic, huh? Not convinced yet? Let’s take a deeper dive into what features Amazon SageMaker has to offer and how we can best utilize it towards our development.

Amazon SageMaker Studio (Main IDE Platform)


Amazon SageMaker Studio is a web-based IDE for build, train, and deploy models. It is the go-to place that houses all the services and capabilities we have mentioned before. These services or building blocks are designed to work with each other in the entire workflow and will help build machine learning applications that are sophisticated and highly scalable. What’s even better about this is that the services are highly modular where each building block could be easily swapped by your own API at any time.


And of course, with AWS framework, collaboration across teams becomes easy with the AWS Identity and Access Management (IAM) framework. It also provides a lot of built-in functionalities to allow quick data experimentation and baseline model building. It also possesses a dashboard nature where you could have full visibility of the data and models without even writing a single line of code, but instead by click and drags. This would greatly increase productivity for model building and deployment.

Amazon SageMaker Ground Truth (Data Preparation)


With Amazon S3 being integrated into the SageMaker, your training, validation, testing data, your entire Data Lake could all be stored on the cloud, as well as your models, making it readily available for the cloud training instances at all times. What’s even better is the SageMaker Ground Truth service that provides annotation services that include automatic labeling through its own machine learning, and well-integrated human labelers workflow to turn raw data into annotated data. This tool will give us scalable and cost-effective input data quality assurance and preparation.


In addition, the batch-processing framework is also already built-in with SageMaker to make distributed processing of data for different clusters easy. Amazon SageMaker also provides an Apache Spark library, in both Python and Scala, that you can use to easily train models in Amazon SageMaker using org.apache.spark.sql.DataFrame data frames in your Spark clusters. It is also flexible for the choice of Containers. You can either use SageMaker’s containers or bringing your own.

Amazon SageMaker Notebook Instances (Build & Train)


With SageMaker, you get access to a range of fully managed elastic instances from ml.t2.medium to p3.16xlarge to even GPU and FPGA instances. Depending on the computation needs, we can dynamically scale up or down the resources needed to complete tasks according to our requirements. A notebook instance specifically is a compute instance that runs the Jupyter Notebook based on Python. With all the libraries and dependencies pre-configured, the notebook instance is always ready to go. And Amazon SageMaker manages to create the instance and related resources so that developers can just focus on preparing and processing data, writing code to train models, deploying models to Amazon SageMaker hosting, and testing or validating the models all without needing to leave SageMaker Studio.

Amazon SageMaker Marketplace for ML (Model options)

When it comes to the actual training of the model, the first thing we could do is to shop around in SageMaker’s “Marketplace for ML”. First, you might find a built-in model that solves the learning objective exactly. Then if it does, it is just a few clicks away from actually deploying that model. If the built-in algorithms do not solve the problem, then there is a vast variety of built-in frameworks to be selected from. Those will always provide a good baseline model if you end up still building your own.


In this step, Amazon also provides a service called SageMaker AutoPilot that interprets the data automatically and try out the entire bank of built-in models to find the best performing model. In feature will allow developers to build and deploy models without writing a single line of code.

Amazon SageMaker Automatic Model Tuning (Tuning and Tracking)

SageMaker provides additional features that allow us to organize, track, and compare training experiments and fine-tune hyperparameters automatically. We can easily track the parameters and metrics at scale across different experiments and visualize them for comparison. It also automated the hyperparameter tuning for which effectively eliminates tedious manual work and make model tuning a fast iteration all orchestrated in automation. During training, we can also track possible errors like gradient explosion, or loss not decreasing and etc…


This will give us a very good insight early on into the model quality during training, as opposed to waiting for a long time for the model to be trained and realize the model is not ideal.

Amazon SageMaker CloudWatch (Deploy & Manage)

Deploying the model is fairly straightforward on SageMaker. Amazon SageMaker will provide a vanilla HTTPs real-time endpoint for you to deploy your model after you are done building it. Just grab the model from S3, post the data stream, you will get predictions. Or you predict data stored in S3 and read the prediction results from S3. SageMaker is flexible. You can actually take the model from S3 and deploy it anywhere you want at this point.


But in order to take advantage of the Model Monitoring feature from SageMaker, we can connect our endpoint service to SageMaker and it will do the rest of the heavy lifting that includes automatically collecting data from the endpoint, monitoring the model performance based on a baseline model, measuring against a list of built-in rules, visualizing the data and everything else you would need to maintain the model, all in AWS CloudWatch.

With all these salient features that Amazon SageMaker provide us, let’s take a deeper dive with a case study where we use it for building a movie recommendation service.


Consider yourself in the shoes of a software engineer in a video streaming platform and your goal is to build a Netflix-like streaming business with a massive catalog of movies. Say the service has one million users and about 27,000 movies. We have access to a real-time event stream (from Apache Kafka) of the streaming service site that records server logs. The stream of logs will include information about which user watched which movie and ratings about those movies. Can we build and deploy a recommendation system using Amazon SageMaker?



With that in mind, let's dive into having Amazon Sagemaker to perform the rest of heavy-liftings. When you first start the SageMaker Studio, you are faced with three options: Build & Train, Deploy and Monitor, Build Models Automatically.


In the “Build Models Automatically”, or the “Autopilot” feature, there is a bank of models to experiment for you automatically on your designated dataset. By just specifying the input and output data location, target data attribute, and the learning paradigm, the experiment will start and learn the best model by trying all the models in the bank.


After the settings, the experimentation will start to figure out the best candidate model for the data by going through the following steps.


Isn’t it nice? Learning was done with a few clicks, but there are limitations to the Autopilot. It might be suitable for simple classification, logistic regression, and linear regression jobs. As you can see, the number of knobs on the Autopilot settings is limited and it is not flexible in terms of what we can specify it to learn and how we can understand the underlying insights of the features in which that affects the learning results. In particular, if we are building a recommendation system, we will need our model to be interpretable so a human can develop intuition of it easily. So if we mindlessly rely on the Autopilot system too much, it will possibly give us undesired effects.


Amazon Sagemaker Build & Train

And of course, we would also want to have full controls from time to time. For example, while consuming the event logs, we may also make queries for more insightful metadata from external databases about the movie and the user and perform additional feature engineering and data analysis prior to learning. With our own data collection pipeline up and running, all the data being collected in the data lake stored in our Amazon S3 bucket for learning. So, let’s start by creating a Notebook Instance on Amazon SageMaker:


There is a wide variety of kernels pre-loaded for you to choose from. Amazon SageMaker notebook instances come with multiple environments already installed. These environments contain Jupyter kernels and Python packages including: scikit, Pandas, NumPy, TensorFlow, and MXNet. It is great to have the notebooks being able to plug & play and ready to go with all the libraries pre-configured on these kernels. And of course, if there are libraries that were not pre-configured, installing missing dependencies is pretty straightforward as well.


The following is the Collaborative filtering model we built using Scikit-learn’s Surprise libraries.

Amazon Sagemaker Model Deployment

In order to deploy your trained model, you need to create a Docker container image as an entry point for the ECR (Elastic Container Registry) that would make it easy for DevOps engineers to store, manage, and deploy Docker container images in the production workflow. After the model is created and linked to an ECR entry point, deploying the model to an endpoint will be a few clicks away.


And there you have it, a quick overview of an end to end machine learning application from data collection to model deployment on Amazon SageMaker.

After this example case study, let’s summarize by talking about the strength and limitations of the Amazon SageMaker.

Advantages

  • Good cloud infrastructure support with AWS.
  • Amazon SageMaker has an expansive built-ins model and algorithms ready for use and developers can train it on elastic instances with computation resources configured to your need.
  • With the Autopilot experimentation tool, you will be surprised from time to time about the learning results and you might not need to train the model yourself.
  • Jupyter notebook instance is conveniently built-in for data collection, feature engineering, visualization, train and test the model.
  • Although a lot of functionalities are bootstrapped into SageMaker’s service, it also provides great extensibility and flexibility for you to make your own adaptations
  • With the help of Docker, you will be able to customize training and inferring models using other frameworks that those provided by SageMaker. We can use our own Docker image to customize the environment, training phase, API service, or using other frameworks, … as far as we follow the “train” and “serve” requirements. (which could be a limitation which we will go through later)
  • We have an amazingly integrated cloud storage S3 bucket for storing data lakes and models. It makes SageMaker the one-stop-shop for the entire machine learning live-production demands.
  • Monitoring the Model will be easy as SageMaker provides CloudWatch as an internal tool to monitor the stats.

Limitations

  • Every minute spent developing, building, training, debugging on Amazon SageMaker costs you money as long as the instances are running. If you are a slow programmer, then unfortunately you probably will not make the most bang for the buck on it.
  • It is incredibly hard to maintain good source control practices on Amazon Sagemaker. The Git is very hard to use on a Notebook instance. So if you are working on a team, it might be hard to branching off the work and maintain good source control practices among a traditional software development team.
  • You have limited Root access on Amazon SageMaker Instances. Imagine opening up the terminal window and not being able to locate any files or take any commands into action. How weird it is to be stranded that way😉
  • There is no way around deploying your model without a Docker Container. You don’t need to create your own Docker image if the model is one of the algorithms in the example from AWS Marketplace. But if you are creating your own model from a 3rd party library, it will require a lot of work to setup your own Dockerfile and link it to the AWS ECR. And to properly setup, you should not even perform training in Juptyper Notebook instance, but rather point it to a training instance so all training jobs could be managed more easily. This is fine if you have a lot of models to train, but the setup overhead is pretty high if you only have a few models to train.
  • I didn’t find other way to change the entry point with Estimator as we can do with built-ins models. SageMaker built-ins allow to code a bundled script that is used to train and serve the model, but with our own Docker image, this is two scripts (trainand serve) we need to insert in image, and we didn’t found how to make it tweakable from Jupyter— that means each image is built for a certain kind of environment, and you only can try to tweak it with hyperparameters.

You may also like...