Predicting Shipping Cost: A Comparison Between Azure AutoML and Jupyter Notebook Developed Code

Introduction

Azure AutoML a part of Azure Cognitive Services has become a powerful tool as it empowers data scientists and developers to build, deploy and manage high-quality models faster and with confidence. AzureML expertly accelerates time to value with industry-leading MLOps (machine learning operations), open-source interoperability, and integrated tools. It aims to innovate on a secure, trusted platform designed for responsible machine learning (ML).

How It Works?

Based on Microsoft's definition, Azure AutoML is Empower professional and nonprofessional data scientists to build machine learning models rapidly. Automate time-consuming and iterative tasks of machine learning model development using breakthrough research and accelerate time to market [link].

This ML tool has a wide range of capabilities and the majors of them are:

AutoML solutions can target various stages of the machine-learning process. These are the different steps that can be automated [link]:


During training, Azure Machine Learning creates a number of pipelines in parallel that try different algorithms and parameters for you. The service iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score. The better the score for the metric you want to optimize for, the better the model is considered to "fit" your data. It will stop once it hits the exit criteria defined in the experiment.

Any Need for Data Science Jobs in the Future?

Altogether, AzureML improves productivity within the studio, the development experience that supports all ML tasks to build, train and deploy models. It collaborates with Jupyter Notebooks using built-in support for popular open-source frameworks and libraries. One of the major factors is how it creates accurate models quickly with automated ML, using feature engineering and hyperparameter-sweeping capabilities. AzureML accesses the debugger, profiler, and explanations to improve model performance as you train and also uses deep Visual Studio Code integration to go from local to cloud training seamlessly and autoscale with powerful cloud-based CPU and GPU clusters [link]. 

Experiment: Defining A Supply Chain Shipping Price Prediction

It can be difficult to navigate the logistics when it comes to buying art. These include, but are not limited to, the following:

Though many companies have made shipping consumer goods a relatively quick and painless procedure, the same rules do not always apply while shipping paintings or transporting antiques and collectibles.


Data Description

Set Up the Scene in Azure AutoML

We import the above dataset to perform a regression problem in Azure AutoML. At the same time, we develop our own notebook to solve the problem. We set the metric as Normalized mean absolute error in AutoML, however, we could get all metrics via the model result upon completion.

We set the models as XGBoostRegressor, LightGBM, and RandomForest based on these models well-performed on other price predictions in other models.

The training hours were set to two hours as the runtime of the Jupiter notebook we performed. However, AutoML could perform better for a longer time, and also the generated notebook also could take more time via GridSearch of the parameters.

The validation is set to auto although we could choose between k-fold- Monte Carlo Cross-validation (picked by the AutoML).

Residuals and validation plots are shown below and it could see the model is not performed very well. However, we should consider the dataset is not optimized and need extensive feature engineering to get a good result.

Let compare the results with a designed Jupiter notebook project and see which one is comparatively could perform better.

Conclusions

By comparing the results we could see the generated Jupyter notebook (human supervision) has a better performance on modeling and hyper tuning and making pipelines. However, as expected Azure AutoML has a relatively simple data injecting to modeling and pipeline production. The lower performance on Azure AutoML is negligible comparing all other benefits Azure services and studios provide for us and on a broader scale for Enterprises.

Building and testing a model is fairly simple in Azure AutoML. However, to get the best result, a data scientist or a team of data scientists that has a deep, fundamental and statistical knowledge of machine learning should supervise the Azure AutoMl and with other Azure capabilities like Designer, pipeline, Azure Cognitive Services (ACS), Azure Kubernetes Service (AKS), the performed model via Azure could reach and save significant benefits for the businesses. We should consider that it requires knowledge of the characteristics of machine learning algorithms, and certainly will not develop new ones automatically. But it does provide an environment where machine learning could be used effectively without low-level implementation of algorithmic knowledge. So, it can be concluded on the note that AzureML will not end the future of data science but it would rather improve the working process with its advanced features.