Training ML models with Azure ML SDK
These notebook tutorials cover the various scenarios for training machine learning and deep learning models with Azure Machine Learning.
Sample notebooks
-
01.train-hyperparameter-tune-deploy-with-pytorch
Train, hyperparameter tune, and deploy a PyTorch image classification model that distinguishes bees vs. ants using transfer learning. Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Upload training data using
Datastore - Run a single-node
PyTorchtraining job - Hyperparameter tune model with HyperDrive
- Find and register the best model
- Deploy model to ACI
-
02.distributed-pytorch-with-horovod
Train a PyTorch model on the MNIST dataset using distributed training with Horovod. Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Run a two-node distributed
PyTorchtraining job using Horovod
-
03.train-hyperparameter-tun-deploy-with-tensorflow
Train, hyperparameter tune, and deploy a TensorFlow model on the MNIST dataset. Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Upload training data using
Datastore - Run a single-node
TensorFlowtraining job - Leverage features of the
Runobject - Download the trained model
- Hyperparameter tune model with HyperDrive
- Find and register the best model
- Deploy model to ACI
-
04.distributed-tensorflow-with-horovod
Train a TensorFlow word2vec model using distributed training with Horovod. Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Upload training data using
Datastore - Run a two-node distributed
TensorFlowtraining job using Horovod
-
05.distributed-tensorflow-with-parameter-server
Train a TensorFlow model on the MNIST dataset using native distributed TensorFlow (parameter server). Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Run a two workers, one parameter server distributed
TensorFlowtraining job
-
06.distributed-cntk-with-custom-docker
Train a CNTK model on the MNIST dataset using the Azure ML baseEstimatorwith custom Docker image and distributed training. Azure ML concepts covered:- Create a remote compute target (Batch AI cluster)
- Upload training data using
Datastore - Run a base
Estimatortraining job using a custom Docker image from Docker Hub - Distributed CNTK two-node training job via MPI using base
Estimator
-
07.tensorboard
Train a TensorFlow MNIST model locally, on a DSVM, and on Batch AI and view the logs live on TensorBoard. Azure ML concepts covered:- Run the training job locally with Azure ML and run TensorBoard locally. Start (and stop) an Azure ML
TensorBoardobject to stream and view the logs - Run the training job on a remote DSVM and stream the logs to TensorBoard
- Run the training job on a remote Batch AI cluster and stream the logs to TensorBoard
- Start a
Tensorboardinstance that displays the logs from all three above runs in one
- Run the training job locally with Azure ML and run TensorBoard locally. Start (and stop) an Azure ML
-
08.export-run-history-to-tensorboard
- Start an Azure ML
Experimentand log metrics toRunhistory - Export the
Runhistory logs to TensorBoard logs - View the logs in TensorBoard
- Start an Azure ML