diff --git a/training/readme.md b/training/readme.md new file mode 100644 index 00000000..bc7dea64 --- /dev/null +++ b/training/readme.md @@ -0,0 +1,52 @@ +# Training ML models with Azure ML SDK +These notebook tutorials cover the various scenarios for training machine learning and deep learning models with Azure Machine Learning. + +## Sample notebooks +- [01.train-hyperparameter-tune-deploy-with-pytorch](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/01.train-hyperparameter-tune-deploy-with-pytorch) +Train, hyperparameter tune, and deploy a PyTorch image classification model that distinguishes bees vs. ants using transfer learning. Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Upload training data using `Datastore` + - Run a single-node `PyTorch` training job + - Hyperparameter tune model with HyperDrive + - Find and register the best model + - Deploy model to ACI +- [02.distributed-pytorch-with-horovod](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/02.distributed-pytorch-with-horovod) +Train a PyTorch model on the MNIST dataset using distributed training with Horovod. Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Run a two-node distributed `PyTorch` training job using Horovod +- [03.train-hyperparameter-tun-deploy-with-tensorflow](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/03.train-hyperparameter-tune-deploy-with-tensorflow) +Train, hyperparameter tune, and deploy a TensorFlow model on the MNIST dataset. Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Upload training data using `Datastore` + - Run a single-node `TensorFlow` training job + - Leverage features of the `Run` object + - Download the trained model + - Hyperparameter tune model with HyperDrive + - Find and register the best model + - Deploy model to ACI +- [04.distributed-tensorflow-with-horovod](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/04.distributed-tensorflow-with-horovod) +Train a TensorFlow word2vec model using distributed training with Horovod. Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Upload training data using `Datastore` + - Run a two-node distributed `TensorFlow` training job using Horovod +- [05.distributed-tensorflow-with-parameter-server](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/05.distributed-tensorflow-with-parameter-server) +Train a TensorFlow model on the MNIST dataset using native distributed TensorFlow (parameter server). Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Run a two workers, one parameter server distributed `TensorFlow` training job +- [06.distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/06.distributed-cntk-with-custom-docker) +Train a CNTK model on the MNIST dataset using the Azure ML base `Estimator` with custom Docker image and distributed training. Azure ML concepts covered: + - Create a remote compute target (Batch AI cluster) + - Upload training data using `Datastore` + - Run a base `Estimator` training job using a custom Docker image from Docker Hub + - Distributed CNTK two-node training job via MPI using base `Estimator` + +- [07.tensorboard](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/07.tensorboard) +Train a TensorFlow MNIST model locally, on a DSVM, and on Batch AI and view the logs live on TensorBoard. Azure ML concepts covered: + - Run the training job locally with Azure ML and run TensorBoard locally. Start (and stop) an Azure ML `TensorBoard` object to stream and view the logs + - Run the training job on a remote DSVM and stream the logs to TensorBoard + - Run the training job on a remote Batch AI cluster and stream the logs to TensorBoard + - Start a `Tensorboard` instance that displays the logs from all three above runs in one +- [08.export-run-history-to-tensorboard](https://github.com/Azure/MachineLearningNotebooks/tree/master/training/08.export-run-history-to-tensorboard) + - Start an Azure ML `Experiment` and log metrics to `Run` history + - Export the `Run` history logs to TensorBoard logs + - View the logs in TensorBoard