version 1.0.17

This commit is contained in:
Roope Astala
2019-02-25 16:12:02 -05:00
parent c082b72b71
commit b7b5576b15
52 changed files with 4538 additions and 3306 deletions

View File

@@ -102,3 +102,5 @@ pip install azureml-sdk[explain]
# install the core SDK and experimental components # install the core SDK and experimental components
pip install azureml-sdk[contrib] pip install azureml-sdk[contrib]
``` ```
Drag and Drop
The image will be downloaded by Fatkun

View File

@@ -1,9 +1,6 @@
# Azure Machine Learning service example notebooks # Azure Machine Learning service example notebooks
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK
allows you the choice of using local or cloud compute resources, while managing
and maintaining the complete data science workflow from the cloud.
![Azure ML workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/overview-what-is-azure-ml/aml.png) ![Azure ML workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/overview-what-is-azure-ml/aml.png)
@@ -18,16 +15,17 @@ You should always run the [Configuration](./configuration.ipynb) notebook first
If you want to... If you want to...
* ...try out and explore Azure ML, start with image classification tutorials [part 1 training](./tutorials/img-classification-part1-training.ipynb) and [part 2 deployment](./tutorials/img-classification-part2-deploy.ipynb). * ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb).
* ...prepare your data and do automated machine learning, start with regression tutorials: [Part 1 (Data Prep)](./tutorials/regression-part1-data-prep.ipynb) and [Part 2 (Automated ML)](./tutorials/regression-part2-automated-ml.ipynb).
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb). * ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb). * ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
* ...deploy model as realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb). * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
* ...deploy models as batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb). * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb). * ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
## Tutorials ## Tutorials
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs) The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs).
## How to use Azure ML ## How to use Azure ML
@@ -45,9 +43,8 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
## Documentation ## Documentation
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/). * Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
* [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
* [Python SDK reference]( https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py) * Azure ML Data Prep SDK [overview](https://aka.ms/data-prep-sdk), [Python SDK reference](https://aka.ms/aml-data-prep-apiref), and [tutorials and how-tos](https://aka.ms/aml-data-prep-notebooks).
--- ---
@@ -56,4 +53,4 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
Visit following repos to see projects contributed by Azure ML users: Visit following repos to see projects contributed by Azure ML users:
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT) - [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion) - [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)

View File

@@ -96,7 +96,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"print(\"This notebook was created using version 1.0.15 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.0.17 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },

View File

@@ -5,8 +5,8 @@ Learn how to use Azure Machine Learning services for experimentation and model m
As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order. As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.
* [train-within-notebook](./training/train-within-notebook): Train a model hile tracking run history, and learn how to deploy the model as web service to Azure Container Instance. * [train-within-notebook](./training/train-within-notebook): Train a model hile tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
* [train-on-local](./training/train-on-local): Learn how to submit a run and use Azure ML managed run configuration. * [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node managed compute cluster as a remote compute target for CPU or GPU based training. * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs. * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
* [logging-api](./training/logging-api): Learn about the details of logging metrics to run history. * [logging-api](./training/logging-api): Learn about the details of logging metrics to run history.
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management. * [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.

View File

@@ -229,6 +229,9 @@ If a sample notebook fails with an error that property, method or library does n
1) Check that you have selected correct kernel in jupyter notebook. The kernel is displayed in the top right of the notebook page. It can be changed using the `Kernel | Change Kernel` menu option. For Azure Notebooks, it should be `Python 3.6`. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook. 1) Check that you have selected correct kernel in jupyter notebook. The kernel is displayed in the top right of the notebook page. It can be changed using the `Kernel | Change Kernel` menu option. For Azure Notebooks, it should be `Python 3.6`. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.
2) Check that the notebook is for the SDK version that you are using. You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version. 2) Check that the notebook is for the SDK version that you are using. You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version.
## Numpy import fails on Windows
Some Windows environments see an error loading numpy with the latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
## Remote run: DsvmCompute.create fails ## Remote run: DsvmCompute.create fails
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are: There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name. 1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.

View File

@@ -2,7 +2,7 @@ name: azure_automl
dependencies: dependencies:
# The python interpreter version. # The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later. # Currently Azure ML only supports 3.5.2 and later.
- python=3.6 - python>=3.5.2,<3.6.8
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.11.0,<1.15.0 - numpy>=1.11.0,<1.15.0
@@ -12,6 +12,7 @@ dependencies:
- scikit-learn>=0.18.0,<=0.19.1 - scikit-learn>=0.18.0,<=0.19.1
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<0.23.0
- tensorflow>=1.12.0 - tensorflow>=1.12.0
- py-xgboost<=0.80
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.

View File

@@ -2,7 +2,7 @@ name: azure_automl
dependencies: dependencies:
# The python interpreter version. # The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later. # Currently Azure ML only supports 3.5.2 and later.
- python=3.6 - python>=3.5.2,<3.6.8
- nb_conda - nb_conda
- matplotlib==2.1.0 - matplotlib==2.1.0
- numpy>=1.15.3 - numpy>=1.15.3
@@ -12,6 +12,7 @@ dependencies:
- scikit-learn>=0.18.0,<=0.19.1 - scikit-learn>=0.18.0,<=0.19.1
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<0.23.0
- tensorflow>=1.12.0 - tensorflow>=1.12.0
- py-xgboost<=0.80
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.

View File

@@ -84,9 +84,9 @@
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"\n", "\n",
"# choose a name for experiment\n", "# choose a name for experiment\n",
"experiment_name = 'automl-local-classification'\n", "experiment_name = 'automl-classification-deployment'\n",
"# project folder\n", "# project folder\n",
"project_folder = './sample_projects/automl-local-classification'\n", "project_folder = './sample_projects/automl-classification-deployment'\n",
"\n", "\n",
"experiment=Experiment(ws, experiment_name)\n", "experiment=Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -103,23 +103,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -289,8 +272,6 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"experiment_name = 'automl-local-classification'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)" "ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
] ]

View File

@@ -100,23 +100,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -81,8 +81,8 @@
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"\n", "\n",
"# Choose a name for the experiment and specify the project folder.\n", "# Choose a name for the experiment and specify the project folder.\n",
"experiment_name = 'automl-local-classification'\n", "experiment_name = 'automl-classification'\n",
"project_folder = './sample_projects/automl-local-classification'\n", "project_folder = './sample_projects/automl-classification'\n",
"\n", "\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -99,23 +99,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -49,23 +49,6 @@
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros." "Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -49,23 +49,6 @@
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros." "Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -70,23 +70,6 @@
"ws = Workspace.from_config()" "ws = Workspace.from_config()"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -147,8 +147,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Data Splitting\n", "For demonstration purposes, we extract sales time-series for just a few of the stores:"
"For the purposes of demonstration and later forecast evaluation, we now split the data into a training and a testing set. The test set will contain the final 20 weeks of observed sales for each time-series."
] ]
}, },
{ {
@@ -157,19 +156,37 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"ntest_periods = 20\n", "use_stores = [2, 5, 8]\n",
"data_subset = data[data.Store.isin(use_stores)]\n",
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
"print('Data subset contains {0} individual time-series.'.format(nseries))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data Splitting\n",
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"n_test_periods = 20\n",
"\n", "\n",
"def split_last_n_by_grain(df, n):\n", "def split_last_n_by_grain(df, n):\n",
" \"\"\"\n", " \"\"\"Group df by grain and split on last n rows for each group.\"\"\"\n",
" Group df by grain and split on last n rows for each group\n",
" \"\"\"\n",
" df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n", " df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n",
" .groupby(grain_column_names, group_keys=False))\n", " .groupby(grain_column_names, group_keys=False))\n",
" df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n", " df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
" df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n", " df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
" return df_head, df_tail\n", " return df_head, df_tail\n",
"\n", "\n",
"X_train, X_test = split_last_n_by_grain(data, ntest_periods)" "X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
] ]
}, },
{ {
@@ -187,24 +204,7 @@
"\n", "\n",
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n", "AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
"\n", "\n",
"You are almost ready to start an AutoML training job. We will first need to create a validation set from the existing training set (i.e. for hyper-parameter tuning): " "You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"nvalidation_periods = 20\n",
"X_train, X_validate = split_last_n_by_grain(X_train, nvalidation_periods)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also need to separate the target column from the rest of the DataFrame: "
] ]
}, },
{ {
@@ -214,8 +214,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"target_column_name = 'Quantity'\n", "target_column_name = 'Quantity'\n",
"y_train = X_train.pop(target_column_name).values\n", "y_train = X_train.pop(target_column_name).values"
"y_validate = X_validate.pop(target_column_name).values "
] ]
}, },
{ {
@@ -224,22 +223,31 @@
"source": [ "source": [
"## Train\n", "## Train\n",
"\n", "\n",
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, and the training and validation data. \n", "The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters. \n",
"\n", "\n",
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time and the grain column names. A time column is required for forecasting, while the grain is optional. If a grain is not given, the forecaster assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak. \n", "For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
"\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. \n",
"\n",
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n",
"\n",
"Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
"\n", "\n",
"|Property|Description|\n", "|Property|Description|\n",
"|-|-|\n", "|-|-|\n",
"|**task**|forecasting|\n", "|**task**|forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n", "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n", "|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
"|**X**|Training matrix of features, shape = [n_training_samples, n_features]|\n", "|**X**|Training matrix of features as a pandas DataFrame, shape = [n_training_samples, n_features]|\n",
"|**y**|Target values, shape = [n_training_samples, ]|\n", "|**y**|Target values as a numpy.ndarray, shape = [n_training_samples, ]|\n",
"|**X_valid**|Validation matrix of features, shape = [n_validation_samples, n_features]|\n", "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
"|**y_valid**|Target values for validation, shape = [n_validation_samples, ]\n",
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n", "|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n",
"|**debug_log**|Log file path for writing debugging information\n", "|**debug_log**|Log file path for writing debugging information\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. " "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n",
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
] ]
}, },
{ {
@@ -248,10 +256,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"automl_settings = {\n", "time_series_settings = {\n",
" 'time_column_name': time_column_name,\n", " 'time_column_name': time_column_name,\n",
" 'grain_column_names': grain_column_names,\n", " 'grain_column_names': grain_column_names,\n",
" 'drop_column_names': ['logQuantity']\n", " 'drop_column_names': ['logQuantity'],\n",
" 'max_horizon': n_test_periods\n",
"}\n", "}\n",
"\n", "\n",
"automl_config = AutoMLConfig(task='forecasting',\n", "automl_config = AutoMLConfig(task='forecasting',\n",
@@ -260,12 +269,11 @@
" iterations=10,\n", " iterations=10,\n",
" X=X_train,\n", " X=X_train,\n",
" y=y_train,\n", " y=y_train,\n",
" X_valid=X_validate,\n", " n_cross_validations=5,\n",
" y_valid=y_validate,\n",
" enable_ensembling=False,\n", " enable_ensembling=False,\n",
" path=project_folder,\n", " path=project_folder,\n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" **automl_settings)" " **time_series_settings)"
] ]
}, },
{ {

View File

@@ -102,23 +102,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -74,9 +74,9 @@
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"\n", "\n",
"# choose a name for experiment\n", "# choose a name for experiment\n",
"experiment_name = 'automl-local-classification'\n", "experiment_name = 'automl-model-explanation'\n",
"# project folder\n", "# project folder\n",
"project_folder = './sample_projects/automl-local-classification-model-explanation'\n", "project_folder = './sample_projects/automl-model-explanation'\n",
"\n", "\n",
"experiment=Experiment(ws, experiment_name)\n", "experiment=Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -93,23 +93,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -96,23 +96,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -104,23 +104,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -130,7 +113,7 @@
"1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n", "1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n",
"2. Enter the IP address, user name and password below.\n", "2. Enter the IP address, user name and password below.\n",
"\n", "\n",
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on changing SSH ports for security reasons." "**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on changing SSH ports for security reasons."
] ]
}, },
{ {

View File

@@ -67,6 +67,7 @@
"source": [ "source": [
"import logging\n", "import logging\n",
"import os\n", "import os\n",
"import csv\n",
"\n", "\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"import numpy as np\n", "import numpy as np\n",
@@ -89,7 +90,7 @@
"\n", "\n",
"# Choose a name for the run history container in the workspace.\n", "# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-amlcompute'\n", "experiment_name = 'automl-remote-amlcompute'\n",
"project_folder = './sample_projects/automl-remote-amlcompute'\n", "project_folder = './project'\n",
"\n", "\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -106,23 +107,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -171,6 +155,51 @@
" # For a more detailed view of current AmlCompute status, use get_status()." " # For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions, you need to make the data accessible from the remote compute.\n",
"This can be done by uploading the data to DataStore.\n",
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_train = datasets.load_digits()\n",
"\n",
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)\n",
" \n",
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
"\n",
"ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data', target_path='bai_data', overwrite=True, show_progress=True)\n",
"\n",
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='bai_data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -188,29 +217,13 @@
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n", "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -219,17 +232,13 @@
"source": [ "source": [
"%%writefile $project_folder/get_data.py\n", "%%writefile $project_folder/get_data.py\n",
"\n", "\n",
"from sklearn import datasets\n", "import pandas as pd\n",
"from scipy import sparse\n",
"import numpy as np\n",
"\n", "\n",
"def get_data():\n", "def get_data():\n",
" \n", " X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" digits = datasets.load_digits()\n", " y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" X_train = digits.data\n",
" y_train = digits.target\n",
"\n", "\n",
" return { \"X\" : X_train, \"y\" : y_train }" " return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
] ]
}, },
{ {

View File

@@ -99,23 +99,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -123,7 +106,7 @@
"### Create a Remote Linux DSVM\n", "### Create a Remote Linux DSVM\n",
"Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", "Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
"\n", "\n",
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this." "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on this."
] ]
}, },
{ {

View File

@@ -68,6 +68,7 @@
"import logging\n", "import logging\n",
"import os\n", "import os\n",
"import time\n", "import time\n",
"import csv\n",
"\n", "\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"import numpy as np\n", "import numpy as np\n",
@@ -90,7 +91,7 @@
"\n", "\n",
"# Choose a name for the run history container in the workspace.\n", "# Choose a name for the run history container in the workspace.\n",
"experiment_name = 'automl-remote-dsvm'\n", "experiment_name = 'automl-remote-dsvm'\n",
"project_folder = './sample_projects/automl-remote-dsvm'\n", "project_folder = './project'\n",
"\n", "\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -107,23 +108,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -153,6 +137,44 @@
" time.sleep(90) # Wait for ssh to be accessible" " time.sleep(90) # Wait for ssh to be accessible"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions, you need to make the data accessible from the remote compute.\n",
"This can be done by uploading the data to DataStore.\n",
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data_train = datasets.load_digits()\n",
"\n",
"if not os.path.isdir('data'):\n",
" os.mkdir('data')\n",
" \n",
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)\n",
" \n",
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
"\n",
"ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data', target_path='re_data', overwrite=True, show_progress=True)\n",
"\n",
"from azureml.core.runconfig import DataReferenceConfiguration\n",
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
" path_on_datastore='re_data', \n",
" path_on_compute='/tmp/azureml_runs',\n",
" mode='download', # download files from datastore to compute target\n",
" overwrite=False)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -168,29 +190,13 @@
"# Set compute target to the Linux DSVM\n", "# Set compute target to the Linux DSVM\n",
"conda_run_config.target = dsvm_compute\n", "conda_run_config.target = dsvm_compute\n",
"\n", "\n",
"# set the data reference of the run coonfiguration\n",
"conda_run_config.data_references = {ds.name: dr}\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n", "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if not os.path.exists(project_folder):\n",
" os.makedirs(project_folder)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -199,17 +205,13 @@
"source": [ "source": [
"%%writefile $project_folder/get_data.py\n", "%%writefile $project_folder/get_data.py\n",
"\n", "\n",
"from sklearn import datasets\n", "import pandas as pd\n",
"from scipy import sparse\n",
"import numpy as np\n",
"\n", "\n",
"def get_data():\n", "def get_data():\n",
" \n", " X_train = pd.read_csv(\"/tmp/azureml_runs/re_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" digits = datasets.load_digits()\n", " y_train = pd.read_csv(\"/tmp/azureml_runs/re_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" X_train = digits.data[100:,:]\n",
" y_train = digits.target[100:]\n",
"\n", "\n",
" return { \"X\" : X_train, \"y\" : y_train }" " return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
] ]
}, },
{ {

View File

@@ -75,7 +75,7 @@
"experiment_name = 'non_sample_weight_experiment'\n", "experiment_name = 'non_sample_weight_experiment'\n",
"sample_weight_experiment_name = 'sample_weight_experiment'\n", "sample_weight_experiment_name = 'sample_weight_experiment'\n",
"\n", "\n",
"project_folder = './sample_projects/automl-local-classification'\n", "project_folder = './sample_projects/sample_weight'\n",
"\n", "\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n", "sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n",
@@ -93,23 +93,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -79,9 +79,9 @@
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"\n", "\n",
"# choose a name for the experiment\n", "# choose a name for the experiment\n",
"experiment_name = 'automl-local-missing-data'\n", "experiment_name = 'sparse-data-train-test-split'\n",
"# project folder\n", "# project folder\n",
"project_folder = './sample_projects/automl-local-missing-data'\n", "project_folder = './sample_projects/sparse-data-train-test-split'\n",
"\n", "\n",
"experiment = Experiment(ws, experiment_name)\n", "experiment = Experiment(ws, experiment_name)\n",
"\n", "\n",
@@ -98,23 +98,6 @@
"outputDf.T" "outputDf.T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -88,23 +88,6 @@
"pd.DataFrame(data = output, index = ['']).T" "pd.DataFrame(data = output, index = ['']).T"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"set_diagnostics_collection(send_diagnostics = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},

View File

@@ -11,13 +11,6 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![04ACI](files/tables/image2.JPG)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -60,14 +53,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# import the Workspace class and check the azureml SDK version\n", "# Set auth to be used by workspace related APIs.\n",
"from azureml.core import Workspace\n", "# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
"\n", "# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
"ws = Workspace.from_config(auth = auth)\n", "auth = None"
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
] ]
}, },
{ {
@@ -79,7 +68,7 @@
"# import the Workspace class and check the azureml SDK version\n", "# import the Workspace class and check the azureml SDK version\n",
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"\n", "\n",
"ws = Workspace.from_config()\n", "ws = Workspace.from_config(auth = auth)\n",
"print('Workspace name: ' + ws.name, \n", "print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n", " 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n", " 'Subscription id: ' + ws.subscription_id, \n",
@@ -350,9 +339,6 @@
"authors": [ "authors": [
{ {
"name": "pasha" "name": "pasha"
},
{
"name": "wamartin"
} }
], ],
"kernelspec": { "kernelspec": {
@@ -370,9 +356,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.0" "version": "3.6.6"
}, },
"name": "03.Build_model_runHistory", "name": "build-model-run-history-03",
"notebookId": 3836944406456339 "notebookId": 3836944406456339
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -20,13 +20,6 @@
"Please Register Azure Container Instance(ACI) using Azure Portal: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal in your subscription before using the SDK to deploy your ML model to ACI." "Please Register Azure Container Instance(ACI) using Azure Portal: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal in your subscription before using the SDK to deploy your ML model to ACI."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![04ACI](files/tables/image3.JPG)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -45,15 +38,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core import Workspace\n", "# Set auth to be used by workspace related APIs.\n",
"\n", "# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
"#'''\n", "# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
"ws = Workspace.from_config(auth = auth)\n", "auth = None"
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
"#'''"
] ]
}, },
{ {
@@ -63,18 +51,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"import azureml.core\n",
"\n", "\n",
"# Check core SDK version number\n", "ws = Workspace.from_config(auth = auth)\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"\n",
"#'''\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n", "print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n", " 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n", " 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n", " 'Resource group: ' + ws.resource_group, sep = '\\n')"
"#'''"
] ]
}, },
{ {
@@ -293,24 +275,14 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"#comment to not delete the web service\n", "#comment to not delete the web service\n",
"#myservice.delete()" "myservice.delete()"
] ]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
} }
], ],
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "pasha" "name": "pasha"
},
{
"name": "wamartin"
} }
], ],
"kernelspec": { "kernelspec": {
@@ -328,9 +300,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.0" "version": "3.6.6"
}, },
"name": "04.DeploytoACI", "name": "deploy-to-aci-04",
"notebookId": 3836944406456376 "notebookId": 3836944406456376
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -0,0 +1,236 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
"\n",
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook uses image from ACI notebook for deploying to AKS."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set auth to be used by workspace related APIs.\n",
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
"auth = None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config(auth = auth)\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# List images by ws\n",
"\n",
"from azureml.core.image import ContainerImage\n",
"for i in ContainerImage.list(workspace = ws):\n",
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import Image\n",
"myimage = Image(workspace=ws, name=\"aciws\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#create AKS compute\n",
"#it may take 20-25 minutes to create a new cluster\n",
"\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Use the default configuration (can also provide parameters to customize)\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'ps-aks-demo2' \n",
"\n",
"# Create the cluster\n",
"aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)\n",
"\n",
"aks_target.wait_for_completion(show_output = True)\n",
"\n",
"print(aks_target.provisioning_state)\n",
"print(aks_target.provisioning_errors)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice\n",
"help( Webservice.deploy_from_image)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.image import ContainerImage\n",
"\n",
"#Set the web service configuration (using default here with app insights)\n",
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n",
"\n",
"#unique service name\n",
"service_name ='ps-aks-service'\n",
"\n",
"# Webservice creation using single command, there is a variant to use image directly as well.\n",
"aks_service = Webservice.deploy_from_image(\n",
" workspace=ws, \n",
" name=service_name,\n",
" deployment_config = aks_config,\n",
" image = myimage,\n",
" deployment_target = aks_target\n",
" )\n",
"\n",
"aks_service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aks_service.deployment_status"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#for using the Web HTTP API \n",
"print(aks_service.scoring_uri)\n",
"print(aks_service.get_keys())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"#get the some sample data\n",
"test_data_path = \"AdultCensusIncomeTest\"\n",
"test = spark.read.parquet(test_data_path).limit(5)\n",
"\n",
"test_json = json.dumps(test.toJSON().collect())\n",
"\n",
"print(test_json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
"aks_service.run(input_data=test_json)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#comment to not delete the web service\n",
"aks_service.delete()\n",
"#image.delete()\n",
"#model.delete()\n",
"aks_target.delete() "
]
}
],
"metadata": {
"authors": [
{
"name": "pasha"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"name": "deploy-to-aks-existingimage-05",
"notebookId": 1030695628045968
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -11,13 +11,6 @@
"Licensed under the MIT License." "Licensed under the MIT License."
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![04ACI](files/tables/image1.JPG)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -42,7 +35,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Download AdultCensusIncome.csv from Azure CDN. This file has 32,561 rows.\n", "# Download AdultCensusIncome.csv from Azure CDN. This file has 32,561 rows.\n",
"basedataurl = \"https://amldockerdatasets.azureedge.net\"\n", "dataurl = \"https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv\"\n",
"datafile = \"AdultCensusIncome.csv\"\n", "datafile = \"AdultCensusIncome.csv\"\n",
"datafile_dbfs = os.path.join(\"/dbfs\", datafile)\n", "datafile_dbfs = os.path.join(\"/dbfs\", datafile)\n",
"\n", "\n",
@@ -50,7 +43,7 @@
" print(\"found {} at {}\".format(datafile, datafile_dbfs))\n", " print(\"found {} at {}\".format(datafile, datafile_dbfs))\n",
"else:\n", "else:\n",
" print(\"downloading {} to {}\".format(datafile, datafile_dbfs))\n", " print(\"downloading {} to {}\".format(datafile, datafile_dbfs))\n",
" urllib.request.urlretrieve(os.path.join(basedataurl, datafile), datafile_dbfs)" " urllib.request.urlretrieve(dataurl, datafile_dbfs)"
] ]
}, },
{ {
@@ -152,9 +145,6 @@
"authors": [ "authors": [
{ {
"name": "pasha" "name": "pasha"
},
{
"name": "wamartin"
} }
], ],
"kernelspec": { "kernelspec": {
@@ -172,9 +162,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.0" "version": "3.6.6"
}, },
"name": "02.Ingest_data", "name": "ingest-data-02",
"notebookId": 3836944406456362 "notebookId": 3836944406456362
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -35,13 +35,6 @@
"print(\"SDK version:\", azureml.core.VERSION)" "print(\"SDK version:\", azureml.core.VERSION)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![04ACI](files/tables/image2b.JPG)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -67,6 +60,18 @@
"# workspace_region = \"<your-resource group-region>\"" "# workspace_region = \"<your-resource group-region>\""
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set auth to be used by workspace related APIs.\n",
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
"auth = None"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -82,6 +87,7 @@
" subscription_id = subscription_id,\n", " subscription_id = subscription_id,\n",
" resource_group = resource_group, \n", " resource_group = resource_group, \n",
" location = workspace_region,\n", " location = workspace_region,\n",
" auth = auth,\n",
" exist_ok=True)" " exist_ok=True)"
] ]
}, },
@@ -103,12 +109,13 @@
"source": [ "source": [
"ws = Workspace(workspace_name = workspace_name,\n", "ws = Workspace(workspace_name = workspace_name,\n",
" subscription_id = subscription_id,\n", " subscription_id = subscription_id,\n",
" resource_group = resource_group)\n", " resource_group = resource_group,\n",
" auth = auth)\n",
"\n", "\n",
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n", "# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
"ws.write_config()\n", "ws.write_config()\n",
"##if you need to give a different path/filename please use this\n", "#if you need to give a different path/filename please use this\n",
"##write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)" "#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
] ]
}, },
{ {
@@ -129,29 +136,19 @@
"# import the Workspace class and check the azureml SDK version\n", "# import the Workspace class and check the azureml SDK version\n",
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"\n", "\n",
"ws = Workspace.from_config()\n", "ws = Workspace.from_config(auth = auth)\n",
"#ws = Workspace.from_config(<full path>)\n", "#ws = Workspace.from_config(<full path>)\n",
"print('Workspace name: ' + ws.name, \n", "print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n", " 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n", " 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')" " 'Resource group: ' + ws.resource_group, sep = '\\n')"
] ]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
} }
], ],
"metadata": { "metadata": {
"authors": [ "authors": [
{ {
"name": "pasha" "name": "pasha"
},
{
"name": "wamartin"
} }
], ],
"kernelspec": { "kernelspec": {
@@ -169,10 +166,10 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.7.0" "version": "3.6.6"
}, },
"name": "01.Installation_and_Configuration", "name": "installation-and-configuration-01",
"notebookId": 3836944406456490 "notebookId": 3688394266452835
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 1 "nbformat_minor": 1

View File

@@ -12,8 +12,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Azure Machine Learning Pipeline with DataTransferStep\n", "# Azure Machine Learning Pipeline with DataTranferStep\n",
"This notebook is used to demonstrate the use of DataTransferStep in Azure Machine Learning Pipeline.\n", "This notebook is used to demonstrate the use of DataTranferStep in Azure Machine Learning Pipeline.\n",
"\n", "\n",
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n", "In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
"\n", "\n",
@@ -466,4 +466,4 @@
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }

View File

@@ -67,8 +67,7 @@
"source": [ "source": [
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n", "Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
"\n", "\n",
"If you don't have a config.json file, please go through the configuration Notebook located here:\n", "If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
"https://github.com/Azure/MachineLearningNotebooks. \n",
"\n", "\n",
"This sets you up with a working config file that has information on your workspace, subscription id, etc. " "This sets you up with a working config file that has information on your workspace, subscription id, etc. "
] ]
@@ -80,7 +79,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" "\n",
"print('Workspace Name: ' + ws.name, \n",
" 'Azure Region: ' + ws.location, \n",
" 'Subscription Id: ' + ws.subscription_id, \n",
" 'Resource Group: ' + ws.resource_group, sep = '\\n')"
] ]
}, },
{ {
@@ -114,7 +117,8 @@
" batch_compute = BatchCompute(ws, batch_compute_name)\n", " batch_compute = BatchCompute(ws, batch_compute_name)\n",
"except ComputeTargetException:\n", "except ComputeTargetException:\n",
" print('Attaching Batch compute...')\n", " print('Attaching Batch compute...')\n",
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, account_name=batch_account_name)\n", " provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
" account_name=batch_account_name)\n",
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n", " batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
" batch_compute.wait_for_completion()\n", " batch_compute.wait_for_completion()\n",
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n", " print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
@@ -127,7 +131,19 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Setup DataStore" "## Setup Datastore"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Setting up the Blob storage associated with the workspace. \n",
"The following call retrieves the Azure Blob Store associated with your workspace. \n",
"Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is**. \n",
" \n",
"If you want to register another Datastore, please follow the instructions from here:\n",
"https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data#register-a-datastore"
] ]
}, },
{ {
@@ -136,11 +152,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# Blob storage associated with the workspace\n", "datastore = Datastore(ws, \"workspaceblobstore\")\n",
"# The following call GETS the Azure Blob Store associated with your workspace.\n", "\n",
"# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** \n", "print('Datastore details:')\n",
"default_blob_store = Datastore(ws, \"workspaceblobstore\")\n", "print('Datastore Account Name: ' + datastore.account_name)\n",
"print(\"Blobstore name: {}\".format(def_blob_store.name))" "print('Datastore Workspace Name: ' + datastore.workspace.name)\n",
"print('Datastore Container Name: ' + datastore.container_name)"
] ]
}, },
{ {
@@ -154,7 +171,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"For this example we will upload a file in the provided DataStore. These are some helper methods to achieve that." "For this example we will upload a file in the provided Datastore. These are some helper methods to achieve that."
] ]
}, },
{ {
@@ -171,16 +188,16 @@
" return temp_dir\n", " return temp_dir\n",
"\n", "\n",
"\n", "\n",
"def upload_file_to_datastore(datastore, path, content):\n", "def upload_file_to_datastore(datastore, file_name, content):\n",
" dir = create_local_file(content=content, file_name=\"temp.file\")\n", " dir = create_local_file(content=content, file_name=file_name)\n",
" datastore.upload(src_dir=dir, target_path=path, overwrite=True, show_progress=True)" " datastore.upload(src_dir=dir, overwrite=True, show_progress=True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Here we associate the input DataReference with an existing file in the provided DataStore. Feel free to upload the file of your choice manually or use the *upload_testdata* method. " "Here we associate the input DataReference with an existing file in the provided Datastore. Feel free to upload the file of your choice manually or use the *upload_file_to_datastore* method. "
] ]
}, },
{ {
@@ -189,14 +206,14 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"testdata_path=\"testdata.txt\"\n", "file_name=\"input.txt\"\n",
"\n", "\n",
"upload_file_to_datastore(datastore=default_blob_store, \n", "upload_file_to_datastore(datastore=datastore, \n",
" path=testdata_path, \n", " file_name=file_name, \n",
" content=\"This is the content of the file\")\n", " content=\"this is the content of the file\")\n",
"\n", "\n",
"testdata = DataReference(datastore=default_blob_store, \n", "testdata = DataReference(datastore=datastore, \n",
" path_on_datastore=testdata_path, \n", " path_on_datastore=file_name, \n",
" data_reference_name=\"input\")\n", " data_reference_name=\"input\")\n",
"\n", "\n",
"outputdata = PipelineData(name=\"output\", datastore=datastore)" "outputdata = PipelineData(name=\"output\", datastore=datastore)"
@@ -224,7 +241,7 @@
"source": [ "source": [
"binaries_folder = \"azurebatch/job_binaries\"\n", "binaries_folder = \"azurebatch/job_binaries\"\n",
"if not os.path.isdir(binaries_folder):\n", "if not os.path.isdir(binaries_folder):\n",
" os.mkdir(project_folder)\n", " os.mkdir(binaries_folder)\n",
"\n", "\n",
"file_name=\"azurebatch.cmd\"\n", "file_name=\"azurebatch.cmd\"\n",
"with open(path.join(binaries_folder, file_name), 'w') as f:\n", "with open(path.join(binaries_folder, file_name), 'w') as f:\n",

View File

@@ -29,7 +29,8 @@
"import os\n", "import os\n",
"import shutil\n", "import shutil\n",
"import urllib\n", "import urllib\n",
"from azureml.core import Experiment\n", "import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"from azureml.core.datastore import Datastore\n", "from azureml.core.datastore import Datastore\n",
"from azureml.core.compute import ComputeTarget, AmlCompute\n", "from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.exceptions import ComputeTargetException\n", "from azureml.exceptions import ComputeTargetException\n",
@@ -109,7 +110,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Upload MNIST dataset to blob datastore \n", "## Upload MNIST dataset to blob datastore \n",
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training." "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
] ]
}, },
{ {
@@ -118,7 +119,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"ds = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n", "ds = ws.get_default_datastore()\n",
"ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
] ]
}, },
@@ -129,12 +130,12 @@
"## Retrieve or create a Azure Machine Learning compute\n", "## Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n", "Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n", "\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n", "If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
"\n", "\n",
"1. Create the configuration\n", "1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n", "2. Create the Azure Machine Learning compute\n",
"\n", "\n",
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n" "**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
] ]
}, },
{ {
@@ -143,7 +144,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"cluster_name = \"aml-compute\"\n", "cluster_name = \"gpucluster\"\n",
"\n", "\n",
"try:\n", "try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
@@ -320,7 +321,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Build the experiment" "### Run the pipeline"
] ]
}, },
{ {
@@ -329,31 +330,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"pipeline = Pipeline(workspace=ws, steps=[hd_step])" "pipeline = Pipeline(workspace=ws, steps=[hd_step])\n",
"pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Submit the experiment " "### Monitor using widget"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)\n",
"pipeline_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View Run Details"
] ]
}, },
{ {
@@ -365,6 +350,22 @@
"from azureml.widgets import RunDetails\n", "from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run).show()" "RunDetails(pipeline_run).show()"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait for the completion of this Pipeline run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run.wait_for_completion()"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -204,7 +204,8 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create a schedule for the pipeline" "### Create a schedule for the pipeline using a recurrence\n",
"This schedule will run on a specified recurrence interval."
] ]
}, },
{ {
@@ -345,7 +346,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Change reccurence of the schedule" "### Change recurrence of the schedule"
] ]
}, },
{ {
@@ -366,13 +367,58 @@
" wait_for_provisioning=True,\n", " wait_for_provisioning=True,\n",
" recurrence=recurrence)\n", " recurrence=recurrence)\n",
"\n", "\n",
"fetched_schedule = Schedule.get_schedule(ws, fetched_schedule.id)\n", "fetched_schedule = Schedule.get(ws, fetched_schedule.id)\n",
"\n", "\n",
"print(\"Updated schedule:\", fetched_schedule.id, \n", "print(\"Updated schedule:\", fetched_schedule.id, \n",
" \"\\nNew name:\", fetched_schedule.name,\n", " \"\\nNew name:\", fetched_schedule.name,\n",
" \"\\nNew frequency:\", fetched_schedule.recurrence.frequency,\n", " \"\\nNew frequency:\", fetched_schedule.recurrence.frequency,\n",
" \"\\nNew status:\", fetched_schedule.status)" " \"\\nNew status:\", fetched_schedule.status)"
] ]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a schedule for the pipeline using a Datastore\n",
"This schedule will run when additions or modifications are made to Blobs in the Datastore container.\n",
"Note: Only Blob Datastores are supported."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.datastore import Datastore\n",
"\n",
"datastore = Datastore(workspace=ws, name=\"workspaceblobstore\")\n",
"\n",
"schedule = Schedule.create(workspace=ws, name=\"My_Schedule\",\n",
" pipeline_id=pub_pipeline_id, \n",
" experiment_name='Schedule_Run',\n",
" datastore=datastore,\n",
" wait_for_provisioning=True,\n",
" description=\"Schedule Run\")\n",
"\n",
"# You may want to make sure that the schedule is provisioned properly\n",
"# before making any further changes to the schedule\n",
"\n",
"print(\"Created schedule with id: {}\".format(schedule.id))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set the wait_for_provisioning flag to False if you do not want to wait \n",
"# for the call to provision the schedule in the backend.\n",
"schedule.disable(wait_for_provisioning=True)\n",
"schedule = Schedule.get(ws, schedule_id)\n",
"print(\"Disabled schedule {}. New status is: {}\".format(schedule.id, schedule.status))"
]
} }
], ],
"metadata": { "metadata": {

View File

@@ -168,7 +168,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Data Connections with Inputs and Outputs\n", "## Data Connections with Inputs and Outputs\n",
"The DatabricksStep supports Azure Blob and ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n", "The DatabricksStep supports Azure Bloband ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
"\n", "\n",
"- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n", "- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n",
"- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n", "- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n",
@@ -397,7 +397,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"### 1. Running the demo notebook already added to the Databricks workspace\n", "### 1. Running the demo notebook already added to the Databricks workspace\n",
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable notebook_path when you run the code cell below:" "Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:"
] ]
}, },
{ {
@@ -436,7 +436,6 @@
"source": [ "source": [
"steps = [dbNbStep]\n", "steps = [dbNbStep]\n",
"pipeline = Pipeline(workspace=ws, steps=steps)\n", "pipeline = Pipeline(workspace=ws, steps=steps)\n",
"pipeline.validate()\n",
"pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n", "pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n",
"pipeline_run.wait_for_completion()" "pipeline_run.wait_for_completion()"
] ]
@@ -706,4 +705,4 @@
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }

View File

@@ -120,7 +120,7 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Python Scripts\n", "# Python Scripts\n",
"We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style_mpi.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n", "We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n",
"\n", "\n",
"We install `ffmpeg` through conda dependencies." "We install `ffmpeg` through conda dependencies."
] ]
@@ -201,6 +201,13 @@
" )" " )"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sample video **organutan.mp4** is stored at a publicly shared datastore. We are registering the datastore below. If you want to take a look at the original video, click here. (https://pipelinedata.blob.core.windows.net/sample-videos/orangutan.mp4)"
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -208,8 +215,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# datastore for input video\n", "# datastore for input video\n",
"account_name = \"happypathspublic\"\n", "account_name = \"pipelinedata\"\n",
"video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"videos\",\n", "video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n",
" account_name=account_name, overwrite=True)\n", " account_name=account_name, overwrite=True)\n",
"\n", "\n",
"# datastore for models\n", "# datastore for models\n",
@@ -238,9 +245,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"video_name=os.getenv(\"STYLE_TRANSFER_VIDEO_NAME\", \"orangutan.mp4\") \n",
"orangutan_video = DataReference(datastore=video_ds,\n", "orangutan_video = DataReference(datastore=video_ds,\n",
" data_reference_name=\"video\",\n", " data_reference_name=\"video\",\n",
" path_on_datastore=\"orangutan.mp4\", mode=\"download\")" " path_on_datastore=video_name, mode=\"download\")"
] ]
}, },
{ {
@@ -542,7 +550,7 @@
"response = requests.post(rest_endpoint, \n", "response = requests.post(rest_endpoint, \n",
" headers=aad_token,\n", " headers=aad_token,\n",
" json={\"ExperimentName\": \"style_transfer\",\n", " json={\"ExperimentName\": \"style_transfer\",\n",
" \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 4}}) \n", " \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 3}}) \n",
"run_id = response.json()[\"Id\"]\n", "run_id = response.json()[\"Id\"]\n",
"\n", "\n",
"published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", "published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n",

View File

@@ -209,8 +209,8 @@
"\n", "\n",
"svc_pr = ServicePrincipalAuthentication(\n", "svc_pr = ServicePrincipalAuthentication(\n",
" tenant_id=\"my-tenant-id\",\n", " tenant_id=\"my-tenant-id\",\n",
" username=\"my-application-id\",\n", " service_principal_id=\"my-application-id\",\n",
" password=svc_pr_password)\n", " service_principal_password=svc_pr_password)\n",
"\n", "\n",
"\n", "\n",
"ws = Workspace(\n", "ws = Workspace(\n",

View File

@@ -4,13 +4,15 @@ These examples show you:
1. [How to use the Estimator pattern in Azure ML](how-to-use-estimator) 1. [How to use the Estimator pattern in Azure ML](how-to-use-estimator)
2. [Train using TensorFlow Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-tensorflow) 2. [Train using TensorFlow Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-tensorflow)
3. [Train using Keras and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-keras) 3. [Train using Pytorch Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-pytorch)
4. [Train using Pytorch Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-pytorch) 4. [Train using Keras and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-keras)
5. [Distributed training using TensorFlow and Parameter Server](distributed-tensorflow-with-parameter-server) 5. [Train using Chainer Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-chainer)
6. [Distributed training using TensorFlow and Horovod](distributed-tensorflow-with-horovod) 6. [Distributed training using TensorFlow and Parameter Server](distributed-tensorflow-with-parameter-server)
7. [Distributed training using Pytorch and Horovod](distributed-pytorch-with-horovod) 7. [Distributed training using TensorFlow and Horovod](distributed-tensorflow-with-horovod)
8. [Distributed training using CNTK and custom Docker image](distributed-cntk-with-custom-docker) 8. [Distributed training using Pytorch and Horovod](distributed-pytorch-with-horovod)
9. [Export run history records to Tensorboard](export-run-history-to-tensorboard) 9. [Distributed training using CNTK and custom Docker image](distributed-cntk-with-custom-docker)
10. [Use TensorBoard to monitor training execution](tensorboard) 10. [Distributed training using Chainer](distributed-chainer)
11. [Export run history records to Tensorboard](export-run-history-to-tensorboard)
12. [Use TensorBoard to monitor training execution](tensorboard)
Learn more about how to use `Estimator` class to [train deep neural networks with Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models). Learn more about how to use `Estimator` class to [train deep neural networks with Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models).

View File

@@ -0,0 +1,153 @@
import argparse
import chainer
import chainer.cuda
import chainer.functions as F
import chainer.links as L
from chainer import training
from chainer.training import extensions
import chainermn
import chainermn.datasets
import chainermn.functions
chainer.disable_experimental_feature_warning = True
class MLP0SubA(chainer.Chain):
def __init__(self, comm, n_out):
super(MLP0SubA, self).__init__(
l1=L.Linear(784, n_out))
def __call__(self, x):
return F.relu(self.l1(x))
class MLP0SubB(chainer.Chain):
def __init__(self, comm):
super(MLP0SubB, self).__init__()
def __call__(self, y):
return y
class MLP0(chainermn.MultiNodeChainList):
# Model on worker 0.
def __init__(self, comm, n_out):
super(MLP0, self).__init__(comm=comm)
self.add_link(MLP0SubA(comm, n_out), rank_in=None, rank_out=1)
self.add_link(MLP0SubB(comm), rank_in=1, rank_out=None)
class MLP1Sub(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP1Sub, self).__init__(
l2=L.Linear(None, n_units),
l3=L.Linear(None, n_out))
def __call__(self, h0):
h1 = F.relu(self.l2(h0))
return self.l3(h1)
class MLP1(chainermn.MultiNodeChainList):
# Model on worker 1.
def __init__(self, comm, n_units, n_out):
super(MLP1, self).__init__(comm=comm)
self.add_link(MLP1Sub(n_units, n_out), rank_in=0, rank_out=0)
def main():
parser = argparse.ArgumentParser(
description='ChainerMN example: pipelined neural network')
parser.add_argument('--batchsize', '-b', type=int, default=100,
help='Number of images in each mini-batch')
parser.add_argument('--epoch', '-e', type=int, default=20,
help='Number of sweeps over the dataset to train')
parser.add_argument('--gpu', '-g', action='store_true',
help='Use GPU')
parser.add_argument('--out', '-o', default='result',
help='Directory to output the result')
parser.add_argument('--unit', '-u', type=int, default=1000,
help='Number of units')
args = parser.parse_args()
# Prepare ChainerMN communicator.
if args.gpu:
comm = chainermn.create_communicator('hierarchical')
data_axis, model_axis = comm.rank % 2, comm.rank // 2
data_comm = comm.split(data_axis, comm.rank)
model_comm = comm.split(model_axis, comm.rank)
device = comm.intra_rank
else:
comm = chainermn.create_communicator('naive')
data_axis, model_axis = comm.rank % 2, comm.rank // 2
data_comm = comm.split(data_axis, comm.rank)
model_comm = comm.split(model_axis, comm.rank)
device = -1
if model_comm.size != 2:
raise ValueError(
'This example can only be executed on the even number'
'of processes.')
if comm.rank == 0:
print('==========================================')
if args.gpu:
print('Using GPUs')
print('Num unit: {}'.format(args.unit))
print('Num Minibatch-size: {}'.format(args.batchsize))
print('Num epoch: {}'.format(args.epoch))
print('==========================================')
if data_axis == 0:
model = L.Classifier(MLP0(model_comm, args.unit))
elif data_axis == 1:
model = MLP1(model_comm, args.unit, 10)
if device >= 0:
chainer.cuda.get_device_from_id(device).use()
model.to_gpu()
optimizer = chainermn.create_multi_node_optimizer(
chainer.optimizers.Adam(), data_comm)
optimizer.setup(model)
# Original dataset on worker 0 and 1.
# Datasets of worker 0 and 1 are split and distributed to all workers.
if model_axis == 0:
train, test = chainer.datasets.get_mnist()
if data_axis == 1:
train = chainermn.datasets.create_empty_dataset(train)
test = chainermn.datasets.create_empty_dataset(test)
else:
train, test = None, None
train = chainermn.scatter_dataset(train, data_comm, shuffle=True)
test = chainermn.scatter_dataset(test, data_comm, shuffle=True)
train_iter = chainer.iterators.SerialIterator(
train, args.batchsize, shuffle=False)
test_iter = chainer.iterators.SerialIterator(
test, args.batchsize, repeat=False, shuffle=False)
updater = training.StandardUpdater(train_iter, optimizer, device=device)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)
evaluator = extensions.Evaluator(test_iter, model, device=device)
evaluator = chainermn.create_multi_node_evaluator(evaluator, data_comm)
trainer.extend(evaluator)
# Some display and output extentions are necessary only for worker 0.
if comm.rank == 0:
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.ProgressBar())
trainer.run()
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,315 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Distributed Chainer\n",
"In this tutorial, you will run a Chainer training example on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using ChainerMN distributed training across a GPU cluster."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"\n",
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model on the remote compute\n",
"Now that we have the AmlCompute ready to go, let's run our distributed training job."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a project directory\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"project_folder = './chainer-distr'\n",
"os.makedirs(project_folder, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare training script\n",
"Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `train_mnist.py`. In practice, you should be able to take any custom Chainer training script as is and run it with Azure ML without having to modify your code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once your script is ready, copy the training script `train_mnist.py` into the project directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.copy('train_mnist.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an experiment\n",
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed Chainer tutorial. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment_name = 'chainer-distr'\n",
"experiment = Experiment(ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Chainer estimator\n",
"The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.dnn import Chainer\n",
"\n",
"estimator = Chainer(source_directory=project_folder,\n",
" compute_target=compute_target,\n",
" entry_script='train_mnist.py',\n",
" node_count=2,\n",
" process_count_per_node=1,\n",
" distributed_backend='mpi',\n",
" use_gpu=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit job\n",
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = experiment.submit(estimator)\n",
"print(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor your run\n",
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"msauthor": "minxia"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,125 @@
# Official ChainerMN example taken from
# https://github.com/chainer/chainer/blob/master/examples/chainermn/mnist/train_mnist.py
from __future__ import print_function
import argparse
import chainer
import chainer.functions as F
import chainer.links as L
from chainer import training
from chainer.training import extensions
import chainermn
class MLP(chainer.Chain):
def __init__(self, n_units, n_out):
super(MLP, self).__init__(
# the size of the inputs to each layer will be inferred
l1=L.Linear(784, n_units), # n_in -> n_units
l2=L.Linear(n_units, n_units), # n_units -> n_units
l3=L.Linear(n_units, n_out), # n_units -> n_out
)
def __call__(self, x):
h1 = F.relu(self.l1(x))
h2 = F.relu(self.l2(h1))
return self.l3(h2)
def main():
parser = argparse.ArgumentParser(description='ChainerMN example: MNIST')
parser.add_argument('--batchsize', '-b', type=int, default=100,
help='Number of images in each mini-batch')
parser.add_argument('--communicator', type=str,
default='non_cuda_aware', help='Type of communicator')
parser.add_argument('--epoch', '-e', type=int, default=20,
help='Number of sweeps over the dataset to train')
parser.add_argument('--gpu', '-g', default=True,
help='Use GPU')
parser.add_argument('--out', '-o', default='result',
help='Directory to output the result')
parser.add_argument('--resume', '-r', default='',
help='Resume the training from snapshot')
parser.add_argument('--unit', '-u', type=int, default=1000,
help='Number of units')
args = parser.parse_args()
# Prepare ChainerMN communicator.
if args.gpu:
if args.communicator == 'naive':
print("Error: 'naive' communicator does not support GPU.\n")
exit(-1)
comm = chainermn.create_communicator(args.communicator)
device = comm.intra_rank
else:
if args.communicator != 'naive':
print('Warning: using naive communicator '
'because only naive supports CPU-only execution')
comm = chainermn.create_communicator('naive')
device = -1
if comm.rank == 0:
print('==========================================')
print('Num process (COMM_WORLD): {}'.format(comm.size))
if args.gpu:
print('Using GPUs')
print('Using {} communicator'.format(args.communicator))
print('Num unit: {}'.format(args.unit))
print('Num Minibatch-size: {}'.format(args.batchsize))
print('Num epoch: {}'.format(args.epoch))
print('==========================================')
model = L.Classifier(MLP(args.unit, 10))
if device >= 0:
chainer.cuda.get_device_from_id(device).use()
model.to_gpu()
# Create a multi node optimizer from a standard Chainer optimizer.
optimizer = chainermn.create_multi_node_optimizer(
chainer.optimizers.Adam(), comm)
optimizer.setup(model)
# Split and distribute the dataset. Only worker 0 loads the whole dataset.
# Datasets of worker 0 are evenly split and distributed to all workers.
if comm.rank == 0:
train, test = chainer.datasets.get_mnist()
else:
train, test = None, None
train = chainermn.scatter_dataset(train, comm, shuffle=True)
test = chainermn.scatter_dataset(test, comm, shuffle=True)
train_iter = chainer.iterators.SerialIterator(train, args.batchsize)
test_iter = chainer.iterators.SerialIterator(test, args.batchsize,
repeat=False, shuffle=False)
updater = training.StandardUpdater(train_iter, optimizer, device=device)
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)
# Create a multi node evaluator from a standard Chainer evaluator.
evaluator = extensions.Evaluator(test_iter, model, device=device)
evaluator = chainermn.create_multi_node_evaluator(evaluator, comm)
trainer.extend(evaluator)
# Some display and output extensions are necessary only for one worker.
# (Otherwise, there would just be repeated outputs.)
if comm.rank == 0:
trainer.extend(extensions.dump_graph('main/loss'))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.extend(extensions.ProgressBar())
if args.resume:
chainer.serializers.load_npz(args.resume, trainer)
trainer.run()
if __name__ == '__main__':
main()

View File

@@ -56,7 +56,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"!pip install azureml-contrib-tensorboard" "!pip install azureml-tensorboard"
] ]
}, },
{ {
@@ -166,7 +166,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Export Run History to Tensorboard logs\n", "# Export Run History to Tensorboard logs\n",
"from azureml.contrib.tensorboard.export import export_to_tensorboard\n", "from azureml.tensorboard.export import export_to_tensorboard\n",
"import os\n", "import os\n",
"\n", "\n",
"logdir = 'exportedTBlogs'\n", "logdir = 'exportedTBlogs'\n",
@@ -208,7 +208,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.contrib.tensorboard import Tensorboard\n", "from azureml.tensorboard import Tensorboard\n",
"\n", "\n",
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([], local_root=logdir, port=6006)\n", "tb = Tensorboard([], local_root=logdir, port=6006)\n",

View File

@@ -57,7 +57,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"!pip install azureml-contrib-tensorboard" "!pip install azureml-tensorboard"
] ]
}, },
{ {
@@ -239,7 +239,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.contrib.tensorboard import Tensorboard\n", "from azureml.tensorboard import Tensorboard\n",
"\n", "\n",
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n", "tb = Tensorboard([run])\n",
@@ -293,7 +293,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import RemoteCompute\n", "from azureml.core.compute import ComputeTarget, RemoteCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n", "from azureml.core.compute_target import ComputeTargetException\n",
"\n", "\n",
"username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')\n", "username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')\n",
@@ -305,12 +305,11 @@
" attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n",
" print('found existing:', attached_dsvm_compute.name)\n", " print('found existing:', attached_dsvm_compute.name)\n",
"except ComputeTargetException:\n", "except ComputeTargetException:\n",
" attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n", " config = RemoteCompute.attach_configuration(username=username,\n",
" name=compute_target_name,\n", " address=address,\n",
" username=username,\n", " ssh_port=22,\n",
" address=address,\n", " private_key_file='./.ssh/id_rsa')\n",
" ssh_port=22,\n", " attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n",
" private_key_file='./.ssh/id_rsa')\n",
" \n", " \n",
" attached_dsvm_compute.wait_for_completion(show_output=True)" " attached_dsvm_compute.wait_for_completion(show_output=True)"
] ]
@@ -407,10 +406,13 @@
"# choose a name for your cluster\n", "# choose a name for your cluster\n",
"cluster_name = \"cpucluster\"\n", "cluster_name = \"cpucluster\"\n",
"\n", "\n",
"try:\n", "cts = ws.compute_targets\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", "found = False\n",
" print('Found existing compute target.')\n", "if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n",
"except ComputeTargetException:\n", " found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[cluster_name]\n",
"if not found:\n",
" print('Creating a new compute target...')\n", " print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n",
" max_nodes=4)\n", " max_nodes=4)\n",
@@ -418,10 +420,10 @@
" # create the cluster\n", " # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n", "\n",
"compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n", "compute_target.wait_for_completion(show_output=True, min_node_count=None)\n",
"\n", "\n",
"# use get_status() to get a detailed status for the current cluster. \n", "# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())" "# print(compute_target.get_status().serialize())"
] ]
}, },
{ {

View File

@@ -0,0 +1,136 @@
import argparse
import numpy as np
import chainer
from chainer import backend
from chainer import backends
from chainer.backends import cuda
from chainer import Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
from chainer.dataset import concat_examples
from chainer.backends.cuda import to_cpu
from azureml.core.run import Run
run = Run.get_context()
class MyNetwork(Chain):
def __init__(self, n_mid_units=100, n_out=10):
super(MyNetwork, self).__init__()
with self.init_scope():
self.l1 = L.Linear(None, n_mid_units)
self.l2 = L.Linear(n_mid_units, n_mid_units)
self.l3 = L.Linear(n_mid_units, n_out)
def forward(self, x):
h = F.relu(self.l1(x))
h = F.relu(self.l2(h))
return self.l3(h)
def main():
parser = argparse.ArgumentParser(description='Chainer example: MNIST')
parser.add_argument('--batchsize', '-b', type=int, default=100,
help='Number of images in each mini-batch')
parser.add_argument('--epochs', '-e', type=int, default=20,
help='Number of sweeps over the dataset to train')
parser.add_argument('--output_dir', '-o', default='./outputs',
help='Directory to output the result')
parser.add_argument('--gpu_id', '-g', default=0,
help='ID of the GPU to be used. Set to -1 if you use CPU')
args = parser.parse_args()
# Download the MNIST data if you haven't downloaded it yet
train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1)
gpu_id = args.gpu_id
batchsize = args.batchsize
epochs = args.epochs
run.log('Batch size', np.int(batchsize))
run.log('Epochs', np.int(epochs))
train_iter = iterators.SerialIterator(train, batchsize)
test_iter = iterators.SerialIterator(test, batchsize,
repeat=False, shuffle=False)
model = MyNetwork()
if gpu_id >= 0:
# Make a specified GPU current
chainer.backends.cuda.get_device_from_id(0).use()
model.to_gpu() # Copy the model to the GPU
# Choose an optimizer algorithm
optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9)
# Give the optimizer a reference to the model so that it
# can locate the model's parameters.
optimizer.setup(model)
while train_iter.epoch < epochs:
# ---------- One iteration of the training loop ----------
train_batch = train_iter.next()
image_train, target_train = concat_examples(train_batch, gpu_id)
# Calculate the prediction of the network
prediction_train = model(image_train)
# Calculate the loss with softmax_cross_entropy
loss = F.softmax_cross_entropy(prediction_train, target_train)
# Calculate the gradients in the network
model.cleargrads()
loss.backward()
# Update all the trainable parameters
optimizer.update()
# --------------------- until here ---------------------
# Check the validation accuracy of prediction after every epoch
if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch
# Display the training loss
print('epoch:{:02d} train_loss:{:.04f} '.format(
train_iter.epoch, float(to_cpu(loss.array))), end='')
test_losses = []
test_accuracies = []
while True:
test_batch = test_iter.next()
image_test, target_test = concat_examples(test_batch, gpu_id)
# Forward the test data
prediction_test = model(image_test)
# Calculate the loss
loss_test = F.softmax_cross_entropy(prediction_test, target_test)
test_losses.append(to_cpu(loss_test.array))
# Calculate the accuracy
accuracy = F.accuracy(prediction_test, target_test)
accuracy.to_cpu()
test_accuracies.append(accuracy.array)
if test_iter.is_new_epoch:
test_iter.epoch = 0
test_iter.current_position = 0
test_iter.is_new_epoch = False
test_iter._pushed_position = None
break
val_accuracy = np.mean(test_accuracies)
print('val_loss:{:.04f} val_accuracy:{:.04f}'.format(
np.mean(test_losses), val_accuracy))
run.log("Accuracy", np.float(val_accuracy))
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,134 @@
import argparse
import numpy as np
import chainer
from chainer import backend
from chainer import backends
from chainer.backends import cuda
from chainer import Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
from chainer.dataset import concat_examples
from chainer.backends.cuda import to_cpu
from azureml.core.run import Run
run = Run.get_context()
class MyNetwork(Chain):
def __init__(self, n_mid_units=100, n_out=10):
super(MyNetwork, self).__init__()
with self.init_scope():
self.l1 = L.Linear(None, n_mid_units)
self.l2 = L.Linear(n_mid_units, n_mid_units)
self.l3 = L.Linear(n_mid_units, n_out)
def forward(self, x):
h = F.relu(self.l1(x))
h = F.relu(self.l2(h))
return self.l3(h)
def main():
parser = argparse.ArgumentParser(description='Chainer example: MNIST')
parser.add_argument('--batchsize', '-b', type=int, default=100,
help='Number of images in each mini-batch')
parser.add_argument('--epochs', '-e', type=int, default=20,
help='Number of sweeps over the dataset to train')
parser.add_argument('--output_dir', '-o', default='./outputs',
help='Directory to output the result')
args = parser.parse_args()
# Download the MNIST data if you haven't downloaded it yet
train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1)
batchsize = args.batchsize
epochs = args.epochs
run.log('Batch size', np.int(batchsize))
run.log('Epochs', np.int(epochs))
train_iter = iterators.SerialIterator(train, batchsize)
test_iter = iterators.SerialIterator(test, batchsize,
repeat=False, shuffle=False)
model = MyNetwork()
gpu_id = -1 # Set to -1 if you use CPU
if gpu_id >= 0:
# Make a specified GPU current
chainer.backends.cuda.get_device_from_id(0).use()
model.to_gpu() # Copy the model to the GPU
# Choose an optimizer algorithm
optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9)
# Give the optimizer a reference to the model so that it
# can locate the model's parameters.
optimizer.setup(model)
while train_iter.epoch < epochs:
# ---------- One iteration of the training loop ----------
train_batch = train_iter.next()
image_train, target_train = concat_examples(train_batch, gpu_id)
# Calculate the prediction of the network
prediction_train = model(image_train)
# Calculate the loss with softmax_cross_entropy
loss = F.softmax_cross_entropy(prediction_train, target_train)
# Calculate the gradients in the network
model.cleargrads()
loss.backward()
# Update all the trainable parameters
optimizer.update()
# --------------------- until here ---------------------
# Check the validation accuracy of prediction after every epoch
if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch
# Display the training loss
print('epoch:{:02d} train_loss:{:.04f} '.format(
train_iter.epoch, float(to_cpu(loss.array))), end='')
test_losses = []
test_accuracies = []
while True:
test_batch = test_iter.next()
image_test, target_test = concat_examples(test_batch, gpu_id)
# Forward the test data
prediction_test = model(image_test)
# Calculate the loss
loss_test = F.softmax_cross_entropy(prediction_test, target_test)
test_losses.append(to_cpu(loss_test.array))
# Calculate the accuracy
accuracy = F.accuracy(prediction_test, target_test)
accuracy.to_cpu()
test_accuracies.append(accuracy.array)
if test_iter.is_new_epoch:
test_iter.epoch = 0
test_iter.current_position = 0
test_iter.is_new_epoch = False
test_iter._pushed_position = None
break
val_accuracy = np.mean(test_accuracies)
print('val_loss:{:.04f} val_accuracy:{:.04f}'.format(
np.mean(test_losses), val_accuracy))
run.log("Accuracy", np.float(val_accuracy))
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,425 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and hyperparameter tune with Chainer\n",
"\n",
"In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.workspace import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model on the remote compute\n",
"Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a project directory\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"project_folder = './chainer-mnist'\n",
"os.makedirs(project_folder, exist_ok=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare training script\n",
"Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n",
"\n",
"However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n",
"\n",
"In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n",
"```Python\n",
"from azureml.core.run import Run\n",
"run = Run.get_context()\n",
"```\n",
"Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n",
"```Python\n",
"run.log('Batch size', np.int(args.batchsize))\n",
"run.log('Epochs', np.int(args.epochs))\n",
"\n",
"run.log('Accuracy', np.float(val_accuracy))\n",
"```\n",
"These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once your script is ready, copy the training script `chainer_mnist.py` into your project directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import shutil\n",
"\n",
"shutil.copy('chainer_mnist.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create an experiment\n",
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment_name = 'chainer-mnist'\n",
"experiment = Experiment(ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a Chainer estimator\n",
"The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs. The following code will define a single-node Chainer job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.dnn import Chainer\n",
"\n",
"script_params = {\n",
" '--epochs': 10,\n",
" '--batchsize': 128,\n",
" '--output_dir': './outputs'\n",
"}\n",
"\n",
"estimator = Chainer(source_directory=project_folder, \n",
" script_params=script_params,\n",
" compute_target=compute_target,\n",
" pip_packages=['numpy', 'pytest'],\n",
" entry_script='chainer_mnist.py',\n",
" use_gpu=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit job\n",
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = experiment.submit(estimator)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor your run\n",
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# to get more details of your run\n",
"print(run.get_details())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tune model hyperparameters\n",
"Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Start a hyperparameter sweep\n",
"First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n",
"\n",
"Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n",
"Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n",
"from azureml.train.hyperdrive.sampling import RandomParameterSampling\n",
"from azureml.train.hyperdrive.policy import BanditPolicy\n",
"from azureml.train.hyperdrive.run import PrimaryMetricGoal\n",
"from azureml.train.hyperdrive.parameter_expressions import choice\n",
" \n",
"\n",
"param_sampling = RandomParameterSampling( {\n",
" \"--batchsize\": choice(128, 256),\n",
" \"--epochs\": choice(5, 10, 20, 40)\n",
" }\n",
")\n",
"\n",
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
" hyperparameter_sampling=param_sampling, \n",
" primary_metric_name='Accuracy',\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
" max_total_runs=8,\n",
" max_concurrent_runs=4)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, lauch the hyperparameter tuning job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# start the HyperDrive run\n",
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor HyperDrive runs\n",
"You can monitor the progress of the runs with the following Jupyter widget. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"RunDetails(hyperdrive_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "minxia"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
},
"msauthor": "minxia"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -28,6 +28,8 @@ parser.add_argument('--first-layer-neurons', type=int, dest='n_hidden_1', defaul
help='# of neurons in the first layer') help='# of neurons in the first layer')
parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', default=100, parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', default=100,
help='# of neurons in the second layer') help='# of neurons in the second layer')
parser.add_argument('--learning-rate', type=float, dest='learning_rate', default=0.001, help='learning rate')
args = parser.parse_args() args = parser.parse_args()
data_folder = args.data_folder data_folder = args.data_folder
@@ -46,9 +48,9 @@ n_inputs = 28 * 28
n_h1 = args.n_hidden_1 n_h1 = args.n_hidden_1
n_h2 = args.n_hidden_2 n_h2 = args.n_hidden_2
n_outputs = 10 n_outputs = 10
n_epochs = 20 n_epochs = 20
batch_size = args.batch_size batch_size = args.batch_size
learning_rate = args.learning_rate
y_train = one_hot_encode(y_train, n_outputs) y_train = one_hot_encode(y_train, n_outputs)
y_test = one_hot_encode(y_test, n_outputs) y_test = one_hot_encode(y_test, n_outputs)
@@ -56,9 +58,9 @@ print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n')
# Build a simple MLP model # Build a simple MLP model
model = Sequential() model = Sequential()
# input layer # first hidden layer
model.add(Dense(n_h1, activation='relu', input_shape=(n_inputs,))) model.add(Dense(n_h1, activation='relu', input_shape=(n_inputs,)))
# hidden layer # second hidden layer
model.add(Dense(n_h2, activation='relu')) model.add(Dense(n_h2, activation='relu'))
# output layer # output layer
model.add(Dense(n_outputs, activation='softmax')) model.add(Dense(n_outputs, activation='softmax'))
@@ -66,7 +68,7 @@ model.add(Dense(n_outputs, activation='softmax'))
model.summary() model.summary()
model.compile(loss='categorical_crossentropy', model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(), optimizer=RMSprop(lr=learning_rate),
metrics=['accuracy']) metrics=['accuracy'])
# start an Azure ML run # start an Azure ML run

View File

@@ -217,7 +217,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"props = run.upload_file(name='myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n", "props = run.upload_file(name='outputs/myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
"props.serialize()" "props.serialize()"
] ]
}, },

View File

@@ -81,7 +81,7 @@
"from azureml.core import Experiment, Workspace\n", "from azureml.core import Experiment, Workspace\n",
"\n", "\n",
"# Check core SDK version number\n", "# Check core SDK version number\n",
"print(\"This notebook was created using version 1.0.15 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n",
"print(\"\")\n", "print(\"\")\n",
"\n", "\n",
@@ -138,7 +138,6 @@
"* We use `start_logging` to create a new run in this experiment\n", "* We use `start_logging` to create a new run in this experiment\n",
"* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n", "* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n",
"* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n", "* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n",
"* We use `run.take_snapshot()` to capture *this* notebook so we can reproduce this experiment at a later time.\n",
"* We use `run.complete()` to indicate that the run is over and results can be captured and finalized" "* We use `run.complete()` to indicate that the run is over and results can be captured and finalized"
] ]
}, },
@@ -173,9 +172,6 @@
"# Save the model to the outputs directory for capture\n", "# Save the model to the outputs directory for capture\n",
"joblib.dump(value=regression_model, filename='outputs/model.pkl')\n", "joblib.dump(value=regression_model, filename='outputs/model.pkl')\n",
"\n", "\n",
"# Take a snapshot of the directory containing this notebook\n",
"run.take_snapshot('./')\n",
"\n",
"# Complete the run\n", "# Complete the run\n",
"run.complete()" "run.complete()"
] ]
@@ -238,10 +234,7 @@
" run.log(name=\"mse\", value=mse)\n", " run.log(name=\"mse\", value=mse)\n",
"\n", "\n",
" # Save the model to the outputs directory for capture\n", " # Save the model to the outputs directory for capture\n",
" joblib.dump(value=regression_model, filename='outputs/model.pkl')\n", " joblib.dump(value=regression_model, filename='outputs/model.pkl')\n"
" \n",
" # Capture this notebook with the run\n",
" run.take_snapshot('./')\n"
] ]
}, },
{ {

View File

@@ -94,7 +94,7 @@
"source": [ "source": [
"# load workspace configuration from the config.json file in the current folder.\n", "# load workspace configuration from the config.json file in the current folder.\n",
"ws = Workspace.from_config()\n", "ws = Workspace.from_config()\n",
"print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\\t')" "print(ws.name, ws.location, ws.resource_group, ws.location, sep='\\t')"
] ]
}, },
{ {
@@ -205,7 +205,7 @@
"import urllib.request\n", "import urllib.request\n",
"\n", "\n",
"data_folder = os.path.join(os.getcwd(), 'data')\n", "data_folder = os.path.join(os.getcwd(), 'data')\n",
"os.makedirs(data_folder, exist_ok = True)\n", "os.makedirs(data_folder, exist_ok=True)\n",
"\n", "\n",
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))\n", "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))\n",
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))\n", "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))\n",
@@ -304,7 +304,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"import os\n", "import os\n",
"script_folder = os.path.join(os.getcwd(), \"sklearn-mnist\")\n", "script_folder = os.path.join(os.getcwd(), \"sklearn-mnist\")\n",
"os.makedirs(script_folder, exist_ok=True)" "os.makedirs(script_folder, exist_ok=True)"
] ]
}, },
@@ -341,7 +341,7 @@
"parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n", "parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n",
"args = parser.parse_args()\n", "args = parser.parse_args()\n",
"\n", "\n",
"data_folder = os.path.join(args.data_folder, 'mnist')\n", "data_folder = args.data_folder\n",
"print('Data folder:', data_folder)\n", "print('Data folder:', data_folder)\n",
"\n", "\n",
"# load train and test set into numpy arrays\n", "# load train and test set into numpy arrays\n",
@@ -426,7 +426,7 @@
"* Parameters required from the training script \n", "* Parameters required from the training script \n",
"* Python packages needed for training\n", "* Python packages needed for training\n",
"\n", "\n",
"In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)." "In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.path('mnist').as_mount()`)."
] ]
}, },
{ {
@@ -442,8 +442,8 @@
"from azureml.train.estimator import Estimator\n", "from azureml.train.estimator import Estimator\n",
"\n", "\n",
"script_params = {\n", "script_params = {\n",
" '--data-folder': ds.as_mount(),\n", " '--data-folder': ds.path('mnist').as_mount(),\n",
" '--regularization': 0.8\n", " '--regularization': 0.05\n",
"}\n", "}\n",
"\n", "\n",
"est = Estimator(source_directory=script_folder,\n", "est = Estimator(source_directory=script_folder,\n",
@@ -453,13 +453,29 @@
" conda_packages=['scikit-learn'])" " conda_packages=['scikit-learn'])"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is what the mounting point looks like:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(ds.path('mnist').as_mount())"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Submit the job to the cluster\n", "### Submit the job to the cluster\n",
"\n", "\n",
"Run the experiment by submitting the estimator object." "Run the experiment by submitting the estimator object. And you can navigate to Azure portal to monitor the run."
] ]
}, },
{ {
@@ -486,17 +502,17 @@
"\n", "\n",
"## Monitor a remote run\n", "## Monitor a remote run\n",
"\n", "\n",
"In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n", "In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the dependencies (`conda_packages` parameter in the above estimator constructor) don't change, the same image is reused and hence the container start up time is much faster.\n",
"\n", "\n",
"Here is what's happening while you wait:\n", "Here is what's happening while you wait:\n",
"\n", "\n",
"- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n", "- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**. \n",
"\n", "\n",
" This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n", " This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n",
"\n", "\n",
"- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n", "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n",
"\n", "\n",
"- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n", "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n",
"\n", "\n",
"- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n", "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n",
"\n", "\n",
@@ -526,7 +542,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"If you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." "By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
] ]
}, },
{ {
@@ -535,7 +551,7 @@
"source": [ "source": [
"### Get log results upon completion\n", "### Get log results upon completion\n",
"\n", "\n",
"Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete." "Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code. "
] ]
}, },
{ {
@@ -550,7 +566,8 @@
}, },
"outputs": [], "outputs": [],
"source": [ "source": [
"run.wait_for_completion(show_output=False) # specify True for a verbose log" "# specify show_output to True for a verbose log\n",
"run.wait_for_completion(show_output=False) "
] ]
}, },
{ {
@@ -559,7 +576,7 @@
"source": [ "source": [
"### Display run results\n", "### Display run results\n",
"\n", "\n",
"You now have a model trained on a remote cluster. Retrieve the accuracy of the model:" "You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:"
] ]
}, },
{ {
@@ -620,7 +637,7 @@
"source": [ "source": [
"# register model \n", "# register model \n",
"model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n", "model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n",
"print(model.name, model.id, model.version, sep = '\\t')" "print(model.name, model.id, model.version, sep='\\t')"
] ]
}, },
{ {
@@ -663,9 +680,9 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.2" "version": "3.6.8"
}, },
"msauthor": "sgilley" "msauthor": "haining"
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2