mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
version 1.0.17
This commit is contained in:
@@ -102,3 +102,5 @@ pip install azureml-sdk[explain]
|
||||
# install the core SDK and experimental components
|
||||
pip install azureml-sdk[contrib]
|
||||
```
|
||||
Drag and Drop
|
||||
The image will be downloaded by Fatkun
|
||||
17
README.md
17
README.md
@@ -1,9 +1,6 @@
|
||||
# Azure Machine Learning service example notebooks
|
||||
|
||||
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK
|
||||
which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK
|
||||
allows you the choice of using local or cloud compute resources, while managing
|
||||
and maintaining the complete data science workflow from the cloud.
|
||||
This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud.
|
||||
|
||||

|
||||
|
||||
@@ -18,16 +15,17 @@ You should always run the [Configuration](./configuration.ipynb) notebook first
|
||||
|
||||
If you want to...
|
||||
|
||||
* ...try out and explore Azure ML, start with image classification tutorials [part 1 training](./tutorials/img-classification-part1-training.ipynb) and [part 2 deployment](./tutorials/img-classification-part2-deploy.ipynb).
|
||||
* ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb).
|
||||
* ...prepare your data and do automated machine learning, start with regression tutorials: [Part 1 (Data Prep)](./tutorials/regression-part1-data-prep.ipynb) and [Part 2 (Automated ML)](./tutorials/regression-part2-automated-ml.ipynb).
|
||||
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
|
||||
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
|
||||
* ...deploy model as realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
||||
* ...deploy models as batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb).
|
||||
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
|
||||
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
|
||||
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
|
||||
|
||||
## Tutorials
|
||||
|
||||
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs)
|
||||
The [Tutorials](./tutorials) folder contains notebooks for the tutorials described in the [Azure Machine Learning documentation](https://aka.ms/aml-docs).
|
||||
|
||||
## How to use Azure ML
|
||||
|
||||
@@ -45,9 +43,8 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
||||
## Documentation
|
||||
|
||||
* Quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
|
||||
|
||||
* [Python SDK reference](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py)
|
||||
|
||||
* Azure ML Data Prep SDK [overview](https://aka.ms/data-prep-sdk), [Python SDK reference](https://aka.ms/aml-data-prep-apiref), and [tutorials and how-tos](https://aka.ms/aml-data-prep-notebooks).
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -96,7 +96,7 @@
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"print(\"This notebook was created using version 1.0.15 of the Azure ML SDK\")\n",
|
||||
"print(\"This notebook was created using version 1.0.17 of the Azure ML SDK\")\n",
|
||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -5,8 +5,8 @@ Learn how to use Azure Machine Learning services for experimentation and model m
|
||||
As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.
|
||||
|
||||
* [train-within-notebook](./training/train-within-notebook): Train a model hile tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
|
||||
* [train-on-local](./training/train-on-local): Learn how to submit a run and use Azure ML managed run configuration.
|
||||
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node managed compute cluster as a remote compute target for CPU or GPU based training.
|
||||
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
|
||||
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
|
||||
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
|
||||
* [logging-api](./training/logging-api): Learn about the details of logging metrics to run history.
|
||||
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
|
||||
|
||||
@@ -229,6 +229,9 @@ If a sample notebook fails with an error that property, method or library does n
|
||||
1) Check that you have selected correct kernel in jupyter notebook. The kernel is displayed in the top right of the notebook page. It can be changed using the `Kernel | Change Kernel` menu option. For Azure Notebooks, it should be `Python 3.6`. For local conda environments, it should be the conda envioronment name that you specified in automl_setup. The default is azure_automl. Note that the kernel is saved as part of the notebook. So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.
|
||||
2) Check that the notebook is for the SDK version that you are using. You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell. You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version.
|
||||
|
||||
## Numpy import fails on Windows
|
||||
Some Windows environments see an error loading numpy with the latest Python version 3.6.8. If you see this issue, try with Python version 3.6.7.
|
||||
|
||||
## Remote run: DsvmCompute.create fails
|
||||
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
|
||||
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.
|
||||
|
||||
@@ -2,7 +2,7 @@ name: azure_automl
|
||||
dependencies:
|
||||
# The python interpreter version.
|
||||
# Currently Azure ML only supports 3.5.2 and later.
|
||||
- python=3.6
|
||||
- python>=3.5.2,<3.6.8
|
||||
- nb_conda
|
||||
- matplotlib==2.1.0
|
||||
- numpy>=1.11.0,<1.15.0
|
||||
@@ -12,6 +12,7 @@ dependencies:
|
||||
- scikit-learn>=0.18.0,<=0.19.1
|
||||
- pandas>=0.22.0,<0.23.0
|
||||
- tensorflow>=1.12.0
|
||||
- py-xgboost<=0.80
|
||||
|
||||
- pip:
|
||||
# Required packages for AzureML execution, history, and data preparation.
|
||||
|
||||
@@ -2,7 +2,7 @@ name: azure_automl
|
||||
dependencies:
|
||||
# The python interpreter version.
|
||||
# Currently Azure ML only supports 3.5.2 and later.
|
||||
- python=3.6
|
||||
- python>=3.5.2,<3.6.8
|
||||
- nb_conda
|
||||
- matplotlib==2.1.0
|
||||
- numpy>=1.15.3
|
||||
@@ -12,6 +12,7 @@ dependencies:
|
||||
- scikit-learn>=0.18.0,<=0.19.1
|
||||
- pandas>=0.22.0,<0.23.0
|
||||
- tensorflow>=1.12.0
|
||||
- py-xgboost<=0.80
|
||||
|
||||
- pip:
|
||||
# Required packages for AzureML execution, history, and data preparation.
|
||||
|
||||
@@ -84,9 +84,9 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# choose a name for experiment\n",
|
||||
"experiment_name = 'automl-local-classification'\n",
|
||||
"experiment_name = 'automl-classification-deployment'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||
"project_folder = './sample_projects/automl-classification-deployment'\n",
|
||||
"\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -103,23 +103,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -289,8 +272,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"experiment_name = 'automl-local-classification'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
||||
]
|
||||
|
||||
@@ -100,23 +100,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -81,8 +81,8 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# Choose a name for the experiment and specify the project folder.\n",
|
||||
"experiment_name = 'automl-local-classification'\n",
|
||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||
"experiment_name = 'automl-classification'\n",
|
||||
"project_folder = './sample_projects/automl-classification'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -99,23 +99,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -49,23 +49,6 @@
|
||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -49,23 +49,6 @@
|
||||
"Currently, Data Prep only supports __Ubuntu 16__ and __Red Hat Enterprise Linux 7__. We are working on supporting more linux distros."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -70,23 +70,6 @@
|
||||
"ws = Workspace.from_config()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -147,8 +147,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Data Splitting\n",
|
||||
"For the purposes of demonstration and later forecast evaluation, we now split the data into a training and a testing set. The test set will contain the final 20 weeks of observed sales for each time-series."
|
||||
"For demonstration purposes, we extract sales time-series for just a few of the stores:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -157,19 +156,37 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ntest_periods = 20\n",
|
||||
"use_stores = [2, 5, 8]\n",
|
||||
"data_subset = data[data.Store.isin(use_stores)]\n",
|
||||
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
|
||||
"print('Data subset contains {0} individual time-series.'.format(nseries))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Data Splitting\n",
|
||||
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"n_test_periods = 20\n",
|
||||
"\n",
|
||||
"def split_last_n_by_grain(df, n):\n",
|
||||
" \"\"\"\n",
|
||||
" Group df by grain and split on last n rows for each group\n",
|
||||
" \"\"\"\n",
|
||||
" \"\"\"Group df by grain and split on last n rows for each group.\"\"\"\n",
|
||||
" df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n",
|
||||
" .groupby(grain_column_names, group_keys=False))\n",
|
||||
" df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
|
||||
" df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
|
||||
" return df_head, df_tail\n",
|
||||
"\n",
|
||||
"X_train, X_test = split_last_n_by_grain(data, ntest_periods)"
|
||||
"X_train, X_test = split_last_n_by_grain(data_subset, n_test_periods)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -187,24 +204,7 @@
|
||||
"\n",
|
||||
"AutoML will currently train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series.\n",
|
||||
"\n",
|
||||
"You are almost ready to start an AutoML training job. We will first need to create a validation set from the existing training set (i.e. for hyper-parameter tuning): "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"nvalidation_periods = 20\n",
|
||||
"X_train, X_validate = split_last_n_by_grain(X_train, nvalidation_periods)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We also need to separate the target column from the rest of the DataFrame: "
|
||||
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -214,8 +214,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"target_column_name = 'Quantity'\n",
|
||||
"y_train = X_train.pop(target_column_name).values\n",
|
||||
"y_validate = X_validate.pop(target_column_name).values "
|
||||
"y_train = X_train.pop(target_column_name).values"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -224,22 +223,31 @@
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, and the training and validation data. \n",
|
||||
"The AutoMLConfig object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters. \n",
|
||||
"\n",
|
||||
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time and the grain column names. A time column is required for forecasting, while the grain is optional. If a grain is not given, the forecaster assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak. \n",
|
||||
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If a grain is not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
|
||||
"\n",
|
||||
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up-to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning organizaion that needs to estimate the next month of sales would set the horizon accordingly. \n",
|
||||
"\n",
|
||||
"Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *X_valid* and *y_valid* parameters of AutoMLConfig.\n",
|
||||
"\n",
|
||||
"Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
"|**task**|forecasting|\n",
|
||||
"|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
|
||||
"|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n",
|
||||
"|**X**|Training matrix of features, shape = [n_training_samples, n_features]|\n",
|
||||
"|**y**|Target values, shape = [n_training_samples, ]|\n",
|
||||
"|**X_valid**|Validation matrix of features, shape = [n_validation_samples, n_features]|\n",
|
||||
"|**y_valid**|Target values for validation, shape = [n_validation_samples, ]\n",
|
||||
"|**X**|Training matrix of features as a pandas DataFrame, shape = [n_training_samples, n_features]|\n",
|
||||
"|**y**|Target values as a numpy.ndarray, shape = [n_training_samples, ]|\n",
|
||||
"|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
|
||||
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n",
|
||||
"|**debug_log**|Log file path for writing debugging information\n",
|
||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
|
||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||
"|**time_column_name**|Name of the datetime column in the input data|\n",
|
||||
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
|
||||
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
|
||||
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -248,10 +256,11 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"automl_settings = {\n",
|
||||
"time_series_settings = {\n",
|
||||
" 'time_column_name': time_column_name,\n",
|
||||
" 'grain_column_names': grain_column_names,\n",
|
||||
" 'drop_column_names': ['logQuantity']\n",
|
||||
" 'drop_column_names': ['logQuantity'],\n",
|
||||
" 'max_horizon': n_test_periods\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"automl_config = AutoMLConfig(task='forecasting',\n",
|
||||
@@ -260,12 +269,11 @@
|
||||
" iterations=10,\n",
|
||||
" X=X_train,\n",
|
||||
" y=y_train,\n",
|
||||
" X_valid=X_validate,\n",
|
||||
" y_valid=y_validate,\n",
|
||||
" n_cross_validations=5,\n",
|
||||
" enable_ensembling=False,\n",
|
||||
" path=project_folder,\n",
|
||||
" verbosity=logging.INFO,\n",
|
||||
" **automl_settings)"
|
||||
" **time_series_settings)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -102,23 +102,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -74,9 +74,9 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# choose a name for experiment\n",
|
||||
"experiment_name = 'automl-local-classification'\n",
|
||||
"experiment_name = 'automl-model-explanation'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './sample_projects/automl-local-classification-model-explanation'\n",
|
||||
"project_folder = './sample_projects/automl-model-explanation'\n",
|
||||
"\n",
|
||||
"experiment=Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -93,23 +93,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -96,23 +96,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -104,23 +104,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -130,7 +113,7 @@
|
||||
"1. Create a Linux DSVM in Azure, following these [quick instructions](https://docs.microsoft.com/en-us/azure/machine-learning/desktop-workbench/how-to-create-dsvm-hdi). Make sure you use the Ubuntu flavor (not CentOS). Make sure that disk space is available under `/tmp` because AutoML creates files under `/tmp/azureml_run`s. The DSVM should have more cores than the number of parallel runs that you plan to enable. It should also have at least 4GB per core.\n",
|
||||
"2. Enter the IP address, user name and password below.\n",
|
||||
"\n",
|
||||
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on changing SSH ports for security reasons."
|
||||
"**Note:** By default, SSH runs on port 22 and you don't need to change the port number below. If you've configured SSH to use a different port, change `dsvm_ssh_port` accordinglyaddress. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on changing SSH ports for security reasons."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -67,6 +67,7 @@
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"import os\n",
|
||||
"import csv\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
@@ -89,7 +90,7 @@
|
||||
"\n",
|
||||
"# Choose a name for the run history container in the workspace.\n",
|
||||
"experiment_name = 'automl-remote-amlcompute'\n",
|
||||
"project_folder = './sample_projects/automl-remote-amlcompute'\n",
|
||||
"project_folder = './project'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -106,23 +107,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -171,6 +155,51 @@
|
||||
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data\n",
|
||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||
"This can be done by uploading the data to DataStore.\n",
|
||||
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data_train = datasets.load_digits()\n",
|
||||
"\n",
|
||||
"if not os.path.isdir('data'):\n",
|
||||
" os.mkdir('data')\n",
|
||||
" \n",
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)\n",
|
||||
" \n",
|
||||
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
|
||||
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
|
||||
"\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"ds.upload(src_dir='./data', target_path='bai_data', overwrite=True, show_progress=True)\n",
|
||||
"\n",
|
||||
"from azureml.core.runconfig import DataReferenceConfiguration\n",
|
||||
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
|
||||
" path_on_datastore='bai_data', \n",
|
||||
" path_on_compute='/tmp/azureml_runs',\n",
|
||||
" mode='download', # download files from datastore to compute target\n",
|
||||
" overwrite=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -188,29 +217,13 @@
|
||||
"conda_run_config.environment.docker.enabled = True\n",
|
||||
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||
"\n",
|
||||
"# set the data reference of the run coonfiguration\n",
|
||||
"conda_run_config.data_references = {ds.name: dr}\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data\n",
|
||||
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
||||
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -219,17 +232,13 @@
|
||||
"source": [
|
||||
"%%writefile $project_folder/get_data.py\n",
|
||||
"\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from scipy import sparse\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"def get_data():\n",
|
||||
" X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
"\n",
|
||||
" digits = datasets.load_digits()\n",
|
||||
" X_train = digits.data\n",
|
||||
" y_train = digits.target\n",
|
||||
"\n",
|
||||
" return { \"X\" : X_train, \"y\" : y_train }"
|
||||
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -99,23 +99,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -123,7 +106,7 @@
|
||||
"### Create a Remote Linux DSVM\n",
|
||||
"Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n",
|
||||
"\n",
|
||||
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this."
|
||||
"**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://docs.microsoft.com/en-us/azure/virtual-machines/troubleshooting/detailed-troubleshoot-ssh-connection) on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -68,6 +68,7 @@
|
||||
"import logging\n",
|
||||
"import os\n",
|
||||
"import time\n",
|
||||
"import csv\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
@@ -90,7 +91,7 @@
|
||||
"\n",
|
||||
"# Choose a name for the run history container in the workspace.\n",
|
||||
"experiment_name = 'automl-remote-dsvm'\n",
|
||||
"project_folder = './sample_projects/automl-remote-dsvm'\n",
|
||||
"project_folder = './project'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -107,23 +108,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -153,6 +137,44 @@
|
||||
" time.sleep(90) # Wait for ssh to be accessible"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data\n",
|
||||
"For remote executions, you need to make the data accessible from the remote compute.\n",
|
||||
"This can be done by uploading the data to DataStore.\n",
|
||||
"In this example, we upload scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data_train = datasets.load_digits()\n",
|
||||
"\n",
|
||||
"if not os.path.isdir('data'):\n",
|
||||
" os.mkdir('data')\n",
|
||||
" \n",
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)\n",
|
||||
" \n",
|
||||
"pd.DataFrame(data_train.data).to_csv(\"data/X_train.tsv\", index=False, header=False, quoting=csv.QUOTE_ALL, sep=\"\\t\")\n",
|
||||
"pd.DataFrame(data_train.target).to_csv(\"data/y_train.tsv\", index=False, header=False, sep=\"\\t\")\n",
|
||||
"\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"ds.upload(src_dir='./data', target_path='re_data', overwrite=True, show_progress=True)\n",
|
||||
"\n",
|
||||
"from azureml.core.runconfig import DataReferenceConfiguration\n",
|
||||
"dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
|
||||
" path_on_datastore='re_data', \n",
|
||||
" path_on_compute='/tmp/azureml_runs',\n",
|
||||
" mode='download', # download files from datastore to compute target\n",
|
||||
" overwrite=False)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -168,29 +190,13 @@
|
||||
"# Set compute target to the Linux DSVM\n",
|
||||
"conda_run_config.target = dsvm_compute\n",
|
||||
"\n",
|
||||
"# set the data reference of the run coonfiguration\n",
|
||||
"conda_run_config.data_references = {ds.name: dr}\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Data\n",
|
||||
"For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n",
|
||||
"In this example, the `get_data()` function returns data using scikit-learn's [load_digits](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"if not os.path.exists(project_folder):\n",
|
||||
" os.makedirs(project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -199,17 +205,13 @@
|
||||
"source": [
|
||||
"%%writefile $project_folder/get_data.py\n",
|
||||
"\n",
|
||||
"from sklearn import datasets\n",
|
||||
"from scipy import sparse\n",
|
||||
"import numpy as np\n",
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"def get_data():\n",
|
||||
" X_train = pd.read_csv(\"/tmp/azureml_runs/re_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
" y_train = pd.read_csv(\"/tmp/azureml_runs/re_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
|
||||
"\n",
|
||||
" digits = datasets.load_digits()\n",
|
||||
" X_train = digits.data[100:,:]\n",
|
||||
" y_train = digits.target[100:]\n",
|
||||
"\n",
|
||||
" return { \"X\" : X_train, \"y\" : y_train }"
|
||||
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -75,7 +75,7 @@
|
||||
"experiment_name = 'non_sample_weight_experiment'\n",
|
||||
"sample_weight_experiment_name = 'sample_weight_experiment'\n",
|
||||
"\n",
|
||||
"project_folder = './sample_projects/automl-local-classification'\n",
|
||||
"project_folder = './sample_projects/sample_weight'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n",
|
||||
@@ -93,23 +93,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -79,9 +79,9 @@
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# choose a name for the experiment\n",
|
||||
"experiment_name = 'automl-local-missing-data'\n",
|
||||
"experiment_name = 'sparse-data-train-test-split'\n",
|
||||
"# project folder\n",
|
||||
"project_folder = './sample_projects/automl-local-missing-data'\n",
|
||||
"project_folder = './sample_projects/sparse-data-train-test-split'\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)\n",
|
||||
"\n",
|
||||
@@ -98,23 +98,6 @@
|
||||
"outputDf.T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -88,23 +88,6 @@
|
||||
"pd.DataFrame(data = output, index = ['']).T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"set_diagnostics_collection(send_diagnostics = True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -11,13 +11,6 @@
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -60,14 +53,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# import the Workspace class and check the azureml SDK version\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
"# Set auth to be used by workspace related APIs.\n",
|
||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||
"auth = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -79,7 +68,7 @@
|
||||
"# import the Workspace class and check the azureml SDK version\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
@@ -350,9 +339,6 @@
|
||||
"authors": [
|
||||
{
|
||||
"name": "pasha"
|
||||
},
|
||||
{
|
||||
"name": "wamartin"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -370,9 +356,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.0"
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"name": "03.Build_model_runHistory",
|
||||
"name": "build-model-run-history-03",
|
||||
"notebookId": 3836944406456339
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -20,13 +20,6 @@
|
||||
"Please Register Azure Container Instance(ACI) using Azure Portal: https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-supported-services#portal in your subscription before using the SDK to deploy your ML model to ACI."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -45,15 +38,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"#'''\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
|
||||
"#'''"
|
||||
"# Set auth to be used by workspace related APIs.\n",
|
||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||
"auth = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -63,18 +51,12 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
||||
"\n",
|
||||
"#'''\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')\n",
|
||||
"#'''"
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -293,24 +275,14 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#comment to not delete the web service\n",
|
||||
"#myservice.delete()"
|
||||
"myservice.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "pasha"
|
||||
},
|
||||
{
|
||||
"name": "wamartin"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -328,9 +300,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.0"
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"name": "04.DeploytoACI",
|
||||
"name": "deploy-to-aci-04",
|
||||
"notebookId": 3836944406456376
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -0,0 +1,236 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Azure ML & Azure Databricks notebooks by Parashar Shah.\n",
|
||||
"\n",
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This notebook uses image from ACI notebook for deploying to AKS."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set auth to be used by workspace related APIs.\n",
|
||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||
"auth = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# List images by ws\n",
|
||||
"\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"for i in ContainerImage.list(workspace = ws):\n",
|
||||
" print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.image import Image\n",
|
||||
"myimage = Image(workspace=ws, name=\"aciws\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#create AKS compute\n",
|
||||
"#it may take 20-25 minutes to create a new cluster\n",
|
||||
"\n",
|
||||
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||
"\n",
|
||||
"# Use the default configuration (can also provide parameters to customize)\n",
|
||||
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||
"\n",
|
||||
"aks_name = 'ps-aks-demo2' \n",
|
||||
"\n",
|
||||
"# Create the cluster\n",
|
||||
"aks_target = ComputeTarget.create(workspace = ws, \n",
|
||||
" name = aks_name, \n",
|
||||
" provisioning_configuration = prov_config)\n",
|
||||
"\n",
|
||||
"aks_target.wait_for_completion(show_output = True)\n",
|
||||
"\n",
|
||||
"print(aks_target.provisioning_state)\n",
|
||||
"print(aks_target.provisioning_errors)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import Webservice\n",
|
||||
"help( Webservice.deploy_from_image)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||
"from azureml.core.image import ContainerImage\n",
|
||||
"\n",
|
||||
"#Set the web service configuration (using default here with app insights)\n",
|
||||
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n",
|
||||
"\n",
|
||||
"#unique service name\n",
|
||||
"service_name ='ps-aks-service'\n",
|
||||
"\n",
|
||||
"# Webservice creation using single command, there is a variant to use image directly as well.\n",
|
||||
"aks_service = Webservice.deploy_from_image(\n",
|
||||
" workspace=ws, \n",
|
||||
" name=service_name,\n",
|
||||
" deployment_config = aks_config,\n",
|
||||
" image = myimage,\n",
|
||||
" deployment_target = aks_target\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
"aks_service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"aks_service.deployment_status"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#for using the Web HTTP API \n",
|
||||
"print(aks_service.scoring_uri)\n",
|
||||
"print(aks_service.get_keys())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import json\n",
|
||||
"\n",
|
||||
"#get the some sample data\n",
|
||||
"test_data_path = \"AdultCensusIncomeTest\"\n",
|
||||
"test = spark.read.parquet(test_data_path).limit(5)\n",
|
||||
"\n",
|
||||
"test_json = json.dumps(test.toJSON().collect())\n",
|
||||
"\n",
|
||||
"print(test_json)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#using data defined above predict if income is >50K (1) or <=50K (0)\n",
|
||||
"aks_service.run(input_data=test_json)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#comment to not delete the web service\n",
|
||||
"aks_service.delete()\n",
|
||||
"#image.delete()\n",
|
||||
"#model.delete()\n",
|
||||
"aks_target.delete() "
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "pasha"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"name": "deploy-to-aks-existingimage-05",
|
||||
"notebookId": 1030695628045968
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
}
|
||||
@@ -11,13 +11,6 @@
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -42,7 +35,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Download AdultCensusIncome.csv from Azure CDN. This file has 32,561 rows.\n",
|
||||
"basedataurl = \"https://amldockerdatasets.azureedge.net\"\n",
|
||||
"dataurl = \"https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv\"\n",
|
||||
"datafile = \"AdultCensusIncome.csv\"\n",
|
||||
"datafile_dbfs = os.path.join(\"/dbfs\", datafile)\n",
|
||||
"\n",
|
||||
@@ -50,7 +43,7 @@
|
||||
" print(\"found {} at {}\".format(datafile, datafile_dbfs))\n",
|
||||
"else:\n",
|
||||
" print(\"downloading {} to {}\".format(datafile, datafile_dbfs))\n",
|
||||
" urllib.request.urlretrieve(os.path.join(basedataurl, datafile), datafile_dbfs)"
|
||||
" urllib.request.urlretrieve(dataurl, datafile_dbfs)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -152,9 +145,6 @@
|
||||
"authors": [
|
||||
{
|
||||
"name": "pasha"
|
||||
},
|
||||
{
|
||||
"name": "wamartin"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -172,9 +162,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.0"
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"name": "02.Ingest_data",
|
||||
"name": "ingest-data-02",
|
||||
"notebookId": 3836944406456362
|
||||
},
|
||||
"nbformat": 4,
|
||||
|
||||
@@ -35,13 +35,6 @@
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -67,6 +60,18 @@
|
||||
"# workspace_region = \"<your-resource group-region>\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set auth to be used by workspace related APIs.\n",
|
||||
"# For automation or CI/CD ServicePrincipalAuthentication can be used.\n",
|
||||
"# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n",
|
||||
"auth = None"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -82,6 +87,7 @@
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group, \n",
|
||||
" location = workspace_region,\n",
|
||||
" auth = auth,\n",
|
||||
" exist_ok=True)"
|
||||
]
|
||||
},
|
||||
@@ -103,12 +109,13 @@
|
||||
"source": [
|
||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group)\n",
|
||||
" resource_group = resource_group,\n",
|
||||
" auth = auth)\n",
|
||||
"\n",
|
||||
"# persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||
"ws.write_config()\n",
|
||||
"##if you need to give a different path/filename please use this\n",
|
||||
"##write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||
"#if you need to give a different path/filename please use this\n",
|
||||
"#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -129,29 +136,19 @@
|
||||
"# import the Workspace class and check the azureml SDK version\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"ws = Workspace.from_config(auth = auth)\n",
|
||||
"#ws = Workspace.from_config(<full path>)\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "pasha"
|
||||
},
|
||||
{
|
||||
"name": "wamartin"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
@@ -169,10 +166,10 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.0"
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"name": "01.Installation_and_Configuration",
|
||||
"notebookId": 3836944406456490
|
||||
"name": "installation-and-configuration-01",
|
||||
"notebookId": 3688394266452835
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 1
|
||||
|
||||
@@ -83,11 +83,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"#subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||
"#resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||
"#workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||
"#workspace_region = \"<azureregion>\" #your region"
|
||||
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||
"workspace_region = \"<azureregion>\" #your region"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -113,35 +112,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"# import auth creds from notebook parameters\n",
|
||||
"tenant = dbutils.widgets.get('tenant_id')\n",
|
||||
"username = dbutils.widgets.get('service_principal_id')\n",
|
||||
"password = dbutils.widgets.get('service_principal_password')\n",
|
||||
"\n",
|
||||
"auth = azureml.core.authentication.ServicePrincipalAuthentication(tenant, username, password)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"subscription_id = dbutils.widgets.get('subscription_id')\n",
|
||||
"resource_group = dbutils.widgets.get('resource_group')\n",
|
||||
"workspace_name = dbutils.widgets.get('workspace_name')\n",
|
||||
"workspace_region = dbutils.widgets.get('workspace_region')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
@@ -149,29 +119,10 @@
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group, \n",
|
||||
" location = workspace_region, \n",
|
||||
" auth = auth,\n",
|
||||
" exist_ok=True)\n",
|
||||
"ws.get_details()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"## Import the Workspace class and check the Azure ML SDK version.\n",
|
||||
"#from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"#ws = Workspace.create(name = workspace_name,\n",
|
||||
"# subscription_id = subscription_id,\n",
|
||||
"# resource_group = resource_group, \n",
|
||||
"# location = workspace_region, \n",
|
||||
"# exist_ok=True)\n",
|
||||
"#ws.get_details()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -186,35 +137,16 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group,\n",
|
||||
" auth = auth)\n",
|
||||
" resource_group = resource_group)\n",
|
||||
"\n",
|
||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||
"ws.write_config()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"#from azureml.core import Workspace\n",
|
||||
"#\n",
|
||||
"#ws = Workspace(workspace_name = workspace_name,\n",
|
||||
"# subscription_id = subscription_id,\n",
|
||||
"# resource_group = resource_group)\n",
|
||||
"#\n",
|
||||
"## Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||
"#ws.write_config()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
|
||||
@@ -99,11 +99,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"#subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||
"#resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||
"#workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||
"#workspace_region = \"<azureregion>\" #your region"
|
||||
"subscription_id = \"<Your SubscriptionId>\" #you should be owner or contributor\n",
|
||||
"resource_group = \"<Resource group - new or existing>\" #you should be owner or contributor\n",
|
||||
"workspace_name = \"<workspace to be created>\" #your workspace name\n",
|
||||
"workspace_region = \"<azureregion>\" #your region"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -129,35 +128,6 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"# import auth creds from notebook parameters\n",
|
||||
"tenant = dbutils.widgets.get('tenant_id')\n",
|
||||
"username = dbutils.widgets.get('service_principal_id')\n",
|
||||
"password = dbutils.widgets.get('service_principal_password')\n",
|
||||
"\n",
|
||||
"auth = azureml.core.authentication.ServicePrincipalAuthentication(tenant, username, password)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"subscription_id = dbutils.widgets.get('subscription_id')\n",
|
||||
"resource_group = dbutils.widgets.get('resource_group')\n",
|
||||
"workspace_name = dbutils.widgets.get('workspace_name')\n",
|
||||
"workspace_region = dbutils.widgets.get('workspace_region')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
@@ -165,29 +135,10 @@
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group, \n",
|
||||
" location = workspace_region, \n",
|
||||
" auth = auth,\n",
|
||||
" exist_ok=True)\n",
|
||||
"ws.get_details()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"## Import the Workspace class and check the Azure ML SDK version.\n",
|
||||
"#from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"#ws = Workspace.create(name = workspace_name,\n",
|
||||
"# subscription_id = subscription_id,\n",
|
||||
"# resource_group = resource_group, \n",
|
||||
"# location = workspace_region, \n",
|
||||
"# exist_ok=True)\n",
|
||||
"#ws.get_details()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -202,35 +153,15 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##TESTONLY\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace(workspace_name = workspace_name,\n",
|
||||
" subscription_id = subscription_id,\n",
|
||||
" resource_group = resource_group,\n",
|
||||
" auth = auth)\n",
|
||||
" resource_group = resource_group)\n",
|
||||
"\n",
|
||||
"# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||
"ws.write_config()\n",
|
||||
"#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"##PUBLISHONLY\n",
|
||||
"#from azureml.core import Workspace\n",
|
||||
"#\n",
|
||||
"#ws = Workspace(workspace_name = workspace_name,\n",
|
||||
"# subscription_id = subscription_id,\n",
|
||||
"# resource_group = resource_group)\n",
|
||||
"#\n",
|
||||
"## Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n",
|
||||
"#ws.write_config()\n",
|
||||
"#write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||
"write_config(path=\"/databricks/driver/aml_config/\",file_name=<alias_conf.cfg>)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -12,8 +12,8 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Azure Machine Learning Pipeline with DataTransferStep\n",
|
||||
"This notebook is used to demonstrate the use of DataTransferStep in Azure Machine Learning Pipeline.\n",
|
||||
"# Azure Machine Learning Pipeline with DataTranferStep\n",
|
||||
"This notebook is used to demonstrate the use of DataTranferStep in Azure Machine Learning Pipeline.\n",
|
||||
"\n",
|
||||
"In certain cases, you will need to transfer data from one data location to another. For example, your data may be in Files storage and you may want to move it to Blob storage. Or, if your data is in an ADLS account and you want to make it available in the Blob storage. The built-in **DataTransferStep** class helps you transfer data in these situations.\n",
|
||||
"\n",
|
||||
|
||||
@@ -67,8 +67,7 @@
|
||||
"source": [
|
||||
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
|
||||
"\n",
|
||||
"If you don't have a config.json file, please go through the configuration Notebook located here:\n",
|
||||
"https://github.com/Azure/MachineLearningNotebooks. \n",
|
||||
"If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
|
||||
"\n",
|
||||
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
|
||||
]
|
||||
@@ -80,7 +79,11 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||
"\n",
|
||||
"print('Workspace Name: ' + ws.name, \n",
|
||||
" 'Azure Region: ' + ws.location, \n",
|
||||
" 'Subscription Id: ' + ws.subscription_id, \n",
|
||||
" 'Resource Group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -114,7 +117,8 @@
|
||||
" batch_compute = BatchCompute(ws, batch_compute_name)\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" print('Attaching Batch compute...')\n",
|
||||
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, account_name=batch_account_name)\n",
|
||||
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
|
||||
" account_name=batch_account_name)\n",
|
||||
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
|
||||
" batch_compute.wait_for_completion()\n",
|
||||
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
|
||||
@@ -127,7 +131,19 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Setup DataStore"
|
||||
"## Setup Datastore"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Setting up the Blob storage associated with the workspace. \n",
|
||||
"The following call retrieves the Azure Blob Store associated with your workspace. \n",
|
||||
"Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is**. \n",
|
||||
" \n",
|
||||
"If you want to register another Datastore, please follow the instructions from here:\n",
|
||||
"https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data#register-a-datastore"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -136,11 +152,12 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Blob storage associated with the workspace\n",
|
||||
"# The following call GETS the Azure Blob Store associated with your workspace.\n",
|
||||
"# Note that workspaceblobstore is **the name of this store and CANNOT BE CHANGED and must be used as is** \n",
|
||||
"default_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||
"print(\"Blobstore name: {}\".format(def_blob_store.name))"
|
||||
"datastore = Datastore(ws, \"workspaceblobstore\")\n",
|
||||
"\n",
|
||||
"print('Datastore details:')\n",
|
||||
"print('Datastore Account Name: ' + datastore.account_name)\n",
|
||||
"print('Datastore Workspace Name: ' + datastore.workspace.name)\n",
|
||||
"print('Datastore Container Name: ' + datastore.container_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -154,7 +171,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For this example we will upload a file in the provided DataStore. These are some helper methods to achieve that."
|
||||
"For this example we will upload a file in the provided Datastore. These are some helper methods to achieve that."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -171,16 +188,16 @@
|
||||
" return temp_dir\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def upload_file_to_datastore(datastore, path, content):\n",
|
||||
" dir = create_local_file(content=content, file_name=\"temp.file\")\n",
|
||||
" datastore.upload(src_dir=dir, target_path=path, overwrite=True, show_progress=True)"
|
||||
"def upload_file_to_datastore(datastore, file_name, content):\n",
|
||||
" dir = create_local_file(content=content, file_name=file_name)\n",
|
||||
" datastore.upload(src_dir=dir, overwrite=True, show_progress=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here we associate the input DataReference with an existing file in the provided DataStore. Feel free to upload the file of your choice manually or use the *upload_testdata* method. "
|
||||
"Here we associate the input DataReference with an existing file in the provided Datastore. Feel free to upload the file of your choice manually or use the *upload_file_to_datastore* method. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -189,14 +206,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"testdata_path=\"testdata.txt\"\n",
|
||||
"file_name=\"input.txt\"\n",
|
||||
"\n",
|
||||
"upload_file_to_datastore(datastore=default_blob_store, \n",
|
||||
" path=testdata_path, \n",
|
||||
" content=\"This is the content of the file\")\n",
|
||||
"upload_file_to_datastore(datastore=datastore, \n",
|
||||
" file_name=file_name, \n",
|
||||
" content=\"this is the content of the file\")\n",
|
||||
"\n",
|
||||
"testdata = DataReference(datastore=default_blob_store, \n",
|
||||
" path_on_datastore=testdata_path, \n",
|
||||
"testdata = DataReference(datastore=datastore, \n",
|
||||
" path_on_datastore=file_name, \n",
|
||||
" data_reference_name=\"input\")\n",
|
||||
"\n",
|
||||
"outputdata = PipelineData(name=\"output\", datastore=datastore)"
|
||||
@@ -224,7 +241,7 @@
|
||||
"source": [
|
||||
"binaries_folder = \"azurebatch/job_binaries\"\n",
|
||||
"if not os.path.isdir(binaries_folder):\n",
|
||||
" os.mkdir(project_folder)\n",
|
||||
" os.mkdir(binaries_folder)\n",
|
||||
"\n",
|
||||
"file_name=\"azurebatch.cmd\"\n",
|
||||
"with open(path.join(binaries_folder, file_name), 'w') as f:\n",
|
||||
|
||||
@@ -29,7 +29,8 @@
|
||||
"import os\n",
|
||||
"import shutil\n",
|
||||
"import urllib\n",
|
||||
"from azureml.core import Experiment\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core import Workspace, Experiment\n",
|
||||
"from azureml.core.datastore import Datastore\n",
|
||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||
"from azureml.exceptions import ComputeTargetException\n",
|
||||
@@ -109,7 +110,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Upload MNIST dataset to blob datastore \n",
|
||||
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
|
||||
"A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -118,7 +119,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ds = Datastore(workspace=ws, name=\"MyBlobDatastore\")\n",
|
||||
"ds = ws.get_default_datastore()\n",
|
||||
"ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
|
||||
]
|
||||
},
|
||||
@@ -129,12 +130,12 @@
|
||||
"## Retrieve or create a Azure Machine Learning compute\n",
|
||||
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
|
||||
"\n",
|
||||
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
|
||||
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
|
||||
"\n",
|
||||
"1. Create the configuration\n",
|
||||
"2. Create the Azure Machine Learning compute\n",
|
||||
"\n",
|
||||
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
|
||||
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -143,7 +144,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"cluster_name = \"aml-compute\"\n",
|
||||
"cluster_name = \"gpucluster\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||
@@ -320,7 +321,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Build the experiment"
|
||||
"### Run the pipeline"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -329,31 +330,15 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pipeline = Pipeline(workspace=ws, steps=[hd_step])"
|
||||
"pipeline = Pipeline(workspace=ws, steps=[hd_step])\n",
|
||||
"pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit the experiment "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pipeline_run = Experiment(ws, 'Hyperdrive_Test').submit(pipeline)\n",
|
||||
"pipeline_run.wait_for_completion()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View Run Details"
|
||||
"### Monitor using widget"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -365,6 +350,22 @@
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"RunDetails(pipeline_run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Wait for the completion of this Pipeline run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"pipeline_run.wait_for_completion()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -204,7 +204,8 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a schedule for the pipeline"
|
||||
"### Create a schedule for the pipeline using a recurrence\n",
|
||||
"This schedule will run on a specified recurrence interval."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -345,7 +346,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Change reccurence of the schedule"
|
||||
"### Change recurrence of the schedule"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -366,13 +367,58 @@
|
||||
" wait_for_provisioning=True,\n",
|
||||
" recurrence=recurrence)\n",
|
||||
"\n",
|
||||
"fetched_schedule = Schedule.get_schedule(ws, fetched_schedule.id)\n",
|
||||
"fetched_schedule = Schedule.get(ws, fetched_schedule.id)\n",
|
||||
"\n",
|
||||
"print(\"Updated schedule:\", fetched_schedule.id, \n",
|
||||
" \"\\nNew name:\", fetched_schedule.name,\n",
|
||||
" \"\\nNew frequency:\", fetched_schedule.recurrence.frequency,\n",
|
||||
" \"\\nNew status:\", fetched_schedule.status)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a schedule for the pipeline using a Datastore\n",
|
||||
"This schedule will run when additions or modifications are made to Blobs in the Datastore container.\n",
|
||||
"Note: Only Blob Datastores are supported."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.datastore import Datastore\n",
|
||||
"\n",
|
||||
"datastore = Datastore(workspace=ws, name=\"workspaceblobstore\")\n",
|
||||
"\n",
|
||||
"schedule = Schedule.create(workspace=ws, name=\"My_Schedule\",\n",
|
||||
" pipeline_id=pub_pipeline_id, \n",
|
||||
" experiment_name='Schedule_Run',\n",
|
||||
" datastore=datastore,\n",
|
||||
" wait_for_provisioning=True,\n",
|
||||
" description=\"Schedule Run\")\n",
|
||||
"\n",
|
||||
"# You may want to make sure that the schedule is provisioned properly\n",
|
||||
"# before making any further changes to the schedule\n",
|
||||
"\n",
|
||||
"print(\"Created schedule with id: {}\".format(schedule.id))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Set the wait_for_provisioning flag to False if you do not want to wait \n",
|
||||
"# for the call to provision the schedule in the backend.\n",
|
||||
"schedule.disable(wait_for_provisioning=True)\n",
|
||||
"schedule = Schedule.get(ws, schedule_id)\n",
|
||||
"print(\"Disabled schedule {}. New status is: {}\".format(schedule.id, schedule.status))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
||||
@@ -397,7 +397,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 1. Running the demo notebook already added to the Databricks workspace\n",
|
||||
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable notebook_path when you run the code cell below:"
|
||||
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -436,7 +436,6 @@
|
||||
"source": [
|
||||
"steps = [dbNbStep]\n",
|
||||
"pipeline = Pipeline(workspace=ws, steps=steps)\n",
|
||||
"pipeline.validate()\n",
|
||||
"pipeline_run = Experiment(ws, 'DB_Notebook_demo').submit(pipeline)\n",
|
||||
"pipeline_run.wait_for_completion()"
|
||||
]
|
||||
|
||||
@@ -120,7 +120,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Python Scripts\n",
|
||||
"We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style_mpi.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n",
|
||||
"We use an edited version of `neural_style_mpi.py` (original is [here](https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py)). Scripts to split and stitch the video are thin wrappers to calls to `ffmpeg`. \n",
|
||||
"\n",
|
||||
"We install `ffmpeg` through conda dependencies."
|
||||
]
|
||||
@@ -201,6 +201,13 @@
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The sample video **organutan.mp4** is stored at a publicly shared datastore. We are registering the datastore below. If you want to take a look at the original video, click here. (https://pipelinedata.blob.core.windows.net/sample-videos/orangutan.mp4)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
@@ -208,8 +215,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# datastore for input video\n",
|
||||
"account_name = \"happypathspublic\"\n",
|
||||
"video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"videos\",\n",
|
||||
"account_name = \"pipelinedata\"\n",
|
||||
"video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n",
|
||||
" account_name=account_name, overwrite=True)\n",
|
||||
"\n",
|
||||
"# datastore for models\n",
|
||||
@@ -238,9 +245,10 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"video_name=os.getenv(\"STYLE_TRANSFER_VIDEO_NAME\", \"orangutan.mp4\") \n",
|
||||
"orangutan_video = DataReference(datastore=video_ds,\n",
|
||||
" data_reference_name=\"video\",\n",
|
||||
" path_on_datastore=\"orangutan.mp4\", mode=\"download\")"
|
||||
" path_on_datastore=video_name, mode=\"download\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -542,7 +550,7 @@
|
||||
"response = requests.post(rest_endpoint, \n",
|
||||
" headers=aad_token,\n",
|
||||
" json={\"ExperimentName\": \"style_transfer\",\n",
|
||||
" \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 4}}) \n",
|
||||
" \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 3}}) \n",
|
||||
"run_id = response.json()[\"Id\"]\n",
|
||||
"\n",
|
||||
"published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n",
|
||||
|
||||
@@ -209,8 +209,8 @@
|
||||
"\n",
|
||||
"svc_pr = ServicePrincipalAuthentication(\n",
|
||||
" tenant_id=\"my-tenant-id\",\n",
|
||||
" username=\"my-application-id\",\n",
|
||||
" password=svc_pr_password)\n",
|
||||
" service_principal_id=\"my-application-id\",\n",
|
||||
" service_principal_password=svc_pr_password)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"ws = Workspace(\n",
|
||||
|
||||
@@ -4,13 +4,15 @@ These examples show you:
|
||||
|
||||
1. [How to use the Estimator pattern in Azure ML](how-to-use-estimator)
|
||||
2. [Train using TensorFlow Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-tensorflow)
|
||||
3. [Train using Keras and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-keras)
|
||||
4. [Train using Pytorch Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-pytorch)
|
||||
5. [Distributed training using TensorFlow and Parameter Server](distributed-tensorflow-with-parameter-server)
|
||||
6. [Distributed training using TensorFlow and Horovod](distributed-tensorflow-with-horovod)
|
||||
7. [Distributed training using Pytorch and Horovod](distributed-pytorch-with-horovod)
|
||||
8. [Distributed training using CNTK and custom Docker image](distributed-cntk-with-custom-docker)
|
||||
9. [Export run history records to Tensorboard](export-run-history-to-tensorboard)
|
||||
10. [Use TensorBoard to monitor training execution](tensorboard)
|
||||
3. [Train using Pytorch Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-pytorch)
|
||||
4. [Train using Keras and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-keras)
|
||||
5. [Train using Chainer Estimator and tune hyperparameters using Hyperdrive](train-hyperparameter-tune-deploy-with-chainer)
|
||||
6. [Distributed training using TensorFlow and Parameter Server](distributed-tensorflow-with-parameter-server)
|
||||
7. [Distributed training using TensorFlow and Horovod](distributed-tensorflow-with-horovod)
|
||||
8. [Distributed training using Pytorch and Horovod](distributed-pytorch-with-horovod)
|
||||
9. [Distributed training using CNTK and custom Docker image](distributed-cntk-with-custom-docker)
|
||||
10. [Distributed training using Chainer](distributed-chainer)
|
||||
11. [Export run history records to Tensorboard](export-run-history-to-tensorboard)
|
||||
12. [Use TensorBoard to monitor training execution](tensorboard)
|
||||
|
||||
Learn more about how to use `Estimator` class to [train deep neural networks with Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models).
|
||||
|
||||
@@ -0,0 +1,153 @@
|
||||
|
||||
import argparse
|
||||
|
||||
import chainer
|
||||
import chainer.cuda
|
||||
import chainer.functions as F
|
||||
import chainer.links as L
|
||||
from chainer import training
|
||||
from chainer.training import extensions
|
||||
|
||||
import chainermn
|
||||
import chainermn.datasets
|
||||
import chainermn.functions
|
||||
|
||||
|
||||
chainer.disable_experimental_feature_warning = True
|
||||
|
||||
|
||||
class MLP0SubA(chainer.Chain):
|
||||
def __init__(self, comm, n_out):
|
||||
super(MLP0SubA, self).__init__(
|
||||
l1=L.Linear(784, n_out))
|
||||
|
||||
def __call__(self, x):
|
||||
return F.relu(self.l1(x))
|
||||
|
||||
|
||||
class MLP0SubB(chainer.Chain):
|
||||
def __init__(self, comm):
|
||||
super(MLP0SubB, self).__init__()
|
||||
|
||||
def __call__(self, y):
|
||||
return y
|
||||
|
||||
|
||||
class MLP0(chainermn.MultiNodeChainList):
|
||||
# Model on worker 0.
|
||||
def __init__(self, comm, n_out):
|
||||
super(MLP0, self).__init__(comm=comm)
|
||||
self.add_link(MLP0SubA(comm, n_out), rank_in=None, rank_out=1)
|
||||
self.add_link(MLP0SubB(comm), rank_in=1, rank_out=None)
|
||||
|
||||
|
||||
class MLP1Sub(chainer.Chain):
|
||||
def __init__(self, n_units, n_out):
|
||||
super(MLP1Sub, self).__init__(
|
||||
l2=L.Linear(None, n_units),
|
||||
l3=L.Linear(None, n_out))
|
||||
|
||||
def __call__(self, h0):
|
||||
h1 = F.relu(self.l2(h0))
|
||||
return self.l3(h1)
|
||||
|
||||
|
||||
class MLP1(chainermn.MultiNodeChainList):
|
||||
# Model on worker 1.
|
||||
def __init__(self, comm, n_units, n_out):
|
||||
super(MLP1, self).__init__(comm=comm)
|
||||
self.add_link(MLP1Sub(n_units, n_out), rank_in=0, rank_out=0)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description='ChainerMN example: pipelined neural network')
|
||||
parser.add_argument('--batchsize', '-b', type=int, default=100,
|
||||
help='Number of images in each mini-batch')
|
||||
parser.add_argument('--epoch', '-e', type=int, default=20,
|
||||
help='Number of sweeps over the dataset to train')
|
||||
parser.add_argument('--gpu', '-g', action='store_true',
|
||||
help='Use GPU')
|
||||
parser.add_argument('--out', '-o', default='result',
|
||||
help='Directory to output the result')
|
||||
parser.add_argument('--unit', '-u', type=int, default=1000,
|
||||
help='Number of units')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Prepare ChainerMN communicator.
|
||||
if args.gpu:
|
||||
comm = chainermn.create_communicator('hierarchical')
|
||||
data_axis, model_axis = comm.rank % 2, comm.rank // 2
|
||||
data_comm = comm.split(data_axis, comm.rank)
|
||||
model_comm = comm.split(model_axis, comm.rank)
|
||||
device = comm.intra_rank
|
||||
else:
|
||||
comm = chainermn.create_communicator('naive')
|
||||
data_axis, model_axis = comm.rank % 2, comm.rank // 2
|
||||
data_comm = comm.split(data_axis, comm.rank)
|
||||
model_comm = comm.split(model_axis, comm.rank)
|
||||
device = -1
|
||||
|
||||
if model_comm.size != 2:
|
||||
raise ValueError(
|
||||
'This example can only be executed on the even number'
|
||||
'of processes.')
|
||||
|
||||
if comm.rank == 0:
|
||||
print('==========================================')
|
||||
if args.gpu:
|
||||
print('Using GPUs')
|
||||
print('Num unit: {}'.format(args.unit))
|
||||
print('Num Minibatch-size: {}'.format(args.batchsize))
|
||||
print('Num epoch: {}'.format(args.epoch))
|
||||
print('==========================================')
|
||||
|
||||
if data_axis == 0:
|
||||
model = L.Classifier(MLP0(model_comm, args.unit))
|
||||
elif data_axis == 1:
|
||||
model = MLP1(model_comm, args.unit, 10)
|
||||
|
||||
if device >= 0:
|
||||
chainer.cuda.get_device_from_id(device).use()
|
||||
model.to_gpu()
|
||||
|
||||
optimizer = chainermn.create_multi_node_optimizer(
|
||||
chainer.optimizers.Adam(), data_comm)
|
||||
optimizer.setup(model)
|
||||
|
||||
# Original dataset on worker 0 and 1.
|
||||
# Datasets of worker 0 and 1 are split and distributed to all workers.
|
||||
if model_axis == 0:
|
||||
train, test = chainer.datasets.get_mnist()
|
||||
if data_axis == 1:
|
||||
train = chainermn.datasets.create_empty_dataset(train)
|
||||
test = chainermn.datasets.create_empty_dataset(test)
|
||||
else:
|
||||
train, test = None, None
|
||||
train = chainermn.scatter_dataset(train, data_comm, shuffle=True)
|
||||
test = chainermn.scatter_dataset(test, data_comm, shuffle=True)
|
||||
|
||||
train_iter = chainer.iterators.SerialIterator(
|
||||
train, args.batchsize, shuffle=False)
|
||||
test_iter = chainer.iterators.SerialIterator(
|
||||
test, args.batchsize, repeat=False, shuffle=False)
|
||||
|
||||
updater = training.StandardUpdater(train_iter, optimizer, device=device)
|
||||
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)
|
||||
evaluator = extensions.Evaluator(test_iter, model, device=device)
|
||||
evaluator = chainermn.create_multi_node_evaluator(evaluator, data_comm)
|
||||
trainer.extend(evaluator)
|
||||
|
||||
# Some display and output extentions are necessary only for worker 0.
|
||||
if comm.rank == 0:
|
||||
trainer.extend(extensions.LogReport())
|
||||
trainer.extend(extensions.PrintReport(
|
||||
['epoch', 'main/loss', 'validation/main/loss',
|
||||
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
|
||||
trainer.extend(extensions.ProgressBar())
|
||||
|
||||
trainer.run()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -0,0 +1,315 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Distributed Chainer\n",
|
||||
"In this tutorial, you will run a Chainer training example on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using ChainerMN distributed training across a GPU cluster."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Check core SDK version number\n",
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Diagnostics\n",
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"Diagnostics"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"\n",
|
||||
"set_diagnostics_collection(send_diagnostics=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialize workspace\n",
|
||||
"\n",
|
||||
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create or attach existing AmlCompute\n",
|
||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
|
||||
"\n",
|
||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
|
||||
"\n",
|
||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||
"\n",
|
||||
"# choose a name for your cluster\n",
|
||||
"cluster_name = \"gpucluster\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||
" print('Found existing compute target.')\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
|
||||
" max_nodes=4)\n",
|
||||
"\n",
|
||||
" # create the cluster\n",
|
||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||
"\n",
|
||||
" compute_target.wait_for_completion(show_output=True)\n",
|
||||
"\n",
|
||||
"# use get_status() to get a detailed status for the current AmlCompute. \n",
|
||||
"print(compute_target.get_status().serialize())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train model on the remote compute\n",
|
||||
"Now that we have the AmlCompute ready to go, let's run our distributed training job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a project directory\n",
|
||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"project_folder = './chainer-distr'\n",
|
||||
"os.makedirs(project_folder, exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prepare training script\n",
|
||||
"Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `train_mnist.py`. In practice, you should be able to take any custom Chainer training script as is and run it with Azure ML without having to modify your code."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once your script is ready, copy the training script `train_mnist.py` into the project directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"\n",
|
||||
"shutil.copy('train_mnist.py', project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an experiment\n",
|
||||
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed Chainer tutorial. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Experiment\n",
|
||||
"\n",
|
||||
"experiment_name = 'chainer-distr'\n",
|
||||
"experiment = Experiment(ws, name=experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a Chainer estimator\n",
|
||||
"The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.dnn import Chainer\n",
|
||||
"\n",
|
||||
"estimator = Chainer(source_directory=project_folder,\n",
|
||||
" compute_target=compute_target,\n",
|
||||
" entry_script='train_mnist.py',\n",
|
||||
" node_count=2,\n",
|
||||
" process_count_per_node=1,\n",
|
||||
" distributed_backend='mpi',\n",
|
||||
" use_gpu=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit job\n",
|
||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run = experiment.submit(estimator)\n",
|
||||
"print(run)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Monitor your run\n",
|
||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"\n",
|
||||
"RunDetails(run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "minxia"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "minxia"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -0,0 +1,125 @@
|
||||
# Official ChainerMN example taken from
|
||||
# https://github.com/chainer/chainer/blob/master/examples/chainermn/mnist/train_mnist.py
|
||||
|
||||
from __future__ import print_function
|
||||
|
||||
import argparse
|
||||
|
||||
import chainer
|
||||
import chainer.functions as F
|
||||
import chainer.links as L
|
||||
from chainer import training
|
||||
from chainer.training import extensions
|
||||
|
||||
import chainermn
|
||||
|
||||
|
||||
class MLP(chainer.Chain):
|
||||
|
||||
def __init__(self, n_units, n_out):
|
||||
super(MLP, self).__init__(
|
||||
# the size of the inputs to each layer will be inferred
|
||||
l1=L.Linear(784, n_units), # n_in -> n_units
|
||||
l2=L.Linear(n_units, n_units), # n_units -> n_units
|
||||
l3=L.Linear(n_units, n_out), # n_units -> n_out
|
||||
)
|
||||
|
||||
def __call__(self, x):
|
||||
h1 = F.relu(self.l1(x))
|
||||
h2 = F.relu(self.l2(h1))
|
||||
return self.l3(h2)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='ChainerMN example: MNIST')
|
||||
parser.add_argument('--batchsize', '-b', type=int, default=100,
|
||||
help='Number of images in each mini-batch')
|
||||
parser.add_argument('--communicator', type=str,
|
||||
default='non_cuda_aware', help='Type of communicator')
|
||||
parser.add_argument('--epoch', '-e', type=int, default=20,
|
||||
help='Number of sweeps over the dataset to train')
|
||||
parser.add_argument('--gpu', '-g', default=True,
|
||||
help='Use GPU')
|
||||
parser.add_argument('--out', '-o', default='result',
|
||||
help='Directory to output the result')
|
||||
parser.add_argument('--resume', '-r', default='',
|
||||
help='Resume the training from snapshot')
|
||||
parser.add_argument('--unit', '-u', type=int, default=1000,
|
||||
help='Number of units')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Prepare ChainerMN communicator.
|
||||
|
||||
if args.gpu:
|
||||
if args.communicator == 'naive':
|
||||
print("Error: 'naive' communicator does not support GPU.\n")
|
||||
exit(-1)
|
||||
comm = chainermn.create_communicator(args.communicator)
|
||||
device = comm.intra_rank
|
||||
else:
|
||||
if args.communicator != 'naive':
|
||||
print('Warning: using naive communicator '
|
||||
'because only naive supports CPU-only execution')
|
||||
comm = chainermn.create_communicator('naive')
|
||||
device = -1
|
||||
|
||||
if comm.rank == 0:
|
||||
print('==========================================')
|
||||
print('Num process (COMM_WORLD): {}'.format(comm.size))
|
||||
if args.gpu:
|
||||
print('Using GPUs')
|
||||
print('Using {} communicator'.format(args.communicator))
|
||||
print('Num unit: {}'.format(args.unit))
|
||||
print('Num Minibatch-size: {}'.format(args.batchsize))
|
||||
print('Num epoch: {}'.format(args.epoch))
|
||||
print('==========================================')
|
||||
|
||||
model = L.Classifier(MLP(args.unit, 10))
|
||||
if device >= 0:
|
||||
chainer.cuda.get_device_from_id(device).use()
|
||||
model.to_gpu()
|
||||
|
||||
# Create a multi node optimizer from a standard Chainer optimizer.
|
||||
optimizer = chainermn.create_multi_node_optimizer(
|
||||
chainer.optimizers.Adam(), comm)
|
||||
optimizer.setup(model)
|
||||
|
||||
# Split and distribute the dataset. Only worker 0 loads the whole dataset.
|
||||
# Datasets of worker 0 are evenly split and distributed to all workers.
|
||||
if comm.rank == 0:
|
||||
train, test = chainer.datasets.get_mnist()
|
||||
else:
|
||||
train, test = None, None
|
||||
train = chainermn.scatter_dataset(train, comm, shuffle=True)
|
||||
test = chainermn.scatter_dataset(test, comm, shuffle=True)
|
||||
|
||||
train_iter = chainer.iterators.SerialIterator(train, args.batchsize)
|
||||
test_iter = chainer.iterators.SerialIterator(test, args.batchsize,
|
||||
repeat=False, shuffle=False)
|
||||
|
||||
updater = training.StandardUpdater(train_iter, optimizer, device=device)
|
||||
trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out)
|
||||
|
||||
# Create a multi node evaluator from a standard Chainer evaluator.
|
||||
evaluator = extensions.Evaluator(test_iter, model, device=device)
|
||||
evaluator = chainermn.create_multi_node_evaluator(evaluator, comm)
|
||||
trainer.extend(evaluator)
|
||||
|
||||
# Some display and output extensions are necessary only for one worker.
|
||||
# (Otherwise, there would just be repeated outputs.)
|
||||
if comm.rank == 0:
|
||||
trainer.extend(extensions.dump_graph('main/loss'))
|
||||
trainer.extend(extensions.LogReport())
|
||||
trainer.extend(extensions.PrintReport(
|
||||
['epoch', 'main/loss', 'validation/main/loss',
|
||||
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
|
||||
trainer.extend(extensions.ProgressBar())
|
||||
|
||||
if args.resume:
|
||||
chainer.serializers.load_npz(args.resume, trainer)
|
||||
|
||||
trainer.run()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -56,7 +56,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install azureml-contrib-tensorboard"
|
||||
"!pip install azureml-tensorboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -166,7 +166,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Export Run History to Tensorboard logs\n",
|
||||
"from azureml.contrib.tensorboard.export import export_to_tensorboard\n",
|
||||
"from azureml.tensorboard.export import export_to_tensorboard\n",
|
||||
"import os\n",
|
||||
"\n",
|
||||
"logdir = 'exportedTBlogs'\n",
|
||||
@@ -208,7 +208,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.tensorboard import Tensorboard\n",
|
||||
"from azureml.tensorboard import Tensorboard\n",
|
||||
"\n",
|
||||
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
|
||||
"tb = Tensorboard([], local_root=logdir, port=6006)\n",
|
||||
|
||||
@@ -57,7 +57,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install azureml-contrib-tensorboard"
|
||||
"!pip install azureml-tensorboard"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -239,7 +239,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.contrib.tensorboard import Tensorboard\n",
|
||||
"from azureml.tensorboard import Tensorboard\n",
|
||||
"\n",
|
||||
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
|
||||
"tb = Tensorboard([run])\n",
|
||||
@@ -293,7 +293,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import RemoteCompute\n",
|
||||
"from azureml.core.compute import ComputeTarget, RemoteCompute\n",
|
||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||
"\n",
|
||||
"username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')\n",
|
||||
@@ -305,12 +305,11 @@
|
||||
" attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n",
|
||||
" print('found existing:', attached_dsvm_compute.name)\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" attached_dsvm_compute = RemoteCompute.attach(workspace=ws,\n",
|
||||
" name=compute_target_name,\n",
|
||||
" username=username,\n",
|
||||
" config = RemoteCompute.attach_configuration(username=username,\n",
|
||||
" address=address,\n",
|
||||
" ssh_port=22,\n",
|
||||
" private_key_file='./.ssh/id_rsa')\n",
|
||||
" attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n",
|
||||
" \n",
|
||||
" attached_dsvm_compute.wait_for_completion(show_output=True)"
|
||||
]
|
||||
@@ -407,10 +406,13 @@
|
||||
"# choose a name for your cluster\n",
|
||||
"cluster_name = \"cpucluster\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||
"cts = ws.compute_targets\n",
|
||||
"found = False\n",
|
||||
"if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n",
|
||||
" found = True\n",
|
||||
" print('Found existing compute target.')\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" compute_target = cts[cluster_name]\n",
|
||||
"if not found:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n",
|
||||
" max_nodes=4)\n",
|
||||
@@ -418,10 +420,10 @@
|
||||
" # create the cluster\n",
|
||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||
"\n",
|
||||
"compute_target.wait_for_completion(show_output=True, min_node_count=1, timeout_in_minutes=20)\n",
|
||||
"compute_target.wait_for_completion(show_output=True, min_node_count=None)\n",
|
||||
"\n",
|
||||
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||
"print(compute_target.get_status().serialize())"
|
||||
"# print(compute_target.get_status().serialize())"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -0,0 +1,136 @@
|
||||
|
||||
import argparse
|
||||
|
||||
import numpy as np
|
||||
|
||||
import chainer
|
||||
from chainer import backend
|
||||
from chainer import backends
|
||||
from chainer.backends import cuda
|
||||
from chainer import Function, gradient_check, report, training, utils, Variable
|
||||
from chainer import datasets, iterators, optimizers, serializers
|
||||
from chainer import Link, Chain, ChainList
|
||||
import chainer.functions as F
|
||||
import chainer.links as L
|
||||
from chainer.training import extensions
|
||||
from chainer.dataset import concat_examples
|
||||
from chainer.backends.cuda import to_cpu
|
||||
|
||||
from azureml.core.run import Run
|
||||
run = Run.get_context()
|
||||
|
||||
|
||||
class MyNetwork(Chain):
|
||||
|
||||
def __init__(self, n_mid_units=100, n_out=10):
|
||||
super(MyNetwork, self).__init__()
|
||||
with self.init_scope():
|
||||
self.l1 = L.Linear(None, n_mid_units)
|
||||
self.l2 = L.Linear(n_mid_units, n_mid_units)
|
||||
self.l3 = L.Linear(n_mid_units, n_out)
|
||||
|
||||
def forward(self, x):
|
||||
h = F.relu(self.l1(x))
|
||||
h = F.relu(self.l2(h))
|
||||
return self.l3(h)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Chainer example: MNIST')
|
||||
parser.add_argument('--batchsize', '-b', type=int, default=100,
|
||||
help='Number of images in each mini-batch')
|
||||
parser.add_argument('--epochs', '-e', type=int, default=20,
|
||||
help='Number of sweeps over the dataset to train')
|
||||
parser.add_argument('--output_dir', '-o', default='./outputs',
|
||||
help='Directory to output the result')
|
||||
parser.add_argument('--gpu_id', '-g', default=0,
|
||||
help='ID of the GPU to be used. Set to -1 if you use CPU')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Download the MNIST data if you haven't downloaded it yet
|
||||
train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1)
|
||||
|
||||
gpu_id = args.gpu_id
|
||||
batchsize = args.batchsize
|
||||
epochs = args.epochs
|
||||
run.log('Batch size', np.int(batchsize))
|
||||
run.log('Epochs', np.int(epochs))
|
||||
|
||||
train_iter = iterators.SerialIterator(train, batchsize)
|
||||
test_iter = iterators.SerialIterator(test, batchsize,
|
||||
repeat=False, shuffle=False)
|
||||
|
||||
model = MyNetwork()
|
||||
|
||||
if gpu_id >= 0:
|
||||
# Make a specified GPU current
|
||||
chainer.backends.cuda.get_device_from_id(0).use()
|
||||
model.to_gpu() # Copy the model to the GPU
|
||||
|
||||
# Choose an optimizer algorithm
|
||||
optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9)
|
||||
|
||||
# Give the optimizer a reference to the model so that it
|
||||
# can locate the model's parameters.
|
||||
optimizer.setup(model)
|
||||
|
||||
while train_iter.epoch < epochs:
|
||||
# ---------- One iteration of the training loop ----------
|
||||
train_batch = train_iter.next()
|
||||
image_train, target_train = concat_examples(train_batch, gpu_id)
|
||||
|
||||
# Calculate the prediction of the network
|
||||
prediction_train = model(image_train)
|
||||
|
||||
# Calculate the loss with softmax_cross_entropy
|
||||
loss = F.softmax_cross_entropy(prediction_train, target_train)
|
||||
|
||||
# Calculate the gradients in the network
|
||||
model.cleargrads()
|
||||
loss.backward()
|
||||
|
||||
# Update all the trainable parameters
|
||||
optimizer.update()
|
||||
# --------------------- until here ---------------------
|
||||
|
||||
# Check the validation accuracy of prediction after every epoch
|
||||
if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch
|
||||
|
||||
# Display the training loss
|
||||
print('epoch:{:02d} train_loss:{:.04f} '.format(
|
||||
train_iter.epoch, float(to_cpu(loss.array))), end='')
|
||||
|
||||
test_losses = []
|
||||
test_accuracies = []
|
||||
while True:
|
||||
test_batch = test_iter.next()
|
||||
image_test, target_test = concat_examples(test_batch, gpu_id)
|
||||
|
||||
# Forward the test data
|
||||
prediction_test = model(image_test)
|
||||
|
||||
# Calculate the loss
|
||||
loss_test = F.softmax_cross_entropy(prediction_test, target_test)
|
||||
test_losses.append(to_cpu(loss_test.array))
|
||||
|
||||
# Calculate the accuracy
|
||||
accuracy = F.accuracy(prediction_test, target_test)
|
||||
accuracy.to_cpu()
|
||||
test_accuracies.append(accuracy.array)
|
||||
|
||||
if test_iter.is_new_epoch:
|
||||
test_iter.epoch = 0
|
||||
test_iter.current_position = 0
|
||||
test_iter.is_new_epoch = False
|
||||
test_iter._pushed_position = None
|
||||
break
|
||||
|
||||
val_accuracy = np.mean(test_accuracies)
|
||||
print('val_loss:{:.04f} val_accuracy:{:.04f}'.format(
|
||||
np.mean(test_losses), val_accuracy))
|
||||
|
||||
run.log("Accuracy", np.float(val_accuracy))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -0,0 +1,134 @@
|
||||
|
||||
import argparse
|
||||
|
||||
import numpy as np
|
||||
|
||||
import chainer
|
||||
from chainer import backend
|
||||
from chainer import backends
|
||||
from chainer.backends import cuda
|
||||
from chainer import Function, gradient_check, report, training, utils, Variable
|
||||
from chainer import datasets, iterators, optimizers, serializers
|
||||
from chainer import Link, Chain, ChainList
|
||||
import chainer.functions as F
|
||||
import chainer.links as L
|
||||
from chainer.training import extensions
|
||||
from chainer.dataset import concat_examples
|
||||
from chainer.backends.cuda import to_cpu
|
||||
|
||||
from azureml.core.run import Run
|
||||
run = Run.get_context()
|
||||
|
||||
|
||||
class MyNetwork(Chain):
|
||||
|
||||
def __init__(self, n_mid_units=100, n_out=10):
|
||||
super(MyNetwork, self).__init__()
|
||||
with self.init_scope():
|
||||
self.l1 = L.Linear(None, n_mid_units)
|
||||
self.l2 = L.Linear(n_mid_units, n_mid_units)
|
||||
self.l3 = L.Linear(n_mid_units, n_out)
|
||||
|
||||
def forward(self, x):
|
||||
h = F.relu(self.l1(x))
|
||||
h = F.relu(self.l2(h))
|
||||
return self.l3(h)
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Chainer example: MNIST')
|
||||
parser.add_argument('--batchsize', '-b', type=int, default=100,
|
||||
help='Number of images in each mini-batch')
|
||||
parser.add_argument('--epochs', '-e', type=int, default=20,
|
||||
help='Number of sweeps over the dataset to train')
|
||||
parser.add_argument('--output_dir', '-o', default='./outputs',
|
||||
help='Directory to output the result')
|
||||
args = parser.parse_args()
|
||||
|
||||
# Download the MNIST data if you haven't downloaded it yet
|
||||
train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1)
|
||||
|
||||
batchsize = args.batchsize
|
||||
epochs = args.epochs
|
||||
run.log('Batch size', np.int(batchsize))
|
||||
run.log('Epochs', np.int(epochs))
|
||||
|
||||
train_iter = iterators.SerialIterator(train, batchsize)
|
||||
test_iter = iterators.SerialIterator(test, batchsize,
|
||||
repeat=False, shuffle=False)
|
||||
|
||||
model = MyNetwork()
|
||||
|
||||
gpu_id = -1 # Set to -1 if you use CPU
|
||||
if gpu_id >= 0:
|
||||
# Make a specified GPU current
|
||||
chainer.backends.cuda.get_device_from_id(0).use()
|
||||
model.to_gpu() # Copy the model to the GPU
|
||||
|
||||
# Choose an optimizer algorithm
|
||||
optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9)
|
||||
|
||||
# Give the optimizer a reference to the model so that it
|
||||
# can locate the model's parameters.
|
||||
optimizer.setup(model)
|
||||
|
||||
while train_iter.epoch < epochs:
|
||||
# ---------- One iteration of the training loop ----------
|
||||
train_batch = train_iter.next()
|
||||
image_train, target_train = concat_examples(train_batch, gpu_id)
|
||||
|
||||
# Calculate the prediction of the network
|
||||
prediction_train = model(image_train)
|
||||
|
||||
# Calculate the loss with softmax_cross_entropy
|
||||
loss = F.softmax_cross_entropy(prediction_train, target_train)
|
||||
|
||||
# Calculate the gradients in the network
|
||||
model.cleargrads()
|
||||
loss.backward()
|
||||
|
||||
# Update all the trainable parameters
|
||||
optimizer.update()
|
||||
# --------------------- until here ---------------------
|
||||
|
||||
# Check the validation accuracy of prediction after every epoch
|
||||
if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch
|
||||
|
||||
# Display the training loss
|
||||
print('epoch:{:02d} train_loss:{:.04f} '.format(
|
||||
train_iter.epoch, float(to_cpu(loss.array))), end='')
|
||||
|
||||
test_losses = []
|
||||
test_accuracies = []
|
||||
while True:
|
||||
test_batch = test_iter.next()
|
||||
image_test, target_test = concat_examples(test_batch, gpu_id)
|
||||
|
||||
# Forward the test data
|
||||
prediction_test = model(image_test)
|
||||
|
||||
# Calculate the loss
|
||||
loss_test = F.softmax_cross_entropy(prediction_test, target_test)
|
||||
test_losses.append(to_cpu(loss_test.array))
|
||||
|
||||
# Calculate the accuracy
|
||||
accuracy = F.accuracy(prediction_test, target_test)
|
||||
accuracy.to_cpu()
|
||||
test_accuracies.append(accuracy.array)
|
||||
|
||||
if test_iter.is_new_epoch:
|
||||
test_iter.epoch = 0
|
||||
test_iter.current_position = 0
|
||||
test_iter.is_new_epoch = False
|
||||
test_iter._pushed_position = None
|
||||
break
|
||||
|
||||
val_accuracy = np.mean(test_accuracies)
|
||||
print('val_loss:{:.04f} val_accuracy:{:.04f}'.format(
|
||||
np.mean(test_losses), val_accuracy))
|
||||
|
||||
run.log("Accuracy", np.float(val_accuracy))
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
@@ -0,0 +1,425 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
|
||||
"\n",
|
||||
"Licensed under the MIT License."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Train and hyperparameter tune with Chainer\n",
|
||||
"\n",
|
||||
"In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Prerequisites\n",
|
||||
"* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Check core SDK version number\n",
|
||||
"import azureml.core\n",
|
||||
"\n",
|
||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Diagnostics\n",
|
||||
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"tags": [
|
||||
"Diagnostics"
|
||||
]
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||
"\n",
|
||||
"set_diagnostics_collection(send_diagnostics=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Initialize workspace\n",
|
||||
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"print('Workspace name: ' + ws.name, \n",
|
||||
" 'Azure region: ' + ws.location, \n",
|
||||
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Create or Attach existing AmlCompute\n",
|
||||
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
|
||||
"\n",
|
||||
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
|
||||
"\n",
|
||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||
"\n",
|
||||
"# choose a name for your cluster\n",
|
||||
"cluster_name = \"gpucluster\"\n",
|
||||
"\n",
|
||||
"try:\n",
|
||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||
" print('Found existing compute target.')\n",
|
||||
"except ComputeTargetException:\n",
|
||||
" print('Creating a new compute target...')\n",
|
||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||
" max_nodes=4)\n",
|
||||
"\n",
|
||||
" # create the cluster\n",
|
||||
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||
"\n",
|
||||
" compute_target.wait_for_completion(show_output=True)\n",
|
||||
"\n",
|
||||
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||
"print(compute_target.get_status().serialize())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Train model on the remote compute\n",
|
||||
"Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a project directory\n",
|
||||
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"\n",
|
||||
"project_folder = './chainer-mnist'\n",
|
||||
"os.makedirs(project_folder, exist_ok=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prepare training script\n",
|
||||
"Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n",
|
||||
"\n",
|
||||
"However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n",
|
||||
"\n",
|
||||
"In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n",
|
||||
"```Python\n",
|
||||
"from azureml.core.run import Run\n",
|
||||
"run = Run.get_context()\n",
|
||||
"```\n",
|
||||
"Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n",
|
||||
"```Python\n",
|
||||
"run.log('Batch size', np.int(args.batchsize))\n",
|
||||
"run.log('Epochs', np.int(args.epochs))\n",
|
||||
"\n",
|
||||
"run.log('Accuracy', np.float(val_accuracy))\n",
|
||||
"```\n",
|
||||
"These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once your script is ready, copy the training script `chainer_mnist.py` into your project directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import shutil\n",
|
||||
"\n",
|
||||
"shutil.copy('chainer_mnist.py', project_folder)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create an experiment\n",
|
||||
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import Experiment\n",
|
||||
"\n",
|
||||
"experiment_name = 'chainer-mnist'\n",
|
||||
"experiment = Experiment(ws, name=experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create a Chainer estimator\n",
|
||||
"The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs. The following code will define a single-node Chainer job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.dnn import Chainer\n",
|
||||
"\n",
|
||||
"script_params = {\n",
|
||||
" '--epochs': 10,\n",
|
||||
" '--batchsize': 128,\n",
|
||||
" '--output_dir': './outputs'\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"estimator = Chainer(source_directory=project_folder, \n",
|
||||
" script_params=script_params,\n",
|
||||
" compute_target=compute_target,\n",
|
||||
" pip_packages=['numpy', 'pytest'],\n",
|
||||
" entry_script='chainer_mnist.py',\n",
|
||||
" use_gpu=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit job\n",
|
||||
"Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run = experiment.submit(estimator)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Monitor your run\n",
|
||||
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.widgets import RunDetails\n",
|
||||
"\n",
|
||||
"RunDetails(run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# to get more details of your run\n",
|
||||
"print(run.get_details())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tune model hyperparameters\n",
|
||||
"Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Start a hyperparameter sweep\n",
|
||||
"First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n",
|
||||
"\n",
|
||||
"Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n",
|
||||
"Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n",
|
||||
"from azureml.train.hyperdrive.sampling import RandomParameterSampling\n",
|
||||
"from azureml.train.hyperdrive.policy import BanditPolicy\n",
|
||||
"from azureml.train.hyperdrive.run import PrimaryMetricGoal\n",
|
||||
"from azureml.train.hyperdrive.parameter_expressions import choice\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"param_sampling = RandomParameterSampling( {\n",
|
||||
" \"--batchsize\": choice(128, 256),\n",
|
||||
" \"--epochs\": choice(5, 10, 20, 40)\n",
|
||||
" }\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
|
||||
" hyperparameter_sampling=param_sampling, \n",
|
||||
" primary_metric_name='Accuracy',\n",
|
||||
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
||||
" max_total_runs=8,\n",
|
||||
" max_concurrent_runs=4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally, lauch the hyperparameter tuning job."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# start the HyperDrive run\n",
|
||||
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Monitor HyperDrive runs\n",
|
||||
"You can monitor the progress of the runs with the following Jupyter widget. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"RunDetails(hyperdrive_run).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "minxia"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.6"
|
||||
},
|
||||
"msauthor": "minxia"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
@@ -28,6 +28,8 @@ parser.add_argument('--first-layer-neurons', type=int, dest='n_hidden_1', defaul
|
||||
help='# of neurons in the first layer')
|
||||
parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', default=100,
|
||||
help='# of neurons in the second layer')
|
||||
parser.add_argument('--learning-rate', type=float, dest='learning_rate', default=0.001, help='learning rate')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
data_folder = args.data_folder
|
||||
@@ -46,9 +48,9 @@ n_inputs = 28 * 28
|
||||
n_h1 = args.n_hidden_1
|
||||
n_h2 = args.n_hidden_2
|
||||
n_outputs = 10
|
||||
|
||||
n_epochs = 20
|
||||
batch_size = args.batch_size
|
||||
learning_rate = args.learning_rate
|
||||
|
||||
y_train = one_hot_encode(y_train, n_outputs)
|
||||
y_test = one_hot_encode(y_test, n_outputs)
|
||||
@@ -56,9 +58,9 @@ print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n')
|
||||
|
||||
# Build a simple MLP model
|
||||
model = Sequential()
|
||||
# input layer
|
||||
# first hidden layer
|
||||
model.add(Dense(n_h1, activation='relu', input_shape=(n_inputs,)))
|
||||
# hidden layer
|
||||
# second hidden layer
|
||||
model.add(Dense(n_h2, activation='relu'))
|
||||
# output layer
|
||||
model.add(Dense(n_outputs, activation='softmax'))
|
||||
@@ -66,7 +68,7 @@ model.add(Dense(n_outputs, activation='softmax'))
|
||||
model.summary()
|
||||
|
||||
model.compile(loss='categorical_crossentropy',
|
||||
optimizer=RMSprop(),
|
||||
optimizer=RMSprop(lr=learning_rate),
|
||||
metrics=['accuracy'])
|
||||
|
||||
# start an Azure ML run
|
||||
|
||||
@@ -281,7 +281,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpucluster' of type `AmlCompute`."
|
||||
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpucluster\" of type `AmlCompute`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -300,7 +300,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Copy the training files into the script folder\n",
|
||||
"The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array."
|
||||
"The Keras training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -326,7 +326,7 @@
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Construct neural network in TensorFlow\n",
|
||||
"## Construct neural network in Keras\n",
|
||||
"In the training script `keras_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n",
|
||||
"\n",
|
||||
""
|
||||
@@ -355,7 +355,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The next cell will print out the training code for you to inspect it."
|
||||
"The next cell will print out the training code for you to inspect."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -389,7 +389,8 @@
|
||||
" '--data-folder': ds.path('mnist').as_mount(),\n",
|
||||
" '--batch-size': 50,\n",
|
||||
" '--first-layer-neurons': 300,\n",
|
||||
" '--second-layer-neurons': 100 \n",
|
||||
" '--second-layer-neurons': 100,\n",
|
||||
" '--learning-rate': 0.001\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"est = TensorFlow(source_directory=script_folder,\n",
|
||||
@@ -421,7 +422,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Submit job to run\n",
|
||||
"Submit the estimator to an Azure ML experiment to kick off the execution."
|
||||
"Submit the estimator to the Azure ML experiment to kick off the execution."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -487,6 +488,13 @@
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the outputs of the training script, it prints out the Keras version number. Please make a note of it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@@ -562,7 +570,27 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Predict on the test set\n",
|
||||
"Now load the saved Kears model."
|
||||
"Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import keras\n",
|
||||
"import tensorflow as tf\n",
|
||||
"\n",
|
||||
"print(\"Keras version:\", keras.__version__)\n",
|
||||
"print(\"Tensorflow version:\", tf.__version__)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's load the downloaded model."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -643,7 +671,8 @@
|
||||
" {\n",
|
||||
" '--batch-size': choice(25, 50, 100),\n",
|
||||
" '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n",
|
||||
" '--second-layer-neurons': choice(10, 50, 200, 500) \n",
|
||||
" '--second-layer-neurons': choice(10, 50, 200, 500),\n",
|
||||
" '--learning-rate': loguniform(-6, -1)\n",
|
||||
" }\n",
|
||||
")"
|
||||
]
|
||||
@@ -698,7 +727,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"htc = HyperDriveRunConfig(estimator=est, \n",
|
||||
"hdc = HyperDriveRunConfig(estimator=est, \n",
|
||||
" hyperparameter_sampling=ps, \n",
|
||||
" policy=policy, \n",
|
||||
" primary_metric_name='Accuracy', \n",
|
||||
@@ -720,8 +749,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"htr = exp.submit(config=htc)\n",
|
||||
"htr"
|
||||
"hdr = exp.submit(config=hdc)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -737,7 +765,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"RunDetails(htr).show()"
|
||||
"RunDetails(hdr).show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -746,7 +774,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"htr.wait_for_completion(show_output=True)"
|
||||
"hdr.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -763,7 +791,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"best_run = htr.get_best_run_by_primary_metric()"
|
||||
"best_run = hdr.get_best_run_by_primary_metric()\n",
|
||||
"print(best_run.get_details()['runDefinition']['Arguments'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -851,7 +880,7 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create myenv.yml\n",
|
||||
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`."
|
||||
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `tensorflow` and `keras`."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -863,7 +892,7 @@
|
||||
"from azureml.core.runconfig import CondaDependencies\n",
|
||||
"\n",
|
||||
"cd = CondaDependencies.create()\n",
|
||||
"cd.add_tensorflow_conda_package()\n",
|
||||
"cd.add_conda_package('tensorflow')\n",
|
||||
"cd.add_conda_package('keras')\n",
|
||||
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
|
||||
"\n",
|
||||
@@ -889,7 +918,7 @@
|
||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||
" auth_enabled=True, # this flag generates API keys to secure access\n",
|
||||
" memory_gb=1, \n",
|
||||
" tags={'name':'mnist', 'framework': 'Keras MLP'},\n",
|
||||
" tags={'name':'mnist', 'framework': 'Keras'},\n",
|
||||
" description='Keras MLP on MNIST')"
|
||||
]
|
||||
},
|
||||
@@ -1032,7 +1061,8 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# retreive the API keys. two keys were generated.\n",
|
||||
"key1, Key2 = service.get_keys()"
|
||||
"key1, Key2 = service.get_keys()\n",
|
||||
"print(key1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -1132,7 +1162,7 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.8"
|
||||
"version": "3.6.7"
|
||||
},
|
||||
"msauthor": "haining"
|
||||
},
|
||||
|
||||
@@ -217,7 +217,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"props = run.upload_file(name='myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
|
||||
"props = run.upload_file(name='outputs/myfile_in_the_cloud.txt', path_or_stream='./myfile.txt')\n",
|
||||
"props.serialize()"
|
||||
]
|
||||
},
|
||||
|
||||
@@ -81,7 +81,7 @@
|
||||
"from azureml.core import Experiment, Workspace\n",
|
||||
"\n",
|
||||
"# Check core SDK version number\n",
|
||||
"print(\"This notebook was created using version 1.0.15 of the Azure ML SDK\")\n",
|
||||
"print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n",
|
||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n",
|
||||
"print(\"\")\n",
|
||||
"\n",
|
||||
@@ -138,7 +138,6 @@
|
||||
"* We use `start_logging` to create a new run in this experiment\n",
|
||||
"* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n",
|
||||
"* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n",
|
||||
"* We use `run.take_snapshot()` to capture *this* notebook so we can reproduce this experiment at a later time.\n",
|
||||
"* We use `run.complete()` to indicate that the run is over and results can be captured and finalized"
|
||||
]
|
||||
},
|
||||
@@ -173,9 +172,6 @@
|
||||
"# Save the model to the outputs directory for capture\n",
|
||||
"joblib.dump(value=regression_model, filename='outputs/model.pkl')\n",
|
||||
"\n",
|
||||
"# Take a snapshot of the directory containing this notebook\n",
|
||||
"run.take_snapshot('./')\n",
|
||||
"\n",
|
||||
"# Complete the run\n",
|
||||
"run.complete()"
|
||||
]
|
||||
@@ -238,10 +234,7 @@
|
||||
" run.log(name=\"mse\", value=mse)\n",
|
||||
"\n",
|
||||
" # Save the model to the outputs directory for capture\n",
|
||||
" joblib.dump(value=regression_model, filename='outputs/model.pkl')\n",
|
||||
" \n",
|
||||
" # Capture this notebook with the run\n",
|
||||
" run.take_snapshot('./')\n"
|
||||
" joblib.dump(value=regression_model, filename='outputs/model.pkl')\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -341,7 +341,7 @@
|
||||
"parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n",
|
||||
"args = parser.parse_args()\n",
|
||||
"\n",
|
||||
"data_folder = os.path.join(args.data_folder, 'mnist')\n",
|
||||
"data_folder = args.data_folder\n",
|
||||
"print('Data folder:', data_folder)\n",
|
||||
"\n",
|
||||
"# load train and test set into numpy arrays\n",
|
||||
@@ -426,7 +426,7 @@
|
||||
"* Parameters required from the training script \n",
|
||||
"* Python packages needed for training\n",
|
||||
"\n",
|
||||
"In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`)."
|
||||
"In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.path('mnist').as_mount()`)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -442,8 +442,8 @@
|
||||
"from azureml.train.estimator import Estimator\n",
|
||||
"\n",
|
||||
"script_params = {\n",
|
||||
" '--data-folder': ds.as_mount(),\n",
|
||||
" '--regularization': 0.8\n",
|
||||
" '--data-folder': ds.path('mnist').as_mount(),\n",
|
||||
" '--regularization': 0.05\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"est = Estimator(source_directory=script_folder,\n",
|
||||
@@ -453,13 +453,29 @@
|
||||
" conda_packages=['scikit-learn'])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This is what the mounting point looks like:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(ds.path('mnist').as_mount())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Submit the job to the cluster\n",
|
||||
"\n",
|
||||
"Run the experiment by submitting the estimator object."
|
||||
"Run the experiment by submitting the estimator object. And you can navigate to Azure portal to monitor the run."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -486,17 +502,17 @@
|
||||
"\n",
|
||||
"## Monitor a remote run\n",
|
||||
"\n",
|
||||
"In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the script dependencies don't change, the same image is reused and hence the container start up time is much faster.\n",
|
||||
"In total, the first run takes **approximately 10 minutes**. But for subsequent runs, as long as the dependencies (`conda_packages` parameter in the above estimator constructor) don't change, the same image is reused and hence the container start up time is much faster.\n",
|
||||
"\n",
|
||||
"Here is what's happening while you wait:\n",
|
||||
"\n",
|
||||
"- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about 5 minutes**. \n",
|
||||
"- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**. \n",
|
||||
"\n",
|
||||
" This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n",
|
||||
"\n",
|
||||
"- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n",
|
||||
"- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n",
|
||||
"\n",
|
||||
"- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n",
|
||||
"\n",
|
||||
@@ -526,7 +542,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
|
||||
"By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -535,7 +551,7 @@
|
||||
"source": [
|
||||
"### Get log results upon completion\n",
|
||||
"\n",
|
||||
"Model training and monitoring happen in the background. Wait until the model has completed training before running more code. Use `wait_for_completion` to show when the model training is complete."
|
||||
"Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code. "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -550,7 +566,8 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run.wait_for_completion(show_output=False) # specify True for a verbose log"
|
||||
"# specify show_output to True for a verbose log\n",
|
||||
"run.wait_for_completion(show_output=False) "
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -559,7 +576,7 @@
|
||||
"source": [
|
||||
"### Display run results\n",
|
||||
"\n",
|
||||
"You now have a model trained on a remote cluster. Retrieve the accuracy of the model:"
|
||||
"You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -663,9 +680,9 @@
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.2"
|
||||
"version": "3.6.8"
|
||||
},
|
||||
"msauthor": "sgilley"
|
||||
"msauthor": "haining"
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
|
||||
Reference in New Issue
Block a user