Compare commits

...

29 Commits

Author SHA1 Message Date
amlrelsa-ms
8dad09a42f update samples from Release-65 as a part of SDK release 2020-09-17 01:14:32 +00:00
Harneet Virk
db2bf8ae93 Merge pull request #1137 from Azure/release_update/Release-64
update samples from Release-64 as a part of  SDK release
2020-09-09 15:31:51 -07:00
amlrelsa-ms
820c09734f update samples from Release-64 as a part of SDK release 2020-09-09 22:30:45 +00:00
Cody
a2a33c70a6 Merge pull request #1123 from oliverw1/patch-2
docs: bring docs in line with code
2020-09-02 11:12:31 -07:00
Cody
2ff791968a Merge pull request #1122 from oliverw1/patch-1
docs: Move unintended side columns below the main rows
2020-09-02 11:11:58 -07:00
Harneet Virk
7186127804 Merge pull request #1128 from Azure/release_update/Release-63
update samples from Release-63 as a part of  SDK release
2020-08-31 13:23:08 -07:00
amlrelsa-ms
b01c52bfd6 update samples from Release-63 as a part of SDK release 2020-08-31 20:00:07 +00:00
Oliver W
28be7bcf58 docs: bring docs in line with code
A non-existant name was being referred to, which only serves confusion.
2020-08-28 10:24:24 +02:00
Oliver W
37a9350fde Properly format markdown table
Remove the unintended two columns that appeared on the right side
2020-08-28 09:29:46 +02:00
Harneet Virk
5080053a35 Merge pull request #1120 from Azure/release_update/Release-62
update samples from Release-62 as a part of  SDK release
2020-08-27 17:12:05 -07:00
amlrelsa-ms
3c02102691 update samples from Release-62 as a part of SDK release 2020-08-27 23:28:05 +00:00
Sheri Gilley
07e1676762 Merge pull request #1010 from GinSiuCheng/patch-1
Include additional details on user authentication
2020-08-25 11:45:58 -05:00
Sheri Gilley
919a3c078f fix code blocks 2020-08-25 11:13:24 -05:00
Sheri Gilley
9b53c924ed add code block for better formatting 2020-08-25 11:09:56 -05:00
Sheri Gilley
04ad58056f fix quotes 2020-08-25 11:06:18 -05:00
Sheri Gilley
576bf386b5 fix quotes 2020-08-25 11:05:25 -05:00
Cody
7e62d1cfd6 Merge pull request #891 from Fokko/patch-1
Don't print the access token
2020-08-22 18:28:33 -07:00
Cody
ec67a569af Merge pull request #804 from omartin2010/patch-3
typo
2020-08-17 14:35:55 -07:00
Cody
6d1e80bcef Merge pull request #1031 from hyoshioka0128/patch-1
Typo "Mircosoft"→"Microsoft"
2020-08-17 14:32:44 -07:00
mx-iao
db00d9ad3c Merge pull request #1100 from Azure/lostmygithubaccount-patch-1
fix minor typo in how-to-use-azureml/README.md
2020-08-17 14:30:18 -07:00
Harneet Virk
d33c75abc3 Merge pull request #1104 from Azure/release_update/Release-61
update samples from Release-61 as a part of  SDK release
2020-08-17 10:59:39 -07:00
amlrelsa-ms
d0dc4836ae update samples from Release-61 as a part of SDK release 2020-08-17 17:45:26 +00:00
Cody
982f8fcc1d Update README.md 2020-08-14 15:25:39 -07:00
Akshaya Annavajhala
79739b5e1b Remove broken links (#1095)
* Remove broken links

* Update README.md
2020-08-10 19:35:41 -04:00
Harneet Virk
aac4fa1fb9 Merge pull request #1081 from Azure/release_update/Release-60
update samples from Release-60 as a part of  SDK 1.11.0 release
2020-08-04 00:04:38 -07:00
Hiroshi Yoshioka
9c32ca9db5 Typo "Mircosoft"→"Microsoft"
https://docs.microsoft.com/en-us/samples/azure/machinelearningnotebooks/azure-machine-learning-service-example-notebooks/
2020-06-29 12:21:23 +09:00
Gin
745b4f0624 Include additional details on user authentication
Additional details should be included for user authentication esp. for enterprise users who may have more than one single aad tenant linked to a user.
2020-06-13 21:24:56 -04:00
Fokko Driesprong
119fd0a8f6 Don't print the access token
That's never a good idea, no exceptions :)
2020-03-31 08:14:05 +02:00
Olivier Martin
d4a486827d typo 2020-02-17 17:16:47 -05:00
93 changed files with 2995 additions and 4723 deletions

View File

@@ -65,7 +65,7 @@ Visit following repos to see projects contributed by Azure ML users:
- [UMass Amherst Student Samples](https://github.com/katiehouse3/microsoft-azure-ml-notebooks) - A number of end-to-end machine learning notebooks, including machine translation, image classification, and customer churn, created by students in the 696DS course at UMass Amherst.
## Data/Telemetry
This repository collects usage data and sends it to Mircosoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
This repository collects usage data and sends it to Microsoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement)
To opt out of tracking, please go to the raw markdown or .ipynb files and remove the following line of code:

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -4,7 +4,7 @@ Learn how to use Azure Machine Learning services for experimentation and model m
As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.
* [train-within-notebook](./training/train-within-notebook): Train a model hile tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
* [train-within-notebook](./training/train-within-notebook): Train a model while tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.

View File

@@ -155,7 +155,7 @@ jupyter notebook
- Continuous retraining using Pipelines and Time-Series TabularDataset
- [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
- Classification with text data using deep learning in AutoML
- Classification with text data using deep learning in automated ML
- AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data.
- Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used.
- Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.

View File

@@ -6,12 +6,12 @@ dependencies:
- python>=3.5.2,<3.6.8
- nb_conda
- matplotlib==2.1.0
- numpy~=1.16.0
- numpy~=1.18.0
- cython
- urllib3<1.24
- scipy==1.4.1
- scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4
- scikit-learn==0.22.1
- pandas==0.25.1
- py-xgboost<=0.90
- conda-forge::fbprophet==0.5
- holidays==0.9.11
@@ -20,12 +20,9 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-train-automl
- azureml-train
- azureml-widgets
- azureml-pipeline
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.13.0/validated_win32_requirements.txt [--no-deps]

View File

@@ -0,0 +1,28 @@
name: azure_automl
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.6.8
- nb_conda
- matplotlib==2.1.0
- numpy~=1.18.0
- cython
- urllib3<1.24
- scipy==1.4.1
- scikit-learn==0.22.1
- pandas==0.25.1
- py-xgboost<=0.90
- conda-forge::fbprophet==0.5
- holidays==0.9.11
- pytorch::pytorch=1.4.0
- cudatoolkit=10.1.243
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.13.0/validated_linux_requirements.txt [--no-deps]

View File

@@ -7,12 +7,12 @@ dependencies:
- python>=3.5.2,<3.6.8
- nb_conda
- matplotlib==2.1.0
- numpy~=1.16.0
- numpy~=1.18.0
- cython
- urllib3<1.24
- scipy==1.4.1
- scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4
- scikit-learn==0.22.1
- pandas==0.25.1
- py-xgboost<=0.90
- conda-forge::fbprophet==0.5
- holidays==0.9.11
@@ -21,11 +21,8 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-train-automl
- azureml-train
- azureml-widgets
- azureml-pipeline
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.13.0/validated_darwin_requirements.txt [--no-deps]

View File

@@ -12,7 +12,7 @@ fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
AUTOML_ENV_FILE="automl_env_linux.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then

View File

@@ -89,7 +89,7 @@
"from azureml.automl.core.featurization import FeaturizationConfig\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.explain.model._internal.explanation_client import ExplanationClient"
"from azureml.interpret._internal.explanation_client import ExplanationClient"
]
},
{
@@ -105,7 +105,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -733,24 +733,6 @@
"print(aci_service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Web Service\n",
"\n",
"Deletes the specified web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -775,7 +757,9 @@
"source": [
"## Test\n",
"\n",
"Now that the model is trained, run the test data through the trained model to get the predicted values."
"Now that the model is trained, run the test data through the trained model to get the predicted values. This calls the ACI web service to do the prediction.\n",
"\n",
"Note that the JSON passed to the ACI web service is an array of rows of data. Each row should either be an array of values in the same order that was used for training or a dictionary where the keys are the same as the column names used for training. The example below uses dictionary rows."
]
},
{
@@ -815,10 +799,27 @@
"metadata": {},
"outputs": [],
"source": [
"y_pred = fitted_model.predict(X_test)\n",
"import json\n",
"import requests\n",
"\n",
"X_test_json = X_test.to_json(orient='records')\n",
"data = \"{\\\"data\\\": \" + X_test_json +\"}\"\n",
"headers = {'Content-Type': 'application/json'}\n",
"\n",
"resp = requests.post(aci_service.scoring_uri, data, headers=headers)\n",
"\n",
"y_pred = json.loads(json.loads(resp.text))['result']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"actual = array(y_test)\n",
"actual = actual[:,0]\n",
"print(y_pred.shape, \" \", actual.shape)"
"print(len(y_pred), \" \", len(actual))"
]
},
{
@@ -827,8 +828,7 @@
"source": [
"### Calculate metrics for the prediction\n",
"\n",
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
"from the trained model that was returned."
"Now visualize the data as a confusion matrix that compared the predicted values against the actual values.\n"
]
},
{
@@ -838,12 +838,45 @@
"outputs": [],
"source": [
"%matplotlib notebook\n",
"test_pred = plt.scatter(actual, y_pred, color='b')\n",
"test_test = plt.scatter(actual, actual, color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"from sklearn.metrics import confusion_matrix\n",
"import numpy as np\n",
"import itertools\n",
"\n",
"cf =confusion_matrix(actual,y_pred)\n",
"plt.imshow(cf,cmap=plt.cm.Blues,interpolation='nearest')\n",
"plt.colorbar()\n",
"plt.title('Confusion Matrix')\n",
"plt.xlabel('Predicted')\n",
"plt.ylabel('Actual')\n",
"class_labels = ['no','yes']\n",
"tick_marks = np.arange(len(class_labels))\n",
"plt.xticks(tick_marks,class_labels)\n",
"plt.yticks([-0.5,0,1,1.5],['','no','yes',''])\n",
"# plotting text value inside cells\n",
"thresh = cf.max() / 2.\n",
"for i,j in itertools.product(range(cf.shape[0]),range(cf.shape[1])):\n",
" plt.text(j,i,format(cf[i,j],'d'),horizontalalignment='center',color='white' if cf[i,j] >thresh else 'black')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Delete a Web Service\n",
"\n",
"Deletes the specified web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"aci_service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -151,6 +151,8 @@
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"num_nodes = 2\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"dnntext-cluster\"\n",
"\n",
@@ -163,7 +165,7 @@
" # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
" # or similar GPU option\n",
" # available in your workspace\n",
" max_nodes = 1)\n",
" max_nodes = num_nodes)\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
"\n",
"compute_target.wait_for_completion(show_output=True)"
@@ -270,7 +272,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)."
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n",
"\n",
"This notebook uses the blocked_models parameter to exclude some models that can take a longer time to train on some text datasets. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
]
},
{
@@ -282,7 +286,7 @@
"automl_settings = {\n",
" \"experiment_timeout_minutes\": 20,\n",
" \"primary_metric\": 'accuracy',\n",
" \"max_concurrent_iterations\": 4, \n",
" \"max_concurrent_iterations\": num_nodes, \n",
" \"max_cores_per_iteration\": -1,\n",
" \"enable_dnn\": True,\n",
" \"enable_early_stopping\": True,\n",
@@ -297,6 +301,7 @@
" compute_target=compute_target,\n",
" training_data=train_dataset,\n",
" label_column_name=target_column_name,\n",
" blocked_models = ['LightGBM'],\n",
" **automl_settings\n",
" )"
]

View File

@@ -1,6 +1,5 @@
import pandas as pd
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.estimator import Estimator
from azureml.core.run import Run
@@ -8,13 +7,7 @@ from azureml.core.run import Run
def run_inference(test_experiment, compute_target, script_folder, train_run,
train_dataset, test_dataset, target_column_name, model_name):
train_run.download_file('outputs/conda_env_v_1_0_0.yml',
'inference/condafile.yml')
inference_env = Environment("myenv")
inference_env.docker.enabled = True
inference_env.python.conda_dependencies = CondaDependencies(
conda_dependencies_file_path='inference/condafile.yml')
inference_env = train_run.get_environment()
est = Estimator(source_directory=script_folder,
entry_script='infer.py',

View File

@@ -88,7 +88,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -204,7 +204,6 @@
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', 'applicationinsights', 'azureml-opendatasets', 'azureml-defaults'], \n",
" conda_packages=['numpy==1.16.2'], \n",
" pin_sdk_version=False)\n",
"#cd.add_pip_package('azureml-explain-model')\n",
"conda_run_config.environment.python.conda_dependencies = cd\n",
"\n",
"print('run config is ready')"

View File

@@ -0,0 +1,92 @@
# Experimental Notebooks for Automated ML
Notebooks listed in this folder are leveraging experimental features. Namespaces or function signitures may change in future SDK releases. The notebooks published here will reflect the latest supported APIs. All of these notebooks can run on a client-only installation of the Automated ML SDK.
The client only installation doesn't contain any of the machine learning libraries, such as scikit-learn, xgboost, or tensorflow, making it much faster to install and is less likely to conflict with any packages in an existing environment. However, since the ML libraries are not available locally, models cannot be downloaded and loaded directly in the client. To replace the functionality of having models locally, these notebooks also demonstrate the ModelProxy feature which will allow you to submit a predict/forecast to the training environment.
<a name="localconda"></a>
## Setup using a Local Conda environment
To run these notebook on your own notebook server, use these installation instructions.
The instructions below will install everything you need and then start a Jupyter notebook.
If you would like to use a lighter-weight version of the client that does not install all of the machine learning libraries locally, you can leverage the [experimental notebooks.](experimental/README.md)
### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
- **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V). If you have a previous version installed, you can update it using the command: conda update conda.
There's no need to install mini-conda specifically.
### 2. Downloading the sample notebooks
- Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory. The automated ML sample notebooks are in the "automated-machine-learning" folder.
### 3. Setup a new conda environment
The **automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter. The default conda environment name is azure_automl. The exact command depends on the operating system. See the specific sections below for Windows, Mac and Linux. It can take about 10 minutes to execute.
Packages installed by the **automl_setup** script:
<ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>pandas</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
For more details refer to the [automl_env.yml](./automl_env.yml)
## Windows
Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
```
automl_setup
```
## Mac
Install "Command line developer tools" if it is not already installed (you can use the command: `xcode-select --install`).
Start a Terminal windows, cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
```
bash automl_setup_mac.sh
```
## Linux
cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
```
bash automl_setup_linux.sh
```
### 4. Running configuration.ipynb
- Before running any samples you next need to run the configuration notebook. Click on [configuration](../../configuration.ipynb) notebook
- Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
### 5. Running Samples
- Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
- Follow the instructions in the individual notebooks to explore various features in automated ML.
### 6. Starting jupyter notebook manually
To start your Jupyter notebook manually, use:
```
conda activate azure_automl
jupyter notebook
```
or on Mac or Linux:
```
source activate azure_automl
jupyter notebook
```
<a name="samples"></a>
# Automated ML SDK Sample Notebooks
- [auto-ml-regression.ipynb](regression/auto-ml-regression.ipynb)
- Dataset: Hardware Performance Dataset
- Simple example of using automated ML for regression
- Uses azure compute for training
- Uses ModelProxy for submitting prediction to training environment on azure compute
<a name="documentation"></a>
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
<a name="pythoncommand"></a>
# Running using python command
Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
You can then run this file using the python command.
However, on Windows the file needs to be modified before it can be run.
The following condition must be added to the main code in the file:
if __name__ == "__main__":
The main code of the file must be indented so that it is under this condition.

View File

@@ -0,0 +1,20 @@
name: azure_automl_experimental
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.8
- nb_conda
- matplotlib==2.1.0
- numpy~=1.18.0
- cython
- urllib3<1.24
- scikit-learn==0.22.1
- pandas==0.25.1
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-sdk
- azureml-widgets
- azureml-explain-model

View File

@@ -0,0 +1,21 @@
name: azure_automl_experimental
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- nomkl
- python>=3.5.2,<3.8
- nb_conda
- matplotlib==2.1.0
- numpy~=1.18.0
- cython
- urllib3<1.24
- scikit-learn==0.22.1
- pandas==0.25.1
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-defaults
- azureml-sdk
- azureml-widgets
- azureml-explain-model

View File

@@ -0,0 +1,63 @@
@echo off
set conda_env_name=%1
set automl_env_file=%2
set options=%3
set PIP_NO_WARN_SCRIPT_LOCATION=0
IF "%conda_env_name%"=="" SET conda_env_name="azure_automl_experimental"
IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
IF NOT EXIST %automl_env_file% GOTO YmlMissing
IF "%CONDA_EXE%"=="" GOTO CondaMissing
call conda activate %conda_env_name% 2>nul:
if not errorlevel 1 (
echo Upgrading existing conda environment %conda_env_name%
call pip uninstall azureml-train-automl -y -q
call conda env update --name %conda_env_name% --file %automl_env_file%
if errorlevel 1 goto ErrorExit
) else (
call conda env create -f %automl_env_file% -n %conda_env_name%
)
call conda activate %conda_env_name% 2>nul:
if errorlevel 1 goto ErrorExit
call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
REM azureml.widgets is now installed as part of the pip install under the conda env.
REM Removing the old user install so that the notebooks will use the latest widget.
call jupyter nbextension uninstall --user --py azureml.widgets
echo.
echo.
echo ***************************************
echo * AutoML setup completed successfully *
echo ***************************************
IF NOT "%options%"=="nolaunch" (
echo.
echo Starting jupyter notebook - please run the configuration notebook
echo.
jupyter notebook --log-level=50 --notebook-dir='..\..'
)
goto End
:CondaMissing
echo Please run this script from an Anaconda Prompt window.
echo You can start an Anaconda Prompt window by
echo typing Anaconda Prompt on the Start menu.
echo If you don't see the Anaconda Prompt app, install Miniconda.
echo If you are running an older version of Miniconda or Anaconda,
echo you can upgrade using the command: conda update conda
goto End
:YmlMissing
echo File %automl_env_file% not found.
:ErrorExit
echo Install failed
:End

View File

@@ -0,0 +1,53 @@
#!/bin/bash
CONDA_ENV_NAME=$1
AUTOML_ENV_FILE=$2
OPTIONS=$3
PIP_NO_WARN_SCRIPT_LOCATION=0
if [ "$CONDA_ENV_NAME" == "" ]
then
CONDA_ENV_NAME="azure_automl_experimental"
fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then
echo "File $AUTOML_ENV_FILE not found"
exit 1
fi
if source activate $CONDA_ENV_NAME 2> /dev/null
then
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
pip uninstall azureml-train-automl -y -q
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
jupyter nbextension uninstall --user --py azureml.widgets
else
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
source activate $CONDA_ENV_NAME &&
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
jupyter nbextension uninstall --user --py azureml.widgets &&
echo "" &&
echo "" &&
echo "***************************************" &&
echo "* AutoML setup completed successfully *" &&
echo "***************************************" &&
if [ "$OPTIONS" != "nolaunch" ]
then
echo "" &&
echo "Starting jupyter notebook - please run the configuration notebook" &&
echo "" &&
jupyter notebook --log-level=50 --notebook-dir '../..'
fi
fi
if [ $? -gt 0 ]
then
echo "Installation failed"
fi

View File

@@ -0,0 +1,55 @@
#!/bin/bash
CONDA_ENV_NAME=$1
AUTOML_ENV_FILE=$2
OPTIONS=$3
PIP_NO_WARN_SCRIPT_LOCATION=0
if [ "$CONDA_ENV_NAME" == "" ]
then
CONDA_ENV_NAME="azure_automl_experimental"
fi
if [ "$AUTOML_ENV_FILE" == "" ]
then
AUTOML_ENV_FILE="automl_env.yml"
fi
if [ ! -f $AUTOML_ENV_FILE ]; then
echo "File $AUTOML_ENV_FILE not found"
exit 1
fi
if source activate $CONDA_ENV_NAME 2> /dev/null
then
echo "Upgrading existing conda environment" $CONDA_ENV_NAME
pip uninstall azureml-train-automl -y -q
conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
jupyter nbextension uninstall --user --py azureml.widgets
else
conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
source activate $CONDA_ENV_NAME &&
conda install lightgbm -c conda-forge -y &&
python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
jupyter nbextension uninstall --user --py azureml.widgets &&
echo "" &&
echo "" &&
echo "***************************************" &&
echo "* AutoML setup completed successfully *" &&
echo "***************************************" &&
if [ "$OPTIONS" != "nolaunch" ]
then
echo "" &&
echo "Starting jupyter notebook - please run the configuration notebook" &&
echo "" &&
jupyter notebook --log-level=50 --notebook-dir '../..'
fi
fi
if [ $? -gt 0 ]
then
echo "Installation failed"
fi

View File

@@ -0,0 +1,481 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/experimental/regression/auto-ml-regression.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Regression with Aml Compute**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
"\n",
"If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Configure AutoML using `AutoMLConfig`.\n",
"3. Train the model using remote compute.\n",
"4. Explore the results.\n",
"5. Test the best fitted model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
" \n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# Choose a name for the experiment.\n",
"experiment_name = 'automl-regression-model-proxy'\n",
"\n",
"experiment = Experiment(ws, experiment_name)\n",
"\n",
"output = {}\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Run History Name'] = experiment_name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Using AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you use `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"reg-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"compute_target.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load Data\n",
"Load the hardware dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"\n",
"# Split the dataset into train and test datasets\n",
"train_data, test_data = dataset.random_split(percentage=0.8, seed=223)\n",
"\n",
"label = \"ERP\"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification, regression or forecasting|\n",
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"\n",
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"automlconfig-remarks-sample"
]
},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": 'r2_score',\n",
" \"enable_early_stopping\": True, \n",
" \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n",
" \"max_concurrent_iterations\": 4,\n",
" \"max_cores_per_iteration\": -1,\n",
" \"verbosity\": logging.INFO,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(task = 'regression',\n",
" compute_target = compute_target,\n",
" training_data = train_data,\n",
" label_column_name = label,\n",
" **automl_settings\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. Execution of remote runs is asynchronous. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# If you need to retrieve a run that already started, use the following code\n",
"#from azureml.train.automl.run import AutoMLRun\n",
"#remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show() "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.wait_for_completion()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Child Run\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_best_child` method returns the best run. Overloads on `get_best_child` allow you to retrieve the best run for *any* logged metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run = remote_run.get_best_child()\n",
"print(best_run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Child Run Based on Any Other Metric\n",
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"root_mean_squared_error\"\n",
"best_run = remote_run.get_best_child(metric = lookup_metric)\n",
"print(best_run)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# preview the first 3 rows of the dataset\n",
"\n",
"test_data = test_data.to_pandas_dataframe()\n",
"y_test = test_data['ERP'].fillna(0)\n",
"test_data = test_data.drop('ERP', 1)\n",
"test_data = test_data.fillna(0)\n",
"\n",
"\n",
"train_data = train_data.to_pandas_dataframe()\n",
"y_train = train_data['ERP'].fillna(0)\n",
"train_data = train_data.drop('ERP', 1)\n",
"train_data = train_data.fillna(0)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating ModelProxy for submitting prediction runs to the training environment.\n",
"We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl.model_proxy import ModelProxy\n",
"best_model_proxy = ModelProxy(best_run)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_pred_train = best_model_proxy.predict(train_data).to_pandas_dataframe()\n",
"y_residual_train = y_train - y_pred_train\n",
"\n",
"y_pred_test = best_model_proxy.predict(test_data).to_pandas_dataframe()\n",
"y_residual_test = y_test - y_pred_test"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from sklearn.metrics import mean_squared_error, r2_score\n",
"\n",
"# Set up a multi-plot chart.\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
"f.set_figheight(6)\n",
"f.set_figwidth(16)\n",
"\n",
"# Plot residual values of training set.\n",
"a0.axis([0, 360, -100, 100])\n",
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
"a0.set_xlabel('Training samples', fontsize = 12)\n",
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
"\n",
"# Plot residual values of test set.\n",
"a1.axis([0, 90, -100, 100])\n",
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
"a1.set_xlabel('Test samples', fontsize = 12)\n",
"a1.set_yticklabels([])\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
"test_test = plt.scatter(y_test, y_test, color='g')\n",
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "rakellam"
}
],
"categories": [
"how-to-use-azureml",
"automated-machine-learning"
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: auto-ml-regression-model-proxy
dependencies:
- pip:
- azureml-sdk

View File

@@ -114,7 +114,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -217,7 +217,7 @@
"\n",
"**Time column** is the time axis along which to predict.\n",
"\n",
"**Grain** is another word for an individual time series in your dataset. Grains are identified by values of the columns listed `grain_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
"**Time series identifier columns** are identified by values of the columns listed `time_series_id_column_names`, for example \"store\" and \"item\" if your data has multiple time series of sales, one series for each combination of store and item sold.\n",
"\n",
"This dataset has only one time series. Please see the [orange juice notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales) for an example of a multi-time series dataset."
]
@@ -269,7 +269,7 @@
"source": [
"target_column_name = 'BeerProduction'\n",
"time_column_name = 'DATE'\n",
"grain_column_names = []\n",
"time_series_id_column_names = []\n",
"freq = 'M' #Monthly data"
]
},
@@ -329,7 +329,7 @@
},
"outputs": [],
"source": [
"max_horizon = 12"
"forecast_horizon = 12"
]
},
{
@@ -364,11 +364,10 @@
},
"outputs": [],
"source": [
"automl_settings = {\n",
" 'time_column_name': time_column_name,\n",
" 'max_horizon': max_horizon,\n",
" 'enable_dnn' : True,\n",
"}\n",
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name, forecast_horizon=forecast_horizon\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
@@ -380,7 +379,8 @@
" compute_target=compute_target,\n",
" max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n",
" **automl_settings)"
" enable_dnn=True,\n",
" forecasting_parameters=forecasting_parameters)"
]
},
{
@@ -581,7 +581,7 @@
"source": [
"from helper import run_inference\n",
"\n",
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run, test_dataset, valid_dataset, max_horizon,\n",
"test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run, test_dataset, valid_dataset, forecast_horizon,\n",
" target_column_name, time_column_name, freq)"
]
},
@@ -603,7 +603,7 @@
"from helper import run_multiple_inferences\n",
"\n",
"summary_df = run_multiple_inferences(summary_df, experiment, test_experiment, compute_target, script_folder, test_dataset, \n",
" valid_dataset, max_horizon, target_column_name, time_column_name, freq)"
" valid_dataset, forecast_horizon, target_column_name, time_column_name, freq)"
]
},
{

View File

@@ -87,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -238,6 +238,22 @@
"test.to_pandas_dataframe().head(5).reset_index(drop=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting Parameters\n",
"To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**time_column_name**|The name of your time column.|\n",
"|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
"|**country_or_region_for_holidays**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
"|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -257,11 +273,7 @@
"|**compute_target**|The remote compute for training.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n",
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|\n",
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
"|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
"|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
"|**forecasting_parameters**|A class that holds all the forecasting related parameters.|\n",
"\n",
"This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
]
@@ -281,7 +293,7 @@
"metadata": {},
"outputs": [],
"source": [
"max_horizon = 14"
"forecast_horizon = 14"
]
},
{
@@ -297,13 +309,14 @@
"metadata": {},
"outputs": [],
"source": [
"time_series_settings = {\n",
" 'time_column_name': time_column_name,\n",
" 'max_horizon': max_horizon, \n",
" 'country_or_region': 'US', # set country_or_region will trigger holiday featurizer\n",
" 'target_lags': 'auto', # use heuristic based lag setting \n",
" 'drop_column_names': ['casual', 'registered'] # these columns are a breakdown of the total and therefore a leak\n",
"}\n",
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name,\n",
" forecast_horizon=forecast_horizon,\n",
" country_or_region_for_holidays='US', # set country_or_region will trigger holiday featurizer\n",
" target_lags='auto', # use heuristic based lag setting \n",
" drop_column_names=['casual', 'registered'] # these columns are a breakdown of the total and therefore a leak\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
@@ -317,7 +330,7 @@
" max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n",
" verbosity=logging.INFO,\n",
" **time_series_settings)"
" forecasting_parameters=forecasting_parameters)"
]
},
{
@@ -422,7 +435,7 @@
"source": [
"We now use the best fitted model from the AutoML Run to make forecasts for the test set. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
"\n",
"The scoring will run on a remote compute. In this example, it will reuse the training compute.|"
"The scoring will run on a remote compute. In this example, it will reuse the training compute."
]
},
{
@@ -439,7 +452,7 @@
"metadata": {},
"source": [
"### Retrieving forecasts from the model\n",
"To run the forecast on the remote compute we will use two helper scripts: forecasting_script and forecasting_helper. These scripts contain the utility methods which will be used by the remote estimator. We copy these scripts to the project folder to upload them to remote compute."
"To run the forecast on the remote compute we will use a helper script: forecasting_script. This script contains the utility methods which will be used by the remote estimator. We copy the script to the project folder to upload it to remote compute."
]
},
{
@@ -453,15 +466,14 @@
"\n",
"script_folder = os.path.join(os.getcwd(), 'forecast')\n",
"os.makedirs(script_folder, exist_ok=True)\n",
"shutil.copy('forecasting_script.py', script_folder)\n",
"shutil.copy('forecasting_helper.py', script_folder)"
"shutil.copy('forecasting_script.py', script_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For brevity we have created the function called run_forecast. It submits the test data to the best model and run the estimation on the selected compute target. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
"For brevity, we have created a function called run_forecast that submits the test data to the best model determined during the training run and retrieves forecasts. The test set is longer than the forecast horizon specified at train time, so the forecasting script uses a so-called rolling evaluation to generate predictions over the whole test set. A rolling evaluation iterates the forecaster over the test set, using the actuals in the test set to make lag features as needed. "
]
},
{
@@ -472,8 +484,7 @@
"source": [
"from run_forecast import run_rolling_forecast\n",
"\n",
"remote_run = run_rolling_forecast(test_experiment, compute_target, best_run, test, max_horizon,\n",
" target_column_name, time_column_name)\n",
"remote_run = run_rolling_forecast(test_experiment, compute_target, best_run, test, target_column_name)\n",
"remote_run"
]
},
@@ -537,7 +548,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The MAPE seems high; it is being skewed by an actual with a small absolute value. For a more informative evaluation, we can calculate the metrics by forecast horizon:"
"Since we did a rolling evaluation on the test set, we can analyze the predictions by their forecast horizon relative to the rolling origin. The model was initially trained at a forecast horizon of 14, so each prediction from the model is associated with a horizon value from 1 to 14. The horizon values are in a column named, \"horizon_origin,\" in the prediction set. For example, we can calculate some of the error metrics grouped by the horizon:"
]
},
{
@@ -557,7 +568,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"It's also interesting to see the distributions of APE (absolute percentage error) by horizon. On a log scale, the outlying APE in the horizon-3 group is clear."
"To drill down more, we can look at the distributions of APE (absolute percentage error) by horizon. From the chart, it is clear that the overall MAPE is being skewed by one particular point where the actual value is of small absolute value."
]
},
{
@@ -567,7 +578,7 @@
"outputs": [],
"source": [
"df_all_APE = df_all.assign(APE=APE(df_all[target_column_name], df_all['predicted']))\n",
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, max_horizon + 1)]\n",
"APEs = [df_all_APE[df_all['horizon_origin'] == h].APE.values for h in range(1, forecast_horizon + 1)]\n",
"\n",
"%matplotlib inline\n",
"plt.boxplot(APEs)\n",
@@ -631,5 +642,5 @@
"version": 3
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -1,99 +0,0 @@
import pandas as pd
import numpy as np
from pandas.tseries.frequencies import to_offset
def align_outputs(y_predicted, X_trans, X_test, y_test, target_column_name,
predicted_column_name='predicted',
horizon_colname='horizon_origin'):
"""
Demonstrates how to get the output aligned to the inputs
using pandas indexes. Helps understand what happened if
the output's shape differs from the input shape, or if
the data got re-sorted by time and grain during forecasting.
Typical causes of misalignment are:
* we predicted some periods that were missing in actuals -> drop from eval
* model was asked to predict past max_horizon -> increase max horizon
* data at start of X_test was needed for lags -> provide previous periods
"""
if (horizon_colname in X_trans):
df_fcst = pd.DataFrame({predicted_column_name: y_predicted,
horizon_colname: X_trans[horizon_colname]})
else:
df_fcst = pd.DataFrame({predicted_column_name: y_predicted})
# y and X outputs are aligned by forecast() function contract
df_fcst.index = X_trans.index
# align original X_test to y_test
X_test_full = X_test.copy()
X_test_full[target_column_name] = y_test
# X_test_full's index does not include origin, so reset for merge
df_fcst.reset_index(inplace=True)
X_test_full = X_test_full.reset_index().drop(columns='index')
together = df_fcst.merge(X_test_full, how='right')
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = together[together[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
return(clean)
def do_rolling_forecast(fitted_model, X_test, y_test, target_column_name,
time_column_name, max_horizon, freq='D'):
"""
Produce forecasts on a rolling origin over the given test set.
Each iteration makes a forecast for the next 'max_horizon' periods
with respect to the current origin, then advances the origin by the
horizon time duration. The prediction context for each forecast is set so
that the forecaster uses the actual target values prior to the current
origin time for constructing lag features.
This function returns a concatenated DataFrame of rolling forecasts.
"""
df_list = []
origin_time = X_test[time_column_name].min()
while origin_time <= X_test[time_column_name].max():
# Set the horizon time - end date of the forecast
horizon_time = origin_time + max_horizon * to_offset(freq)
# Extract test data from an expanding window up-to the horizon
expand_wind = (X_test[time_column_name] < horizon_time)
X_test_expand = X_test[expand_wind]
y_query_expand = np.zeros(len(X_test_expand)).astype(np.float)
y_query_expand.fill(np.NaN)
if origin_time != X_test[time_column_name].min():
# Set the context by including actuals up-to the origin time
test_context_expand_wind = (X_test[time_column_name] < origin_time)
context_expand_wind = (
X_test_expand[time_column_name] < origin_time)
y_query_expand[context_expand_wind] = y_test[
test_context_expand_wind]
# Make a forecast out to the maximum horizon
y_fcst, X_trans = fitted_model.forecast(X_test_expand, y_query_expand)
# Align forecast with test set for dates within the
# current rolling window
trans_tindex = X_trans.index.get_level_values(time_column_name)
trans_roll_wind = (trans_tindex >= origin_time) & (
trans_tindex < horizon_time)
test_roll_wind = expand_wind & (
X_test[time_column_name] >= origin_time)
df_list.append(align_outputs(y_fcst[trans_roll_wind],
X_trans[trans_roll_wind],
X_test[test_roll_wind],
y_test[test_roll_wind],
target_column_name))
# Advance the origin time
origin_time = horizon_time
return pd.concat(df_list, ignore_index=True)

View File

@@ -1,37 +1,21 @@
import argparse
import azureml.train.automl
from azureml.automl.runtime.shared import forecasting_models
from azureml.core import Run
from sklearn.externals import joblib
import forecasting_helper
parser = argparse.ArgumentParser()
parser.add_argument(
'--max_horizon', type=int, dest='max_horizon',
default=10, help='Max Horizon for forecasting')
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--time_column_name', type=str, dest='time_column_name',
help='Time Column Name')
parser.add_argument(
'--frequency', type=str, dest='freq',
help='Frequency of prediction')
args = parser.parse_args()
max_horizon = args.max_horizon
target_column_name = args.target_column_name
time_column_name = args.time_column_name
freq = args.freq
run = Run.get_context()
# get input dataset by name
test_dataset = run.input_datasets['test_data']
grain_column_names = []
df = test_dataset.to_pandas_dataframe().reset_index(drop=True)
X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
@@ -39,14 +23,12 @@ y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[targ
fitted_model = joblib.load('model.pkl')
df_all = forecasting_helper.do_rolling_forecast(
fitted_model,
X_test_df,
y_test_df.values.T[0],
target_column_name,
time_column_name,
max_horizon,
freq)
y_pred, X_trans = fitted_model.rolling_evaluation(X_test_df, y_test_df.values)
# Add predictions, actuals, and horizon relative to rolling origin to the test feature data
assign_dict = {'horizon_origin': X_trans['horizon_origin'].values, 'predicted': y_pred,
target_column_name: y_test_df[target_column_name].values}
df_all = X_test_df.assign(**assign_dict)
file_name = 'outputs/predictions.csv'
export_csv = df_all.to_csv(file_name, header=True)

View File

@@ -5,8 +5,7 @@ from azureml.core.run import Run
def run_rolling_forecast(test_experiment, compute_target, train_run, test_dataset,
max_horizon, target_column_name, time_column_name,
freq='D', inference_folder='./forecast'):
target_column_name, inference_folder='./forecast'):
condafile = inference_folder + '/condafile.yml'
train_run.download_file('outputs/model.pkl',
inference_folder + '/model.pkl')
@@ -20,10 +19,7 @@ def run_rolling_forecast(test_experiment, compute_target, train_run, test_datase
est = Estimator(source_directory=inference_folder,
entry_script='forecasting_script.py',
script_params={
'--max_horizon': max_horizon,
'--target_column_name': target_column_name,
'--time_column_name': time_column_name,
'--frequency': freq
'--target_column_name': target_column_name
},
inputs=[test_dataset.as_named_input('test_data')],
compute_target=compute_target,

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -288,7 +288,20 @@
"metadata": {},
"outputs": [],
"source": [
"max_horizon = 48"
"forecast_horizon = 48"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting Parameters\n",
"To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**time_column_name**|The name of your time column.|\n",
"|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|"
]
},
{
@@ -297,7 +310,7 @@
"source": [
"## Train\n",
"\n",
"Instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. We can provide extra configurations within 'automl_settings', for this forecasting task we add the name of the time column and the maximum forecast horizon.\n",
"Instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. We can provide extra configurations within 'automl_settings', for this forecasting task we add the forecasting parameters to hold all the additional forecasting parameters.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
@@ -310,8 +323,7 @@
"|**compute_target**|The remote compute for training.|\n",
"|**n_cross_validations**|Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way.|\n",
"|**enable_early_stopping**|Flag to enble early termination if the score is not improving in the short term.|\n",
"|**time_column_name**|The name of your time column.|\n",
"|**max_horizon**|The number of periods out you would like to predict past your training data. Periods are inferred from your data.|\n"
"|**forecasting_parameters**|A class holds all the forecasting related parameters.|\n"
]
},
{
@@ -327,10 +339,10 @@
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" 'time_column_name': time_column_name,\n",
" 'max_horizon': max_horizon,\n",
"}\n",
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name, forecast_horizon=forecast_horizon\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
@@ -342,7 +354,7 @@
" enable_early_stopping=True,\n",
" n_cross_validations=3, \n",
" verbosity=logging.INFO,\n",
" **automl_settings)"
" forecasting_parameters=forecasting_parameters)"
]
},
{
@@ -550,7 +562,7 @@
"metadata": {},
"source": [
"## Advanced Training <a id=\"advanced_training\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
]
},
{
@@ -558,7 +570,7 @@
"metadata": {},
"source": [
"### Using lags and rolling window features\n",
"Now we will configure the target lags, that is the previous values of the target variables, meaning the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n",
"Now we will configure the target lags, that is the previous values of the target variables, meaning the prediction is no longer horizon-less. We therefore must still specify the `forecast_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n",
"\n",
"This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the iteration_timeout_minutes parameter value to get results."
]
@@ -569,12 +581,10 @@
"metadata": {},
"outputs": [],
"source": [
"automl_advanced_settings = {\n",
" 'time_column_name': time_column_name,\n",
" 'max_horizon': max_horizon,\n",
" 'target_lags': 12,\n",
" 'target_rolling_window_size': 4,\n",
"}\n",
"advanced_forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name, forecast_horizon=forecast_horizon,\n",
" target_lags=12, target_rolling_window_size=4\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting', \n",
" primary_metric='normalized_root_mean_squared_error',\n",
@@ -586,7 +596,7 @@
" enable_early_stopping = True,\n",
" n_cross_validations=3, \n",
" verbosity=logging.INFO,\n",
" **automl_advanced_settings)"
" forecasting_parameters=advanced_forecasting_parameters)"
]
},
{
@@ -635,7 +645,7 @@
"metadata": {},
"source": [
"## Advanced Results<a id=\"advanced_results\"></a>\n",
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
"We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
]
},
{

View File

@@ -94,7 +94,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -311,14 +311,15 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"lags = [1,2,3]\n",
"forecast_horizon = n_test_periods\n",
"time_series_settings = { \n",
" 'time_column_name': TIME_COLUMN_NAME,\n",
" 'time_series_id_column_names': [ TIME_SERIES_ID_COLUMN_NAME ],\n",
" 'forecast_horizon': forecast_horizon ,\n",
" 'target_lags': lags\n",
"}"
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=TIME_COLUMN_NAME,\n",
" forecast_horizon=forecast_horizon,\n",
" time_series_id_column_names=[ TIME_SERIES_ID_COLUMN_NAME ],\n",
" target_lags=lags\n",
")"
]
},
{
@@ -351,7 +352,7 @@
" max_concurrent_iterations=4,\n",
" max_cores_per_iteration=-1,\n",
" label_column_name=target_label,\n",
" **time_series_settings)\n",
" forecasting_parameters=forecasting_parameters)\n",
"\n",
"remote_run = experiment.submit(automl_config, show_output=False)"
]

View File

@@ -82,7 +82,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -178,7 +178,7 @@
"source": [
"Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
"\n",
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we thus define the **grain** - the columns whose values determine the boundaries between time-series: "
"The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we define the **time_series_id_column_names** - the columns whose values determine the boundaries between time-series: "
]
},
{
@@ -187,8 +187,8 @@
"metadata": {},
"outputs": [],
"source": [
"grain_column_names = ['Store', 'Brand']\n",
"nseries = data.groupby(grain_column_names).ngroups\n",
"time_series_id_column_names = ['Store', 'Brand']\n",
"nseries = data.groupby(time_series_id_column_names).ngroups\n",
"print('Data contains {0} individual time-series.'.format(nseries))"
]
},
@@ -207,7 +207,7 @@
"source": [
"use_stores = [2, 5, 8]\n",
"data_subset = data[data.Store.isin(use_stores)]\n",
"nseries = data_subset.groupby(grain_column_names).ngroups\n",
"nseries = data_subset.groupby(time_series_id_column_names).ngroups\n",
"print('Data subset contains {0} individual time-series.'.format(nseries))"
]
},
@@ -216,7 +216,7 @@
"metadata": {},
"source": [
"### Data Splitting\n",
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the grain columns."
"We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the time series identifier columns."
]
},
{
@@ -227,15 +227,15 @@
"source": [
"n_test_periods = 20\n",
"\n",
"def split_last_n_by_grain(df, n):\n",
" \"\"\"Group df by grain and split on last n rows for each group.\"\"\"\n",
"def split_last_n_by_series_id(df, n):\n",
" \"\"\"Group df by series identifiers and split on last n rows for each group.\"\"\"\n",
" df_grouped = (df.sort_values(time_column_name) # Sort by ascending time\n",
" .groupby(grain_column_names, group_keys=False))\n",
" .groupby(time_series_id_column_names, group_keys=False))\n",
" df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
" df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
" return df_head, df_tail\n",
"\n",
"train, test = split_last_n_by_grain(data_subset, n_test_periods)"
"train, test = split_last_n_by_series_id(data_subset, n_test_periods)"
]
},
{
@@ -301,11 +301,11 @@
"For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
"* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
"* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
"* Create grain-based features to enable fixed effects across different series\n",
"* Create features based on time series identifiers to enable fixed effects across different series\n",
"* Create time-based features to assist in learning seasonal patterns\n",
"* Encode categorical variables to numeric quantities\n",
"\n",
"In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please check out the forecasting grouping notebook. \n",
"In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please see the many-models notebook.\n",
"\n",
"You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
]
@@ -327,7 +327,7 @@
"\n",
"The featurization customization in forecasting is an advanced feature in AutoML which allows our customers to change the default forecasting featurization behaviors and column types through `FeaturizationConfig`. The supported scenarios include,\n",
"1. Column purposes update: Override feature type for the specified column. Currently supports DateTime, Categorical and Numeric. This customization can be used in the scenario that the type of the column cannot correctly reflect its purpose. Some numerical columns, for instance, can be treated as Categorical columns which need to be converted to categorical while some can be treated as epoch timestamp which need to be converted to datetime. To tell our SDK to correctly preprocess these columns, a configuration need to be add with the columns and their desired types.\n",
"2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods, the supported methods are constant for target data and mean, median, most frequent and constant for training data. This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
"2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods. The supported imputing methods for target column are constant and ffill (forward fill). The supported imputing methods for feature columns are mean, median, most frequent, constant and ffill (forward fill). This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
"3. Drop columns: Columns to drop from being featurized. These usually are the columns which are leaky or the columns contain no useful data.\n",
"\n",
"This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)"
@@ -350,7 +350,24 @@
"# Fill missing values in the target column, Quantity, with zeros.\n",
"featurization_config.add_transformer_params('Imputer', ['Quantity'], {\"strategy\": \"constant\", \"fill_value\": 0})\n",
"# Fill missing values in the INCOME column with median value.\n",
"featurization_config.add_transformer_params('Imputer', ['INCOME'], {\"strategy\": \"median\"})"
"featurization_config.add_transformer_params('Imputer', ['INCOME'], {\"strategy\": \"median\"})\n",
"# Fill missing values in the Price column with forward fill (last value carried forward).\n",
"featurization_config.add_transformer_params('Imputer', ['Price'], {\"strategy\": \"ffill\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Forecasting Parameters\n",
"To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
"\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**time_column_name**|The name of your time column.|\n",
"|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
"|**time_series_id_column_names**|The column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined, the data set is assumed to be one time series.|"
]
},
{
@@ -361,9 +378,9 @@
"\n",
"The [AutoMLConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters.\n",
"\n",
"For forecasting tasks, there are some additional parameters that can be set: the name of the column holding the date/time, the grain column names, and the maximum forecast horizon. A time column is required for forecasting, while the grain is optional. If grain columns are not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
"For forecasting tasks, there are some additional parameters that can be set in the `ForecastingParameters` class: the name of the column holding the date/time, the timeseries id column names, and the maximum forecast horizon. A time column is required for forecasting, while the time_series_id is optional. If time_series_id columns are not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
"\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the maximum horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning application that estimates the next month of sales should set the horizon according to suitable planning time-scales. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
"The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the forecast horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning application that estimates the next month of sales should set the horizon according to suitable planning time-scales. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
"\n",
"We note here that AutoML can sweep over two types of time-series models:\n",
"* Models that are trained for each series such as ARIMA and Facebook's Prophet. Note that these models are only available for [Enterprise Edition Workspaces](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace#upgrade).\n",
@@ -389,11 +406,8 @@
"|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models|\n",
"|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models|\n",
"|**debug_log**|Log file path for writing debugging information|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n",
"|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
"|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|\n",
"|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
"|**max_cores_per_iteration**|Maximum number of cores to utilize per iteration. A value of -1 indicates all available cores should be used.|"
"|**max_cores_per_iteration**|Maximum number of cores to utilize per iteration. A value of -1 indicates all available cores should be used"
]
},
{
@@ -402,11 +416,12 @@
"metadata": {},
"outputs": [],
"source": [
"time_series_settings = {\n",
" 'time_column_name': time_column_name,\n",
" 'grain_column_names': grain_column_names,\n",
" 'max_horizon': n_test_periods\n",
"}\n",
"from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=time_column_name,\n",
" forecast_horizon=n_test_periods,\n",
" time_series_id_column_names=time_series_id_column_names\n",
")\n",
"\n",
"automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_oj_sales_errors.log',\n",
@@ -420,7 +435,7 @@
" n_cross_validations=3,\n",
" verbosity=logging.INFO,\n",
" max_cores_per_iteration=-1,\n",
" **time_series_settings)"
" forecasting_parameters=forecasting_parameters)"
]
},
{
@@ -537,9 +552,8 @@
"metadata": {},
"outputs": [],
"source": [
"# The featurized data, aligned to y, will also be returned.\n",
"# forecast returns the predictions and the featurized data, aligned to X_test.\n",
"# This contains the assumptions that were made in the forecast\n",
"# and helps align the forecast to the original data\n",
"y_predictions, X_trans = fitted_model.forecast(X_test)"
]
},
@@ -560,7 +574,7 @@
"\n",
"To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). \n",
"\n",
"It is a good practice to always align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
"We'll add predictions and actuals into a single dataframe for convenience in calculating the metrics."
]
},
{
@@ -569,9 +583,8 @@
"metadata": {},
"outputs": [],
"source": [
"from forecasting_helper import align_outputs\n",
"\n",
"df_all = align_outputs(y_predictions, X_trans, X_test, y_test, target_column_name)"
"assign_dict = {'predicted': y_predictions, target_column_name: y_test}\n",
"df_all = X_test.assign(**assign_dict)"
]
},
{
@@ -794,5 +807,5 @@
"task": "Forecasting"
},
"nbformat": 4,
"nbformat_minor": 2
"nbformat_minor": 4
}

View File

@@ -1,98 +0,0 @@
import pandas as pd
import numpy as np
from pandas.tseries.frequencies import to_offset
def align_outputs(y_predicted, X_trans, X_test, y_test, target_column_name,
predicted_column_name='predicted',
horizon_colname='horizon_origin'):
"""
Demonstrates how to get the output aligned to the inputs
using pandas indexes. Helps understand what happened if
the output's shape differs from the input shape, or if
the data got re-sorted by time and grain during forecasting.
Typical causes of misalignment are:
* we predicted some periods that were missing in actuals -> drop from eval
* model was asked to predict past max_horizon -> increase max horizon
* data at start of X_test was needed for lags -> provide previous periods
"""
if (horizon_colname in X_trans):
df_fcst = pd.DataFrame({predicted_column_name: y_predicted,
horizon_colname: X_trans[horizon_colname]})
else:
df_fcst = pd.DataFrame({predicted_column_name: y_predicted})
# y and X outputs are aligned by forecast() function contract
df_fcst.index = X_trans.index
# align original X_test to y_test
X_test_full = X_test.copy()
X_test_full[target_column_name] = y_test
# X_test_full's index does not include origin, so reset for merge
df_fcst.reset_index(inplace=True)
X_test_full = X_test_full.reset_index().drop(columns='index')
together = df_fcst.merge(X_test_full, how='right')
# drop rows where prediction or actuals are nan
# happens because of missing actuals
# or at edges of time due to lags/rolling windows
clean = together[together[[target_column_name,
predicted_column_name]].notnull().all(axis=1)]
return(clean)
def do_rolling_forecast(fitted_model, X_test, y_test, target_column_name, time_column_name, max_horizon, freq='D'):
"""
Produce forecasts on a rolling origin over the given test set.
Each iteration makes a forecast for the next 'max_horizon' periods
with respect to the current origin, then advances the origin by the
horizon time duration. The prediction context for each forecast is set so
that the forecaster uses the actual target values prior to the current
origin time for constructing lag features.
This function returns a concatenated DataFrame of rolling forecasts.
"""
df_list = []
origin_time = X_test[time_column_name].min()
while origin_time <= X_test[time_column_name].max():
# Set the horizon time - end date of the forecast
horizon_time = origin_time + max_horizon * to_offset(freq)
# Extract test data from an expanding window up-to the horizon
expand_wind = (X_test[time_column_name] < horizon_time)
X_test_expand = X_test[expand_wind]
y_query_expand = np.zeros(len(X_test_expand)).astype(np.float)
y_query_expand.fill(np.NaN)
if origin_time != X_test[time_column_name].min():
# Set the context by including actuals up-to the origin time
test_context_expand_wind = (X_test[time_column_name] < origin_time)
context_expand_wind = (
X_test_expand[time_column_name] < origin_time)
y_query_expand[context_expand_wind] = y_test[
test_context_expand_wind]
# Make a forecast out to the maximum horizon
y_fcst, X_trans = fitted_model.forecast(X_test_expand, y_query_expand)
# Align forecast with test set for dates within the
# current rolling window
trans_tindex = X_trans.index.get_level_values(time_column_name)
trans_roll_wind = (trans_tindex >= origin_time) & (
trans_tindex < horizon_time)
test_roll_wind = expand_wind & (
X_test[time_column_name] >= origin_time)
df_list.append(align_outputs(y_fcst[trans_roll_wind],
X_trans[trans_roll_wind],
X_test[test_roll_wind],
y_test[test_roll_wind],
target_column_name))
# Advance the origin time
origin_time = horizon_time
return pd.concat(df_list, ignore_index=True)

View File

@@ -1,22 +0,0 @@
import pandas as pd
import numpy as np
def APE(actual, pred):
"""
Calculate absolute percentage error.
Returns a vector of APE values with same length as actual/pred.
"""
return 100 * np.abs((actual - pred) / actual)
def MAPE(actual, pred):
"""
Calculate mean absolute percentage error.
Remove NA and values where actual is close to zero
"""
not_na = ~(np.isnan(actual) | np.isnan(pred))
not_zero = ~np.isclose(actual, 0.0)
actual_safe = actual[not_na & not_zero]
pred_safe = pred[not_na & not_zero]
return np.mean(APE(actual_safe, pred_safe))

View File

@@ -80,7 +80,7 @@
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig\n",
"from azureml.explain.model._internal.explanation_client import ExplanationClient"
"from azureml.interpret._internal.explanation_client import ExplanationClient"
]
},
{
@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -354,7 +354,7 @@
"metadata": {},
"source": [
"## Explanation\n",
"In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. We will also show how to run the automl model and the explainer model through deploying an AKS web service.\n",
"In this section, we will show how to compute model explanations and visualize the explanations using azureml-interpret package. We will also show how to run the automl model and the explainer model through deploying an AKS web service.\n",
"\n",
"Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance based on your test data.\n",
"\n",
@@ -434,7 +434,7 @@
"metadata": {},
"source": [
"#### Initialize the Mimic Explainer for feature importance\n",
"For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
"For explaining the AutoML models, use the MimicWrapper from azureml-interpret package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded."
]
},
{
@@ -443,7 +443,8 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.mimic_wrapper import MimicWrapper\n",
"from interpret.ext.glassbox import LGBMExplainableModel\n",
"from azureml.interpret.mimic_wrapper import MimicWrapper\n",
"explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator,\n",
" explainable_model=automl_explainer_setup_obj.surrogate_model, \n",
" init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n",
@@ -486,7 +487,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer\n",
"from azureml.interpret.scoring.scoring_explainer import TreeScoringExplainer\n",
"import joblib\n",
"\n",
"# Initialize the ScoringExplainer\n",
@@ -507,7 +508,7 @@
"source": [
"### Deploying the scoring and explainer models to a web service to Azure Kubernetes Service (AKS)\n",
"\n",
"We use the TreeScoringExplainer from azureml.explain.model package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. In the cell below, we register the AutoML model and the scoring explainer with the Model Management Service."
"We use the TreeScoringExplainer from azureml.interpret package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. In the cell below, we register the AutoML model and the scoring explainer with the Model Management Service."
]
},
{
@@ -529,7 +530,7 @@
"source": [
"#### Create the conda dependencies for setting up the service\n",
"\n",
"We need to create the conda dependencies comprising of the azureml-explain-model, azureml-train-automl and azureml-defaults packages."
"We need to download the conda dependencies using the automl_run object."
]
},
{
@@ -561,16 +562,10 @@
"outputs": [],
"source": [
"%%writefile score.py\n",
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"import pickle\n",
"import azureml.train.automl\n",
"import azureml.explain.model\n",
"from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \\\n",
" automl_setup_model_explanations\n",
"import joblib\n",
"import pandas as pd\n",
"from azureml.core.model import Model\n",
"from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations\n",
"\n",
"\n",
"def init():\n",

View File

@@ -98,7 +98,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -625,7 +625,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.explain.model._internal.explanation_client import ExplanationClient\n",
"from azureml.interpret._internal.explanation_client import ExplanationClient\n",
"client = ExplanationClient.from_run(automl_run)\n",
"engineered_explanations = client.download_model_explanation(raw=False, comment='engineered explanations')\n",
"print(engineered_explanations.get_feature_importance_dict())\n",
@@ -659,7 +659,7 @@
"In this section we will show how you can operationalize an AutoML model and the explainer which was used to compute the explanations in the previous section.\n",
"\n",
"### Register the AutoML model and the scoring explainer\n",
"We use the *TreeScoringExplainer* from *azureml.explain.model* package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. \n",
"We use the *TreeScoringExplainer* from *azureml-interpret* package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. \n",
"In the cell below, we register the AutoML model and the scoring explainer with the Model Management Service."
]
},
@@ -681,7 +681,7 @@
"metadata": {},
"source": [
"### Create the conda dependencies for setting up the service\n",
"We need to create the conda dependencies comprising of the *azureml-explain-model*, *azureml-train-automl* and *azureml-defaults* packages. "
"We need to create the conda dependencies comprising of the *azureml* packages using the training environment from the *automl_run*."
]
},
{

View File

@@ -1,14 +1,7 @@
import json
import numpy as np
import pandas as pd
import os
import pickle
import azureml.train.automl
import azureml.explain.model
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations
import joblib
from azureml.core.model import Model
from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations
def init():

View File

@@ -1,17 +1,18 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
import os
import joblib
from azureml.core.run import Run
from interpret.ext.glassbox import LGBMExplainableModel
from automl.client.core.common.constants import MODEL_PATH
from azureml.core.experiment import Experiment
from azureml.core.dataset import Dataset
from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \
automl_setup_model_explanations, automl_check_model_if_explainable
from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel
from azureml.explain.model.mimic_wrapper import MimicWrapper
from azureml.automl.core.shared.constants import MODEL_PATH
from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer
import joblib
from azureml.core.run import Run
from azureml.interpret.mimic_wrapper import MimicWrapper
from azureml.interpret.scoring.scoring_explainer import TreeScoringExplainer
from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations, \
automl_check_model_if_explainable
OUTPUT_DIR = './outputs/'
os.makedirs(OUTPUT_DIR, exist_ok=True)

View File

@@ -92,7 +92,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -0,0 +1,56 @@
# Adding an init script to an Azure Databricks cluster
The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to
1. Install the latest AutoML library
To create the Azure Databricks cluster-scoped init script
1. Create the base directory you want to store the init script in if it does not exist.
```
dbutils.fs.mkdirs("dbfs:/databricks/init/")
```
2. Create the script azureml-cluster-init.sh
```
dbutils.fs.put("/databricks/init/azureml-cluster-init.sh","""
#!/bin/bash
set -ex
/databricks/python/bin/pip install -r https://aka.ms/automl_linux_requirements.txt
""", True)
```
3. Check that the script exists.
```
display(dbutils.fs.ls("dbfs:/databricks/init/azureml-cluster-init.sh"))
```
1. Configure the cluster to run the script.
* Using the cluster configuration page
1. On the cluster configuration page, click the Advanced Options toggle.
1. At the bottom of the page, click the Init Scripts tab.
1. In the Destination drop-down, select a destination type. Example: 'DBFS'
1. Specify a path to the init script.
```
dbfs:/databricks/init/azureml-cluster-init.sh
```
1. Click Add
* Using the API.
```
curl -n -X POST -H 'Content-Type: application/json' -d '{
"cluster_id": "<cluster_id>",
"num_workers": <num_workers>,
"spark_version": "<spark_version>",
"node_type_id": "<node_type_id>",
"cluster_log_conf": {
"dbfs" : {
"destination": "dbfs:/cluster-logs"
}
},
"init_scripts": [ {
"dbfs": {
"destination": "dbfs:/databricks/init/azureml-cluster-init.sh"
}
} ]
}' https://<databricks-instance>/api/2.0/clusters/edit
```

View File

@@ -13,12 +13,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
"## AutoML Installation\n",
"\n",
"**install azureml-sdk with Automated ML**\n",
"* Source: Upload Python Egg or PyPi\n",
"* PyPi Name: `azureml-sdk[automl]`\n",
"* Select Install Library"
"**For Databricks non ML runtime 7.1(scala 2.21, spark 3.0.0) and up, Install AML sdk by running the following command in the first cell of the notebook.**\n",
"\n",
"%pip install -r https://aka.ms/automl_linux_requirements.txt\n",
"\n",
"**For Databricks non ML runtime 7.0 and lower, Install AML sdk using init script as shown in [readme](readme.md) before running this notebook.**\n"
]
},
{

View File

@@ -13,12 +13,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
"## AutoML Installation\n",
"\n",
"**install azureml-sdk with Automated ML**\n",
"* Source: Upload Python Egg or PyPi\n",
"* PyPi Name: `azureml-sdk[automl]`\n",
"* Select Install Library"
"**For Databricks non ML runtime 7.1(scala 2.21, spark 3.0.0) and up, Install AML sdk by running the following command in the first cell of the notebook.**\n",
"\n",
"%pip install -r https://aka.ms/automl_linux_requirements.txt\n",
"\n",
"**For Databricks non ML runtime 7.0 and lower, Install AML sdk using init script as shown in [readme](readme.md) before running this notebook.**"
]
},
{

View File

@@ -334,14 +334,27 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your AKS cluster\n",
"aks_name = 'my-aks-9' \n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" aks_target = ComputeTarget(workspace=ws, name=aks_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" # Use the default configuration (can also provide parameters to customize)\n",
" prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'my-aks-9' \n",
" # Create the cluster\n",
" aks_target = ComputeTarget.create(workspace = ws, \n",
" name = aks_name, \n",
" provisioning_configuration = prov_config)"
" provisioning_configuration = prov_config)\n",
"\n",
"if aks_target.get_status() != \"Succeeded\":\n",
" aks_target.wait_for_completion(show_output=True)"
]
},
{

View File

@@ -287,8 +287,8 @@
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=azureml_pip_packages)\n",
"azureml_pip_packages.extend([sklearn_dep, pandas_dep])\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
@@ -427,8 +427,8 @@
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=azureml_pip_packages)\n",
"azureml_pip_packages.extend([sklearn_dep, pandas_dep])\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",

View File

@@ -350,8 +350,7 @@
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"myenv = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
"myenv = CondaDependencies.create(pip_packages=['sklearn-pandas', 'pyyaml', sklearn_dep, pandas_dep] + azureml_pip_packages,\n",
" pin_sdk_version=False)\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",

View File

@@ -294,8 +294,8 @@
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn_pandas', 'pyyaml'] + azureml_pip_packages,\n",
"azureml_pip_packages.extend(['sklearn-pandas', 'pyyaml', sklearn_dep, pandas_dep])\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=azureml_pip_packages,\n",
" pin_sdk_version=False)\n",
"# Now submit a run on AmlCompute\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
@@ -459,8 +459,8 @@
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"myenv = CondaDependencies.create(conda_packages=[sklearn_dep, pandas_dep],\n",
" pip_packages=['sklearn-pandas', 'pyyaml'] + azureml_pip_packages,\n",
"azureml_pip_packages.extend(['sklearn-pandas', 'pyyaml', sklearn_dep, pandas_dep])\n",
"myenv = CondaDependencies.create(pip_packages=azureml_pip_packages,\n",
" pin_sdk_version=False)\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",

View File

@@ -1,5 +1,13 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -1062,7 +1070,7 @@
"metadata": {
"authors": [
{
"name": "sanpil"
"name": "anshirga"
}
],
"kernelspec": {

View File

@@ -29,8 +29,8 @@ print("Argument 2(output final transformed taxi data): %s" % args.output_transfo
# use the drop_columns() function to delete the original fields as the newly generated features are preferred.
# Rename the rest of the fields to use meaningful descriptions.
normalized_df = normalized_df.astype({"pickup_date": 'datetime64', "dropoff_date": 'datetime64',
"pickup_time": 'datetime64', "dropoff_time": 'datetime64',
normalized_df = normalized_df.astype({"pickup_date": 'datetime64[ns]', "dropoff_date": 'datetime64[ns]',
"pickup_time": 'datetime64[us]', "dropoff_time": 'datetime64[us]',
"distance": 'float64', "cost": 'float64'})
normalized_df["pickup_weekday"] = normalized_df["pickup_date"].dt.dayofweek

View File

@@ -6,10 +6,15 @@ import numpy as np
from azureml.core.model import Model
from sklearn.linear_model import LogisticRegression
from azureml_user.parallel_run import EntryScript
def init():
global iris_model
logger = EntryScript().logger
logger.info("init() is called.")
parser = argparse.ArgumentParser(description="Iris model serving")
parser.add_argument('--model_name', dest="model_name", required=True)
args, unknown_args = parser.parse_known_args()
@@ -20,6 +25,9 @@ def init():
def run(input_data):
logger = EntryScript().logger
logger.info("run() is called with: {}.".format(input_data))
# make inference
num_rows, num_cols = input_data.shape
pred = iris_model.predict(input_data).reshape((num_rows, 1))

View File

@@ -51,7 +51,23 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
@@ -397,7 +413,7 @@
" parallel_run_config=parallel_run_config,\n",
" inputs=[ input_mnist_ds_consumption ],\n",
" output=output_dir,\n",
" allow_reuse=True\n",
" allow_reuse=False\n",
")"
]
},
@@ -426,7 +442,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Monitor the run"
"### Monitor the run\n",
"\n",
"The pipeline run status could be checked in Azure Machine Learning portal (https://ml.azure.com). The link to the pipeline run could be retrieved by inspecting the `pipeline_run` object."
]
},
{
@@ -435,8 +453,8 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run).show()"
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run"
]
},
{
@@ -452,9 +470,40 @@
"metadata": {},
"outputs": [],
"source": [
"# Wait the run for completion and show output log to console\n",
"pipeline_run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View the prediction results per input image\n",
"In the digit_identification.py file above you can see that the ResultList with the filename and the prediction result gets returned. These are written to the DataStore specified in the PipelineData object as the output data, which in this case is called *inferences*. This containers the outputs from all of the worker nodes used in the compute cluster. You can download this data to view the results ... below just filters to the first 10 rows"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import tempfile\n",
"\n",
"batch_run = pipeline_run.find_step_run(parallelrun_step.name)[0]\n",
"batch_output = batch_run.get_output_data(output_dir.name)\n",
"\n",
"target_dir = tempfile.mkdtemp()\n",
"batch_output.download(local_path=target_dir)\n",
"result_file = os.path.join(target_dir, batch_output.path_on_datastore, parallel_run_config.append_row_file_name)\n",
"\n",
"df = pd.read_csv(result_file, delimiter=\":\", header=None)\n",
"df.columns = [\"Filename\", \"Prediction\"]\n",
"print(\"Prediction has \", df.shape[0], \" rows\")\n",
"df.head(10) "
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -492,15 +541,8 @@
"metadata": {},
"outputs": [],
"source": [
"pipeline_run_2.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View the prediction results per input image\n",
"In the digit_identification.py file above you can see that the ResultList with the filename and the prediction result gets returned. These are written to the DataStore specified in the PipelineData object as the output data, which in this case is called *inferences*. This containers the outputs from all of the worker nodes used in the compute cluster. You can download this data to view the results ... below just filters to the first 10 rows"
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run_2"
]
},
{
@@ -509,20 +551,8 @@
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import tempfile\n",
"\n",
"batch_run = pipeline_run.find_step_run(parallelrun_step.name)[0]\n",
"batch_output = batch_run.get_output_data(output_dir.name)\n",
"\n",
"target_dir = tempfile.mkdtemp()\n",
"batch_output.download(local_path=target_dir)\n",
"result_file = os.path.join(target_dir, batch_output.path_on_datastore, parallel_run_config.append_row_file_name)\n",
"\n",
"df = pd.read_csv(result_file, delimiter=\":\", header=None)\n",
"df.columns = [\"Filename\", \"Prediction\"]\n",
"print(\"Prediction has \", df.shape[0], \" rows\")\n",
"df.head(10) "
"# Wait the run for completion and show output log to console\n",
"pipeline_run_2.wait_for_completion(show_output=True)"
]
},
{

View File

@@ -49,7 +49,23 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
@@ -318,7 +334,7 @@
"parallel_run_config = ParallelRunConfig(\n",
" source_directory=scripts_folder,\n",
" entry_script=script_file, # the user script to run against each input\n",
" mini_batch_size='5MB',\n",
" mini_batch_size='1KB',\n",
" error_threshold=5,\n",
" output_action='append_row',\n",
" append_row_file_name=\"iris_outputs.txt\",\n",
@@ -349,7 +365,7 @@
" output=output_folder,\n",
" parallel_run_config=parallel_run_config,\n",
" arguments=['--model_name', 'iris-prs'],\n",
" allow_reuse=True\n",
" allow_reuse=False\n",
")"
]
},
@@ -375,13 +391,22 @@
"pipeline_run = Experiment(ws, 'iris-prs').submit(pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View progress of Pipeline run\n",
"\n",
"The pipeline run status could be checked in Azure Machine Learning portal (https://ml.azure.com). The link to the pipeline run could be retrieved by inspecting the `pipeline_run` object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this will output a table with link to the run details in azure portal\n",
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run"
]
},
@@ -389,29 +414,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## View progress of Pipeline run\n",
"\n",
"The progress of the pipeline is able to be viewed either through azureml.widgets or a console feed from PipelineRun.wait_for_completion()."
"### Optional: View detailed logs (streaming) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# GUI\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run).show() "
]
"metadata": {
"tags": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Console logs\n",
"## Wait the run for completion and show output log to console\n",
"pipeline_run.wait_for_completion(show_output=True)"
]
},

View File

@@ -461,7 +461,7 @@
" output=processed_images, # Output file share/blob container\n",
" arguments=[\"--style\", style_param],\n",
" parallel_run_config=parallel_run_config,\n",
" allow_reuse=True #[optional - default value True]\n",
" allow_reuse=False #[optional - default value True]\n",
")"
]
},
@@ -497,7 +497,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monitor using widget"
"# Monitor pipeline run\n",
"\n",
"The pipeline run status could be checked in Azure Machine Learning portal (https://ml.azure.com). The link to the pipeline run could be retrieved by inspecting the `pipeline_run` object.\n",
"\n"
]
},
{
@@ -506,25 +509,25 @@
"metadata": {},
"outputs": [],
"source": [
"# Track pipeline run progress\n",
"from azureml.widgets import RunDetails\n",
"RunDetails(pipeline_run).show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_run.wait_for_completion()"
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Downloads the video in `output_video` folder"
"### Optional: View detailed logs (streaming) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait the run for completion and show output log to console\n",
"pipeline_run.wait_for_completion(show_output=True)"
]
},
{
@@ -534,6 +537,13 @@
"# Download output video"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Downloads the video in `output_video` folder"
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -674,7 +684,8 @@
"from azureml.pipeline.core.run import PipelineRun\n",
"published_pipeline_run_candy = PipelineRun(ws.experiments[experiment_name], run_id)\n",
"\n",
"RunDetails(published_pipeline_run_candy).show()"
"# Show detail information of run\n",
"published_pipeline_run_candy"
]
},
{

View File

@@ -418,7 +418,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n",
"from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n",
"from azureml.train.hyperdrive.sampling import RandomParameterSampling\n",
"from azureml.train.hyperdrive.run import PrimaryMetricGoal\n",
"from azureml.train.hyperdrive.parameter_expressions import choice\n",
@@ -430,7 +430,7 @@
" }\n",
")\n",
"\n",
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
" hyperparameter_sampling=param_sampling, \n",
" primary_metric_name='Accuracy',\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
@@ -452,7 +452,7 @@
"outputs": [],
"source": [
"# start the HyperDrive run\n",
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
"hyperdrive_run = experiment.submit(hyperdrive_config)"
]
},
{

View File

@@ -630,27 +630,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predict on the test set\n",
"## Predict on the test set (Optional)\n",
"Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n",
"\n",
"Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it."
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" import tensorflow as tf\n",
" imported_model = tf.saved_model.load('./model')"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" pred = imported_model(X_test)\n",
" y_hat = np.argmax(pred, axis=1)\n",
@@ -661,19 +657,15 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" print(\"Accuracy on the test set:\", np.average(y_hat == y_test))"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" print(\"Accuracy on the test set:\", np.average(y_hat == y_test))"
]
@@ -1055,7 +1047,7 @@
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
" plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n",
" plt.text(x=10, y=-10, s=result[i], fontsize=18, color=font_color)\n",
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
" \n",
" i = i + 1\n",

View File

@@ -158,7 +158,7 @@
"source": [
"from azureml.core import Dataset\n",
"\n",
"web_paths = ['http://mattmahoney.net/dc/text8.zip']\n",
"web_paths = ['https://azureopendatastorage.blob.core.windows.net/testpublic/text8.zip']\n",
"dataset = Dataset.File.from_files(path=web_paths)"
]
},

View File

@@ -1,31 +0,0 @@
# imports
import pickle
from datetime import datetime
from azureml.opendatasets import NoaaIsdWeather
from sklearn.linear_model import LinearRegression
# get weather dataset
start = datetime(2019, 1, 1)
end = datetime(2019, 1, 14)
isd = NoaaIsdWeather(start, end)
# convert to pandas dataframe and filter down
df = isd.to_pandas_dataframe().fillna(0)
df = df[df['stationName'].str.contains('FLORIDA', regex=True, na=False)]
# features for training
X_features = ['latitude', 'longitude', 'temperature', 'windAngle', 'windSpeed']
y_features = ['elevation']
# write the training dataset to csv
training_dataset = df[X_features + y_features]
training_dataset.to_csv('training.csv', index=False)
# train the model
X = training_dataset[X_features]
y = training_dataset[y_features]
model = LinearRegression().fit(X, y)
# save the model as a .pkl file
with open('elevation-regression-model.pkl', 'wb') as f:
pickle.dump(model, f)

View File

@@ -1,346 +0,0 @@
latitude,longitude,temperature,windAngle,windSpeed,elevation
26.536,-81.755,17.8,10.0,2.1,9.0
26.536,-81.755,16.7,360.0,1.5,9.0
26.536,-81.755,16.1,350.0,1.5,9.0
26.536,-81.755,15.0,0.0,0.0,9.0
26.536,-81.755,14.4,350.0,1.5,9.0
26.536,-81.755,0.0,0.0,0.0,9.0
26.536,-81.755,13.9,360.0,2.1,9.0
26.536,-81.755,13.3,350.0,1.5,9.0
26.536,-81.755,13.3,10.0,2.1,9.0
26.536,-81.755,13.3,360.0,1.5,9.0
26.536,-81.755,13.3,0.0,0.0,9.0
26.536,-81.755,12.2,0.0,0.0,9.0
26.536,-81.755,11.7,0.0,0.0,9.0
26.536,-81.755,14.4,0.0,0.0,9.0
26.536,-81.755,17.2,10.0,2.6,9.0
26.536,-81.755,20.0,20.0,2.6,9.0
26.536,-81.755,22.2,10.0,3.6,9.0
26.536,-81.755,23.3,30.0,4.6,9.0
26.536,-81.755,23.3,330.0,2.6,9.0
26.536,-81.755,24.4,0.0,0.0,9.0
26.536,-81.755,25.0,360.0,3.1,9.0
26.536,-81.755,24.4,20.0,4.1,9.0
26.536,-81.755,23.3,10.0,2.6,9.0
26.536,-81.755,21.1,30.0,2.1,9.0
26.536,-81.755,18.3,0.0,0.0,9.0
26.536,-81.755,17.2,30.0,2.1,9.0
26.536,-81.755,15.6,60.0,2.6,9.0
26.536,-81.755,15.6,0.0,0.0,9.0
26.536,-81.755,13.9,60.0,2.6,9.0
26.536,-81.755,12.8,70.0,2.6,9.0
26.536,-81.755,0.0,0.0,0.0,9.0
26.536,-81.755,11.7,70.0,2.1,9.0
26.536,-81.755,12.2,20.0,2.1,9.0
26.536,-81.755,11.7,30.0,1.5,9.0
26.536,-81.755,11.1,40.0,2.1,9.0
26.536,-81.755,12.2,40.0,2.6,9.0
26.536,-81.755,12.2,30.0,2.6,9.0
26.536,-81.755,12.2,0.0,0.0,9.0
26.536,-81.755,15.0,30.0,6.2,9.0
26.536,-81.755,17.2,50.0,3.6,9.0
26.536,-81.755,20.6,60.0,5.1,9.0
26.536,-81.755,22.8,50.0,4.6,9.0
26.536,-81.755,24.4,80.0,6.2,9.0
26.536,-81.755,25.0,100.0,5.7,9.0
26.536,-81.755,25.6,60.0,3.1,9.0
26.536,-81.755,25.6,80.0,4.6,9.0
26.536,-81.755,25.0,90.0,5.1,9.0
26.536,-81.755,24.4,80.0,5.1,9.0
26.536,-81.755,21.1,60.0,2.6,9.0
26.536,-81.755,19.4,70.0,3.6,9.0
26.536,-81.755,18.3,70.0,2.6,9.0
26.536,-81.755,18.3,80.0,2.6,9.0
26.536,-81.755,17.2,60.0,1.5,9.0
26.536,-81.755,16.1,70.0,2.6,9.0
26.536,-81.755,15.6,70.0,2.6,9.0
26.536,-81.755,0.0,0.0,0.0,9.0
26.536,-81.755,16.1,50.0,2.6,9.0
26.536,-81.755,15.6,50.0,2.1,9.0
26.536,-81.755,15.0,50.0,1.5,9.0
26.536,-81.755,15.0,0.0,0.0,9.0
26.536,-81.755,15.0,0.0,0.0,9.0
26.536,-81.755,14.4,0.0,0.0,9.0
26.536,-81.755,14.4,30.0,4.1,9.0
26.536,-81.755,16.1,40.0,1.5,9.0
26.536,-81.755,19.4,0.0,1.5,9.0
26.536,-81.755,22.8,90.0,2.6,9.0
26.536,-81.755,24.4,130.0,3.6,9.0
26.536,-81.755,25.6,100.0,4.6,9.0
26.536,-81.755,26.1,120.0,3.1,9.0
26.536,-81.755,26.7,0.0,2.6,9.0
26.536,-81.755,27.2,0.0,0.0,9.0
26.536,-81.755,27.2,40.0,3.1,9.0
26.536,-81.755,26.1,30.0,1.5,9.0
26.536,-81.755,22.8,310.0,2.1,9.0
26.536,-81.755,23.3,330.0,2.1,9.0
-34.067,-56.238,17.5,30.0,3.1,68.0
-34.067,-56.238,21.2,30.0,5.7,68.0
-34.067,-56.238,24.5,30.0,3.1,68.0
-34.067,-56.238,27.5,330.0,3.6,68.0
-34.067,-56.238,29.2,30.0,4.1,68.0
-34.067,-56.238,31.0,20.0,4.6,68.0
-34.067,-56.238,33.0,360.0,2.6,68.0
-34.067,-56.238,33.6,60.0,3.1,68.0
-34.067,-56.238,33.6,30.0,3.6,68.0
-34.067,-56.238,18.6,40.0,3.1,68.0
-34.067,-56.238,22.0,120.0,1.5,68.0
-34.067,-56.238,25.0,120.0,2.6,68.0
-34.067,-56.238,28.6,50.0,3.1,68.0
-34.067,-56.238,30.6,50.0,4.1,68.0
-34.067,-56.238,31.5,30.0,6.7,68.0
-34.067,-56.238,32.0,40.0,7.2,68.0
-34.067,-56.238,33.0,30.0,5.7,68.0
-34.067,-56.238,33.2,360.0,3.6,68.0
-34.067,-56.238,20.6,30.0,3.1,68.0
-34.067,-56.238,21.2,0.0,0.0,68.0
-34.067,-56.238,22.0,210.0,3.1,68.0
-34.067,-56.238,23.0,210.0,3.6,68.0
-34.067,-56.238,24.0,180.0,6.7,68.0
-34.067,-56.238,24.5,210.0,7.2,68.0
-34.067,-56.238,21.0,180.0,8.2,68.0
-34.067,-56.238,20.0,180.0,6.7,68.0
-34.083,-56.233,20.2,180.0,7.2,68.0
-29.917,-71.2,16.6,290.0,4.1,146.0
-29.916,-71.2,17.0,290.0,4.1,147.0
-29.916,-71.2,16.0,310.0,3.1,147.0
-29.916,-71.2,16.0,300.0,2.1,147.0
-29.917,-71.2,15.1,0.0,0.0,146.0
-29.916,-71.2,15.0,0.0,1.0,147.0
-29.916,-71.2,15.0,160.0,1.0,147.0
-29.916,-71.2,15.0,120.0,1.0,147.0
-29.917,-71.2,14.3,190.0,1.0,146.0
-29.916,-71.2,14.0,190.0,1.0,147.0
-29.916,-71.2,14.0,0.0,0.0,147.0
-29.916,-71.2,14.0,100.0,3.1,147.0
-29.917,-71.2,12.9,0.0,0.0,146.0
-29.916,-71.2,13.0,0.0,1.0,147.0
-29.916,-71.2,14.0,0.0,0.5,147.0
-29.916,-71.2,15.0,0.0,0.5,147.0
-29.917,-71.2,15.9,0.0,0.0,146.0
-29.916,-71.2,16.0,0.0,0.0,147.0
-29.916,-71.2,17.0,270.0,4.6,147.0
-29.916,-71.2,19.0,260.0,4.1,147.0
-29.917,-71.2,18.1,270.0,6.2,146.0
-29.916,-71.2,18.0,270.0,6.2,147.0
-29.916,-71.2,19.0,270.0,6.2,147.0
-29.916,-71.2,20.0,260.0,5.1,147.0
-29.917,-71.2,19.6,280.0,6.2,146.0
-29.916,-71.2,20.0,280.0,6.2,147.0
-29.916,-71.2,20.0,270.0,6.2,147.0
-29.916,-71.2,19.0,280.0,6.7,147.0
-29.917,-71.2,18.3,270.0,5.7,146.0
-29.916,-71.2,18.0,270.0,5.7,147.0
-29.916,-71.2,18.0,0.0,0.0,147.0
-29.916,-71.2,17.0,280.0,4.6,147.0
-29.917,-71.2,15.9,280.0,4.1,146.0
-29.916,-71.2,16.0,280.0,4.1,147.0
-29.916,-71.2,15.0,280.0,3.6,147.0
-29.916,-71.2,15.0,280.0,3.6,147.0
-29.917,-71.2,15.4,280.0,4.1,146.0
-29.916,-71.2,15.0,280.0,4.1,147.0
-29.916,-71.2,16.0,240.0,2.1,147.0
-29.916,-71.2,15.0,0.0,0.5,147.0
-29.917,-71.2,15.8,80.0,3.6,146.0
-29.916,-71.2,16.0,80.0,3.6,147.0
-29.916,-71.2,16.0,10.0,1.5,147.0
-29.916,-71.2,16.0,100.0,1.5,147.0
-29.917,-71.2,15.3,130.0,1.5,146.0
-29.916,-71.2,15.0,130.0,1.5,147.0
-29.916,-71.2,15.0,110.0,1.0,147.0
-29.916,-71.2,16.0,280.0,6.2,147.0
-29.917,-71.2,15.9,240.0,3.6,146.0
-29.916,-71.2,16.0,240.0,3.6,147.0
-29.916,-71.2,16.0,240.0,3.1,147.0
-29.916,-71.2,16.0,220.0,3.1,147.0
-29.917,-71.2,16.4,260.0,3.1,146.0
-29.916,-71.2,16.0,260.0,3.1,147.0
-29.916,-71.2,17.0,230.0,2.6,147.0
-29.916,-71.2,18.0,0.0,1.5,147.0
-29.917,-71.2,20.3,340.0,2.6,146.0
-29.916,-71.2,20.0,340.0,2.6,147.0
-29.916,-71.2,21.0,270.0,5.1,147.0
-29.916,-71.2,20.0,270.0,6.7,147.0
-29.917,-71.2,19.2,280.0,6.7,146.0
-29.916,-71.2,19.0,280.0,6.7,147.0
-29.916,-71.2,19.0,310.0,2.6,147.0
-29.916,-71.2,18.0,270.0,5.1,147.0
-29.917,-71.2,17.0,300.0,4.6,146.0
-29.916,-71.2,17.0,300.0,4.6,147.0
-29.916,-71.2,17.0,300.0,3.6,147.0
-29.916,-71.2,17.0,290.0,3.1,147.0
-29.917,-71.2,16.3,290.0,2.1,146.0
-29.916,-71.2,16.0,290.0,2.1,147.0
-29.916,-71.2,17.0,270.0,1.0,147.0
-29.916,-71.2,17.0,0.0,0.5,147.0
-29.917,-71.2,16.5,160.0,2.1,146.0
-29.916,-71.2,17.0,160.0,2.1,147.0
-29.916,-71.2,15.0,120.0,3.1,147.0
-29.916,-71.2,16.0,180.0,1.5,147.0
-29.917,-71.2,14.7,0.0,0.0,146.0
-29.916,-71.2,15.0,0.0,1.0,147.0
-29.916,-71.2,15.0,300.0,1.0,147.0
-29.916,-71.2,16.0,0.0,0.0,147.0
-29.917,-71.2,18.5,110.0,1.0,146.0
-29.916,-71.2,19.0,110.0,1.0,147.0
-29.916,-71.2,20.0,270.0,3.6,147.0
-29.916,-71.2,20.0,270.0,5.7,147.0
-29.917,-71.2,20.0,280.0,6.2,146.0
-29.916,-71.2,20.0,280.0,6.2,147.0
-29.916,-71.2,21.0,290.0,6.7,147.0
-29.916,-71.2,20.0,270.0,6.2,147.0
-29.917,-71.2,21.0,260.0,6.7,146.0
-29.916,-71.2,21.0,260.0,6.7,147.0
-29.916,-71.2,20.0,270.0,6.2,147.0
-29.916,-71.2,19.0,260.0,5.1,147.0
-29.916,-71.2,18.0,280.0,4.6,147.0
-29.917,-71.2,17.5,280.0,3.1,146.0
-29.916,-71.2,18.0,280.0,3.1,147.0
30.349,-85.788,11.1,0.0,0.0,21.0
30.349,-85.788,11.1,0.0,0.0,21.0
30.349,-85.788,9.4,0.0,0.0,21.0
30.349,-85.788,9.4,0.0,0.0,21.0
30.349,-85.788,8.3,300.0,2.1,21.0
30.349,-85.788,11.1,280.0,1.5,21.0
30.349,-85.788,0.0,0.0,0.0,21.0
30.349,-85.788,10.6,320.0,3.1,21.0
30.349,-85.788,9.4,310.0,3.1,21.0
30.349,-85.788,7.8,320.0,2.6,21.0
30.349,-85.788,6.1,340.0,2.1,21.0
30.349,-85.788,6.7,330.0,2.6,21.0
30.349,-85.788,6.1,310.0,1.5,21.0
30.349,-85.788,7.2,310.0,2.1,21.0
30.349,-85.788,12.8,360.0,3.1,21.0
30.349,-85.788,15.0,0.0,3.1,21.0
30.349,-85.788,16.7,20.0,4.6,21.0
30.349,-85.788,18.9,30.0,5.1,21.0
30.349,-85.788,19.4,10.0,4.1,21.0
30.349,-85.788,21.1,330.0,2.6,21.0
30.349,-85.788,21.1,10.0,4.6,21.0
30.349,-85.788,21.7,360.0,4.1,21.0
30.349,-85.788,21.7,30.0,2.1,21.0
30.349,-85.788,21.7,330.0,2.6,21.0
30.349,-85.788,16.1,350.0,2.1,21.0
30.349,-85.788,11.7,0.0,0.0,21.0
30.349,-85.788,8.9,0.0,0.0,21.0
30.349,-85.788,9.4,0.0,0.0,21.0
30.349,-85.788,7.8,0.0,0.0,21.0
30.349,-85.788,11.1,30.0,3.1,21.0
30.349,-85.788,7.2,0.0,0.0,21.0
30.349,-85.788,7.2,0.0,0.0,21.0
30.349,-85.788,0.0,0.0,0.0,21.0
30.349,-85.788,7.8,30.0,2.1,21.0
30.349,-85.788,8.3,40.0,2.6,21.0
30.349,-85.788,7.2,50.0,1.5,21.0
30.349,-85.788,8.3,60.0,1.5,21.0
30.349,-85.788,5.6,40.0,2.1,21.0
30.349,-85.788,6.7,40.0,2.1,21.0
30.349,-85.788,7.8,50.0,3.1,21.0
30.349,-85.788,11.7,70.0,2.6,21.0
30.349,-85.788,15.6,70.0,3.1,21.0
30.349,-85.788,18.9,100.0,3.6,21.0
30.349,-85.788,20.0,130.0,3.6,21.0
30.349,-85.788,21.1,140.0,4.1,21.0
30.349,-85.788,21.7,150.0,4.1,21.0
30.349,-85.788,21.7,170.0,3.1,21.0
30.349,-85.788,22.2,170.0,3.1,21.0
30.349,-85.788,20.6,0.0,0.0,21.0
30.349,-85.788,17.2,0.0,0.0,21.0
30.349,-85.788,14.4,0.0,0.0,21.0
30.349,-85.788,12.8,100.0,1.5,21.0
30.349,-85.788,13.3,100.0,1.5,21.0
30.349,-85.788,10.6,0.0,0.0,21.0
30.349,-85.788,9.4,0.0,0.0,21.0
30.349,-85.788,7.8,0.0,0.0,21.0
30.358,-85.799,8.3,0.0,0.0,21.0
30.349,-85.788,0.0,0.0,0.0,21.0
30.358,-85.799,6.7,0.0,0.0,21.0
30.358,-85.799,7.2,0.0,0.0,21.0
30.358,-85.799,7.2,0.0,0.0,21.0
30.358,-85.799,8.3,50.0,1.5,21.0
30.358,-85.799,9.4,0.0,0.0,21.0
30.358,-85.799,8.9,0.0,0.0,21.0
30.358,-85.799,10.0,340.0,1.5,21.0
30.358,-85.799,12.8,40.0,1.5,21.0
30.358,-85.799,16.7,100.0,2.1,21.0
30.358,-85.799,21.1,100.0,1.5,21.0
30.358,-85.799,23.3,0.0,0.0,21.0
30.358,-85.799,25.0,180.0,4.6,21.0
30.358,-85.799,24.4,230.0,3.6,21.0
30.358,-85.799,25.0,210.0,4.1,21.0
30.358,-85.799,23.9,170.0,4.1,21.0
30.358,-85.799,22.8,0.0,0.0,21.0
30.358,-85.799,19.4,0.0,0.0,21.0
30.358,-85.799,17.8,140.0,2.1,21.0
60.383,5.333,-0.7,0.0,0.0,36.0
60.383,5.333,0.6,270.0,2.0,36.0
60.383,5.333,-0.9,120.0,1.0,36.0
60.383,5.333,-1.6,130.0,2.0,36.0
60.383,5.333,-1.4,150.0,1.0,36.0
60.383,5.333,-1.7,0.0,0.0,36.0
60.383,5.333,-1.7,140.0,1.0,36.0
60.383,5.333,-1.4,0.0,0.0,36.0
60.383,5.333,-1.0,0.0,0.0,36.0
60.383,5.333,-1.0,150.0,1.0,36.0
60.383,5.333,-0.7,140.0,1.0,36.0
60.383,5.333,0.5,150.0,1.0,36.0
60.383,5.333,1.9,0.0,0.0,36.0
60.383,5.333,1.7,0.0,0.0,36.0
60.383,5.333,2.1,310.0,2.0,36.0
60.383,5.333,1.5,90.0,1.0,36.0
60.383,5.333,1.9,290.0,1.0,36.0
60.383,5.333,2.0,320.0,1.0,36.0
60.383,5.333,1.9,330.0,1.0,36.0
60.383,5.333,1.3,350.0,1.0,36.0
60.383,5.333,1.5,120.0,1.0,36.0
60.383,5.333,1.3,150.0,2.0,36.0
60.383,5.333,0.8,140.0,1.0,36.0
60.383,5.333,0.3,300.0,1.0,36.0
60.383,5.333,0.2,140.0,1.0,36.0
60.383,5.333,0.4,140.0,1.0,36.0
60.383,5.333,0.5,320.0,1.0,36.0
60.383,5.333,1.5,330.0,1.0,36.0
60.383,5.333,1.8,40.0,1.0,36.0
60.383,5.333,2.3,170.0,1.0,36.0
60.383,5.333,2.7,140.0,1.0,36.0
60.383,5.333,3.1,330.0,1.0,36.0
60.383,5.333,3.8,350.0,1.0,36.0
60.383,5.333,3.8,140.0,1.0,36.0
60.383,5.333,4.1,150.0,1.0,36.0
60.383,5.333,4.4,180.0,1.0,36.0
60.383,5.333,4.9,300.0,1.0,36.0
60.383,5.333,5.2,320.0,1.0,36.0
60.383,5.333,6.7,340.0,1.0,36.0
60.383,5.333,6.9,250.0,1.0,36.0
60.383,5.333,7.9,300.0,2.0,36.0
60.383,5.333,5.5,140.0,1.0,36.0
60.383,5.333,7.1,140.0,2.0,36.0
60.383,5.333,7.0,280.0,2.0,36.0
60.383,5.333,4.6,170.0,1.0,36.0
60.383,5.333,4.8,330.0,1.0,36.0
60.383,5.333,6.4,260.0,2.0,36.0
60.383,5.333,6.2,340.0,1.0,36.0
60.383,5.333,5.7,320.0,2.0,36.0
60.383,5.333,5.2,100.0,1.0,36.0
60.383,5.333,5.1,310.0,1.0,36.0
60.383,5.333,4.9,290.0,2.0,36.0
60.383,5.333,4.9,310.0,2.0,36.0
60.383,5.333,6.1,320.0,2.0,36.0
60.383,5.333,7.0,250.0,1.0,36.0
60.383,5.333,5.3,140.0,1.0,36.0
60.383,5.333,6.9,350.0,1.0,36.0
60.383,5.333,9.7,110.0,3.0,36.0
60.383,5.333,10.3,300.0,3.0,36.0
60.383,5.333,8.7,310.0,1.0,36.0
60.383,5.333,9.0,270.0,3.0,36.0
60.383,5.333,11.6,80.0,3.0,36.0
60.383,5.333,11.4,80.0,4.0,36.0
60.383,5.333,9.7,70.0,5.0,36.0
60.383,5.333,9.5,80.0,6.0,36.0
60.383,5.333,8.7,80.0,5.0,36.0
60.383,5.333,7.7,80.0,5.0,36.0
60.383,5.333,8.2,80.0,4.0,36.0
60.383,5.333,7.7,30.0,1.0,36.0
60.383,5.333,7.2,310.0,1.0,36.0
60.383,5.333,6.8,300.0,2.0,36.0
60.383,5.333,6.7,140.0,1.0,36.0
1 latitude longitude temperature windAngle windSpeed elevation
2 26.536 -81.755 17.8 10.0 2.1 9.0
3 26.536 -81.755 16.7 360.0 1.5 9.0
4 26.536 -81.755 16.1 350.0 1.5 9.0
5 26.536 -81.755 15.0 0.0 0.0 9.0
6 26.536 -81.755 14.4 350.0 1.5 9.0
7 26.536 -81.755 0.0 0.0 0.0 9.0
8 26.536 -81.755 13.9 360.0 2.1 9.0
9 26.536 -81.755 13.3 350.0 1.5 9.0
10 26.536 -81.755 13.3 10.0 2.1 9.0
11 26.536 -81.755 13.3 360.0 1.5 9.0
12 26.536 -81.755 13.3 0.0 0.0 9.0
13 26.536 -81.755 12.2 0.0 0.0 9.0
14 26.536 -81.755 11.7 0.0 0.0 9.0
15 26.536 -81.755 14.4 0.0 0.0 9.0
16 26.536 -81.755 17.2 10.0 2.6 9.0
17 26.536 -81.755 20.0 20.0 2.6 9.0
18 26.536 -81.755 22.2 10.0 3.6 9.0
19 26.536 -81.755 23.3 30.0 4.6 9.0
20 26.536 -81.755 23.3 330.0 2.6 9.0
21 26.536 -81.755 24.4 0.0 0.0 9.0
22 26.536 -81.755 25.0 360.0 3.1 9.0
23 26.536 -81.755 24.4 20.0 4.1 9.0
24 26.536 -81.755 23.3 10.0 2.6 9.0
25 26.536 -81.755 21.1 30.0 2.1 9.0
26 26.536 -81.755 18.3 0.0 0.0 9.0
27 26.536 -81.755 17.2 30.0 2.1 9.0
28 26.536 -81.755 15.6 60.0 2.6 9.0
29 26.536 -81.755 15.6 0.0 0.0 9.0
30 26.536 -81.755 13.9 60.0 2.6 9.0
31 26.536 -81.755 12.8 70.0 2.6 9.0
32 26.536 -81.755 0.0 0.0 0.0 9.0
33 26.536 -81.755 11.7 70.0 2.1 9.0
34 26.536 -81.755 12.2 20.0 2.1 9.0
35 26.536 -81.755 11.7 30.0 1.5 9.0
36 26.536 -81.755 11.1 40.0 2.1 9.0
37 26.536 -81.755 12.2 40.0 2.6 9.0
38 26.536 -81.755 12.2 30.0 2.6 9.0
39 26.536 -81.755 12.2 0.0 0.0 9.0
40 26.536 -81.755 15.0 30.0 6.2 9.0
41 26.536 -81.755 17.2 50.0 3.6 9.0
42 26.536 -81.755 20.6 60.0 5.1 9.0
43 26.536 -81.755 22.8 50.0 4.6 9.0
44 26.536 -81.755 24.4 80.0 6.2 9.0
45 26.536 -81.755 25.0 100.0 5.7 9.0
46 26.536 -81.755 25.6 60.0 3.1 9.0
47 26.536 -81.755 25.6 80.0 4.6 9.0
48 26.536 -81.755 25.0 90.0 5.1 9.0
49 26.536 -81.755 24.4 80.0 5.1 9.0
50 26.536 -81.755 21.1 60.0 2.6 9.0
51 26.536 -81.755 19.4 70.0 3.6 9.0
52 26.536 -81.755 18.3 70.0 2.6 9.0
53 26.536 -81.755 18.3 80.0 2.6 9.0
54 26.536 -81.755 17.2 60.0 1.5 9.0
55 26.536 -81.755 16.1 70.0 2.6 9.0
56 26.536 -81.755 15.6 70.0 2.6 9.0
57 26.536 -81.755 0.0 0.0 0.0 9.0
58 26.536 -81.755 16.1 50.0 2.6 9.0
59 26.536 -81.755 15.6 50.0 2.1 9.0
60 26.536 -81.755 15.0 50.0 1.5 9.0
61 26.536 -81.755 15.0 0.0 0.0 9.0
62 26.536 -81.755 15.0 0.0 0.0 9.0
63 26.536 -81.755 14.4 0.0 0.0 9.0
64 26.536 -81.755 14.4 30.0 4.1 9.0
65 26.536 -81.755 16.1 40.0 1.5 9.0
66 26.536 -81.755 19.4 0.0 1.5 9.0
67 26.536 -81.755 22.8 90.0 2.6 9.0
68 26.536 -81.755 24.4 130.0 3.6 9.0
69 26.536 -81.755 25.6 100.0 4.6 9.0
70 26.536 -81.755 26.1 120.0 3.1 9.0
71 26.536 -81.755 26.7 0.0 2.6 9.0
72 26.536 -81.755 27.2 0.0 0.0 9.0
73 26.536 -81.755 27.2 40.0 3.1 9.0
74 26.536 -81.755 26.1 30.0 1.5 9.0
75 26.536 -81.755 22.8 310.0 2.1 9.0
76 26.536 -81.755 23.3 330.0 2.1 9.0
77 -34.067 -56.238 17.5 30.0 3.1 68.0
78 -34.067 -56.238 21.2 30.0 5.7 68.0
79 -34.067 -56.238 24.5 30.0 3.1 68.0
80 -34.067 -56.238 27.5 330.0 3.6 68.0
81 -34.067 -56.238 29.2 30.0 4.1 68.0
82 -34.067 -56.238 31.0 20.0 4.6 68.0
83 -34.067 -56.238 33.0 360.0 2.6 68.0
84 -34.067 -56.238 33.6 60.0 3.1 68.0
85 -34.067 -56.238 33.6 30.0 3.6 68.0
86 -34.067 -56.238 18.6 40.0 3.1 68.0
87 -34.067 -56.238 22.0 120.0 1.5 68.0
88 -34.067 -56.238 25.0 120.0 2.6 68.0
89 -34.067 -56.238 28.6 50.0 3.1 68.0
90 -34.067 -56.238 30.6 50.0 4.1 68.0
91 -34.067 -56.238 31.5 30.0 6.7 68.0
92 -34.067 -56.238 32.0 40.0 7.2 68.0
93 -34.067 -56.238 33.0 30.0 5.7 68.0
94 -34.067 -56.238 33.2 360.0 3.6 68.0
95 -34.067 -56.238 20.6 30.0 3.1 68.0
96 -34.067 -56.238 21.2 0.0 0.0 68.0
97 -34.067 -56.238 22.0 210.0 3.1 68.0
98 -34.067 -56.238 23.0 210.0 3.6 68.0
99 -34.067 -56.238 24.0 180.0 6.7 68.0
100 -34.067 -56.238 24.5 210.0 7.2 68.0
101 -34.067 -56.238 21.0 180.0 8.2 68.0
102 -34.067 -56.238 20.0 180.0 6.7 68.0
103 -34.083 -56.233 20.2 180.0 7.2 68.0
104 -29.917 -71.2 16.6 290.0 4.1 146.0
105 -29.916 -71.2 17.0 290.0 4.1 147.0
106 -29.916 -71.2 16.0 310.0 3.1 147.0
107 -29.916 -71.2 16.0 300.0 2.1 147.0
108 -29.917 -71.2 15.1 0.0 0.0 146.0
109 -29.916 -71.2 15.0 0.0 1.0 147.0
110 -29.916 -71.2 15.0 160.0 1.0 147.0
111 -29.916 -71.2 15.0 120.0 1.0 147.0
112 -29.917 -71.2 14.3 190.0 1.0 146.0
113 -29.916 -71.2 14.0 190.0 1.0 147.0
114 -29.916 -71.2 14.0 0.0 0.0 147.0
115 -29.916 -71.2 14.0 100.0 3.1 147.0
116 -29.917 -71.2 12.9 0.0 0.0 146.0
117 -29.916 -71.2 13.0 0.0 1.0 147.0
118 -29.916 -71.2 14.0 0.0 0.5 147.0
119 -29.916 -71.2 15.0 0.0 0.5 147.0
120 -29.917 -71.2 15.9 0.0 0.0 146.0
121 -29.916 -71.2 16.0 0.0 0.0 147.0
122 -29.916 -71.2 17.0 270.0 4.6 147.0
123 -29.916 -71.2 19.0 260.0 4.1 147.0
124 -29.917 -71.2 18.1 270.0 6.2 146.0
125 -29.916 -71.2 18.0 270.0 6.2 147.0
126 -29.916 -71.2 19.0 270.0 6.2 147.0
127 -29.916 -71.2 20.0 260.0 5.1 147.0
128 -29.917 -71.2 19.6 280.0 6.2 146.0
129 -29.916 -71.2 20.0 280.0 6.2 147.0
130 -29.916 -71.2 20.0 270.0 6.2 147.0
131 -29.916 -71.2 19.0 280.0 6.7 147.0
132 -29.917 -71.2 18.3 270.0 5.7 146.0
133 -29.916 -71.2 18.0 270.0 5.7 147.0
134 -29.916 -71.2 18.0 0.0 0.0 147.0
135 -29.916 -71.2 17.0 280.0 4.6 147.0
136 -29.917 -71.2 15.9 280.0 4.1 146.0
137 -29.916 -71.2 16.0 280.0 4.1 147.0
138 -29.916 -71.2 15.0 280.0 3.6 147.0
139 -29.916 -71.2 15.0 280.0 3.6 147.0
140 -29.917 -71.2 15.4 280.0 4.1 146.0
141 -29.916 -71.2 15.0 280.0 4.1 147.0
142 -29.916 -71.2 16.0 240.0 2.1 147.0
143 -29.916 -71.2 15.0 0.0 0.5 147.0
144 -29.917 -71.2 15.8 80.0 3.6 146.0
145 -29.916 -71.2 16.0 80.0 3.6 147.0
146 -29.916 -71.2 16.0 10.0 1.5 147.0
147 -29.916 -71.2 16.0 100.0 1.5 147.0
148 -29.917 -71.2 15.3 130.0 1.5 146.0
149 -29.916 -71.2 15.0 130.0 1.5 147.0
150 -29.916 -71.2 15.0 110.0 1.0 147.0
151 -29.916 -71.2 16.0 280.0 6.2 147.0
152 -29.917 -71.2 15.9 240.0 3.6 146.0
153 -29.916 -71.2 16.0 240.0 3.6 147.0
154 -29.916 -71.2 16.0 240.0 3.1 147.0
155 -29.916 -71.2 16.0 220.0 3.1 147.0
156 -29.917 -71.2 16.4 260.0 3.1 146.0
157 -29.916 -71.2 16.0 260.0 3.1 147.0
158 -29.916 -71.2 17.0 230.0 2.6 147.0
159 -29.916 -71.2 18.0 0.0 1.5 147.0
160 -29.917 -71.2 20.3 340.0 2.6 146.0
161 -29.916 -71.2 20.0 340.0 2.6 147.0
162 -29.916 -71.2 21.0 270.0 5.1 147.0
163 -29.916 -71.2 20.0 270.0 6.7 147.0
164 -29.917 -71.2 19.2 280.0 6.7 146.0
165 -29.916 -71.2 19.0 280.0 6.7 147.0
166 -29.916 -71.2 19.0 310.0 2.6 147.0
167 -29.916 -71.2 18.0 270.0 5.1 147.0
168 -29.917 -71.2 17.0 300.0 4.6 146.0
169 -29.916 -71.2 17.0 300.0 4.6 147.0
170 -29.916 -71.2 17.0 300.0 3.6 147.0
171 -29.916 -71.2 17.0 290.0 3.1 147.0
172 -29.917 -71.2 16.3 290.0 2.1 146.0
173 -29.916 -71.2 16.0 290.0 2.1 147.0
174 -29.916 -71.2 17.0 270.0 1.0 147.0
175 -29.916 -71.2 17.0 0.0 0.5 147.0
176 -29.917 -71.2 16.5 160.0 2.1 146.0
177 -29.916 -71.2 17.0 160.0 2.1 147.0
178 -29.916 -71.2 15.0 120.0 3.1 147.0
179 -29.916 -71.2 16.0 180.0 1.5 147.0
180 -29.917 -71.2 14.7 0.0 0.0 146.0
181 -29.916 -71.2 15.0 0.0 1.0 147.0
182 -29.916 -71.2 15.0 300.0 1.0 147.0
183 -29.916 -71.2 16.0 0.0 0.0 147.0
184 -29.917 -71.2 18.5 110.0 1.0 146.0
185 -29.916 -71.2 19.0 110.0 1.0 147.0
186 -29.916 -71.2 20.0 270.0 3.6 147.0
187 -29.916 -71.2 20.0 270.0 5.7 147.0
188 -29.917 -71.2 20.0 280.0 6.2 146.0
189 -29.916 -71.2 20.0 280.0 6.2 147.0
190 -29.916 -71.2 21.0 290.0 6.7 147.0
191 -29.916 -71.2 20.0 270.0 6.2 147.0
192 -29.917 -71.2 21.0 260.0 6.7 146.0
193 -29.916 -71.2 21.0 260.0 6.7 147.0
194 -29.916 -71.2 20.0 270.0 6.2 147.0
195 -29.916 -71.2 19.0 260.0 5.1 147.0
196 -29.916 -71.2 18.0 280.0 4.6 147.0
197 -29.917 -71.2 17.5 280.0 3.1 146.0
198 -29.916 -71.2 18.0 280.0 3.1 147.0
199 30.349 -85.788 11.1 0.0 0.0 21.0
200 30.349 -85.788 11.1 0.0 0.0 21.0
201 30.349 -85.788 9.4 0.0 0.0 21.0
202 30.349 -85.788 9.4 0.0 0.0 21.0
203 30.349 -85.788 8.3 300.0 2.1 21.0
204 30.349 -85.788 11.1 280.0 1.5 21.0
205 30.349 -85.788 0.0 0.0 0.0 21.0
206 30.349 -85.788 10.6 320.0 3.1 21.0
207 30.349 -85.788 9.4 310.0 3.1 21.0
208 30.349 -85.788 7.8 320.0 2.6 21.0
209 30.349 -85.788 6.1 340.0 2.1 21.0
210 30.349 -85.788 6.7 330.0 2.6 21.0
211 30.349 -85.788 6.1 310.0 1.5 21.0
212 30.349 -85.788 7.2 310.0 2.1 21.0
213 30.349 -85.788 12.8 360.0 3.1 21.0
214 30.349 -85.788 15.0 0.0 3.1 21.0
215 30.349 -85.788 16.7 20.0 4.6 21.0
216 30.349 -85.788 18.9 30.0 5.1 21.0
217 30.349 -85.788 19.4 10.0 4.1 21.0
218 30.349 -85.788 21.1 330.0 2.6 21.0
219 30.349 -85.788 21.1 10.0 4.6 21.0
220 30.349 -85.788 21.7 360.0 4.1 21.0
221 30.349 -85.788 21.7 30.0 2.1 21.0
222 30.349 -85.788 21.7 330.0 2.6 21.0
223 30.349 -85.788 16.1 350.0 2.1 21.0
224 30.349 -85.788 11.7 0.0 0.0 21.0
225 30.349 -85.788 8.9 0.0 0.0 21.0
226 30.349 -85.788 9.4 0.0 0.0 21.0
227 30.349 -85.788 7.8 0.0 0.0 21.0
228 30.349 -85.788 11.1 30.0 3.1 21.0
229 30.349 -85.788 7.2 0.0 0.0 21.0
230 30.349 -85.788 7.2 0.0 0.0 21.0
231 30.349 -85.788 0.0 0.0 0.0 21.0
232 30.349 -85.788 7.8 30.0 2.1 21.0
233 30.349 -85.788 8.3 40.0 2.6 21.0
234 30.349 -85.788 7.2 50.0 1.5 21.0
235 30.349 -85.788 8.3 60.0 1.5 21.0
236 30.349 -85.788 5.6 40.0 2.1 21.0
237 30.349 -85.788 6.7 40.0 2.1 21.0
238 30.349 -85.788 7.8 50.0 3.1 21.0
239 30.349 -85.788 11.7 70.0 2.6 21.0
240 30.349 -85.788 15.6 70.0 3.1 21.0
241 30.349 -85.788 18.9 100.0 3.6 21.0
242 30.349 -85.788 20.0 130.0 3.6 21.0
243 30.349 -85.788 21.1 140.0 4.1 21.0
244 30.349 -85.788 21.7 150.0 4.1 21.0
245 30.349 -85.788 21.7 170.0 3.1 21.0
246 30.349 -85.788 22.2 170.0 3.1 21.0
247 30.349 -85.788 20.6 0.0 0.0 21.0
248 30.349 -85.788 17.2 0.0 0.0 21.0
249 30.349 -85.788 14.4 0.0 0.0 21.0
250 30.349 -85.788 12.8 100.0 1.5 21.0
251 30.349 -85.788 13.3 100.0 1.5 21.0
252 30.349 -85.788 10.6 0.0 0.0 21.0
253 30.349 -85.788 9.4 0.0 0.0 21.0
254 30.349 -85.788 7.8 0.0 0.0 21.0
255 30.358 -85.799 8.3 0.0 0.0 21.0
256 30.349 -85.788 0.0 0.0 0.0 21.0
257 30.358 -85.799 6.7 0.0 0.0 21.0
258 30.358 -85.799 7.2 0.0 0.0 21.0
259 30.358 -85.799 7.2 0.0 0.0 21.0
260 30.358 -85.799 8.3 50.0 1.5 21.0
261 30.358 -85.799 9.4 0.0 0.0 21.0
262 30.358 -85.799 8.9 0.0 0.0 21.0
263 30.358 -85.799 10.0 340.0 1.5 21.0
264 30.358 -85.799 12.8 40.0 1.5 21.0
265 30.358 -85.799 16.7 100.0 2.1 21.0
266 30.358 -85.799 21.1 100.0 1.5 21.0
267 30.358 -85.799 23.3 0.0 0.0 21.0
268 30.358 -85.799 25.0 180.0 4.6 21.0
269 30.358 -85.799 24.4 230.0 3.6 21.0
270 30.358 -85.799 25.0 210.0 4.1 21.0
271 30.358 -85.799 23.9 170.0 4.1 21.0
272 30.358 -85.799 22.8 0.0 0.0 21.0
273 30.358 -85.799 19.4 0.0 0.0 21.0
274 30.358 -85.799 17.8 140.0 2.1 21.0
275 60.383 5.333 -0.7 0.0 0.0 36.0
276 60.383 5.333 0.6 270.0 2.0 36.0
277 60.383 5.333 -0.9 120.0 1.0 36.0
278 60.383 5.333 -1.6 130.0 2.0 36.0
279 60.383 5.333 -1.4 150.0 1.0 36.0
280 60.383 5.333 -1.7 0.0 0.0 36.0
281 60.383 5.333 -1.7 140.0 1.0 36.0
282 60.383 5.333 -1.4 0.0 0.0 36.0
283 60.383 5.333 -1.0 0.0 0.0 36.0
284 60.383 5.333 -1.0 150.0 1.0 36.0
285 60.383 5.333 -0.7 140.0 1.0 36.0
286 60.383 5.333 0.5 150.0 1.0 36.0
287 60.383 5.333 1.9 0.0 0.0 36.0
288 60.383 5.333 1.7 0.0 0.0 36.0
289 60.383 5.333 2.1 310.0 2.0 36.0
290 60.383 5.333 1.5 90.0 1.0 36.0
291 60.383 5.333 1.9 290.0 1.0 36.0
292 60.383 5.333 2.0 320.0 1.0 36.0
293 60.383 5.333 1.9 330.0 1.0 36.0
294 60.383 5.333 1.3 350.0 1.0 36.0
295 60.383 5.333 1.5 120.0 1.0 36.0
296 60.383 5.333 1.3 150.0 2.0 36.0
297 60.383 5.333 0.8 140.0 1.0 36.0
298 60.383 5.333 0.3 300.0 1.0 36.0
299 60.383 5.333 0.2 140.0 1.0 36.0
300 60.383 5.333 0.4 140.0 1.0 36.0
301 60.383 5.333 0.5 320.0 1.0 36.0
302 60.383 5.333 1.5 330.0 1.0 36.0
303 60.383 5.333 1.8 40.0 1.0 36.0
304 60.383 5.333 2.3 170.0 1.0 36.0
305 60.383 5.333 2.7 140.0 1.0 36.0
306 60.383 5.333 3.1 330.0 1.0 36.0
307 60.383 5.333 3.8 350.0 1.0 36.0
308 60.383 5.333 3.8 140.0 1.0 36.0
309 60.383 5.333 4.1 150.0 1.0 36.0
310 60.383 5.333 4.4 180.0 1.0 36.0
311 60.383 5.333 4.9 300.0 1.0 36.0
312 60.383 5.333 5.2 320.0 1.0 36.0
313 60.383 5.333 6.7 340.0 1.0 36.0
314 60.383 5.333 6.9 250.0 1.0 36.0
315 60.383 5.333 7.9 300.0 2.0 36.0
316 60.383 5.333 5.5 140.0 1.0 36.0
317 60.383 5.333 7.1 140.0 2.0 36.0
318 60.383 5.333 7.0 280.0 2.0 36.0
319 60.383 5.333 4.6 170.0 1.0 36.0
320 60.383 5.333 4.8 330.0 1.0 36.0
321 60.383 5.333 6.4 260.0 2.0 36.0
322 60.383 5.333 6.2 340.0 1.0 36.0
323 60.383 5.333 5.7 320.0 2.0 36.0
324 60.383 5.333 5.2 100.0 1.0 36.0
325 60.383 5.333 5.1 310.0 1.0 36.0
326 60.383 5.333 4.9 290.0 2.0 36.0
327 60.383 5.333 4.9 310.0 2.0 36.0
328 60.383 5.333 6.1 320.0 2.0 36.0
329 60.383 5.333 7.0 250.0 1.0 36.0
330 60.383 5.333 5.3 140.0 1.0 36.0
331 60.383 5.333 6.9 350.0 1.0 36.0
332 60.383 5.333 9.7 110.0 3.0 36.0
333 60.383 5.333 10.3 300.0 3.0 36.0
334 60.383 5.333 8.7 310.0 1.0 36.0
335 60.383 5.333 9.0 270.0 3.0 36.0
336 60.383 5.333 11.6 80.0 3.0 36.0
337 60.383 5.333 11.4 80.0 4.0 36.0
338 60.383 5.333 9.7 70.0 5.0 36.0
339 60.383 5.333 9.5 80.0 6.0 36.0
340 60.383 5.333 8.7 80.0 5.0 36.0
341 60.383 5.333 7.7 80.0 5.0 36.0
342 60.383 5.333 8.2 80.0 4.0 36.0
343 60.383 5.333 7.7 30.0 1.0 36.0
344 60.383 5.333 7.2 310.0 1.0 36.0
345 60.383 5.333 6.8 300.0 2.0 36.0
346 60.383 5.333 6.7 140.0 1.0 36.0

View File

@@ -1,547 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Monitor data drift on models deployed to Azure Kubernetes Service \n",
"\n",
"In this tutorial, you will setup a data drift monitor on a toy model that predicts elevation based on a few weather factors which will send email alerts if drift is detected."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Compute instance, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already established your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print('SDK version:', azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"ws"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup training dataset and model\n",
"\n",
"Setup the training dataset and model in preparation for deployment to the Azure Kubernetes Service. \n",
"\n",
"The next few cells will:\n",
" * get the default datastore and upload the `training.csv` dataset to the datastore\n",
" * create and register the dataset \n",
" * register the model with the dataset\n",
" \n",
"See the `config.py` script in this folder for details on how `training.csv` and `elevation-regression-model.pkl` are created. If you train your model in Azure ML using a Dataset, it will be automatically captured when registering the model from the run. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# use default datastore\n",
"dstore = ws.get_default_datastore()\n",
"\n",
"# upload weather data\n",
"dstore.upload('dataset', 'drift-on-aks-data', overwrite=True, show_progress=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"\n",
"# create dataset \n",
"dset = Dataset.Tabular.from_delimited_files(dstore.path('drift-on-aks-data/training.csv'))\n",
"# register dataset\n",
"dset = dset.register(ws, 'drift-demo-dataset')\n",
"# get the dataset by name from the workspace\n",
"dset = Dataset.get_by_name(ws, 'drift-demo-dataset')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import Model\n",
"\n",
"# register the model\n",
"model = Model.register(model_path='elevation-regression-model.pkl',\n",
" model_name='elevation-regression-model.pkl',\n",
" tags={'area': \"weather\", 'type': \"linear regression\"},\n",
" description='Linear regression model to predict elevation based on the weather',\n",
" workspace=ws,\n",
" datasets=[(Dataset.Scenario.TRAINING, dset)]) # need to register the dataset with the model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the inference config\n",
"\n",
"Create the environment and inference config from the `myenv.yml` and `score.py` files. Notice the [Model Data Collector](https://docs.microsoft.com/azure/machine-learning/service/how-to-enable-data-collection) code included in the scoring script. This dependency is currently required to collect model data, but will be removed in the near future as data collection in Azure Machine Learning webservice endpoints is automated."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"# create the environment from the yml file \n",
"env = Environment.from_conda_specification(name='deploytocloudenv', file_path='myenv.yml')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n",
"# create an inference config, combining the environment and entry script \n",
"inference_config = InferenceConfig(entry_script='score.py', environment=env)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the AKS compute target\n",
"\n",
"Create an Azure Kubernetes Service compute target to deploy the model to. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AksCompute, ComputeTarget\n",
"\n",
"# Use the default configuration (you can also provide parameters to customize this).\n",
"# For example, to create a dev/test cluster, use:\n",
"# prov_config = AksCompute.provisioning_configuration(cluster_purpose = AksCompute.ClusterPurpose.DEV_TEST)\n",
"prov_config = AksCompute.provisioning_configuration()\n",
"\n",
"aks_name = 'drift-aks'\n",
"aks_target = ws.compute_targets.get(aks_name)\n",
"\n",
"# Create the cluster\n",
"if not aks_target:\n",
" aks_target = ComputeTarget.create(workspace = ws,\n",
" name = aks_name,\n",
" provisioning_configuration = prov_config)\n",
"\n",
" # Wait for the create process to complete\n",
" aks_target.wait_for_completion(show_output = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy the model to AKS \n",
"\n",
"Deploy the model as a webservice endpoint. Be sure to enable the `collect_model_data` flag so that serving data is collected in blob storage for use by the data drift capability."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AksWebservice\n",
"\n",
"deployment_config = AksWebservice.deploy_configuration(cpu_cores=1, memory_gb=1, collect_model_data=True)\n",
"service_name = 'drift-aks-service'\n",
"\n",
"service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, aks_target)\n",
"\n",
"service.wait_for_deployment(True)\n",
"print(service.state)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run recent weather data through the webservice \n",
"\n",
"The below cells take the weather data of Florida from 2019-11-20 to 2019-11-12, filter and transform using the same processes as the training dataset, and runs the data through the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create dataset \n",
"tset = Dataset.Tabular.from_delimited_files(dstore.path('drift-on-aks-data/testing.csv'))\n",
"\n",
"df = tset.to_pandas_dataframe().fillna(0)\n",
"\n",
"X_features = ['latitude', 'longitude', 'temperature', 'windAngle', 'windSpeed']\n",
"y_features = ['elevation']\n",
"\n",
"X = df[X_features]\n",
"y = df[y_features]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"data = json.dumps({'data': X.values.tolist()})\n",
"\n",
"data_encoded = bytes(data, encoding='utf8')\n",
"prediction = service.run(input_data=data_encoded)\n",
"print(prediction)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create an Azure Machine Learning Compute cluster\n",
"\n",
"The data drift capability needs a compute target for computing drift and other data metrics. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"\n",
"compute_name = 'cpu-cluster'\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D3_V2', min_nodes=0, max_nodes=2)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout.\n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
" # For a more detailed view of current AmlCompute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Wait 10 minutes\n",
"\n",
"From the Model Data Collector, it can take up to (but usually less than) 10 minutes for data to arrive in your blob storage account. Wait 10 minutes to ensure cells below will run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"time.sleep(600)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create and update the data drift object"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime, timedelta\n",
"from azureml.datadrift import DataDriftDetector, AlertConfiguration\n",
"\n",
"services = [service_name]\n",
"start = datetime.now() - timedelta(days=2)\n",
"feature_list = X_features\n",
"alert_config = AlertConfiguration(['user@contoso.com'])\n",
"\n",
"try:\n",
" monitor = DataDriftDetector.create_from_model(ws, model.name, model.version, services, \n",
" frequency='Day', \n",
" schedule_start=datetime.utcnow() + timedelta(days=1), \n",
" alert_config=alert_config, \n",
" compute_target='cpu-cluster')\n",
"except KeyError:\n",
" monitor = DataDriftDetector.get(ws, model.name, model.version)\n",
" \n",
"monitor"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# many monitor settings can be updated \n",
"monitor = monitor.update(drift_threshold = 0.1)\n",
"\n",
"monitor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the monitor on today's scoring data\n",
"\n",
"Perform a data drift run on the data sent to the service earlier in this notebook. If you set your email address in the alert configuration and the drift threshold <=0.1 you should recieve an email alert to drift from this run.\n",
"\n",
"Wait for the run to complete before getting the results. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"now = datetime.utcnow()\n",
"target_date = datetime(now.year, now.month, now.day)\n",
"run = monitor.run(target_date, services, feature_list=feature_list, compute_target='cpu-cluster')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get and view results and metrics\n",
"\n",
"For enterprise workspaces, the UI in the Azure Machine Learning studio can be used. Otherwise, the metrics can be queried in Python and plotted. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The run() API initiates a pipeline run for each service in the services list. \n",
"# Here we retrieve the individual service run to get its output results and metrics. \n",
"\n",
"child_run = list(run.get_children())[0]\n",
"child_run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"child_run.wait_for_completion(wait_post_processing=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"results, metrics = monitor.get_output(run_id=child_run.id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"drift_plots = monitor.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Enable the monitor's pipeline schedule\n",
"\n",
"Turn on a scheduled pipeline which will anlayze the serving dataset for drift. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"monitor.enable_schedule()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Delete the DataDriftDetector\n",
"\n",
"Invoking the `delete()` method on the object deletes the the drift monitor permanently and cannot be undone. You will no longer be able to find it in the UI and the `list()` or `get()` methods. The object on which delete() was called will have its state set to deleted and name suffixed with deleted. The baseline and target datasets and model data that was collected, if any, are not deleted. The compute is not deleted. The DataDrift schedule pipeline is disabled and archived."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"monitor.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
" * See [our documentation](https://aka.ms/datadrift/aks) or [Python SDK reference](https://docs.microsoft.com/python/api/overview/azure/ml/intro)\n",
" * [Send requests or feedback](mailto:driftfeedback@microsoft.com) on data drift directly to the team\n",
" * Please open issues with data drift here on GitHub or on StackOverflow if others are likely to run into the same issue"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "jamgan"
}
],
"category": "tutorial",
"compute": [
"Remote"
],
"datasets": [
"NOAA"
],
"deployment": [
"AKS"
],
"exclude_from_index": false,
"framework": [
"Azure ML"
],
"friendly_name": "Data drift on aks",
"index_order": 1.0,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"star_tag": [
"featured"
],
"tags": [
"Dataset",
"Timeseries",
"Drift"
],
"task": "Filtering"
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -1,8 +0,0 @@
name: drift-on-aks
dependencies:
- pip:
- azureml-sdk
- azureml-datadrift
- azureml-monitoring
- azureml-opendatasets
- azureml-widgets

View File

@@ -1,11 +0,0 @@
name: project_environment
dependencies:
- python=3.6.2
- pip:
- azureml-core
- azureml-defaults
- azureml-monitoring
- scikit-learn
- numpy
- packaging
- inference-schema[numpy-support]

View File

@@ -1,44 +0,0 @@
import os
import numpy as np
from azureml.monitoring import ModelDataCollector
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.schema_decorators import input_schema, output_schema
# sklearn.externals.joblib is removed in 0.23
from sklearn import __version__ as sklearnver
from packaging.version import Version
if Version(sklearnver) < Version("0.23.0"):
from sklearn.externals import joblib
else:
import joblib
def init():
global model
global inputs_dc
inputs_dc = ModelDataCollector('elevation-regression-model.pkl', designation='inputs',
feature_names=['latitude', 'longitude', 'temperature', 'windAngle', 'windSpeed'])
# note here "elevation-regression-model.pkl" is the name of the model registered under
# this is a different behavior than before when the code is run locally, even though the code is the same.
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'elevation-regression-model.pkl')
model = joblib.load(model_path)
input_sample = np.array([[30, -85, 21, 150, 6]])
output_sample = np.array([8.995])
@input_schema('data', NumpyParameterType(input_sample))
@output_schema(NumpyParameterType(output_sample))
def run(data):
try:
inputs_dc.collect(data)
result = model.predict(data)
# you can return any datatype as long as it is JSON-serializable
return result.tolist()
except Exception as e:
error = str(e)
return error

View File

@@ -254,6 +254,7 @@
" dockerfile=f.read()\n",
"\n",
" xvfb_env = Environment(name='xvfb-vdisplay')\n",
" xvfb_env.docker.enabled = True\n",
" xvfb_env.docker.base_image = None\n",
" xvfb_env.docker.base_dockerfile = dockerfile\n",
" \n",
@@ -547,24 +548,13 @@
"source": [
"import shutil\n",
"\n",
"# A helper function to download (copy) movies from a dataset to local directory\n",
"# A helper function to download movies from a dataset to local directory\n",
"def download_movies(artifacts_ds, movies, destination):\n",
" # Create the local destination directory \n",
" if path.exists(destination):\n",
" dir_util.remove_tree(destination)\n",
" dir_util.mkpath(destination)\n",
"\n",
" try:\n",
" print(\"Trying mounting dataset and copying movies.\")\n",
" # Note: We assume movie paths start with '\\'\n",
" mount_context = artifacts_ds.mount()\n",
" mount_context.start()\n",
" for movie in movies:\n",
" print('Copying {} ...'.format(movie))\n",
" shutil.copy2(path.join(mount_context.mount_point, movie[1:]), destination)\n",
" mount_context.stop()\n",
" except OSError as e:\n",
" print(\"Mounting failed with error '{0}'. Going with dataset download.\".format(e))\n",
" for i, artifact in enumerate(artifacts_ds.to_path()):\n",
" if artifact in movies:\n",
" print('Downloading {} ...'.format(artifact))\n",
@@ -782,6 +772,7 @@
" dockerfile=f.read()\n",
"\n",
"xvfb_env = Environment(name='xvfb-vdisplay')\n",
"xvfb_env.docker.enabled = True\n",
"xvfb_env.docker.base_image = None\n",
"xvfb_env.docker.base_dockerfile = dockerfile\n",
" \n",

View File

@@ -8,7 +8,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
rm -rf /var/lib/apt/lists/* && \
rm -rf /usr/share/man/*
RUN conda install -y conda=4.7.12 python=3.6.2 && conda clean -ay && \
RUN conda install -y conda=4.7.12 python=3.7 && conda clean -ay && \
pip install --no-cache-dir \
azureml-defaults \
azureml-dataset-runtime[fuse,pandas] \
@@ -26,4 +26,4 @@ RUN conda install -y conda=4.7.12 python=3.6.2 && conda clean -ay && \
setproctitle \
gym[atari] && \
conda install -y -c conda-forge x264='1!152.20180717' ffmpeg=4.0.2 && \
conda install opencv
conda install -c anaconda opencv

View File

@@ -10,10 +10,10 @@
[MLflow](https://mlflow.org/) is an open-source platform for tracking machine learning experiments and managing models. You can use MLflow logging APIs with Azure Machine Learning service: the metrics and artifacts are logged to your Azure ML Workspace.
Try out the sample notebooks:
1. [Use MLflow with Azure Machine Learning for Local Training Run](./train-local/train-local.ipynb)
1. [Use MLflow with Azure Machine Learning for Remote Training Run](./train-remote/train-remote.ipynb)
1. [Deploy Model as Azure Machine Learning Web Service using MLflow](./deploy-model/deploy-model.ipynb)
1. [Train and Deploy PyTorch Image Classifier](./train-deploy-pytorch/train-deploy-pytorch.ipynb)
1. [Use MLflow with Azure Machine Learning for Local Training Run](./using-mlflow/train-local/train-local.ipynb)
1. [Use MLflow with Azure Machine Learning for Remote Training Run](./using-mlflow/train-remote/train-remote.ipynb)
1. [Train and Deploy PyTorch Image Classifier](./using-mlflow/train-and-deploy-pytorch/train-and-deploy-pytorch.ipynb)
1. [Train and Deploy Keras Image Classifier with MLflow auto logging](./using-mlflow/train-and-deploy-keras-auto-logging/train-and-deploy-keras-auto-logging.ipynb)
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/README.png)

View File

@@ -100,7 +100,7 @@
"\n",
"# Check core SDK version number\n",
"\n",
"print(\"This notebook was created using SDK version 1.11.0, you are currently running version\", azureml.core.VERSION)"
"print(\"This notebook was created using SDK version 1.13.0, you are currently running version\", azureml.core.VERSION)"
]
},
{

View File

@@ -0,0 +1,78 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import mlflow
import mlflow.keras
import numpy as np
import warnings
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import RMSprop
print("Keras version:", keras.__version__)
# Enable auto-logging to MLflow to capture Keras metrics.
mlflow.keras.autolog()
# Model / data parameters
n_inputs = 28 * 28
n_h1 = 300
n_h2 = 100
n_outputs = 10
learning_rate = 0.001
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Flatten image to be (n, 28 * 28)
x_train = x_train.reshape(len(x_train), -1)
x_test = x_test.reshape(len(x_test), -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, n_outputs)
y_test = keras.utils.to_categorical(y_test, n_outputs)
def driver():
warnings.filterwarnings("ignore")
with mlflow.start_run() as run:
# Build a simple MLP model
model = Sequential()
# first hidden layer
model.add(Dense(n_h1, activation='relu', input_shape=(n_inputs,)))
# second hidden layer
model.add(Dense(n_h2, activation='relu'))
# output layer
model.add(Dense(n_outputs, activation='softmax'))
model.summary()
batch_size = 128
epochs = 5
model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(lr=learning_rate),
metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])
return run
if __name__ == "__main__":
driver()

View File

@@ -0,0 +1,455 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-and-deploy-keras-auto-logging/train-and-deploy-keras-auto-logging.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MLflow with Azure Machine Learning to Train and Deploy Keras Image Classifier\n",
"\n",
"This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a Keras model to classify MNIST digit images and deploy the model as a web service. You'll learn how to:\n",
"\n",
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
" 2. Create experiment\n",
" 3. Instrument your model with MLflow tracking\n",
" 4. Train a Keras model locally with MLflow auto logging\n",
" 5. Train a model on GPU compute on Azure with MLflow auto logging\n",
" 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
" 7. Deploy the model as a web service on Azure Container Instance\n",
" 8. Call the model to make predictions\n",
" \n",
"### Pre-requisites\n",
" \n",
"If you are using a Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
"\n",
"Install TensorFlow and Keras, this notebook has been tested with TensorFlow version 2.1.0 and Keras version 2.3.1.\n",
"\n",
"Also, install azureml-mlflow package using ```pip install azureml-mlflow```. Note that azureml-mlflow installs mlflow package itself as a dependency if you haven't done so previously.\n",
"\n",
"### Set-up\n",
"\n",
"Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"import mlflow\n",
"import mlflow.azureml\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"print(\"MLflow version:\", mlflow.version.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set tracking URI\n",
"\n",
"Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Experiment\n",
"\n",
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"keras-with-mlflow\"\n",
"mlflow.set_experiment(experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model locally while logging metrics and artifacts\n",
"\n",
"The ```scripts/train.py``` program contains the code to load the image dataset, train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
"\n",
"Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, MLflow's automatic logging is used to log metrics, parameters and model for the Keras run.\n",
"\n",
"Let's add the program to search path, import it as a module and invoke the driver function. Note that the training can take few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lib_path = os.path.abspath(\"scripts\")\n",
"sys.path.append(lib_path)\n",
"\n",
"import train"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = train.driver()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model on GPU compute on Azure\n",
"\n",
"Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
"\n",
"Clone an environment object from the Tensorflow 2.1 Azure ML curated environment. Azure ML curated environments are pre-configured environments to simplify ML setup, reference [this doc](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-a-curated-environment) for more information. To enable MLflow tracking, add ```azureml-mlflow``` as pip package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"env = Environment.get(workspace=ws, name=\"AzureML-TensorFlow-2.1-GPU\").clone(\"mlflow-env\")\n",
"\n",
"env.python.conda_dependencies.add_pip_package(\"azureml-mlflow\")\n",
"env.python.conda_dependencies.add_pip_package(\"keras==2.3.1\")\n",
"env.python.conda_dependencies.add_pip_package(\"numpy\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a ScriptRunConfig to specify the training configuration: script, compute as well as environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=\"./scripts\", script=\"train.py\")\n",
"src.run_config.environment = env\n",
"src.run_config.target = \"gpu-cluster\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a reference to the experiment you created previously, but this time, as an Azure Machine Learning experiment object.\n",
"\n",
"Then, use the ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as the cached image is used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"exp = Experiment(ws, experiment_name)\n",
"run = exp.submit(src)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can monitor the run and its metrics on Azure Portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, you can wait for run to complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy model as web service\n",
"\n",
"The ```mlflow.azureml.deploy``` function registers the logged Keras+Tensorflow model and deploys the model in a framework-aware manner. It automatically creates the Tensorflow-specific inferencing wrapper code and specifies package dependencies for you. See [this doc](https://mlflow.org/docs/latest/models.html#id34) for more information on deploying models on Azure ML using MLflow.\n",
"\n",
"In this example, we deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
"\n",
"[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
"\n",
"Note that the service deployment can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice, Webservice\n",
"\n",
"model_path = \"model\"\n",
"\n",
"aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n",
" memory_gb=5, \n",
" tags={\"data\": \"MNIST\", \"method\" : \"keras\"}, \n",
" description=\"Predict using webservice\")\n",
"\n",
"webservice, azure_model = mlflow.azureml.deploy(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
" workspace=ws,\n",
" deployment_config=aci_config,\n",
" service_name=\"keras-mnist-1\",\n",
" model_name=\"keras_mnist\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the deployment has completed you can check the scoring URI of the web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Scoring URI is: {}\".format(webservice.scoring_uri))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make predictions using a web service\n",
"\n",
"To make the web service, create a test data set as normalized NumPy array. \n",
"\n",
"Then, let's define a utility function that takes a random image and converts it into a format and shape suitable for input to the Keras inferencing end-point. The conversion is done by: \n",
"\n",
" 1. Select a random (image, label) tuple\n",
" 2. Take the image and converting to to NumPy array \n",
" 3. Reshape array into 1 x 1 x N array\n",
" * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n",
" * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n",
" 4. Convert the NumPy array to list to make it into a built-in type.\n",
" 5. Create a dictionary {\"data\", &lt;list&gt;} that can be converted to JSON string for web service requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import keras\n",
"import random\n",
"import numpy as np\n",
"\n",
"# the data, split between train and test sets\n",
"(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n",
"\n",
"# Scale images to the [0, 1] range\n",
"x_test = x_test.astype(\"float32\") / 255\n",
"x_test = x_test.reshape(len(x_test), -1)\n",
"\n",
"# convert class vectors to binary class matrices\n",
"y_test = keras.utils.to_categorical(y_test, 10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# send a random row from the test set to score\n",
"random_index = np.random.randint(0, len(x_test)-1)\n",
"input_data = \"{\\\"data\\\": [\" + str(list(x_test[random_index])) + \"]}\"\n",
"\n",
"response = webservice.run(input_data)\n",
"\n",
"response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n",
"\n",
"print(\"Predicted label:\", response[0][0])\n",
"plt.imshow(x_test[random_index].reshape(28,28), cmap = \"gray\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also call the web service using a raw POST method against the web service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"response = requests.post(url=webservice.scoring_uri, data=input_data,headers={\"Content-type\": \"application/json\"})\n",
"print(response.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"You can delete the ACI deployment with a delete API call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"webservice.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "hancwang"
}
],
"category": "tutorial",
"celltoolbar": "Edit Metadata",
"compute": [
"Local",
"AML Compute"
],
"datasets": [
"MNIST"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"Keras"
],
"friendly_name": "Use MLflow with Azure Machine Learning to Train and Deploy Keras Image Classifier",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"tags": [
"mlflow",
"keras"
],
"task": "Use MLflow with Azure Machine Learning to Train and Deploy Keras Image Classifier, leveraging MLflow auto logging"
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -0,0 +1,150 @@
# Copyright (c) 2017, PyTorch Team
# All rights reserved
# Licensed under BSD 3-Clause License.
# This example is based on PyTorch MNIST example:
# https://github.com/pytorch/examples/blob/master/mnist/main.py
import mlflow
import mlflow.pytorch
from mlflow.utils.environment import _mlflow_conda_env
import warnings
import cloudpickle
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4 * 4 * 50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
# Added the view for reshaping score requests
x = x.view(-1, 1, 28, 28)
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
# Use MLflow logging
mlflow.log_metric("epoch_loss", loss.item())
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction="sum").item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print("\n")
print("Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
# Use MLflow logging
mlflow.log_metric("average_loss", test_loss)
class Args(object):
pass
# Training settings
args = Args()
setattr(args, 'batch_size', 64)
setattr(args, 'test_batch_size', 1000)
setattr(args, 'epochs', 3) # Higher number for better convergence
setattr(args, 'lr', 0.01)
setattr(args, 'momentum', 0.5)
setattr(args, 'no_cuda', True)
setattr(args, 'seed', 1)
setattr(args, 'log_interval', 10)
setattr(args, 'save_model', True)
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(
'../data',
train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
def driver():
warnings.filterwarnings("ignore")
# Dependencies for deploying the model
pytorch_index = "https://download.pytorch.org/whl/"
pytorch_version = "cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl"
deps = [
"cloudpickle=={}".format(cloudpickle.__version__),
pytorch_index + pytorch_version,
"torchvision=={}".format(torchvision.__version__),
"Pillow=={}".format("6.0.0")
]
with mlflow.start_run() as run:
model = Net().to(device)
optimizer = optim.SGD(
model.parameters(),
lr=args.lr,
momentum=args.momentum)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch)
test(args, model, device, test_loader)
# Log model to run history using MLflow
if args.save_model:
model_env = _mlflow_conda_env(additional_pip_deps=deps)
mlflow.pytorch.log_model(model, "model", conda_env=model_env)
return run
if __name__ == "__main__":
driver()

View File

@@ -0,0 +1,464 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-and-deploy-pytorch/train-and-deploy-pytorch.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n",
"\n",
"This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images and deploy the model as a web service. You'll learn how to:\n",
"\n",
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
" 2. Create experiment\n",
" 3. Instrument your model with MLflow tracking\n",
" 4. Train a PyTorch model locally\n",
" 5. Train a model on GPU compute on Azure\n",
" 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
" 7. Deploy the model as a web service on Azure Container Instance\n",
" 8. Call the model to make predictions\n",
" \n",
"## Pre-requisites\n",
" \n",
"If you are using a Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
"\n",
"Install PyTorch, this notebook has been tested with torch==1.4\n",
"\n",
"Also, install azureml-mlflow package using ```pip install azureml-mlflow```. Note that azureml-mlflow installs mlflow package itself as a dependency if you haven't done so previously.\n",
"\n",
"## Set-up\n",
"\n",
"Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"import mlflow\n",
"import mlflow.azureml\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"print(\"MLflow version:\", mlflow.version.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set tracking URI\n",
"\n",
"Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"\n",
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"pytorch-with-mlflow\"\n",
"mlflow.set_experiment(experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model locally while logging metrics and artifacts\n",
"\n",
"The ```scripts/train.py``` program contains the code to load the image dataset, train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
"\n",
"Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n",
"\n",
"Let's add the program to search path, import it as a module and invoke the driver function. Note that the training can take few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lib_path = os.path.abspath(\"scripts\")\n",
"sys.path.append(lib_path)\n",
"\n",
"import train"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = train.driver()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model on GPU compute on Azure\n",
"\n",
"Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
"\n",
"Clone an environment object from the PyTorch 1.4 Azure ML curated environment. Azure ML curated environments are pre-configured environments to simplify ML setup, reference [this doc](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments#use-a-curated-environment) for more information. To enable MLflow tracking, add ```azureml-mlflow``` as pip package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"env = Environment.get(workspace=ws, name=\"AzureML-PyTorch-1.4-GPU\").clone(\"mlflow-env\")\n",
"\n",
"env.python.conda_dependencies.add_pip_package(\"azureml-mlflow\")\n",
"env.python.conda_dependencies.add_pip_package(\"Pillow==6.0.0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Create a ScriptRunConfig to specify the training configuration: script, compute as well as environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=\"./scripts\", script=\"train.py\")\n",
"src.run_config.environment = env\n",
"src.run_config.target = \"gpu-cluster\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a reference to the experiment you created previously, but this time, as an Azure Machine Learning experiment object.\n",
"\n",
"Then, use the ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as the cached image is used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"exp = Experiment(ws, experiment_name)\n",
"run = exp.submit(src)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can monitor the run and its metrics on Azure Portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, you can wait for run to complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy model as web service\n",
"\n",
"The ```mlflow.azureml.deploy``` function registers the logged PyTorch model and deploys the model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specifies package dependencies for you. See [this doc](https://mlflow.org/docs/latest/models.html#id34) for more information on deploying models on Azure ML using MLflow.\n",
"\n",
"In this example, we deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
"\n",
"[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
"\n",
"Note that the service deployment can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice, Webservice\n",
"\n",
"model_path = \"model\"\n",
"\n",
"aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n",
" memory_gb=5, \n",
" tags={\"data\": \"MNIST\", \"method\" : \"pytorch\"}, \n",
" description=\"Predict using webservice\")\n",
"\n",
"webservice, azure_model = mlflow.azureml.deploy(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
" workspace=ws,\n",
" deployment_config=aci_config,\n",
" service_name=\"pytorch-mnist-1\",\n",
" model_name=\"pytorch_mnist\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the deployment has completed you can check the scoring URI of the web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Scoring URI is: {}\".format(webservice.scoring_uri))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Make predictions using a web service\n",
"\n",
"To make the web service, create a test data set as normalized PyTorch tensors. \n",
"\n",
"Then, let's define a utility function that takes a random image and converts it into a format and shape suitable for input to the PyTorch inferencing end-point. The conversion is done by: \n",
"\n",
" 1. Select a random (image, label) tuple\n",
" 2. Take the image and converting the tensor to NumPy array \n",
" 3. Reshape array into 1 x 1 x N array\n",
" * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n",
" * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n",
" 4. Convert the NumPy array to list to make it into a built-in type.\n",
" 5. Create a dictionary {\"data\", &lt;list&gt;} that can be converted to JSON string for web service requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torchvision import datasets, transforms\n",
"import random\n",
"import numpy as np\n",
"\n",
"test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.1307,), (0.3081,))]))\n",
"\n",
"\n",
"def get_random_image():\n",
" image_idx = random.randint(0,len(test_data))\n",
" image_as_tensor = test_data[image_idx][0]\n",
" return {\"data\": elem for elem in image_as_tensor.numpy().reshape(1,1,-1).tolist()}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, invoke the web service using a random test image. Convert the dictionary containing the image to JSON string before passing it to web service.\n",
"\n",
"The response contains the raw scores for each label, with greater value indicating higher probability. Sort the labels and select the one with greatest score to get the prediction. Let's also plot the image sent to web service for comparison purposes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"\n",
"test_image = get_random_image()\n",
"\n",
"response = webservice.run(json.dumps(test_image))\n",
"\n",
"response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n",
"\n",
"\n",
"print(\"Predicted label:\", response[0][0])\n",
"plt.imshow(np.array(test_image[\"data\"]).reshape(28,28), cmap = \"gray\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also call the web service using a raw POST method against the web service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"response = requests.post(url=webservice.scoring_uri, data=json.dumps(test_image),headers={\"Content-type\": \"application/json\"})\n",
"print(response.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"You can delete the ACI deployment with a delete API call."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"webservice.delete()"
]
}
],
"metadata": {
"authors": [
{
"name": "shipatel"
}
],
"category": "tutorial",
"celltoolbar": "Edit Metadata",
"compute": [
"Local",
"AML Compute"
],
"datasets": [
"MNIST"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"PyTorch"
],
"friendly_name": "Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"name": "mlflow-sparksummit-pytorch",
"notebookId": 2495374963457641,
"tags": [
"mlflow",
"pytorch"
],
"task": "Use MLflow with Azure Machine Learning to train and deploy PyTorch image classifier model"
},
"nbformat": 4,
"nbformat_minor": 4
}

View File

@@ -604,15 +604,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Predict on the test set\n",
"## Predict on the test set (Optional)\n",
"Let's check the version of the local Keras. Make sure it matches with the version number printed out in the training script. Otherwise you might not be able to load the model properly."
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" import keras\n",
" import tensorflow as tf\n",
@@ -629,10 +627,8 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" from keras.models import model_from_json\n",
"\n",
@@ -654,10 +650,8 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" # evaluate loaded model on test data\n",
" loaded_model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])\n",
@@ -677,10 +671,8 @@
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"metadata": {},
"outputs": [],
"source": [
" print(\"Accuracy on the test set:\", np.average(y_hat == y_test))"
]
@@ -1056,7 +1048,7 @@
" font_color = 'red' if y_test[s] != result[i] else 'black'\n",
" clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n",
" \n",
" plt.text(x=10, y=-10, s=y_test[s], fontsize=18, color=font_color)\n",
" plt.text(x=10, y=-10, s=result[i], fontsize=18, color=font_color)\n",
" plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n",
" \n",
" i = i + 1\n",

View File

@@ -1,406 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-amlcompute/train-on-computeinstance.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train using Azure Machine Learning Compute Instance\n",
"\n",
"* Initialize Workspace\n",
"* Introduction to ComputeInstance\n",
"* Create an Experiment\n",
"* Submit ComputeInstance run\n",
"* Additional operations to perform on ComputeInstance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning ComputeInstance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to ComputeInstance\n",
"\n",
"\n",
"Azure Machine Learning compute instance is a fully-managed cloud-based workstation optimized for your machine learning development environment. It is created **within your workspace region**.\n",
"\n",
"For more information on ComputeInstance, please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance)\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create ComputeInstance\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since ComputeInstance is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D3_V2') is supported.\n",
"\n",
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "check_region"
},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
"\n",
"ComputeInstance.supported_vmsizes(workspace = ws)\n",
"# ComputeInstance.supported_vmsizes(workspace = ws, location='eastus')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "create_instance"
},
"outputs": [],
"source": [
"import datetime\n",
"import time\n",
"\n",
"from azureml.core.compute import ComputeTarget, ComputeInstance\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your instance\n",
"# Compute instance name should be unique across the azure region\n",
"compute_name = \"ci{}\".format(ws._workspace_id)[:10]\n",
"\n",
"# Verify that instance does not exist already\n",
"try:\n",
" instance = ComputeInstance(workspace=ws, name=compute_name)\n",
" print('Found existing instance, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = ComputeInstance.provisioning_configuration(\n",
" vm_size='STANDARD_D3_V2',\n",
" ssh_public_access=False,\n",
" # vnet_resourcegroup_name='<my-resource-group>',\n",
" # vnet_name='<my-vnet-name>',\n",
" # subnet_name='default',\n",
" # admin_user_ssh_public_key='<my-sshkey>'\n",
" )\n",
" instance = ComputeInstance.create(ws, compute_name, compute_config)\n",
" instance.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create An Experiment\n",
"\n",
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"experiment_name = 'train-on-computeinstance'\n",
"experiment = Experiment(workspace = ws, name = experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit ComputeInstance run\n",
"The training script `train.py` is already created for you"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create environment\n",
"\n",
"Create an environment with scikit-learn installed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = Environment(\"myenv\")\n",
"myenv.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"src = ScriptRunConfig(source_directory='', script='train.py')\n",
"\n",
"# Set compute target to the one created in previous step\n",
"src.run_config.target = instance\n",
"\n",
"# Set environment\n",
"src.run_config.environment = myenv\n",
" \n",
"run = experiment.submit(config=src)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the get_active_runs() to get the currently running or queued jobs on the compute instance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# wait for the run to reach Queued or Running state if it is in Preparing state\n",
"status = run.get_status()\n",
"while status not in ['Queued', 'Running', 'Completed', 'Failed', 'Canceled']:\n",
" state = run.get_status()\n",
" print('Run status: {}'.format(status))\n",
" time.sleep(10)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get active runs which are in Queued or Running state\n",
"active_runs = instance.get_active_runs()\n",
"for active_run in active_runs:\n",
" print(active_run.run_id, ',', active_run.status)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion()\n",
"print(run.get_metrics())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Additional operations to perform on ComputeInstance\n",
"\n",
"You can perform more operations on ComputeInstance such as get status, change the state or deleting the compute."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "get_status"
},
"outputs": [],
"source": [
"# get_status() gets the latest status of the ComputeInstance target\n",
"instance.get_status()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "stop"
},
"outputs": [],
"source": [
"# stop() is used to stop the ComputeInstance\n",
"# Stopping ComputeInstance will stop the billing meter and persist the state on the disk.\n",
"# Available Quota will not be changed with this operation.\n",
"instance.stop(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"msdoc": "how-to-auto-train-remote.md",
"name": "start"
},
"outputs": [],
"source": [
"# start() is used to start the ComputeInstance if it is in stopped state\n",
"instance.start(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# restart() is used to restart the ComputeInstance\n",
"instance.restart(wait_for_completion=True, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# delete() is used to delete the ComputeInstance target. Useful if you want to re-use the compute name \n",
"# instance.delete(wait_for_completion=True, show_output=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "ramagott"
}
],
"category": "training",
"compute": [
"Compute Instance"
],
"datasets": [
"Diabetes"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"None"
],
"friendly_name": "Train on Azure Machine Learning Compute Instance",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"tags": [
"None"
],
"task": "Submit a run on Azure Machine Learning Compute Instance."
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,6 +0,0 @@
name: train-on-computeinstance
dependencies:
- scikit-learn
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -1,48 +0,0 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
import os
import numpy as np
# sklearn.externals.joblib is removed in 0.23
try:
from sklearn.externals import joblib
except ImportError:
import joblib
os.makedirs('./outputs', exist_ok=True)
X, y = load_diabetes(return_X_y=True)
run = Run.get_context()
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=0)
data = {"train": {"X": X_train, "y": y_train},
"test": {"X": X_test, "y": y_test}}
# list of numbers from 0.0 to 1.0 with a 0.05 interval
alphas = np.arange(0.0, 1.0, 0.05)
for alpha in alphas:
# Use Ridge algorithm to create a regression model
reg = Ridge(alpha=alpha)
reg.fit(data["train"]["X"], data["train"]["y"])
preds = reg.predict(data["test"]["X"])
mse = mean_squared_error(preds, data["test"]["y"])
run.log('alpha', alpha)
run.log('mse', mse)
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
# save model in the outputs folder so it automatically get uploaded
with open(model_file_name, "wb") as file:
joblib.dump(value=reg, filename=os.path.join('./outputs/',
model_file_name))
print('alpha is {0:.2f}, and mse is {1:0.2f}'.format(alpha, mse))

View File

@@ -1,33 +0,0 @@
import pickle
import json
import numpy as np
from sklearn.linear_model import Ridge
from azureml.core.model import Model
# sklearn.externals.joblib is removed in 0.23
try:
from sklearn.externals import joblib
except ImportError:
import joblib
def init():
global model
# note here "best_model" is the name of the model registered under the workspace
# this call should return the path to the model.pkl file on the local disk.
model_path = Model.get_model_path(model_name='best_model')
# deserialize the model file back into a sklearn model
model = joblib.load(model_path)
# note you can pass in multiple rows for scoring
def run(raw_data):
try:
data = json.loads(raw_data)['data']
data = np.array(data)
result = model.predict(data)
# you can return any data type as long as it is JSON-serializable
return result.tolist()
except Exception as e:
result = str(e)
return result

View File

@@ -1,721 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-within-notebook/train-within-notebook.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and deploy a model\n",
"_**Create and deploy a model directly from a notebook**_\n",
"\n",
"---\n",
"---\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
" 1. Viewing run results\n",
" 1. Simple parameter sweep\n",
" 1. Viewing experiment results\n",
" 1. Select the best model\n",
"1. [Deploy](#Deploy)\n",
" 1. Register the model\n",
" 1. Create a scoring file\n",
" 1. Describe your environment\n",
" 1. Descrice your target compute\n",
" 1. Deploy your webservice\n",
" 1. Test your webservice\n",
" 1. Clean up\n",
"1. [Next Steps](#nextsteps)\n",
"\n",
"---\n",
"\n",
"## Introduction\n",
"Azure Machine Learning provides capabilities to control all aspects of model training and deployment directly from a notebook using the AML Python SDK. In this notebook we will\n",
"* connect to our AML Workspace\n",
"* create an experiment that contains multiple runs with tracked metrics\n",
"* choose the best model created across all runs\n",
"* deploy that model as a service\n",
"\n",
"In the end we will have a model deployed as a web service which we can call from an HTTP endpoint"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Setup\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. From the configuration, the important sections are the workspace configuration and ACI regristration.\n",
"\n",
"We will also need the following libraries install to our conda environment. If these are not installed, use the following command to do so and restart the notebook.\n",
"```shell\n",
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
"```\n",
"\n",
"For this notebook we need the Azure ML SDK and access to our workspace. The following cell imports the SDK, checks the version, and accesses our already configured AzureML workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"install"
],
"name": "load_ws",
"msdoc": "how-to-track-experiments.md"
},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Experiment, Workspace\n",
"\n",
"# Check core SDK version number\n",
"print(\"This notebook was created using version 1.0.2 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")\n",
"print(\"\")\n",
"\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Data\n",
"We will use the diabetes dataset for this experiement, a well-known small dataset that comes with scikit-learn. This cell loads the dataset and splits it into random training and testing sets.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"name": "load_data",
"msdoc": "how-to-track-experiments.md"
},
"outputs": [],
"source": [
"from sklearn.datasets import load_diabetes\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.externals import joblib\n",
"\n",
"X, y = load_diabetes(return_X_y = True)\n",
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
"data = {\n",
" \"train\":{\"X\": X_train, \"y\": y_train}, \n",
" \"test\":{\"X\": X_test, \"y\": y_test}\n",
"}\n",
"\n",
"print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Train\n",
"\n",
"Let's use scikit-learn to train a simple Ridge regression model. We use AML to record interesting information about the model in an Experiment. An Experiment contains a series of trials called Runs. During this trial we use AML in the following way:\n",
"* We access an experiment from our AML workspace by name, which will be created if it doesn't exist\n",
"* We use `start_logging` to create a new run in this experiment\n",
"* We use `run.log()` to record a parameter, alpha, and an accuracy measure - the Mean Squared Error (MSE) to the run. We will be able to review and compare these measures in the Azure Portal at a later time.\n",
"* We store the resulting model in the **outputs** directory, which is automatically captured by AML when the run is complete.\n",
"* We use `run.complete()` to indicate that the run is over and results can be captured and finalized"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"local run",
"outputs upload"
],
"name": "create_experiment",
"msdoc": "how-to-track-experiments.md"
},
"outputs": [],
"source": [
"# Get an experiment object from Azure Machine Learning\n",
"experiment = Experiment(workspace=ws, name=\"train-within-notebook\")\n",
"\n",
"# Create a run object in the experiment\n",
"run = experiment.start_logging()\n",
"# Log the algorithm parameter alpha to the run\n",
"run.log('alpha', 0.03)\n",
"\n",
"# Create, fit, and test the scikit-learn Ridge regression model\n",
"regression_model = Ridge(alpha=0.03)\n",
"regression_model.fit(data['train']['X'], data['train']['y'])\n",
"preds = regression_model.predict(data['test']['X'])\n",
"\n",
"# Output the Mean Squared Error to the notebook and to the run\n",
"print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n",
"run.log('mse', mean_squared_error(data['test']['y'], preds))\n",
"\n",
"# Save the model to the outputs directory for capture\n",
"model_file_name = 'outputs/model.pkl'\n",
"\n",
"joblib.dump(value = regression_model, filename = model_file_name)\n",
"\n",
"# upload the model file explicitly into artifacts \n",
"run.upload_file(name = model_file_name, path_or_stream = model_file_name)\n",
"\n",
"# Complete the run\n",
"run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Viewing run results\n",
"Azure Machine Learning stores all the details about the run in the Azure cloud. Let's access those details by retrieving a link to the run using the default run output. Clicking on the resulting link will take you to an interactive page presenting all run information."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Simple parameter sweep\n",
"Now let's take the same concept from above and modify the **alpha** parameter. For each value of alpha we will create a run that will store metrics and the resulting model. In the end we can use the captured run history to determine which model was the best for us to deploy. \n",
"\n",
"Note that by using `with experiment.start_logging() as run` AML will automatically call `run.complete()` at the end of each loop.\n",
"\n",
"This example also uses the **tqdm** library to provide a thermometer feedback"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from tqdm import tqdm\n",
"\n",
"# list of numbers from 0 to 1.0 with a 0.05 interval\n",
"alphas = np.arange(0.0, 1.0, 0.05)\n",
"\n",
"# try a bunch of alpha values in a Linear Regression (Ridge) model\n",
"for alpha in tqdm(alphas):\n",
" # create a bunch of runs, each train a model with a different alpha value\n",
" with experiment.start_logging() as run:\n",
" # Use Ridge algorithm to build a regression model\n",
" regression_model = Ridge(alpha=alpha)\n",
" regression_model.fit(X=data[\"train\"][\"X\"], y=data[\"train\"][\"y\"])\n",
" preds = regression_model.predict(X=data[\"test\"][\"X\"])\n",
" mse = mean_squared_error(y_true=data[\"test\"][\"y\"], y_pred=preds)\n",
"\n",
" # log alpha, mean_squared_error and feature names in run history\n",
" run.log(name=\"alpha\", value=alpha)\n",
" run.log(name=\"mse\", value=mse)\n",
"\n",
" # Save the model to the outputs directory for capture\n",
" joblib.dump(value=regression_model, filename='outputs/model.pkl')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Viewing experiment results\n",
"Similar to viewing the run, we can also view the entire experiment. The experiment report view in the Azure portal lets us view all the runs in a table, and also allows us to customize charts. This way, we can see how the alpha parameter impacts the quality of the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# now let's take a look at the experiment in Azure portal.\n",
"experiment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Select the best model \n",
"Now that we've created many runs with different parameters, we need to determine which model is the best for deployment. For this, we will iterate over the set of runs. From each run we will take the *run id* using the `id` property, and examine the metrics by calling `run.get_metrics()`. \n",
"\n",
"Since each run may be different, we do need to check if the run has the metric that we are looking for, in this case, **mse**. To find the best run, we create a dictionary mapping the run id's to the metrics.\n",
"\n",
"Finally, we use the `tag` method to mark the best run to make it easier to find later. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"runs = {}\n",
"run_metrics = {}\n",
"\n",
"# Create dictionaries containing the runs and the metrics for all runs containing the 'mse' metric\n",
"for r in tqdm(experiment.get_runs()):\n",
" metrics = r.get_metrics()\n",
" if 'mse' in metrics.keys():\n",
" runs[r.id] = r\n",
" run_metrics[r.id] = metrics\n",
"\n",
"# Find the run with the best (lowest) mean squared error and display the id and metrics\n",
"best_run_id = min(run_metrics, key = lambda k: run_metrics[k]['mse'])\n",
"best_run = runs[best_run_id]\n",
"print('Best run is:', best_run_id)\n",
"print('Metrics:', run_metrics[best_run_id])\n",
"\n",
"# Tag the best run for identification later\n",
"best_run.tag(\"Best Run\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"## Deploy\n",
"Now that we have trained a set of models and identified the run containing the best model, we want to deploy the model for real time inference. The process of deploying a model involves\n",
"* registering a model in your workspace\n",
"* creating a scoring file containing init and run methods\n",
"* creating an environment dependency file describing packages necessary for your scoring file\n",
"* deploying the model and packages as a web service"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Register a model\n",
"We have already identified which run contains the \"best model\" by our evaluation criteria. Each run has a file structure associated with it that contains various files collected during the run. Since a run can have many outputs we need to tell AML which file from those outputs represents the model that we want to use for our deployment. We can use the `run.get_file_names()` method to list the files associated with the run, and then use the `run.register_model()` method to place the model in the workspace's model registry.\n",
"\n",
"When using `run.register_model()` we supply a `model_name` that is meaningful for our scenario and the `model_path` of the model relative to the run. In this case, the model path is what is returned from `run.get_file_names()`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"query history"
]
},
"outputs": [],
"source": [
"# View the files in the run\n",
"for f in best_run.get_file_names():\n",
" print(f)\n",
" \n",
"# Register the model with the workspace\n",
"model = best_run.register_model(model_name='best_model', model_path='outputs/model.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once a model is registered, it is accessible from the list of models on the AML workspace. If you register models with the same name multiple times, AML keeps a version history of those models for you. The `Model.list()` lists all models in a workspace, and can be filtered by name, tags, or model properties. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from history"
]
},
"outputs": [],
"source": [
"# Find all models called \"best_model\" and display their version numbers\n",
"from azureml.core.model import Model\n",
"models = Model.list(ws, name='best_model')\n",
"for m in models:\n",
" print(m.name, m.version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a scoring file\n",
"\n",
"Since your model file can essentially be anything you want it to be, you need to supply a scoring script that can load your model and then apply the model to new data. This script is your 'scoring file'. This scoring file is a python program containing, at a minimum, two methods `init()` and `run()`. The `init()` method is called once when your deployment is started so you can load your model and any other required objects. This method uses the `get_model_path` function to locate the registered model inside the docker container. The `run()` method is called interactively when the web service is called with one or more data samples to predict.\n",
"\n",
"The scoring file used for this exercise is [here](score.py). \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Describe your environment\n",
"\n",
"Each modelling process may require a unique set of packages. Therefore we need to create an environment object describing the dependencies. \n",
"\n",
"Next we create an inference configuration using this environment object and the scoring script that we created previously."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.model import InferenceConfig\n",
"\n",
"env = Environment('deploytocloudenv')\n",
"env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],pip_packages=['azureml-defaults'])\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Describe your target compute\n",
"In addition to the inference configuration, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n",
"\n",
"For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={'sample name': 'AML 101'}, \n",
" description='This is a great example.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy your webservice\n",
"The final step to deploying your webservice is to call `Model.deploy()`. This function uses the deployment and inference configurations created above to perform the following:\n",
"* Build a docker image\n",
"* Deploy to the docker image to an Azure Container Instance\n",
"* Copy your model files to the Azure Container Instance\n",
"* Call the `init()` function in your scoring file\n",
"* Provide an HTTP endpoint for scoring calls\n",
"\n",
"The `Model.deploy` method requires the following parameters\n",
"* `workspace` - the workspace containing the service\n",
"* `name` - a unique named used to identify the service in the workspace\n",
"* `models` - an array of models to be deployed into the container\n",
"* `inference_config` - a configuration object describing the image environment\n",
"* `deployment_config` - a configuration object describing the compute type\n",
" \n",
"**Note:** The web service creation can take several minutes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"%%time\n",
"from azureml.core.model import Model\n",
"from azureml.core.webservice import Webservice\n",
"\n",
"# Create the webservice using all of the precreated configurations and our best model\n",
"service = Model.deploy(workspace=ws,\n",
" name='my-aci-svc',\n",
" models=[model],\n",
" inference_config=inference_config,\n",
" deployment_config=aciconfig)\n",
"\n",
"# Wait for the service deployment to complete while displaying log output\n",
"service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Test your webservice"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that your web service is runing you can send JSON data directly to the service using the `run` method. This cell pulls the first test sample from the original dataset into JSON and then sends it to the service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"import json\n",
"\n",
"service = ws.webservices['my-aci-svc']\n",
"\n",
"# scrape the first row from the test set.\n",
"test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n",
"\n",
"#score on our service\n",
"service.run(input_data = test_samples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This cell shows how you can send multiple rows to the webservice at once. It then calculates the residuals - that is, the errors - by subtracting out the actual values from the results. These residuals are used later to show a plotted result."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"# score the entire test set.\n",
"test_samples = json.dumps({'data': X_test.tolist()})\n",
"\n",
"result = service.run(input_data = test_samples)\n",
"residual = result - y_test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This cell shows how you can use the `service.scoring_uri` property to access the HTTP endpoint of the service and call it using standard POST operations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"import requests\n",
"\n",
"# use the first row from the test set again\n",
"test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n",
"\n",
"# create the required header\n",
"headers = {'Content-Type':'application/json'}\n",
"\n",
"# post the request to the service and display the result\n",
"resp = requests.post(service.scoring_uri, test_samples, headers = headers)\n",
"print(resp.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Residual graph\n",
"One way to understand the behavior of your model is to see how the data performs against data with known results. This cell uses matplotlib to create a histogram of the residual values, or errors, created from scoring the test samples.\n",
"\n",
"A good model should have residual values that cluster around 0 - that is, no error. Observing the resulting histogram can also show you if the model is skewed in any particular direction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"\n",
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw={'width_ratios':[3, 1], 'wspace':0, 'hspace': 0})\n",
"f.suptitle('Residual Values', fontsize = 18)\n",
"\n",
"f.set_figheight(6)\n",
"f.set_figwidth(14)\n",
"\n",
"a0.plot(residual, 'bo', alpha=0.4)\n",
"a0.plot([0,90], [0,0], 'r', lw=2)\n",
"a0.set_ylabel('residue values', fontsize=14)\n",
"a0.set_xlabel('test data set', fontsize=14)\n",
"\n",
"a1.hist(residual, orientation='horizontal', color='blue', bins=10, histtype='step')\n",
"a1.hist(residual, orientation='horizontal', color='blue', alpha=0.2, bins=10)\n",
"a1.set_yticklabels([])\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Clean up"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Delete the ACI instance to stop the compute and any associated billing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"deploy service",
"aci"
]
},
"outputs": [],
"source": [
"%%time\n",
"service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id='nextsteps'></a>\n",
"## Next Steps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, you created a series of models inside the notebook using local data, stored them inside an AML experiment, found the best one and deployed it as a live service! From here you can continue to use Azure Machine Learning in this regard to run your own experiments and deploy your own models, or you can expand into further capabilities of AML!\n",
"\n",
"If you have a model that is difficult to process locally, either because the data is remote or the model is large, try the [train-on-remote-vm](../train-on-remote-vm) notebook to learn about submitting remote jobs.\n",
"\n",
"If you want to take advantage of multiple cloud machines to perform large parameter sweeps try the [train-hyperparameter-tune-deploy-with-pytorch](../../training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch\n",
") sample.\n",
"\n",
"If you want to deploy models to a production cluster try the [production-deploy-to-aks](../../deployment/production-deploy-to-aks\n",
") notebook."
]
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"category": "tutorial",
"compute": [
"Local"
],
"datasets": [
"Diabetes"
],
"deployment": [
"Azure Container Instance"
],
"exclude_from_index": false,
"framework": [
"None"
],
"friendly_name": "Train and deploy a model using Python SDK",
"index_order": 1,
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"tags": [
"None"
],
"task": "Training and deploying a model from a notebook"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,8 +0,0 @@
name: train-within-notebook
dependencies:
- tqdm
- scikit-learn
- matplotlib
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -1,4 +1,5 @@
import os
import sys
def convert(imgf, labelf, outf, n):
@@ -23,8 +24,8 @@ def convert(imgf, labelf, outf, n):
l.close()
mounted_input_path = os.environ['fashion_ds']
mounted_output_path = os.environ['AZUREML_DATAREFERENCE_prepared_fashion_ds']
mounted_input_path = sys.argv[1]
mounted_output_path = sys.argv[2]
os.makedirs(mounted_output_path, exist_ok=True)
convert(os.path.join(mounted_input_path, 'train-images-idx3-ubyte'),

View File

@@ -65,12 +65,9 @@
"source": [
"import os\n",
"import azureml.core\n",
"from azureml.core import Workspace, Dataset, Datastore, ComputeTarget, RunConfiguration, Experiment\n",
"from azureml.core.runconfig import CondaDependencies\n",
"from azureml.core import Workspace, Dataset, Datastore, ComputeTarget, Experiment\n",
"from azureml.pipeline.steps import PythonScriptStep, EstimatorStep\n",
"from azureml.pipeline.core import Pipeline, PipelineData\n",
"from azureml.train.dnn import TensorFlow\n",
"\n",
"from azureml.pipeline.core import Pipeline\n",
"# check core SDK version number\n",
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
]
@@ -141,7 +138,7 @@
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpu-cluster\"\n",
"cluster_name = \"amlcomp\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=workspace, name=cluster_name)\n",
@@ -170,7 +167,7 @@
"\n",
"By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.\n",
"\n",
"Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create a dataset from it. We will now upload the [Fashion MNIST](./keras-mnist-fashion) to the default datastore (blob) within your workspace."
"Every workspace comes with a default [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create a dataset from it. We will now upload the [Fashion MNIST](./data) to the default datastore (blob) within your workspace."
]
},
{
@@ -180,8 +177,8 @@
"outputs": [],
"source": [
"datastore = workspace.get_default_datastore()\n",
"datastore.upload_files(files = ['keras-mnist-fashion/t10k-images-idx3-ubyte', 'keras-mnist-fashion/t10k-labels-idx1-ubyte',\n",
" 'keras-mnist-fashion/train-images-idx3-ubyte','keras-mnist-fashion/train-labels-idx1-ubyte'],\n",
"datastore.upload_files(files = ['data/t10k-images-idx3-ubyte', 'data/t10k-labels-idx1-ubyte',\n",
" 'data/train-images-idx3-ubyte','data/train-labels-idx1-ubyte'],\n",
" target_path = 'mnist-fashion',\n",
" overwrite = True,\n",
" show_progress = True)"
@@ -212,7 +209,7 @@
"source": [
"## Build 2-step ML pipeline\n",
"\n",
"The [Azure Machine Learning Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) enables data scientists to create and manage multiple simple and complex workflows concurrently. A typical pipeline would have multiple tasks to prepare data, train, deploy and evaluate models. Individual steps in the pipeline can make use of diverse compute options (for example: CPU for data preparation and GPU for training) and languages. [Learn More](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines)\n",
"The [Azure Machine Learning Pipeline](https://docs.microsoft.com/azure/machine-learning/service/concept-ml-pipelines) enables data scientists to create and manage multiple simple and complex workflows concurrently. A typical pipeline would have multiple tasks to prepare data, train, deploy and evaluate models. Individual steps in the pipeline can make use of diverse compute options (for example: CPU for data preparation and GPU for training) and languages. [Learn More](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines)\n",
"\n",
"\n",
"### Step 1: data preparation\n",
@@ -222,28 +219,11 @@
"Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. Both mnist_train.csv and mnist_test.csv contain 785 columns. The first column consists of the class labels, which represent the article of clothing. The rest of the columns contain the pixel-values of the associated image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set up the compute environment to install required packages\n",
"conda = CondaDependencies.create(\n",
" pip_packages=['azureml-sdk','azureml-dataset-runtime[fuse,pandas]'],\n",
" pin_sdk_version=False)\n",
"\n",
"conda.set_pip_option('--pre')\n",
"\n",
"run_config = RunConfiguration()\n",
"run_config.environment.python.conda_dependencies = conda"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Intermediate data (or output of a step) is represented by a `PipelineData` object. preprared_fashion_ds is produced as the output of step 1, and used as the input of step 2. PipelineData introduces a data dependency between steps, and creates an implicit execution order in the pipeline. You can register a `PipelineData` as a dataset and version the output data automatically. [Learn More](https://docs.microsoft.com/azure/machine-learning/service/how-to-version-track-datasets#version-a-pipeline-output-dataset) "
"Intermediate data (or output of a step) is represented by a `OutputFileDatasetConfig` object. preprared_fashion_ds is produced as the output of step 1, and used as the input of step 2. `OutputFileDatasetConfig` introduces a data dependency between steps, and creates an implicit execution order in the pipeline. You can register a `OutputFileDatasetConfig` as a dataset and version the output data automatically."
]
},
{
@@ -252,18 +232,28 @@
"metadata": {},
"outputs": [],
"source": [
"# define output data\n",
"prepared_fashion_ds = PipelineData('prepared_fashion_ds', datastore=datastore).as_dataset()\n",
"from azureml.data import OutputFileDatasetConfig\n",
"\n",
"# register output data as dataset\n",
"prepared_fashion_ds = prepared_fashion_ds.register(name='prepared_fashion_ds', create_new_version=True)"
"# learn more about the output config\n",
"help(OutputFileDatasetConfig)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# write output to datastore under folder `outputdataset` and register it as a dataset after the experiment completes\n",
"# make sure the service principal in your datastore has blob data contributor role in order to write data back\n",
"prepared_fashion_ds = OutputFileDatasetConfig(destination=(datastore, 'outputdataset/{run-id}')).register_on_complete(name='prepared_fashion_ds')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A **PythonScriptStep** is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a [**RunConfiguration**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py) to specify requirements for the PythonScriptStep, such as conda dependencies and docker image."
"A **PythonScriptStep** is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a [**RunConfiguration**](https://docs.microsoft.com/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py) to specify requirements for the PythonScriptStep, such as conda dependencies and docker image."
]
},
{
@@ -275,12 +265,10 @@
"prep_step = PythonScriptStep(name='prepare step',\n",
" script_name=\"prepare.py\",\n",
" # mount fashion_ds dataset to the compute_target\n",
" inputs=[fashion_ds.as_named_input('fashion_ds').as_mount()],\n",
" outputs=[prepared_fashion_ds],\n",
" arguments=[fashion_ds.as_named_input('fashion_ds').as_mount(), prepared_fashion_ds],\n",
" source_directory=script_folder,\n",
" compute_target=compute_target,\n",
" runconfig=run_config,\n",
" allow_reuse=False)"
" allow_reuse=True)"
]
},
{
@@ -289,9 +277,7 @@
"source": [
"### Step 2: train CNN with Keras\n",
"\n",
"Next, we construct an `azureml.train.dnn.TensorFlow` estimator object. The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed.\n",
"\n",
"[EstimatorStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.estimator_step.estimatorstep?view=azure-ml-py) adds a step to run Tensorflow Estimator in a Pipeline. It takes a dataset as the input."
"Next, we construct an `azureml.train.Estimator` estimator object. [EstimatorStep](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.estimator_step.estimatorstep?view=azure-ml-py) adds a step to run Tensorflow Estimator in a Pipeline. It takes a dataset as the input."
]
},
{
@@ -300,17 +286,17 @@
"metadata": {},
"outputs": [],
"source": [
"# set up training step with Tensorflow estimator\n",
"est = TensorFlow(entry_script='train.py',\n",
"from azureml.train.estimator import Estimator\n",
"# set up training step with Estimator\n",
"est = Estimator(entry_script='train.py',\n",
" source_directory=script_folder,\n",
" pip_packages = ['azureml-sdk', 'keras<=2.3.1', 'tensorflow==2.1.0', 'numpy','scikit-learn', 'matplotlib'],\n",
" pip_packages=['keras','tensorflow','numpy','scikit-learn', 'matplotlib','pandas'],\n",
" compute_target=compute_target)\n",
"\n",
"est_step = EstimatorStep(name='train step',\n",
" estimator=est,\n",
" estimator_entry_script_arguments=[],\n",
" # parse prepared_fashion_ds into TabularDataset and use it as the input\n",
" inputs=[prepared_fashion_ds.parse_delimited_files()],\n",
" # parse prepared_fashion_ds into tabulardataset and use it as input\n",
" estimator_entry_script_arguments=[prepared_fashion_ds.read_delimited_files().as_input(name='prepared_fashion_ds')],\n",
" compute_target=compute_target)"
]
},
@@ -321,7 +307,7 @@
"### Build the pipeline\n",
"Once we have the steps (or steps collection), we can build the [pipeline](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py).\n",
"\n",
"A pipeline is created with a list of steps and a workspace. Submit a pipeline using [submit](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py#submit-config--tags-none----kwargs-). When submit is called, a [PipelineRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinerun?view=azure-ml-py) is created which in turn creates [StepRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.steprun?view=azure-ml-py) objects for each step in the workflow."
"A pipeline is created with a list of steps and a workspace. Submit a pipeline using `submit`. When submit is called, a [PipelineRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinerun?view=azure-ml-py) is created which in turn creates [StepRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.steprun?view=azure-ml-py) objects for each step in the workflow."
]
},
{
@@ -374,23 +360,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Azure Machine Learning dataset makes it easy to trace how your data is used in ML. [Learn More](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-version-track-datasets#track-datasets-in-experiments)<br>\n",
"For each Machine Learning experiment, you can easily trace the datasets used as the input through `Run` object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get input datasets\n",
"prep_step = run.find_step_run('prepare step')[0]\n",
"inputs = prep_step.get_details()['inputDatasets']\n",
"input_dataset = inputs[0]['dataset']\n",
"\n",
"# list the files referenced by input_dataset\n",
"input_dataset.to_path()"
"Azure Machine Learning dataset makes it easy to trace how your data is used in ML. [Learn More](https://docs.microsoft.com/azure/machine-learning/service/how-to-version-track-datasets#track-datasets-in-experiments)<br>"
]
},
{
@@ -406,11 +376,10 @@
"metadata": {},
"outputs": [],
"source": [
"fashion_ds = input_dataset.register(workspace = workspace,\n",
"fashion_ds = fashion_ds.register(workspace = workspace,\n",
" name = 'fashion_ds',\n",
" description = 'image and label files from fashion mnist',\n",
" create_new_version = True)\n",
"fashion_ds"
" create_new_version = True)"
]
},
{

View File

@@ -0,0 +1,320 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/scriptun-with-data-input-output.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to use ScriptRun with data input and output\n",
"\n",
"This notebook shows how to use [ScriptRun](https://docs.microsoft.com/python/api/azureml-core/azureml.core.script_run.scriptrun?view=azure-ml-py) with input and output. A run submitted with ScriptRunConfig represents a single trial in an experiment. Submitting the run returns a ScriptRun object, which can be used to monitor the asynchronous execution of the run, log metrics and store output of the run, and analyze results and access artifacts generated by the run.\n",
"\n",
"\n",
"## Prerequisite:\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](https://aka.ms/pl-config) to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize workspace\n",
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"amlcomp\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'mlc' of type `AmlCompute`."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use a simple script\n",
"We have already created a simple \"hello world\" script. This is the script that we will submit through the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py). It reads iris dataset as input, and write it out to `outputdataset` folder in default blob datastore. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"source_directory = 'script_run'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile $source_directory/dummy_train.py\n",
"\n",
"# Copyright (c) Microsoft Corporation. All rights reserved.\n",
"# Licensed under the MIT License.\n",
"import sys\n",
"import os\n",
"\n",
"print(\"*********************************************************\")\n",
"print(\"Hello Azure ML!\")\n",
"\n",
"mounted_input_path = sys.argv[1]\n",
"mounted_output_path = sys.argv[2]\n",
"\n",
"print(\"Argument 1: %s\" % mounted_input_path)\n",
"print(\"Argument 2: %s\" % mounted_output_path)\n",
" \n",
"with open(mounted_input_path, 'r') as f:\n",
" content = f.read()\n",
" with open(os.path.join(mounted_output_path, 'output.csv'), 'w') as fw:\n",
" fw.write(content)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create dataset from it. We will now upload the Iris data to the default datastore (blob) within your workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def_blob_store = ws.get_default_datastore()\n",
"def_blob_store.upload_files(files = ['iris.csv'],\n",
" target_path = 'script-run/',\n",
" overwrite = True,\n",
" show_progress = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are ready to define the input and output of your script. They can be passed in via `arguments`, which is a list of command-line arguments to pass to the training script specified in `script`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"from azureml.data import OutputFileDatasetConfig\n",
"\n",
"input_data = Dataset.File.from_files(def_blob_store.path('script-run/iris.csv')).as_named_input('input').as_mount()\n",
"\n",
"# output is configured to write the result back to def_blob_store, under\"may_sample/outputdataset\" folder\n",
"# learn more about options to configure the output, run 'help(OutputFileDatasetConfig)'\n",
"output = OutputFileDatasetConfig(destination=(def_blob_store, 'sample/outputdataset'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = Environment(\"myenv\")\n",
"\n",
"myenv.docker.enabled = True\n",
"myenv.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk>=1.12.0'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=source_directory, \n",
" script='dummy_train.py', \n",
" # to mount the dataset on the remote compute and pass the mounted path as an argument to the training script\n",
" arguments =[input_data, output])\n",
"\n",
"src.run_config.framework = 'python'\n",
"src.run_config.target = cpu_cluster.name\n",
"\n",
"# Set environment\n",
"src.run_config.environment = myenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build and Submit the Experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"exp = Experiment(ws, 'ScriptRun_sample')\n",
"run = exp.submit(config=src)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## View Run Details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
}
],
"metadata": {
"authors": [
{
"name": "sihhu"
}
],
"category": "tutorial",
"compute": [
"AML Compute"
],
"datasets": [
"Custom"
],
"deployment": [
"None"
],
"exclude_from_index": false,
"framework": [
"Azure ML"
],
"friendly_name": "How to use ScriptRun with data input and output",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"order_index": 7,
"star_tag": [
"None"
],
"tags": [
"Dataset",
"ScriptRun"
],
"task": "Demonstrates the use of Scriptrun with datasets"
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: how-to-use-scriptrun
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,19 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import sys
import os
print("*********************************************************")
print("Hello Azure ML!")
mounted_input_path = sys.argv[1]
mounted_output_path = sys.argv[2]
print("Argument 1: %s" % mounted_input_path)
print("Argument 2: %s" % mounted_output_path)
with open(mounted_input_path, 'r') as f:
content = f.read()
with open(os.path.join(mounted_output_path, 'output.csv'), 'w') as fw:
fw.write(content)

View File

@@ -84,14 +84,9 @@
"outputs": [],
"source": [
"# import packages\n",
"import os\n",
"\n",
"import pandas as pd\n",
"\n",
"from calendar import monthrange\n",
"from datetime import datetime, timedelta\n",
"\n",
"from azureml.core import Dataset, Datastore, Workspace, Run"
"from azureml.core import Dataset, Workspace"
]
},
{
@@ -174,7 +169,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Assign timestamp column for Tabular Dataset to activate Time Series related APIs. The column to be assigned should be a Date type, otherwise the assigning will fail."
"Assign \"datetime\" column as timestamp and \"partition_time\" from folder path as partition_timestamp for Tabular Dataset to activate Time Series related APIs. The column to be assigned should be a Date type, otherwise the assigning will fail."
]
},
{
@@ -183,8 +178,7 @@
"metadata": {},
"outputs": [],
"source": [
"# for this demo, leave out partition_time so timestamp is used\n",
"tsd = dataset.with_timestamp_columns(timestamp='datetime') # partition_timestamp='partition_time')"
"tsd = dataset.with_timestamp_columns(timestamp='datetime', partition_timestamp='partition_time')"
]
},
{
@@ -228,7 +222,9 @@
"source": [
"## Filter Data by Time Windows\n",
"\n",
"Once your data has been loaded into the notebook, you can query by time using the time_before(), time_after(), time_between(), and time_recent() functions. You can also choose to drop or keep certain columns. "
"Once your data has been loaded into the notebook, you can query by time using the time_before(), time_after(), time_between(), and time_recent() functions.The filter is optimized to only load those data files within the partition_timestamp range when partition_timestamp is specified.\n",
"\n",
"include_boundary is default to be true for all the time series related filters, please pass include_boundary=False to exclude boundary."
]
},
{
@@ -276,13 +272,6 @@
"You can chain time functions together."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE:** You must set the partition_timestamp to None to filter on the timestamp. The below cell will fail unless the second line is uncommented "
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -290,8 +279,7 @@
"outputs": [],
"source": [
"# select data that occurs within a given time range\n",
"#tsd = tsd.with_timestamp_columns(timestamp='datetime', partition_timestamp=None)\n",
"tsd2 = tsd.time_after(datetime(2019, 1, 2)).time_before(datetime(2019, 1, 10))\n",
"tsd2 = tsd.time_after(datetime(2019, 1, 1)).time_before(datetime(2019, 1, 10))\n",
"tsd2.to_pandas_dataframe().head(5)"
]
},
@@ -327,16 +315,6 @@
"This function takes in a datetime.timedelta and returns a dataset containing the data from datetime.now()-timedelta() to datetime.now()."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tsd2 = tsd.time_recent(timedelta(weeks=5, days=0))\n",
"tsd2.to_pandas_dataframe().head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -354,6 +332,15 @@
"tsd2.to_pandas_dataframe().tail(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Drop and keep columns\n",
"\n",
"You can also choose to drop or keep certain columns."
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -391,9 +378,11 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.exceptions import DatasetTimestampMissingError\n",
"\n",
"try:\n",
" tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)\n",
"except Exception as e:\n",
"except DatasetTimestampMissingError as e:\n",
" print('Expected exception : {}'.format(str(e)))"
]
},
@@ -462,7 +451,7 @@
"source": [
"try:\n",
" tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)\n",
"except Exception as e:\n",
"except DatasetTimestampMissingError as e:\n",
" print('Expected exception : {}'.format(str(e)))"
]
},
@@ -515,10 +504,11 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.exceptions import UserErrorException\n",
"# Illegal clearing, exception is expected.\n",
"try:\n",
" tsd2 = tsd.with_timestamp_columns(timestamp=None, partition_timestamp='partition_time')\n",
"except Exception as e:\n",
"except UserErrorException as e:\n",
" print('Cleaning not allowed because {}'.format(str(e)))\n",
"\n",
"# clear both\n",
@@ -575,7 +565,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.6.10"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"star_tag": [

View File

@@ -19,8 +19,6 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| [Forecasting BikeShare Demand](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb) | Forecasting | BikeShare | Remote | None | Azure ML AutoML | Forecasting |
| [Forecasting orange juice sales with deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb) | Forecasting | Orange Juice Sales | Remote | Azure Container Instance | Azure ML AutoML | None |
| [Register a model and deploy locally](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb) | Deployment | None | Local | Local | None | None |
| :star:[Data drift on aks](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb) | Filtering | NOAA | Remote | AKS | Azure ML | Dataset, Timeseries, Drift |
| [Train and deploy a model using Python SDK](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb) | Training and deploying a model from a notebook | Diabetes | Local | Azure Container Instance | None | None |
| :star:[Data drift quickdemo](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb) | Filtering | NOAA | Remote | None | Azure ML | Dataset, Timeseries, Drift |
| :star:[Introduction to labeled datasets](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.ipynb) | Train | | Remote | None | Azure ML | Dataset, label, Estimator |
| :star:[Datasets with ML Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb) | Train | Fashion MNIST | Remote | None | Azure ML | Dataset, Pipeline, Estimator, ScriptRun |
@@ -47,6 +45,9 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| :star:[How to use AutoMLStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb) | Demonstrates the use of AutoMLStep | Custom | AML Compute | None | Automated Machine Learning | None |
| :star:[Azure Machine Learning Pipelines with Data Dependency](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb) | Demonstrates how to construct a Pipeline with data dependency between steps | Custom | AML Compute | None | Azure ML | None |
| [How to use run a notebook as a step in AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.ipynb) | Demonstrates the use of NotebookRunnerStep | Custom | AML Compute | None | Azure ML | None |
| [Use MLflow with Azure Machine Learning to Train and Deploy Keras Image Classifier](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-and-deploy-keras-auto-logging/train-and-deploy-keras-auto-logging.ipynb) | Use MLflow with Azure Machine Learning to Train and Deploy Keras Image Classifier, leveraging MLflow auto logging | MNIST | Local, AML Compute | Azure Container Instance | Keras | mlflow, keras |
| [Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-and-deploy-pytorch/train-and-deploy-pytorch.ipynb) | Use MLflow with Azure Machine Learning to train and deploy PyTorch image classifier model | MNIST | Local, AML Compute | Azure Container Instance | PyTorch | mlflow, pytorch |
| [How to use ScriptRun with data input and output](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/scriptrun-with-data-input-output/how-to-use-scriptrun.ipynb) | Demonstrates the use of Scriptrun with datasets | Custom | AML Compute | None | Azure ML | Dataset, ScriptRun |
## Training
@@ -65,7 +66,6 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| [Resuming a model](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | Resume a model in TensorFlow from a previously submitted run | MNIST | AML Compute | None | TensorFlow | None |
| [Training in Spark](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb) | Submiting a run on a spark cluster | None | HDI cluster | None | PySpark | None |
| [Train on Azure Machine Learning Compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) | Submit a run on Azure Machine Learning Compute. | Diabetes | AML Compute | None | None | None |
| [Train on Azure Machine Learning Compute Instance](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-computeinstance/train-on-computeinstance.ipynb) | Submit a run on Azure Machine Learning Compute Instance. | Diabetes | Compute Instance | None | None | None |
| [Train on local compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-local/train-on-local.ipynb) | Train a model locally | Diabetes | Local | None | None | None |
| [Train in a remote Linux virtual machine](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) | Configure and execute a run | Diabetes | Data Science Virtual Machine | None | None | None |
| [Using Tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) | Export the run history as Tensorboard logs | None | None | None | TensorFlow | None |
@@ -100,6 +100,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| [upload-fairness-dashboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/fairness/upload-fairness-dashboard.ipynb) | | | | | | |
| [azure-ml-with-nvidia-rapids](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb) | | | | | | |
| [auto-ml-continuous-retraining](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb) | | | | | | |
| [auto-ml-regression-model-proxy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/experimental/regression/auto-ml-regression-model-proxy.ipynb) | | | | | | |
| [auto-ml-forecasting-beer-remote](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) | | | | | | |
| [auto-ml-forecasting-energy-demand](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb) | | | | | | |
| [auto-ml-regression](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb) | | | | | | |

View File

@@ -102,7 +102,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.11.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -382,7 +382,8 @@
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"msauthor": "trbye"
"msauthor": "trbye",
"network_required": false
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -689,7 +689,8 @@
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"msauthor": "roastala"
"msauthor": "roastala",
"network_required": false
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -567,7 +567,8 @@
"pygments_lexer": "ipython3",
"version": "3.7.6"
},
"msauthor": "sgilley"
"msauthor": "sgilley",
"network_required": false
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -610,7 +610,8 @@
"pygments_lexer": "ipython3",
"version": "3.6"
},
"msauthor": "vkanne"
"msauthor": "vkanne",
"network_required": false
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -64,10 +64,27 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()"
]
},
@@ -397,7 +414,26 @@
"from azureml.pipeline.core import Pipeline\n",
"\n",
"pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n",
"pipeline_run = Experiment(ws, \"batch_scoring\").submit(pipeline)\n",
"pipeline_run = Experiment(ws, \"batch_scoring\").submit(pipeline)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# This will output information of the pipeline run, including the link to the details page of portal.\n",
"pipeline_run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Wait the run for completion and show output log to console\n",
"pipeline_run.wait_for_completion(show_output=True)"
]
},
@@ -544,10 +580,18 @@
"outputs": [],
"source": [
"from azureml.pipeline.core.run import PipelineRun\n",
"from azureml.widgets import RunDetails\n",
"\n",
"published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n",
"RunDetails(published_pipeline_run).show()"
"published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Show detail information of the run\n",
"published_pipeline_run"
]
},
{

View File

@@ -98,7 +98,7 @@
"\n",
"for sample_month in range(12):\n",
" temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n",
" .to_pandas_dataframe()\n",
" .get_tabular_dataset().to_pandas_dataframe()\n",
" green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n",
" \n",
"green_taxi_df.head(10)"
@@ -649,7 +649,8 @@
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"msauthor": "trbye"
"msauthor": "trbye",
"network_required": false
},
"nbformat": 4,
"nbformat_minor": 2