Compare commits

...

7 Commits

Author SHA1 Message Date
Sheri Gilley
184680f1d2 Update img-classification-part1-training.ipynb
updated explanation of datastore
2019-08-20 17:52:45 -05:00
Shané Winner
474f58bd0b Merge pull request #540 from trevorbye/master
removing tutorials for single combined tutorial
2019-08-20 15:22:47 -07:00
Trevor Bye
22c8433897 removing tutorials for single combined tutorial 2019-08-20 12:09:21 -07:00
Josée Martens
822cdd0f01 Update issue templates 2019-08-20 08:35:00 -05:00
Josée Martens
6e65d42986 Update issue templates 2019-08-20 08:26:45 -05:00
Harneet Virk
4c0cbac834 Merge pull request #537 from Azure/release_update/Release-141
update samples from Release-141 as a part of 1.0.57 SDK release
2019-08-19 18:32:44 -07:00
vizhur
44a7481ed1 update samples from Release-141 as a part of 1.0.57 SDK release 2019-08-19 23:33:44 +00:00
164 changed files with 33084 additions and 1835 deletions

View File

@@ -1,30 +1,43 @@
--- ---
name: Notebook issue name: Notebook issue
about: Create a report to help us improve about: Describe your notebook issue
title: "[Notebook issue]" title: "[Notebook] DESCRIPTIVE TITLE"
labels: '' labels: notebook
assignees: '' assignees: ''
--- ---
**Describe the bug** ### DESCRIPTION: Describe clearly + concisely
A clear and concise description of what the bug is.
Provide the following if applicable:
+ Your Python & SDK version
+ Python Scripts or the full notebook name
+ Pipeline definition
+ Environment definition
+ Example data
+ Any log files.
+ Run and Workspace Id
**To Reproduce** .
Steps to reproduce the behavior: ### REPRODUCIBLE: Steps
1.
**Expected behavior**
A clear and concise description of what you expected to happen.
**Additional context** .
Add any other context about the problem here. ### EXPECTATION: Clear description
.
### CONFIG/ENVIRONMENT:
```Provide where applicable
## Your Python & SDK version:
## Environment definition:
## Notebook name or Python scripts:
## Run and Workspace Id:
## Pipeline definition:
## Example data:
## Any log files:
```

View File

@@ -103,7 +103,7 @@
"source": [ "source": [
"import azureml.core\n", "import azureml.core\n",
"\n", "\n",
"print(\"This notebook was created using version 1.0.55 of the Azure ML SDK\")\n", "print(\"This notebook was created using version 1.0.57 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
] ]
}, },

View File

@@ -8,7 +8,7 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration. * [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure. * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs. * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
* [logging-api](./training/logging-api): Learn about the details of logging metrics to run history. * [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management. * [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management.
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service. * [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
* [enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model. * [enable-data-collection-for-models-in-aks](./deployment/enable-data-collection-for-models-in-aks) Learn about data collection APIs for deployed model.

View File

@@ -155,11 +155,11 @@ jupyter notebook
- [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb) - [auto-ml-subsampling-local.ipynb](subsampling/auto-ml-subsampling-local.ipynb)
- How to enable subsampling - How to enable subsampling
- [auto-ml-dataprep.ipynb](dataprep/auto-ml-dataprep.ipynb) - [auto-ml-dataset.ipynb](dataprep/auto-ml-dataset.ipynb)
- Using DataPrep for reading data - Using Dataset for reading data
- [auto-ml-dataprep-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataprep-remote-execution.ipynb) - [auto-ml-dataset-remote-execution.ipynb](dataprep-remote-execution/auto-ml-dataset-remote-execution.ipynb)
- Using DataPrep for reading data with remote execution - Using Dataset for reading data with remote execution
- [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb) - [auto-ml-classification-with-whitelisting.ipynb](classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb)
- Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) - Dataset: scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits)
@@ -229,7 +229,7 @@ The main code of the file must be indented so that it is under this condition.
2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac. 2. Check that you have conda 64-bit installed rather than 32-bit. You can check this with the command `conda info`. The `platform` should be `win-64` for Windows or `osx-64` for Mac.
3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`. 3. Check that you have conda 4.4.10 or later. You can check the version with the command `conda -V`. If you have a previous version installed, you can update it using the command: `conda update conda`.
4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`. 4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`. 5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
## automl_setup_linux.sh fails ## automl_setup_linux.sh fails
If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory` If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
@@ -264,13 +264,13 @@ Some Windows environments see an error loading numpy with the latest Python vers
Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13 Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
You may check the version of tensorflow and uninstall as follows You may check the version of tensorflow and uninstall as follows
1) start a command shell, activate conda environment where automated ml packages are installed 1) start a command shell, activate conda environment where automated ml packages are installed
2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13 2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation. 3) If the listed version is a not a supported version, `pip uninstall tensorflow` in the command shell and enter y for confirmation.
## Remote run: DsvmCompute.create fails ## Remote run: DsvmCompute.create fails
There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are: There are several reasons why the DsvmCompute.create can fail. The reason is usually in the error message but you have to look at the end of the error message for the detailed reason. Some common reasons are:
1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name. 1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.` Note that underscore is not allowed in the name.
2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size. 2) `The requested VM size xxxxx is not available in the current region.` You can select a different region or vm_size.
## Remote run: Unable to establish SSH connection ## Remote run: Unable to establish SSH connection
Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are: Automated ML uses the SSH protocol to communicate with remote DSVMs. This defaults to port 22. Possible causes for this error are:
@@ -296,4 +296,4 @@ To resolve this issue, allocate a DSVM with more memory or reduce the value spec
## Remote run: Iterations show as "Not Responding" in the RunDetails widget. ## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM. This can be caused by too many concurrent iterations for a remote DSVM. Each concurrent iteration usually takes 100% of a core when it is running. Some iterations can use multiple cores. So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting. To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.

View File

@@ -13,10 +13,13 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<=0.23.4 - pandas>=0.22.0,<=0.23.4
- py-xgboost<=0.80 - py-xgboost<=0.80
- pyarrow>=0.11.0
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-sdk[automl,explain] - azureml-defaults
- azureml-train-automl
- azureml-widgets - azureml-widgets
- azureml-explain-model
- pandas_ml - pandas_ml

View File

@@ -14,10 +14,13 @@ dependencies:
- scikit-learn>=0.19.0,<=0.20.3 - scikit-learn>=0.19.0,<=0.20.3
- pandas>=0.22.0,<0.23.0 - pandas>=0.22.0,<0.23.0
- py-xgboost<=0.80 - py-xgboost<=0.80
- pyarrow>=0.11.0
- pip: - pip:
# Required packages for AzureML execution, history, and data preparation. # Required packages for AzureML execution, history, and data preparation.
- azureml-sdk[automl,explain] - azureml-defaults
- azureml-train-automl
- azureml-widgets - azureml-widgets
- azureml-explain-model
- pandas_ml - pandas_ml

View File

@@ -69,22 +69,17 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import json\n",
"import logging\n", "import logging\n",
"\n", "\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n", "import pandas as pd\n",
"import os\n", "import os\n",
"from sklearn import datasets\n",
"import azureml.dataprep as dprep\n",
"from sklearn.model_selection import train_test_split\n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.core.dataset import Dataset\n",
"from azureml.train.automl.run import AutoMLRun" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{ {
@@ -155,11 +150,12 @@
" # Create the cluster.\n", " # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n", " \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n", " \n",
" # For a more detailed view of current AmlCompute status, use get_status()." "# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -200,11 +196,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -224,11 +217,10 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n", "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n",
"dflow.get_profile()\n", "X_train = dataset.drop_columns(columns=['y'])\n",
"X_train = dflow.drop_columns(columns=['y'])\n", "y_train = dataset.keep_columns(columns=['y'], validate=True)\n",
"y_train = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n", "dataset.take(5).to_pandas_dataframe()"
"dflow.head()"
] ]
}, },
{ {
@@ -406,7 +398,7 @@
"def run(rawdata):\n", "def run(rawdata):\n",
" try:\n", " try:\n",
" data = json.loads(rawdata)['data']\n", " data = json.loads(rawdata)['data']\n",
" data = numpy.array(data)\n", " data = np.array(data)\n",
" result = model.predict(data)\n", " result = model.predict(data)\n",
" except Exception as e:\n", " except Exception as e:\n",
" result = str(e)\n", " result = str(e)\n",
@@ -443,7 +435,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))" " print('{}\\t{}'.format(p, dependencies[p]))"
] ]
}, },
@@ -453,10 +445,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-sdk[automl]'])\n", " pip_packages=['azureml-train-automl'])\n",
"\n", "\n",
"conda_env_file_name = 'myenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
@@ -476,7 +466,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",
@@ -618,8 +608,6 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# Load the bank marketing datasets.\n", "# Load the bank marketing datasets.\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.model_selection import train_test_split\n",
"from numpy import array" "from numpy import array"
] ]
}, },
@@ -630,11 +618,10 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n", "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n",
"dflow.get_profile()\n", "X_test = dataset.drop_columns(columns=['y'])\n",
"X_test = dflow.drop_columns(columns=['y'])\n", "y_test = dataset.keep_columns(columns=['y'], validate=True)\n",
"y_test = dflow.keep_columns(columns=['y'], validate_column_exists=True)\n", "dataset.take(5).to_pandas_dataframe()"
"dflow.head()"
] ]
}, },
{ {

View File

@@ -2,6 +2,8 @@ name: auto-ml-classification-bank-marketing
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -74,14 +74,12 @@
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"import pandas as pd\n", "import pandas as pd\n",
"import os\n", "import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.core.dataset import Dataset\n",
"from azureml.train.automl.run import AutoMLRun" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{ {
@@ -152,11 +150,12 @@
" # Create the cluster.\n", " # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n", " \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" \n", "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" # For a more detailed view of current AmlCompute status, use get_status()." "\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -197,11 +196,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -211,7 +207,7 @@
"source": [ "source": [
"### Load Data\n", "### Load Data\n",
"\n", "\n",
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model." "Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using random_split and return X_train and y_train for training the model."
] ]
}, },
{ {
@@ -221,10 +217,9 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n", "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n",
"dflow.get_profile()\n", "X = dataset.drop_columns(columns=['Class'])\n",
"X = dflow.drop_columns(columns=['Class'])\n", "y = dataset.keep_columns(columns=['Class'], validate=True)\n",
"y = dflow.keep_columns(columns=['Class'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n", "X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223)" "y_train, y_test = y.random_split(percentage=0.8, seed=223)"
] ]
@@ -447,7 +442,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))" " print('{}\\t{}'.format(p, dependencies[p]))"
] ]
}, },
@@ -458,7 +453,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-sdk[automl]'])\n", " pip_packages=['azureml-train-automl'])\n",
"\n", "\n",
"conda_env_file_name = 'myenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
@@ -478,7 +473,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",

View File

@@ -2,6 +2,8 @@ name: auto-ml-classification-credit-card-fraud
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -297,7 +297,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))" " print('{}\\t{}'.format(p, dependencies[p]))"
] ]
}, },
@@ -310,7 +310,7 @@
"from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
" pip_packages=['azureml-sdk[automl]'])\n", " pip_packages=['azureml-train-automl'])\n",
"\n", "\n",
"conda_env_file_name = 'myenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
@@ -330,7 +330,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",

View File

@@ -0,0 +1,509 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/dataprep-remote-execution/auto-ml-dataprep-remote-execution.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Load Data using `TabularDataset` for Remote Execution (AmlCompute)**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we showcase how you can use AzureML Dataset to load data for AutoML.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create a `TabularDataset` pointing to the training data.\n",
"2. Pass the `TabularDataset` to AutoML for a remote run."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataset-remote-bai'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataprep-remote-bai'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the data\n",
"\n",
"You can peek the result of a `TabularDataset` at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records, which makes it fast even against large datasets.\n",
"\n",
"`TabularDataset` objects are immutable and are composed of a list of subsetting transformations (optional)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dataset.keep_columns(columns=['Primary Type'], validate=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"This creates a general AutoML settings object applicable for both local and remote runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create or Attach an AmlCompute cluster"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlc2\"\n",
"\n",
"found = False\n",
"\n",
"# Check if this compute target already exists in the workspace.\n",
"\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
"\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n",
"print('Checking cluster status...')\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"# For a more detailed view of current AmlCompute status, use get_status()."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"import pkg_resources\n",
"\n",
"# create a new RunConfig object\n",
"conda_run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n",
"\n",
"cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `TabularDataset` Objects\n",
"\n",
"The `TabularDataset` objects captured above can also be passed to the `submit` method for a remote run. AutoML will serialize the `TabularDataset` object and send it to the remote compute target. The `TabularDataset` will not be evaluated locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" path = project_folder,\n",
" run_configuration=conda_run_config,\n",
" X = X,\n",
" y = y,\n",
" **automl_settings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pre-process cache cleanup\n",
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.clean_preprocessor_cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cancelling Runs\n",
"You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Cancel the ongoing experiment and stop scheduling new iterations.\n",
"# remote_run.cancel()\n",
"\n",
"# Cancel iteration 1 and move onto iteration 2.\n",
"# remote_run.cancel_iteration(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(remote_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(remote_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
" \n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = remote_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the first iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 0\n",
"best_run, fitted_model = remote_run.get_output(iteration = iteration)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"\n",
"df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df_test['Primary Type'])]\n",
"\n",
"y_test = df_test[['Primary Type']]\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model\n",
"We will use confusion matrix to see how our model works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pandas_ml import ConfusionMatrix\n",
"\n",
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
"\n",
"print(cm)\n",
"\n",
"cm.plot()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -1,10 +1,10 @@
name: regression-part2-automated-ml name: auto-ml-dataset-remote-execution
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- azureml-explain-model
- matplotlib - matplotlib
- pandas_ml - pandas_ml
- seaborn

View File

@@ -0,0 +1,402 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/dataprep/auto-ml-dataprep.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Automated Machine Learning\n",
"_**Load Data using `TabularDataset` for Local Execution**_\n",
"\n",
"## Contents\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Data](#Data)\n",
"1. [Train](#Train)\n",
"1. [Results](#Results)\n",
"1. [Test](#Test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"In this example we showcase how you can use AzureML Dataset to load data for AutoML.\n",
"\n",
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create a `TabularDataset` pointing to the training data.\n",
"2. Pass the `TabularDataset` to AutoML for a local run."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import logging\n",
"\n",
"import pandas as pd\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
" \n",
"# choose a name for experiment\n",
"experiment_name = 'automl-dataset-local'\n",
"# project folder\n",
"project_folder = './sample_projects/automl-dataset-local'\n",
" \n",
"experiment = Experiment(ws, experiment_name)\n",
" \n",
"output = {}\n",
"output['SDK version'] = azureml.core.VERSION\n",
"output['Subscription ID'] = ws.subscription_id\n",
"output['Workspace Name'] = ws.name\n",
"output['Resource Group'] = ws.resource_group\n",
"output['Location'] = ws.location\n",
"output['Project Directory'] = project_folder\n",
"output['Experiment Name'] = experiment.name\n",
"pd.set_option('display.max_colwidth', -1)\n",
"outputDf = pd.DataFrame(data = output, index = [''])\n",
"outputDf.T"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
"dataset.take(5).to_pandas_dataframe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the data\n",
"\n",
"You can peek the result of a `TabularDataset` at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records, which makes it fast even against large datasets.\n",
"\n",
"`TabularDataset` objects are immutable and are composed of a list of subsetting transformations (optional)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dataset.keep_columns(columns=['Primary Type'], validate=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train\n",
"\n",
"This creates a general AutoML settings object applicable for both local and remote runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pass Data with `TabularDataset` Objects\n",
"\n",
"The `TabularDataset` objects captured above can be passed to the `submit` method for a local run. AutoML will retrieve the results from the `TabularDataset` for model training."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl_config = AutoMLConfig(task = 'classification',\n",
" debug_log = 'automl_errors.log',\n",
" X = X,\n",
" y = y,\n",
" **automl_settings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output = True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Widget for Monitoring Runs\n",
"\n",
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
"\n",
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"RunDetails(local_run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve All Child Runs\n",
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"children = list(local_run.get_children())\n",
"metricslist = {}\n",
"for run in children:\n",
" properties = run.get_properties()\n",
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
" metricslist[int(properties['iteration'])] = metrics\n",
" \n",
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
"rundata"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the Best Model\n",
"\n",
"Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run, fitted_model = local_run.get_output()\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Best Model Based on Any Other Metric\n",
"Show the run and the model that has the smallest `log_loss` value:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookup_metric = \"log_loss\"\n",
"best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Model from a Specific Iteration\n",
"Show the run and the model from the first iteration:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"iteration = 0\n",
"best_run, fitted_model = local_run.get_output(iteration = iteration)\n",
"print(best_run)\n",
"print(fitted_model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test\n",
"\n",
"#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"\n",
"df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df_test['Primary Type'])]\n",
"\n",
"y_test = df_test[['Primary Type']]\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model\n",
"We will use confusion matrix to see how our model works."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pandas_ml import ConfusionMatrix\n",
"\n",
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
"\n",
"print(cm)\n",
"\n",
"cm.plot()"
]
}
],
"metadata": {
"authors": [
{
"name": "savitam"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: auto-ml-dataset
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib
- pandas_ml

View File

@@ -231,6 +231,7 @@
"automl_config = AutoMLConfig(task='forecasting',\n", "automl_config = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_nyc_energy_errors.log',\n", " debug_log='automl_nyc_energy_errors.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models = ['ExtremeRandomTrees'],\n",
" iterations=10,\n", " iterations=10,\n",
" iteration_timeout_minutes=5,\n", " iteration_timeout_minutes=5,\n",
" X=X_train,\n", " X=X_train,\n",
@@ -481,7 +482,7 @@
"automl_config_lags = AutoMLConfig(task='forecasting',\n", "automl_config_lags = AutoMLConfig(task='forecasting',\n",
" debug_log='automl_nyc_energy_errors.log',\n", " debug_log='automl_nyc_energy_errors.log',\n",
" primary_metric='normalized_root_mean_squared_error',\n", " primary_metric='normalized_root_mean_squared_error',\n",
" blacklist_models=['ElasticNet'],\n", " blacklist_models=['ElasticNet','ExtremeRandomTrees','GradientBoosting'],\n",
" iterations=10,\n", " iterations=10,\n",
" iteration_timeout_minutes=10,\n", " iteration_timeout_minutes=10,\n",
" X=X_train,\n", " X=X_train,\n",

View File

@@ -244,7 +244,8 @@
"|**X**|Training matrix of features as a pandas DataFrame, shape = [n_training_samples, n_features]|\n", "|**X**|Training matrix of features as a pandas DataFrame, shape = [n_training_samples, n_features]|\n",
"|**y**|Target values as a numpy.ndarray, shape = [n_training_samples, ]|\n", "|**y**|Target values as a numpy.ndarray, shape = [n_training_samples, ]|\n",
"|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n", "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
"|**enable_ensembling**|Allow AutoML to create ensembles of the best performing models\n", "|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models\n",
"|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models\n",
"|**debug_log**|Log file path for writing debugging information\n", "|**debug_log**|Log file path for writing debugging information\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n", "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
"|**time_column_name**|Name of the datetime column in the input data|\n", "|**time_column_name**|Name of the datetime column in the input data|\n",
@@ -273,7 +274,8 @@
" X=X_train,\n", " X=X_train,\n",
" y=y_train,\n", " y=y_train,\n",
" n_cross_validations=3,\n", " n_cross_validations=3,\n",
" enable_ensembling=False,\n", " enable_voting_ensemble=False,\n",
" enable_stack_ensemble=False,\n",
" path=project_folder,\n", " path=project_folder,\n",
" verbosity=logging.INFO,\n", " verbosity=logging.INFO,\n",
" **time_series_settings)" " **time_series_settings)"
@@ -663,10 +665,10 @@
"conda_env_file_name = 'fcast_env.yml'\n", "conda_env_file_name = 'fcast_env.yml'\n",
"\n", "\n",
"dependencies = ml_run.get_run_sdk_dependencies(iteration = best_iteration)\n", "dependencies = ml_run.get_run_sdk_dependencies(iteration = best_iteration)\n",
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))\n", " print('{}\\t{}'.format(p, dependencies[p]))\n",
"\n", "\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-train-automl'])\n",
"\n", "\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
] ]
@@ -688,7 +690,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",

View File

@@ -70,13 +70,12 @@
"import numpy as np\n", "import numpy as np\n",
"import pandas as pd\n", "import pandas as pd\n",
"import os\n", "import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
" \n", " \n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
@@ -147,11 +146,12 @@
" # Create the cluster.\n", " # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n", " \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n", " \n",
" # For a more detailed view of current AmlCompute status, use get_status()." "# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -192,11 +192,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -206,7 +203,7 @@
"source": [ "source": [
"### Load Data\n", "### Load Data\n",
"\n", "\n",
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model." "Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using random_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
] ]
}, },
{ {
@@ -216,13 +213,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n", "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\"\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n",
"dflow.get_profile()\n", "X = dataset.drop_columns(columns=['CONCRETE'])\n",
"X = dflow.drop_columns(columns=['CONCRETE'])\n", "y = dataset.keep_columns(columns=['CONCRETE'], validate=True)\n",
"y = dflow.keep_columns(columns=['CONCRETE'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n", "X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n", "y_train, y_test = y.random_split(percentage=0.8, seed=223) \n",
"dflow.head()" "dataset.take(5).to_pandas_dataframe()"
] ]
}, },
{ {
@@ -484,7 +480,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))" " print('{}\\t{}'.format(p, dependencies[p]))"
] ]
}, },
@@ -494,9 +490,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.conda_dependencies import CondaDependencies\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
"\n", "\n",
"conda_env_file_name = 'myenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
@@ -516,7 +510,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",

View File

@@ -2,6 +2,8 @@ name: auto-ml-regression-concrete-strength
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -70,13 +70,12 @@
"import numpy as np\n", "import numpy as np\n",
"import pandas as pd\n", "import pandas as pd\n",
"import os\n", "import os\n",
"from sklearn.model_selection import train_test_split\n",
"import azureml.dataprep as dprep\n",
" \n", " \n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
@@ -147,11 +146,12 @@
" # Create the cluster.\n", " # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n", " \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n", " \n",
" # For a more detailed view of current AmlCompute status, use get_status()." "# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -192,11 +192,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy', 'py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -206,7 +203,7 @@
"source": [ "source": [
"### Load Data\n", "### Load Data\n",
"\n", "\n",
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model." "Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using random_split and return X_train and y_train for training the model."
] ]
}, },
{ {
@@ -216,13 +213,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n", "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
"dflow = dprep.read_csv(data, infer_column_types=True)\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n",
"dflow.get_profile()\n", "X = dataset.drop_columns(columns=['ERP'])\n",
"X = dflow.drop_columns(columns=['ERP'])\n", "y = dataset.keep_columns(columns=['ERP'], validate=True)\n",
"y = dflow.keep_columns(columns=['ERP'], validate_column_exists=True)\n",
"X_train, X_test = X.random_split(percentage=0.8, seed=223)\n", "X_train, X_test = X.random_split(percentage=0.8, seed=223)\n",
"y_train, y_test = y.random_split(percentage=0.8, seed=223) \n", "y_train, y_test = y.random_split(percentage=0.8, seed=223)\n",
"dflow.head()" "dataset.take(5).to_pandas_dataframe()"
] ]
}, },
{ {
@@ -502,7 +498,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n", "for p in ['azureml-train-automl', 'azureml-core']:\n",
" print('{}\\t{}'.format(p, dependencies[p]))" " print('{}\\t{}'.format(p, dependencies[p]))"
] ]
}, },
@@ -512,7 +508,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n",
"\n", "\n",
"conda_env_file_name = 'myenv.yml'\n", "conda_env_file_name = 'myenv.yml'\n",
"myenv.save_to_file('.', conda_env_file_name)" "myenv.save_to_file('.', conda_env_file_name)"
@@ -532,7 +528,7 @@
" content = cefr.read()\n", " content = cefr.read()\n",
"\n", "\n",
"with open(conda_env_file_name, 'w') as cefw:\n", "with open(conda_env_file_name, 'w') as cefw:\n",
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n", " cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
"\n", "\n",
"# Substitute the actual model id in the script file.\n", "# Substitute the actual model id in the script file.\n",
"\n", "\n",

View File

@@ -2,6 +2,8 @@ name: auto-ml-regression-hardware-performance
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -73,10 +73,7 @@
"source": [ "source": [
"import logging\n", "import logging\n",
"import os\n", "import os\n",
"import csv\n",
"\n", "\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n", "import pandas as pd\n",
"from sklearn import datasets\n", "from sklearn import datasets\n",
"from sklearn.model_selection import train_test_split\n", "from sklearn.model_selection import train_test_split\n",
@@ -84,8 +81,8 @@
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{ {
@@ -137,7 +134,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n", "amlcompute_cluster_name = \"automlc2\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -156,11 +153,12 @@
" # Create the cluster.\\n\",\n", " # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n", "\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n", "\n",
" # For a more detailed view of current AmlCompute status, use get_status()." "# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -236,11 +234,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -248,9 +243,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Dprep reference\n", "### Creating a TabularDataset\n",
"\n", "\n",
"Defined X and y as dprep references, which are passed to automated machine learning in the AutoMLConfig." "Defined X and y as `TabularDataset`s, which are passed to automated machine learning in the AutoMLConfig."
] ]
}, },
{ {
@@ -259,8 +254,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X = dprep.read_csv(path=ds.path('irisdata/X_train.csv'), infer_column_types=True)\n", "X = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/X_train.csv'))\n",
"y = dprep.read_csv(path=ds.path('irisdata/y_train.csv'), infer_column_types=True)" "y = Dataset.Tabular.from_delimited_files(path=ds.path('irisdata/y_train.csv'))"
] ]
}, },
{ {
@@ -498,8 +493,7 @@
" res_path = 'onnx_resource.json'\n", " res_path = 'onnx_resource.json'\n",
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n", " run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
" with open(res_path) as f:\n", " with open(res_path) as f:\n",
" onnx_res = json.load(f)\n", " return json.load(f)\n",
" return onnx_res\n",
"\n", "\n",
"if onnxrt_present and python_version_compatible: \n", "if onnxrt_present and python_version_compatible: \n",
" mdl_bytes = onnx_mdl.SerializeToString()\n", " mdl_bytes = onnx_mdl.SerializeToString()\n",

View File

@@ -2,6 +2,8 @@ name: auto-ml-remote-amlcompute-with-onnx
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -74,7 +74,6 @@
"source": [ "source": [
"import logging\n", "import logging\n",
"import os\n", "import os\n",
"import csv\n",
"\n", "\n",
"from matplotlib import pyplot as plt\n", "from matplotlib import pyplot as plt\n",
"import numpy as np\n", "import numpy as np\n",
@@ -84,8 +83,8 @@
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n", "from azureml.core.workspace import Workspace\n",
"from azureml.train.automl import AutoMLConfig\n", "from azureml.core.dataset import Dataset\n",
"import azureml.dataprep as dprep" "from azureml.train.automl import AutoMLConfig"
] ]
}, },
{ {
@@ -137,7 +136,7 @@
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"\n", "\n",
"# Choose a name for your cluster.\n", "# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpu-cluster\"\n", "amlcompute_cluster_name = \"automlc2\"\n",
"\n", "\n",
"found = False\n", "found = False\n",
"# Check if this compute target already exists in the workspace.\n", "# Check if this compute target already exists in the workspace.\n",
@@ -156,11 +155,12 @@
" # Create the cluster.\\n\",\n", " # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
"\n", "\n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n", "print('Checking cluster status...')\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n", "# Can poll for a minimum number of nodes and for a specific timeout.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", "# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n", "\n",
" # For a more detailed view of current AmlCompute status, use get_status()." "# For a more detailed view of current AmlCompute status, use get_status()."
] ]
}, },
{ {
@@ -210,11 +210,8 @@
"# Set compute target to AmlCompute\n", "# Set compute target to AmlCompute\n",
"conda_run_config.target = compute_target\n", "conda_run_config.target = compute_target\n",
"conda_run_config.environment.docker.enabled = True\n", "conda_run_config.environment.docker.enabled = True\n",
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"\n", "\n",
"dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n",
"\n",
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]', dprep_dependency], conda_packages=['numpy','py-xgboost<=0.80'])\n",
"conda_run_config.environment.python.conda_dependencies = cd" "conda_run_config.environment.python.conda_dependencies = cd"
] ]
}, },
@@ -222,9 +219,9 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Dprep reference\n", "### Creating TabularDataset\n",
"\n", "\n",
"Defined X and y as dprep references, which are passed to automated machine learning in the AutoMLConfig." "Defined X and y as `TabularDataset`s, which are passed to Automated ML in the AutoMLConfig. `from_delimited_files` by default sets the `infer_column_types` to true, which will infer the columns type automatically. If you do wish to manually set the column types, you can set the `set_column_types` argument to manually set the type of each columns."
] ]
}, },
{ {
@@ -233,8 +230,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X = dprep.read_csv(path=ds.path('digitsdata/X_train.csv'), infer_column_types=True)\n", "X = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/X_train.csv'))\n",
"y = dprep.read_csv(path=ds.path('digitsdata/y_train.csv'), infer_column_types=True)" "y = Dataset.Tabular.from_delimited_files(path=ds.path('digitsdata/y_train.csv'))"
] ]
}, },
{ {

View File

@@ -2,6 +2,8 @@ name: auto-ml-remote-amlcompute
dependencies: dependencies:
- pip: - pip:
- azureml-sdk - azureml-sdk
- azureml-defaults
- azureml-explain-model
- azureml-train-automl - azureml-train-automl
- azureml-widgets - azureml-widgets
- matplotlib - matplotlib

View File

@@ -342,7 +342,6 @@
" n_cross_validations = n_cross_validations, \r\n", " n_cross_validations = n_cross_validations, \r\n",
" preprocess = preprocess,\r\n", " preprocess = preprocess,\r\n",
" verbosity = logging.INFO, \r\n", " verbosity = logging.INFO, \r\n",
" enable_ensembling = False,\r\n",
" X = X_train, \r\n", " X = X_train, \r\n",
" y = y_train, \r\n", " y = y_train, \r\n",
" path = project_folder,\r\n", " path = project_folder,\r\n",

View File

@@ -314,25 +314,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Load Training Data Using DataPrep" "## Load Training Data Using Dataset"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Automated ML takes a Dataflow as input.\n", "Automated ML takes a `TabularDataset` as input.\n",
"\n", "\n",
"If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n", "You are free to use the data preparation libraries/tools of your choice to do the require preparation and once you are done, you can write it to a datastore and create a TabularDataset from it.\n",
"```python\n",
"df = pd.read_csv(...)\n",
"# apply some transforms\n",
"dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n",
"```\n",
"\n", "\n",
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n", "You will get the datastore you registered previously and pass it to Dataset for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
"\n",
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
] ]
}, },
{ {
@@ -341,21 +334,21 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import azureml.dataprep as dprep\n", "from azureml.core.dataset import Dataset\n",
"from azureml.data.datapath import DataPath\n", "from azureml.data.datapath import DataPath\n",
"\n", "\n",
"datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n", "datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n",
"\n", "\n",
"X_train = dprep.read_csv(datastore.path('X.csv'))\n", "X_train = Dataset.Tabular.from_delimited_files(datastore.path('X.csv'))\n",
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))" "y_train = Dataset.Tabular.from_delimited_files(datastore.path('y.csv'))"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Review the Data Preparation Result\n", "## Review the TabularDataset\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets." "You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets."
] ]
}, },
{ {
@@ -364,7 +357,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_train.get_profile()" "X_train.take(5).to_pandas_dataframe()"
] ]
}, },
{ {
@@ -373,7 +366,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"y_train.get_profile()" "y_train.take(5).to_pandas_dataframe()"
] ]
}, },
{ {

View File

@@ -331,25 +331,18 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Load Training Data Using DataPrep" "## Load Training Data Using Dataset"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Automated ML takes a Dataflow as input.\n", "Automated ML takes a `TabularDataset` as input.\n",
"\n", "\n",
"If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n", "You are free to use the data preparation libraries/tools of your choice to do the require preparation and once you are done, you can write it to a datastore and create a TabularDataset from it.\n",
"```python\n",
"df = pd.read_csv(...)\n",
"# apply some transforms\n",
"dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n",
"```\n",
"\n", "\n",
"If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n", "You will get the datastore you registered previously and pass it to Dataset for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
"\n",
"You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. "
] ]
}, },
{ {
@@ -358,21 +351,21 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"import azureml.dataprep as dprep\n", "from azureml.core.dataset import Dataset\n",
"from azureml.data.datapath import DataPath\n", "from azureml.data.datapath import DataPath\n",
"\n", "\n",
"datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n", "datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n",
"\n", "\n",
"X_train = dprep.read_csv(datastore.path('X.csv'))\n", "X_train = Dataset.Tabular.from_delimited_files(datastore.path('X.csv'))\n",
"y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))" "y_train = Dataset.Tabular.from_delimited_files(datastore.path('y.csv'))"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Review the Data Preparation Result\n", "## Review the TabularDataset\n",
"You can peek the result of a Dataflow at any range using skip(i) and head(j). Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets." "You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets."
] ]
}, },
{ {
@@ -381,7 +374,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X_train.get_profile()" "X_train.take(5).to_pandas_dataframe()"
] ]
}, },
{ {
@@ -390,7 +383,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"y_train.get_profile()" "y_train.take(5).to_pandas_dataframe()"
] ]
}, },
{ {

View File

@@ -115,6 +115,36 @@
" workspace=ws)" " workspace=ws)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Environment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n",
"\n",
"More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"\n",
"env = Environment.from_conda_specification(name='deploytocloudenv', file_path='myenv.yml')\n",
"\n",
"# This is optional at this point\n",
"# env.register(workspace=ws)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -153,10 +183,7 @@
"source": [ "source": [
"from azureml.core.model import InferenceConfig\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"inference_config = InferenceConfig(runtime= \"python\", \n", "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)"
" entry_script=\"score.py\",\n",
" conda_file=\"myenv.yml\", \n",
" extra_docker_file_steps=\"helloworld.txt\")"
] ]
}, },
{ {

View File

@@ -336,7 +336,7 @@
" num_replicas=1,\n", " num_replicas=1,\n",
" auth_enabled = False)\n", " auth_enabled = False)\n",
"\n", "\n",
"aks_service_name ='my-aks-service'\n", "aks_service_name ='my-aks-service-3'\n",
"\n", "\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n", "aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n", " name = aks_service_name,\n",

View File

@@ -404,7 +404,7 @@
" num_replicas=1,\n", " num_replicas=1,\n",
" auth_enabled = False)\n", " auth_enabled = False)\n",
"\n", "\n",
"aks_service_name ='my-aks-service'\n", "aks_service_name ='my-aks-service-1'\n",
"\n", "\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n", "aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n", " name = aks_service_name,\n",

View File

@@ -694,7 +694,7 @@
" num_replicas=1,\n", " num_replicas=1,\n",
" auth_enabled = False)\n", " auth_enabled = False)\n",
"\n", "\n",
"aks_service_name ='my-aks-service'\n", "aks_service_name ='my-aks-service-2'\n",
"\n", "\n",
"aks_service = Webservice.deploy_from_image(workspace = ws,\n", "aks_service = Webservice.deploy_from_image(workspace = ws,\n",
" name = aks_service_name,\n", " name = aks_service_name,\n",

View File

@@ -22,7 +22,7 @@
"If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n", "If you want to log custom traces, you will follow the standard deplyment process for AKS and you will:\n",
"1. Update scoring file.\n", "1. Update scoring file.\n",
"2. Update aks configuration.\n", "2. Update aks configuration.\n",
"3. Build new image and deploy it. " "3. Deploy the model with this new configuration. "
] ]
}, },
{ {
@@ -178,7 +178,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"## 6. Create your new Image" "## 6. Create Inference Configuration"
] ]
}, },
{ {
@@ -187,22 +187,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " conda_file=\"myenv.yml\")"
" description = \"Image with ridge regression model\",\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"myimage1\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
@@ -220,7 +209,7 @@
"source": [ "source": [
"from azureml.core.webservice import AciWebservice\n", "from azureml.core.webservice import AciWebservice\n",
"\n", "\n",
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", "aci_deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
" memory_gb = 1, \n", " memory_gb = 1, \n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n",
" description = 'Predict diabetes using regression model',\n", " description = 'Predict diabetes using regression model',\n",
@@ -236,11 +225,7 @@
"from azureml.core.webservice import Webservice\n", "from azureml.core.webservice import Webservice\n",
"\n", "\n",
"aci_service_name = 'my-aci-service-4'\n", "aci_service_name = 'my-aci-service-4'\n",
"print(aci_service_name)\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aci_deployment_config)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]
@@ -361,7 +346,7 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"#Set the web service configuration\n", "#Set the web service configuration\n",
"aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)" "aks_deployment_config = AksWebservice.deploy_configuration(enable_app_insights=True)"
] ]
}, },
{ {
@@ -379,12 +364,12 @@
"source": [ "source": [
"if aks_target.provisioning_state== \"Succeeded\": \n", "if aks_target.provisioning_state== \"Succeeded\": \n",
" aks_service_name ='aks-w-dc5'\n", " aks_service_name ='aks-w-dc5'\n",
" aks_service = Webservice.deploy_from_image(workspace = ws, \n", " aks_service = Model.deploy(ws,\n",
" name = aks_service_name,\n", " aks_service_name, \n",
" image = image,\n", " [model], \n",
" deployment_config = aks_config,\n", " inference_config, \n",
" deployment_target = aks_target\n", " aks_deployment_config, \n",
" )\n", " deployment_target = aks_target) \n",
" aks_service.wait_for_deployment(show_output = True)\n", " aks_service.wait_for_deployment(show_output = True)\n",
" print(aks_service.state)\n", " print(aks_service.state)\n",
"else:\n", "else:\n",
@@ -464,7 +449,6 @@
"%%time\n", "%%time\n",
"aks_service.delete()\n", "aks_service.delete()\n",
"aci_service.delete()\n", "aci_service.delete()\n",
"image.delete()\n",
"model.delete()" "model.delete()"
] ]
} }

View File

@@ -243,7 +243,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create container image\n", "### Setting up inference configuration\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container." "First we create a YAML file that specifies which dependencies we would like to see in our container."
] ]
}, },
@@ -265,7 +265,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Then we have Azure ML create the container. This step will likely take a few minutes." "Then we create the inference configuration."
] ]
}, },
{ {
@@ -274,48 +274,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " conda_file=\"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n", " extra_docker_file_steps = \"Dockerfile\")"
" description = \"TinyYOLO ONNX Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxyolo\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In case you need to debug your code, the next line of code accesses the log file." "### Deploy the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
] ]
}, },
{ {
@@ -336,7 +307,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The following cell will likely take a few minutes to run as well." "The following cell will take a few minutes to run as the model gets packaged up and deployed to ACI."
] ]
}, },
{ {
@@ -348,14 +319,9 @@
"from azureml.core.webservice import Webservice\n", "from azureml.core.webservice import Webservice\n",
"from random import randint\n", "from random import randint\n",
"\n", "\n",
"aci_service_name = 'onnx-tinyyolo'+str(randint(0,100))\n", "aci_service_name = 'my-aci-service-15ad'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]

View File

@@ -54,7 +54,7 @@
"\n", "\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n", "\n",
"In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)." "In the following lines of code, we download [the trained ONNX Emotion FER+ model and corresponding test data](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) and place them in the same folder as this tutorial notebook. For more information about the FER+ dataset, please visit Microsoft Researcher Emad Barsoum's [FER+ source data repository](https://github.com/ebarsoum/FERPlus)."
] ]
}, },
{ {
@@ -176,7 +176,7 @@
"source": [ "source": [
"### ONNX FER+ Model Methodology\n", "### ONNX FER+ Model Methodology\n",
"\n", "\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/emotion_ferplus) in the ONNX model zoo.\n", "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the well-known FER+ data set, provided as part of the [trained Emotion Recognition model](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus) in the ONNX model zoo.\n",
"\n", "\n",
"The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n", "The original Facial Emotion Recognition (FER) Dataset was released in 2013 by Pierre-Luc Carrier and Aaron Courville as part of a [Kaggle Competition](https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data), but some of the labels are not entirely appropriate for the expression. In the FER+ Dataset, each photo was evaluated by at least 10 croud sourced reviewers, creating a more accurate basis for ground truth. \n",
"\n", "\n",
@@ -341,9 +341,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create the Container Image\n", "### Setup inference configuration"
"\n",
"This step will likely take a few minutes."
] ]
}, },
{ {
@@ -352,48 +350,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " conda_file=\"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n", " extra_docker_file_steps = \"Dockerfile\")"
" description = \"Emotion ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"})\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In case you need to debug your code, the next line of code accesses the log file." "### Deploy the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"### Deploy the container image"
] ]
}, },
{ {
@@ -410,6 +379,13 @@
" description = 'ONNX for emotion recognition model')" " description = 'ONNX for emotion recognition model')"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -420,23 +396,11 @@
"\n", "\n",
"aci_service_name = 'onnx-demo-emotion'\n", "aci_service_name = 'onnx-demo-emotion'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following cell will likely take a few minutes to run as well."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -470,7 +434,7 @@
"\n", "\n",
"### Useful Helper Functions\n", "### Useful Helper Functions\n",
"\n", "\n",
"We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/emotion_ferplus)." "We preprocess and postprocess our data (see score.py file) using the helper functions specified in the [ONNX FER+ Model page in the Model Zoo repository](https://github.com/onnx/models/tree/master/vision/body_analysis/emotion_ferplus)."
] ]
}, },
{ {

View File

@@ -54,7 +54,7 @@
"\n", "\n",
"### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n", "### 3. Download sample data and pre-trained ONNX model from ONNX Model Zoo.\n",
"\n", "\n",
"In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)." "In the following lines of code, we download [the trained ONNX MNIST model and corresponding test data](https://github.com/onnx/models/tree/master/vision/classification/mnist) and place them in the same folder as this tutorial notebook. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/)."
] ]
}, },
{ {
@@ -187,7 +187,7 @@
"source": [ "source": [
"### ONNX MNIST Model Methodology\n", "### ONNX MNIST Model Methodology\n",
"\n", "\n",
"The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/mnist) in the ONNX model zoo.\n", "The image classification model we are using is pre-trained using Microsoft's deep learning cognitive toolkit, [CNTK](https://github.com/Microsoft/CNTK), from the [ONNX model zoo](http://github.com/onnx/models). The model zoo has many other models that can be deployed on cloud providers like AzureML without any additional training. To ensure that our cloud deployed model works, we use testing data from the famous MNIST data set, provided as part of the [trained MNIST model](https://github.com/onnx/models/tree/master/vision/classification/mnist) in the ONNX model zoo.\n",
"\n", "\n",
"***Input: Handwritten Images from MNIST Dataset***\n", "***Input: Handwritten Images from MNIST Dataset***\n",
"\n", "\n",
@@ -325,8 +325,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create the Container Image\n", "### Create Inference Configuration"
"This step will likely take a few minutes."
] ]
}, },
{ {
@@ -335,48 +334,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " extra_docker_file_steps = \"Dockerfile\",\n",
" docker_file = \"Dockerfile\",\n", " conda_file=\"myenv.yml\")"
" description = \"MNIST ONNX Runtime container\",\n",
" tags = {\"demo\": \"onnx\"}) \n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnximage\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In case you need to debug your code, the next line of code accesses the log file." "### Deploy the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all done specifying what we want our virtual machine to do. Let's configure and deploy our container image.\n",
"\n",
"### Deploy the container image"
] ]
}, },
{ {
@@ -397,7 +367,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"The following cell will likely take a few minutes to run as well." "The following cell will likely take a few minutes to run."
] ]
}, },
{ {
@@ -410,12 +380,7 @@
"\n", "\n",
"aci_service_name = 'onnx-demo-mnist'\n", "aci_service_name = 'onnx-demo-mnist'\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]

View File

@@ -28,7 +28,7 @@
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n", "\n",
"## ResNet50 Details\n", "## ResNet50 Details\n",
"ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/models/image_classification/resnet). " "ResNet classifies the major object in an input image into a set of 1000 pre-defined classes. For more information about the ResNet50 model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/vision/classification/resnet). "
] ]
}, },
{ {
@@ -221,7 +221,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create container image" "### Create inference configuration"
] ]
}, },
{ {
@@ -249,7 +249,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Then we have Azure ML create the container. This step will likely take a few minutes." "Create the inference configuration object"
] ]
}, },
{ {
@@ -258,48 +258,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " conda_file=\"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n", " extra_docker_file_steps = \"Dockerfile\")"
" description = \"ONNX ResNet50 Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxresnet50v2\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In case you need to debug your code, the next line of code accesses the log file." "### Deploy the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
] ]
}, },
{ {
@@ -334,12 +305,7 @@
"\n", "\n",
"aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n", "aci_service_name = 'onnx-demo-resnet50'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]

View File

@@ -28,7 +28,7 @@
"ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n", "ONNX is an open format for representing machine learning and deep learning models. ONNX enables open and interoperable AI by enabling data scientists and developers to use the tools of their choice without worrying about lock-in and flexibility to deploy to a variety of platforms. ONNX is developed and supported by a community of partners including Microsoft, Facebook, and Amazon. For more information, explore the [ONNX website](http://onnx.ai).\n",
"\n", "\n",
"## MNIST Details\n", "## MNIST Details\n",
"The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/mnist). " "The Modified National Institute of Standards and Technology (MNIST) dataset consists of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing numbers from 0 to 9. For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/). For more information about the MNIST model and how it was created can be found on the [ONNX Model Zoo github](https://github.com/onnx/models/tree/master/vision/classification/mnist). "
] ]
}, },
{ {
@@ -401,7 +401,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Create container image\n", "### Create inference configuration\n",
"First we create a YAML file that specifies which dependencies we would like to see in our container." "First we create a YAML file that specifies which dependencies we would like to see in our container."
] ]
}, },
@@ -423,7 +423,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Then we have Azure ML create the container. This step will likely take a few minutes." "Then we setup the inference configuration "
] ]
}, },
{ {
@@ -432,48 +432,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.image import ContainerImage\n", "from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", "inference_config = InferenceConfig(runtime= \"python\", \n",
" runtime = \"python\",\n", " entry_script=\"score.py\",\n",
" conda_file = \"myenv.yml\",\n", " conda_file=\"myenv.yml\",\n",
" docker_file = \"Dockerfile\",\n", " extra_docker_file_steps = \"Dockerfile\")"
" description = \"MNIST ONNX Demo\",\n",
" tags = {\"demo\": \"onnx\"}\n",
" )\n",
"\n",
"\n",
"image = ContainerImage.create(name = \"onnxmnistdemo\",\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"In case you need to debug your code, the next line of code accesses the log file." "### Deploy the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(image.image_build_log_uri)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We're all set! Let's get our model chugging.\n",
"\n",
"### Deploy the container image"
] ]
}, },
{ {
@@ -504,16 +475,12 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.webservice import Webservice\n", "from azureml.core.webservice import Webservice\n",
"from azureml.core.model import Model\n",
"from random import randint\n", "from random import randint\n",
"\n", "\n",
"aci_service_name = 'onnx-demo-mnist'+str(randint(0,100))\n", "aci_service_name = 'onnx-demo-mnist'+str(randint(0,100))\n",
"print(\"Service\", aci_service_name)\n", "print(\"Service\", aci_service_name)\n",
"\n", "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
" image = image,\n",
" name = aci_service_name,\n",
" workspace = ws)\n",
"\n",
"aci_service.wait_for_deployment(True)\n", "aci_service.wait_for_deployment(True)\n",
"print(aci_service.state)" "print(aci_service.state)"
] ]

View File

@@ -34,7 +34,6 @@
"from azureml.core import Workspace\n", "from azureml.core import Workspace\n",
"from azureml.core.compute import AksCompute, ComputeTarget\n", "from azureml.core.compute import AksCompute, ComputeTarget\n",
"from azureml.core.webservice import Webservice, AksWebservice\n", "from azureml.core.webservice import Webservice, AksWebservice\n",
"from azureml.core.image import Image\n",
"from azureml.core.model import Model" "from azureml.core.model import Model"
] ]
}, },
@@ -97,8 +96,51 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"# Create an image\n", "# Create the Environment\n",
"Create an image using the registered model the script that will load and run the model." "Create an environment that the model will be deployed with"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"conda_deps = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults'])\n",
"myenv = Environment(name='myenv')\n",
"myenv.python.conda_dependencies = conda_deps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use a custom Docker image\n",
"\n",
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n",
"\n",
"Only supported with `python` runtime.\n",
"```python\n",
"# use an image available in public Container Registry without authentication\n",
"myenv.docker.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
"\n",
"# or, use an image available in a private Container Registry\n",
"myenv.docker.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
"myenv.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"myenv.docker.base_image_registry.username = \"username\"\n",
"myenv.docker.base_image_registry.password = \"password\"\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Write the Entry Script\n",
"Write the script that will be used to predict on your model"
] ]
}, },
{ {
@@ -136,67 +178,23 @@
" return error" " return error"
] ]
}, },
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.image import ContainerImage\n",
"\n",
"image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n",
" runtime = \"python\",\n",
" conda_file = \"myenv.yml\",\n",
" description = \"Image with ridge regression model\",\n",
" tags = {'area': \"diabetes\", 'type': \"regression\"}\n",
" )\n",
"\n",
"image = ContainerImage.create(name = \"myimage1\",\n",
" # this is the model object\n",
" models = [model],\n",
" image_config = image_config,\n",
" workspace = ws)\n",
"\n",
"image.wait_for_creation(show_output = True)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"#### Use a custom Docker image\n", "# Create the InferenceConfig\n",
"Create the inference config that will be used when deploying the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.model import InferenceConfig\n",
"\n", "\n",
"You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", "inf_config = InferenceConfig(entry_script='score.py', environment=myenv)"
"\n",
"Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n",
"```python\n",
"# use an image available in public Container Registry without authentication\n",
"image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n",
"\n",
"# or, use an image available in a private Container Registry\n",
"image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n",
"image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"image_config.base_image_registry.username = \"username\"\n",
"image_config.base_image_registry.password = \"password\"\n",
"\n",
"# or, use an image built during training.\n",
"image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n",
"```\n",
"You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n"
] ]
}, },
{ {
@@ -237,23 +235,21 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"'''\n", "# from azureml.core.compute import ComputeTarget, AksCompute\n",
"from azureml.core.compute import ComputeTarget, AksCompute\n",
"\n", "\n",
"# Create the compute configuration and set virtual network information\n", "# # Create the compute configuration and set virtual network information\n",
"config = AksCompute.provisioning_configuration(location=\"eastus2\")\n", "# config = AksCompute.provisioning_configuration(location=\"eastus2\")\n",
"config.vnet_resourcegroup_name = \"mygroup\"\n", "# config.vnet_resourcegroup_name = \"mygroup\"\n",
"config.vnet_name = \"mynetwork\"\n", "# config.vnet_name = \"mynetwork\"\n",
"config.subnet_name = \"default\"\n", "# config.subnet_name = \"default\"\n",
"config.service_cidr = \"10.0.0.0/16\"\n", "# config.service_cidr = \"10.0.0.0/16\"\n",
"config.dns_service_ip = \"10.0.0.10\"\n", "# config.dns_service_ip = \"10.0.0.10\"\n",
"config.docker_bridge_cidr = \"172.17.0.1/16\"\n", "# config.docker_bridge_cidr = \"172.17.0.1/16\"\n",
"\n", "\n",
"# Create the compute target\n", "# # Create the compute target\n",
"aks_target = ComputeTarget.create(workspace = ws,\n", "# aks_target = ComputeTarget.create(workspace = ws,\n",
" name = \"myaks\",\n", "# name = \"myaks\",\n",
" provisioning_configuration = config)\n", "# provisioning_configuration = config)"
"'''"
] ]
}, },
{ {
@@ -300,17 +296,15 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"'''\n", "# # Use the default configuration (can also provide parameters to customize)\n",
"# Use the default configuration (can also provide parameters to customize)\n", "# resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
"resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n",
"\n", "\n",
"create_name='my-existing-aks' \n", "# create_name='my-existing-aks' \n",
"# Create the cluster\n", "# # Create the cluster\n",
"attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", "# attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n",
"aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n", "# aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n",
"# Wait for the operation to complete\n", "# # Wait for the operation to complete\n",
"aks_target.wait_for_completion(True)\n", "# aks_target.wait_for_completion(True)"
"'''"
] ]
}, },
{ {
@@ -326,8 +320,11 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#Set the web service configuration (using default here)\n", "# Set the web service configuration (using default here)\n",
"aks_config = AksWebservice.deploy_configuration()" "aks_config = AksWebservice.deploy_configuration()\n",
"\n",
"# # Enable token auth and disable (key) auth on the webservice\n",
"# aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)\n"
] ]
}, },
{ {
@@ -339,11 +336,13 @@
"%%time\n", "%%time\n",
"aks_service_name ='aks-service-1'\n", "aks_service_name ='aks-service-1'\n",
"\n", "\n",
"aks_service = Webservice.deploy_from_image(workspace = ws, \n", "aks_service = Model.deploy(workspace=ws,\n",
" name = aks_service_name,\n", " name=aks_service_name,\n",
" image = image,\n", " models=[model],\n",
" deployment_config = aks_config,\n", " inference_config=inf_config,\n",
" deployment_target = aks_target)\n", " deployment_config=aks_config,\n",
" deployment_target=aks_target)\n",
"\n",
"aks_service.wait_for_deployment(show_output = True)\n", "aks_service.wait_for_deployment(show_output = True)\n",
"print(aks_service.state)" "print(aks_service.state)"
] ]
@@ -390,11 +389,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# retreive the API keys. AML generates two keys.\n", "# # if (key) auth is enabled, retrieve the API keys. AML generates two keys.\n",
"'''\n", "# key1, Key2 = aks_service.get_keys()\n",
"key1, Key2 = aks_service.get_keys()\n", "# print(key1)\n",
"print(key1)\n", "\n",
"'''" "# # if token auth is enabled, retrieve the token.\n",
"# access_token, refresh_after = aks_service.get_token()"
] ]
}, },
{ {
@@ -404,27 +404,28 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"# construct raw HTTP request and send to the service\n", "# construct raw HTTP request and send to the service\n",
"'''\n", "# %%time\n",
"%%time\n",
"\n", "\n",
"import requests\n", "# import requests\n",
"\n", "\n",
"import json\n", "# import json\n",
"\n", "\n",
"test_sample = json.dumps({'data': [\n", "# test_sample = json.dumps({'data': [\n",
" [1,2,3,4,5,6,7,8,9,10], \n", "# [1,2,3,4,5,6,7,8,9,10], \n",
" [10,9,8,7,6,5,4,3,2,1]\n", "# [10,9,8,7,6,5,4,3,2,1]\n",
"]})\n", "# ]})\n",
"test_sample = bytes(test_sample,encoding = 'utf8')\n", "# test_sample = bytes(test_sample,encoding = 'utf8')\n",
"\n", "\n",
"# Don't forget to add key to the HTTP header.\n", "# # If (key) auth is enabled, don't forget to add key to the HTTP header.\n",
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", "# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
"\n", "\n",
"resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n", "# # If token auth is enabled, don't forget to add token to the HTTP header.\n",
"# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + access_token}\n",
"\n",
"# resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)\n",
"\n", "\n",
"\n", "\n",
"print(\"prediction:\", resp.text)\n", "# print(\"prediction:\", resp.text)"
"'''"
] ]
}, },
{ {
@@ -443,7 +444,6 @@
"source": [ "source": [
"%%time\n", "%%time\n",
"aks_service.delete()\n", "aks_service.delete()\n",
"image.delete()\n",
"model.delete()" "model.delete()"
] ]
} }

View File

@@ -0,0 +1,748 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Train and explain models remotely via Azure Machine Learning Compute\n",
"\n",
"\n",
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a regression model remotely on an Azure Machine Leanrning Compute Target (AMLCompute).**_\n",
"\n",
"\n",
"\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
" 1. Initialize a Workspace\n",
" 1. Create an Experiment\n",
" 1. Introduction to AmlCompute\n",
" 1. Submit an AmlCompute run in a few different ways\n",
" 1. Option 1: Provision as a run based compute target \n",
" 1. Option 2: Provision as a persistent compute target (Basic)\n",
" 1. Option 3: Provision as a persistent compute target (Advanced)\n",
"1. Additional operations to perform on AmlCompute\n",
"1. [Download model explanations from Azure Machine Learning Run History](#Download)\n",
"1. [Visualize explanations](#Visualize)\n",
"1. [Next steps](#Next)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"This notebook showcases how to train and explain a regression model remotely via Azure Machine Learning Compute (AMLCompute), and download the calculated explanations locally for visualization.\n",
"It demonstrates the API calls that you need to make to submit a run for training and explaining a model to AMLCompute, download the compute explanations remotely, and visualizing the global and local explanations via a visualization dashboard that provides an interactive way of discovering patterns in model predictions and downloaded explanations.\n",
"\n",
"We will showcase one of the tabular data explainers: TabularExplainer (SHAP).\n",
"\n",
"Problem: Boston Housing Price Prediction with scikit-learn (train a model and run an explainer remotely via AMLCompute, and download and visualize the remotely-calculated explanations.)\n",
"\n",
"| ![explanations-run-history](./img/explanations-run-history.PNG) |\n",
"|:--:|\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't.\n",
"\n",
"\n",
"You will need to have extensions enabled prior to jupyter kernel starting to see the visualization dashboard.\n",
"```\n",
"(myenv) $ jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"(myenv) $ jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"```\n",
"Or\n",
"\n",
"```\n",
"(myenv) $ jupyter nbextension install azureml.contrib.explain.model.visualize --user --py\n",
"(myenv) $ jupyter nbextension enable azureml.contrib.explain.model.visualize --user --py\n",
"```\n",
"\n",
"If you are using Jupyter Labs run the following commands instead:\n",
"```\n",
"(myenv) $ jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"(myenv) $ jupyter labextension install microsoft-mli-widget\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize a Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create An Experiment\n",
"\n",
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"experiment_name = 'explainer-remote-run-on-amlcompute'\n",
"experiment = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to AmlCompute\n",
"\n",
"Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. \n",
"\n",
"Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. \n",
"\n",
"For more information on Azure Machine Learning Compute, please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)\n",
"\n",
"If you are an existing BatchAI customer who is migrating to Azure Machine Learning, please read [this article](https://aka.ms/batchai-retirement)\n",
"\n",
"**Note**: As with other Azure services, there are limits on certain resources (for eg. AmlCompute quota) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
"\n",
"\n",
"The training script `train_explain.py` is already created for you. Let's have a look."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit an AmlCompute run in a few different ways\n",
"\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n",
"\n",
"You can also pass a different region to check availability and then re-create your workspace in that region through the [configuration notebook](../../../configuration.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"AmlCompute.supported_vmsizes(workspace=ws)\n",
"# AmlCompute.supported_vmsizes(workspace=ws, location='southcentralus')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create project directory\n",
"\n",
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import shutil\n",
"\n",
"project_folder = './explainer-remote-run-on-amlcompute'\n",
"os.makedirs(project_folder, exist_ok=True)\n",
"shutil.copy('train_explain.py', project_folder)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 1: Provision as a run based compute target\n",
"\n",
"You can provision AmlCompute as a compute target at run-time. In this case, the compute is auto-created for your run, scales up to max_nodes that you specify, and then **deleted automatically** after the run completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"from azureml.core.runconfig import DEFAULT_CPU_IMAGE\n",
"\n",
"# create a new runconfig object\n",
"run_config = RunConfiguration()\n",
"\n",
"# signal that you want to use AmlCompute to execute script.\n",
"run_config.target = \"amlcompute\"\n",
"\n",
"# AmlCompute will be created in the same region as workspace\n",
"# Set vm size for AmlCompute\n",
"run_config.amlcompute.vm_size = 'STANDARD_D2_V2'\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"# set Docker base image to the default CPU-based image\n",
"run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE\n",
"\n",
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'sklearn-pandas', 'azureml-dataprep'\n",
"]\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"# Now submit a run on AmlCompute\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"script_run_config = ScriptRunConfig(source_directory=project_folder,\n",
" script='train_explain.py',\n",
" run_config=run_config)\n",
"\n",
"run = experiment.submit(script_run_config)\n",
"\n",
"# Show run details\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 2: Provision as a persistent compute target (Basic)\n",
"\n",
"You can provision a persistent AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
"\n",
"* `vm_size`: VM family of the nodes provisioned by AmlCompute. Simply choose from the supported_vmsizes() above\n",
"* `max_nodes`: Maximum nodes to autoscale to while running a job on AmlCompute"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" max_nodes=4)\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'azureml-dataprep'\n",
"]\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, \n",
" script='train_explain.py', \n",
" run_config=run_config) \n",
"run = experiment.submit(config=src)\n",
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 3: Provision as a persistent compute target (Advanced)\n",
"\n",
"You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n",
"\n",
"In addition to `vm_size` and `max_nodes`, you can specify:\n",
"* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
"* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
"* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
"* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n",
"* `vnet_name`: Name of VNet\n",
"* `subnet_name`: Name of SubNet within the VNet"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" vm_priority='lowpriority',\n",
" min_nodes=2,\n",
" max_nodes=4,\n",
" idle_seconds_before_scaledown='300',\n",
" vnet_resourcegroup_name='<my-resource-group>',\n",
" vnet_name='<my-vnet-name>',\n",
" subnet_name='<my-subnet-name>')\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# create a new RunConfig object\n",
"run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model', 'azureml-dataprep'\n",
"]\n",
"\n",
"\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'],\n",
" pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, \n",
" script='train_explain.py', \n",
" run_config=run_config) \n",
"run = experiment.submit(config=src)\n",
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
"\n",
"client = ExplanationClient.from_run(run)\n",
"# Get the top k (e.g., 4) most important features with their importance values\n",
"explanation = client.download_model_explanation(top_k=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional operations to perform on AmlCompute\n",
"\n",
"You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get_status () gets the latest status of the AmlCompute target\n",
"cpu_cluster.get_status().serialize()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n",
"# cpu_cluster.update(min_nodes=1)\n",
"# cpu_cluster.update(max_nodes=10)\n",
"cpu_cluster.update(idle_seconds_before_scaledown=300)\n",
"# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
"# 'cpu-cluster' in this case but use a different VM family for instance.\n",
"\n",
"# cpu_cluster.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download \n",
"1. Download model explanation data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient\n",
"\n",
"# Get model explanation data\n",
"client = ExplanationClient.from_run(run)\n",
"global_explanation = client.download_model_explanation()\n",
"local_importance_values = global_explanation.local_importance_values\n",
"expected_values = global_explanation.expected_values\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Or you can use the saved run.id to retrive the feature importance values\n",
"client = ExplanationClient.from_run_id(ws, experiment_name, run.id)\n",
"global_explanation = client.download_model_explanation()\n",
"local_importance_values = global_explanation.local_importance_values\n",
"expected_values = global_explanation.expected_values"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get the top k (e.g., 4) most important features with their importance values\n",
"global_explanation_topk = client.download_model_explanation(top_k=4)\n",
"global_importance_values = global_explanation_topk.get_ranked_global_values()\n",
"global_importance_names = global_explanation_topk.get_ranked_global_names()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print('global importance values: {}'.format(global_importance_values))\n",
"print('global importance names: {}'.format(global_importance_names))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"2. Download model file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# retrieve model for visualization and deployment\n",
"from azureml.core.model import Model\n",
"from sklearn.externals import joblib\n",
"original_model = Model(ws, 'original_model')\n",
"model_path = original_model.download(exist_ok=True)\n",
"original_model = joblib.load(model_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"3. Download test dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# retrieve x_test for visualization\n",
"from sklearn.externals import joblib\n",
"x_test_path = './x_test_boston_housing.pkl'\n",
"run.download_file('x_test_boston_housing.pkl', output_file_path=x_test_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_test = joblib.load('x_test_boston_housing.pkl')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize\n",
"Load the visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.contrib.explain.model.visualize import ExplanationDashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ExplanationDashboard(global_explanation, original_model, x_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next\n",
"Learn about other use cases of the explain package on a:\n",
"1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n",
"1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n",
"1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n",
"1. Explain models with engineered features:\n",
" 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n",
" 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n",
"1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n",
"1. Inferencing time: deploy a classification model and explainer:\n",
" 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n",
" 1. [Deploy a remotely-trained model and explainer](../scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "mesameki"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: explain-model-on-amlcompute
dependencies:
- pip:
- azureml-sdk
- azureml-explain-model
- azureml-contrib-explain-model
- sklearn-pandas
- azureml-dataprep

View File

@@ -0,0 +1,63 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from sklearn import datasets
from sklearn.linear_model import Ridge
from azureml.explain.model.tabular_explainer import TabularExplainer
from azureml.contrib.explain.model.explanation.explanation_client import ExplanationClient
from sklearn.model_selection import train_test_split
from azureml.core.run import Run
from sklearn.externals import joblib
import os
import numpy as np
OUTPUT_DIR = './outputs/'
os.makedirs(OUTPUT_DIR, exist_ok=True)
boston_data = datasets.load_boston()
run = Run.get_context()
client = ExplanationClient.from_run(run)
X_train, X_test, y_train, y_test = train_test_split(boston_data.data,
boston_data.target,
test_size=0.2,
random_state=0)
# write x_test out as a pickle file for later visualization
x_test_pkl = 'x_test.pkl'
with open(x_test_pkl, 'wb') as file:
joblib.dump(value=X_test, filename=os.path.join(OUTPUT_DIR, x_test_pkl))
run.upload_file('x_test_boston_housing.pkl', os.path.join(OUTPUT_DIR, x_test_pkl))
alpha = 0.5
# Use Ridge algorithm to create a regression model
reg = Ridge(alpha)
model = reg.fit(X_train, y_train)
preds = reg.predict(X_test)
run.log('alpha', alpha)
model_file_name = 'ridge_{0:.2f}.pkl'.format(alpha)
# save model in the outputs folder so it automatically get uploaded
with open(model_file_name, 'wb') as file:
joblib.dump(value=reg, filename=os.path.join(OUTPUT_DIR,
model_file_name))
# register the model
run.upload_file('original_model.pkl', os.path.join('./outputs/', model_file_name))
original_model = run.register_model(model_name='original_model', model_path='original_model.pkl')
# Explain predictions on your local machine
tabular_explainer = TabularExplainer(model, X_train, features=boston_data.feature_names)
# Explain overall model predictions (global explanation)
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations it will
# take longer although they may be more accurate
global_explanation = tabular_explainer.explain_global(X_test)
# Uploading model explanation data for storage or visualization in webUX
# The explanation can then be downloaded on any compute
comment = 'Global explanation on regression model trained on boston dataset'
client.upload_model_explanation(global_explanation, comment=comment)

View File

@@ -4,5 +4,5 @@ dependencies:
- azureml-sdk - azureml-sdk
- azureml-explain-model - azureml-explain-model
- azureml-contrib-explain-model - azureml-contrib-explain-model
- azureml-dataprep
- sklearn-pandas - sklearn-pandas
- azureml-dataprep

View File

@@ -460,7 +460,7 @@
"source": [ "source": [
"# Submit syntax\n", "# Submit syntax\n",
"# submit(experiment_name, \n", "# submit(experiment_name, \n",
"# pipeline_params=None, \n", "# pipeline_parameters=None, \n",
"# continue_on_step_failure=False, \n", "# continue_on_step_failure=False, \n",
"# regenerate_outputs=False)\n", "# regenerate_outputs=False)\n",
"\n", "\n",

View File

@@ -321,7 +321,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"hyperdriveconfig-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"hd_config = HyperDriveConfig(estimator=est, \n", "hd_config = HyperDriveConfig(estimator=est, \n",

View File

@@ -299,7 +299,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.pipelince.core import PipelineParameter\n", "from azureml.pipeline.core import PipelineParameter\n",
"\n", "\n",
"# Use the default blob storage\n", "# Use the default blob storage\n",
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n", "def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",

View File

@@ -28,14 +28,14 @@
"metadata": {}, "metadata": {},
"source": [ "source": [
"## Introduction\n", "## Introduction\n",
"In this example we showcase how you can use the `azureml.dataprep` SDK to load and prepare data for AutoML via AML Pipeline. `azureml.dataprep` can also be used standalone; full documentation can be found [here](https://github.com/Microsoft/PendletonDocs).\n", "In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. \n",
"\n", "\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.\n", "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.\n",
"\n", "\n",
"In this notebook you will learn how to:\n", "In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n", "1. Create an `Experiment` in an existing `Workspace`.\n",
"2. Create or Attach existing AmlCompute to a workspace.\n", "2. Create or Attach existing AmlCompute to a workspace.\n",
"3. Define data loading and preparation steps in a `Dataflow` using `azureml.dataprep`.\n", "3. Define data loading in a `TabularDataset`.\n",
"4. Configure AutoML using `AutoMLConfig`.\n", "4. Configure AutoML using `AutoMLConfig`.\n",
"5. Use AutoMLStep\n", "5. Use AutoMLStep\n",
"6. Train the model using AmlCompute\n", "6. Train the model using AmlCompute\n",
@@ -65,7 +65,6 @@
"import pandas as pd\n", "import pandas as pd\n",
"from sklearn import datasets\n", "from sklearn import datasets\n",
"import pkg_resources\n", "import pkg_resources\n",
"import azureml.dataprep as dprep\n",
"\n", "\n",
"import azureml.core\n", "import azureml.core\n",
"from azureml.core.experiment import Experiment\n", "from azureml.core.experiment import Experiment\n",
@@ -73,6 +72,7 @@
"from azureml.train.automl import AutoMLConfig\n", "from azureml.train.automl import AutoMLConfig\n",
"from azureml.core.compute import AmlCompute\n", "from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n", "from azureml.core.compute import ComputeTarget\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.conda_dependencies import CondaDependencies\n",
"\n", "\n",
@@ -197,13 +197,10 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n", "# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n", "example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n", "dataset = Dataset.Tabular.from_delimited_files(example_data)\n",
"dflow.get_profile()" "dataset.to_pandas_dataframe().describe()"
] ]
}, },
{ {
@@ -212,20 +209,18 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n", "dataset.take(5).to_pandas_dataframe()"
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
] ]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"### Review the Data Preparation Result\n", "### Review the Dataset Result\n",
"\n", "\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n", "You can peek the result of a TabularDataset at any range using `skip(i)` and `take(j).to_pandas_dataframe()`. Doing so evaluates only `j` records for all the steps in the TabularDataset, which makes it fast even against large datasets.\n",
"\n", "\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage." "`TabularDataset` objects are composed of a list of transformation steps (optional)."
] ]
}, },
{ {
@@ -234,8 +229,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n", "X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)\n", "y = dataset.keep_columns(columns=['Primary Type'], validate=True)\n",
"print('X and y are ready!')" "print('X and y are ready!')"
] ]
}, },
@@ -441,8 +436,12 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n", "dataset = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n",
"dflow_test = dflow_test.drop_nulls('Primary Type')" "df_test = dataset_test.to_pandas_dataframe()\n",
"df_test = df_test[pd.notnull(df['Primary Type'])]\n",
"\n",
"y_test = df_test[['Primary Type']]\n",
"X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)"
] ]
}, },
{ {
@@ -462,10 +461,6 @@
"source": [ "source": [
"from pandas_ml import ConfusionMatrix\n", "from pandas_ml import ConfusionMatrix\n",
"\n", "\n",
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
"\n",
"\n",
"ypred = best_model.predict(X_test)\n", "ypred = best_model.predict(X_test)\n",
"\n", "\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n", "cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",

View File

@@ -1,5 +1,12 @@
{ {
"cells": [ "cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.png)"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -187,7 +194,19 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"aml_compute = ws.get_default_compute_target(\"CPU\")\n", "aml_compute = ws.get_default_compute_target(\"CPU\")\n",
"\n",
"if aml_compute is None:\n",
" amlcompute_cluster_name = \"cpu-cluster\"\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" max_nodes = 4)\n",
"\n",
" aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
"\n",
"aml_compute" "aml_compute"
] ]
}, },
@@ -735,6 +754,8 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"%%writefile $train_model_folder/get_data.py\n", "%%writefile $train_model_folder/get_data.py\n",
"import os\n",
"import pandas as pd\n",
"\n", "\n",
"def get_data():\n", "def get_data():\n",
" print(\"In get_data\")\n", " print(\"In get_data\")\n",

View File

@@ -387,11 +387,15 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"pipelineparameterssample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n",
"pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_params={\"param_batch_size\": 20})" "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_parameters={\"param_batch_size\": 20})"
] ]
}, },
{ {

View File

@@ -384,7 +384,7 @@
"source": [ "source": [
"pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n", "pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n",
"# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n", "# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n",
"pipeline_run = Experiment(ws, 'style_transfer').submit(pipeline, pipeline_params={\"style\": \"mosaic\", \"nodecount\": 3})" "pipeline_run = Experiment(ws, 'style_transfer').submit(pipeline, pipeline_parameters={\"style\": \"mosaic\", \"nodecount\": 3})"
] ]
}, },
{ {

View File

@@ -26,9 +26,10 @@
"\n", "\n",
" 1. Interactive Login Authentication\n", " 1. Interactive Login Authentication\n",
" 2. Azure CLI Authentication\n", " 2. Azure CLI Authentication\n",
" 3. Service Principal Authentication\n", " 3. Managed Service Identity (MSI) Authentication\n",
" 4. Service Principal Authentication\n",
" \n", " \n",
"The interactive authentication is suitable for local experimentation on your own computer. Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once. The Service Principal authentication is suitable for automated workflows, for example as part of Azure Devops build." "The interactive authentication is suitable for local experimentation on your own computer. Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once. The MSI and Service Principal authentication are suitable for automated workflows, for example as part of Azure Devops build."
] ]
}, },
{ {
@@ -145,6 +146,43 @@
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))" "print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MSI Authentication\n",
"\n",
"__Note__: _MSI authentication is supported only when using SDK from Azure Virtual Machine. The code below will fail on local computer._\n",
"\n",
"When using Azure ML SDK on Azure Virtual Machine (VM), you can use Managed Service Identity (MSI) based authentication. This mode allows the VM connect to the Workspace without storing credentials in the Python code.\n",
"\n",
"As a pre-requisite, enable System-assigned Managed Identity for your VM as described in [this document](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/qs-configure-portal-windows-vm).\n",
"\n",
"Then, assign the VM access to your Workspace. For example from Azure Portal, navigate to your workspace, select __Access Control (IAM)__, __Add Role Assignment__, specify __Virtual Machine__ for __Assign Access To__ dropdown, and select your VM's identity.\n",
"\n",
"![msi assignment](images/msiaccess.PNG)\n",
"\n",
"After completing these steps, you can use authenticate using MsiAuthentication instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import MsiAuthentication\n",
"\n",
"msi_auth = MsiAuthentication()\n",
"\n",
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\",\n",
" auth=msi_auth)\n",
"\n",
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
]
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
@@ -238,6 +276,135 @@
"See [Register an application with the Microsoft identity platform](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) quickstart for more details about application registrations. " "See [Register an application with the Microsoft identity platform](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app) quickstart for more details about application registrations. "
] ]
}, },
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using Secrets in Remote Runs\n",
"\n",
"Sometimes, you may have to pass a secret to a remote run, for example username and password to authenticate against external data source.\n",
"\n",
"Azure ML SDK enables this use case through Key Vault associated with your workspace. The workflow for adding a secret is following.\n",
"\n",
"On local computer:\n",
"\n",
" 1. Read in a local secret, for example from environment variable or user input. To keep them secret, do not insert secret values into code as hard-coded strings.\n",
" 2. Obtain a reference to the keyvault\n",
" 3. Add the secret name-value pair in the key vault.\n",
" \n",
"The secret is then available for remote runs as shown further below.\n",
"\n",
"__Note__: The _azureml.core.keyvault.Keyvault_ is different from _azure.keyvault_ library. It is intended as simplified wrapper for setting, getting and listing user secrets in Workspace Key Vault."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os, uuid\n",
"\n",
"local_secret = os.environ.get(\"LOCAL_SECRET\", default = str(uuid.uuid4())) # Use random UUID as a substitute for real secret.\n",
"keyvault = ws.get_default_keyvault()\n",
"keyvault.set_secret(name=\"secret-name\", value = local_secret)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The _set_secret_ method adds a new secret if one doesn't exist, or updates an existing one with new value.\n",
"\n",
"You can list secret names you've added. This method doesn't return the values of the secrets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"keyvault.list_secrets()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can retrieve the value of the secret, and validate that it matches the original value. \n",
"\n",
"__Note__: This method returns the secret value. Take care not to write the the secret value to output."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"retrieved_secret = keyvault.get_secret(name=\"secret-name\")\n",
"local_secret==retrieved_secret"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In submitted runs on local and remote compute, you can use the get_secret method of Run instance to get the secret value from Key Vault. \n",
"\n",
"The method gives you a simple shortcut: the Run instance is aware of its Workspace and Keyvault, so it can directly obtain the secret without you having to instantiate the Workspace and Keyvault within remote run.\n",
"\n",
"__Note__: This method returns the secret value. Take care not to write the secret to output.\n",
"\n",
"For example, let's create a simple script _get_secret.py_ that gets the secret we set earlier. In an actual appication, you would use the secret, for example to access a database or other password-protected resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile get_secret.py\n",
"\n",
"from azureml.core import Run\n",
"\n",
"run = Run.get_context()\n",
"secret_value = run.get_secret(name=\"secret-name\")\n",
"print(\"Got secret value {} , but don't write it out!\".format(len(secret_value) * \"*\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, submit the script as a regular script run, and find the obfuscated secret value in run output. You can use the same approach to other kinds of runs, such as Estimator ones."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment, Run\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"exp = Experiment(workspace = ws, name=\"try-secret\")\n",
"src = ScriptRunConfig(source_directory=\".\", script=\"get_secret.py\")\n",
"\n",
"run = exp.submit(src)\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Furthermore, you can set and get multiple secrets using set_secrets and get_secrets methods."
]
},
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
@@ -267,7 +434,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.6.4" "version": "3.6.9"
} }
}, },
"nbformat": 4, "nbformat": 4,

View File

@@ -0,0 +1,19 @@
## Follow these sample notebooks to learn:
1. [Logging API](./logging-api/logging-api.ipynb): experiment with various logging functions to create runs and automatically generate graphs.
2. [Manage runs](./manage-runs/manage-runs.ipynb): learn different ways how to start runs and child runs, monitor them, and cancel them.
1. [Tensorboard to monitor runs](./tensorboard/tensorboard.ipynb)
## Use MLflow with Azure Machine Learning service (Preview)
[MLflow](https://mlflow.org/) is an open-source platform for tracking machine learning experiments and managing models. You can use MLflow logging APIs with Azure Machine Learning service: the metrics and artifacts are logged to your Azure ML Workspace.
Try out the sample notebooks:
1. [Use MLflow with Azure Machine Learning for Local Training Run](./train-local/train-local.ipynb)
1. [Use MLflow with Azure Machine Learning for Remote Training Run](./train-remote/train-remote.ipynb)
1. [Deploy Model as Azure Machine Learning Web Service using MLflow](./deploy-model/deploy-model.ipynb)
1. [Train and Deploy PyTorch Image Classifier](./train-deploy-pytorch/train-deploy-pytorch.ipynb)
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/README.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@@ -0,0 +1,545 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Logging\n",
"\n",
"_**This notebook showcases various ways to use the Azure Machine Learning service run logging APIs, and view the results in the Azure portal.**_\n",
"\n",
"---\n",
"---\n",
"\n",
"## Table of Contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
" 1. Validate Azure ML SDK installation\n",
" 1. Initialize workspace\n",
" 1. Set experiment\n",
"1. [Logging](#Logging)\n",
" 1. Starting a run\n",
" 1. Viewing a run in the portal\n",
" 1. Viewing the experiment in the portal\n",
" 1. Logging metrics\n",
" 1. Logging string metrics\n",
" 1. Logging numeric metrics\n",
" 1. Logging vectors\n",
" 1. Logging tables\n",
" 1. Uploading files\n",
"1. [Analyzing results](#Analyzing-results)\n",
" 1. Tagging a run\n",
"1. [Next steps](#Next-steps)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"Logging metrics from runs in your experiments allows you to track results from one run to another, determining trends in your outputs and understand how your inputs correspond to your model and script performance. Azure Machine Learning services (AzureML) allows you to track various types of metrics including images and arbitrary files in order to understand, analyze, and audit your experimental progress. \n",
"\n",
"Typically you should log all parameters for your experiment and all numerical and string outputs of your experiment. This will allow you to analyze the performance of your experiments across multiple runs, correlate inputs to outputs, and filter runs based on interesting criteria.\n",
"\n",
"The experiment's Run History report page automatically creates a report that can be customized to show the KPI's, charts, and column sets that are interesting to you. \n",
"\n",
"| ![Run Details](./img/run_details.PNG) | ![Run History](./img/run_history.PNG) |\n",
"|:--:|:--:|\n",
"| *Run Details* | *Run History* |\n",
"\n",
"---\n",
"\n",
"## Setup\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also make sure you have tqdm and matplotlib installed in the current kernel.\n",
"\n",
"```\n",
"(myenv) $ conda install -y tqdm matplotlib\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Validate Azure ML SDK installation and get version number for debugging purposes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"install"
]
},
"outputs": [],
"source": [
"from azureml.core import Experiment, Workspace, Run\n",
"import azureml.core\n",
"import numpy as np\n",
"from tqdm import tqdm\n",
"\n",
"# Check core SDK version number\n",
"\n",
"print(\"This notebook was created using SDK version 1.0.57, you are currently running version\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Initialize workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"create workspace"
]
},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set experiment\n",
"Create a new experiment (or get the one with the specified name). An *experiment* is a container for an arbitrary set of *runs*. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment = Experiment(workspace=ws, name='logging-api-test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Logging\n",
"In this section we will explore the various logging mechanisms.\n",
"\n",
"### Starting a run\n",
"\n",
"A *run* is a singular experimental trial. In this notebook we will create a run directly on the experiment by calling `run = exp.start_logging()`. If you were experimenting by submitting a script file as an experiment using ``experiment.submit()``, you would call `run = Run.get_context()` in your script to access the run context of your code. In either case, the logging methods on the returned run object work the same.\n",
"\n",
"This cell also stores the run id for use later in this notebook. The run_id is not necessary for logging."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# start logging for the run\n",
"run = experiment.start_logging()\n",
"\n",
"# access the run id for use later\n",
"run_id = run.id\n",
"\n",
"# change the scale factor on different runs to see how you can compare multiple runs\n",
"scale_factor = 2\n",
"\n",
"# change the category on different runs to see how to organize data in reports\n",
"category = 'Red'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Viewing a run in the Portal\n",
"Once a run is started you can see the run in the portal by simply typing ``run``. Clicking on the \"Link to Portal\" link will take you to the Run Details page that shows the metrics you have logged and other run properties. You can refresh this page after each logging statement to see the updated results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Viewing an experiment in the portal\n",
"You can also view an experiement similarly by typing `experiment`. The portal link will take you to the experiment's Run History page that shows all runs and allows you to analyze trends across multiple runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Logging metrics\n",
"Metrics are visible in the run details page in the AzureML portal and also can be analyzed in experiment reports. The run details page looks as below and contains tabs for Details, Outputs, Logs, and Snapshot. \n",
"* The Details page displays attributes about the run, plus logged metrics and images. Metrics that are vectors appear as charts. \n",
"* The Outputs page contains any files, such as models, you uploaded into the \"outputs\" directory from your run into storage. If you place files in the \"outputs\" directory locally, the files are automatically uploaded on your behald when the run is completed.\n",
"* The Logs page allows you to view any log files created by your run. Logging runs created in notebooks typically do not generate log files.\n",
"* The Snapshot page contains a snapshot of the directory specified in the ''start_logging'' statement, plus the notebook at the time of the ''start_logging'' call. This snapshot and notebook can be downloaded from the Run Details page to continue or reproduce an experiment.\n",
"\n",
"### Logging string metrics\n",
"The following cell logs a string metric. A string metric is simply a string value associated with a name. A string metric String metrics are useful for labelling runs and to organize your data. Typically you should log all string parameters as metrics for later analysis - even information such as paths can help to understand how individual experiements perform differently.\n",
"\n",
"String metrics can be used in the following ways:\n",
"* Plot in hitograms\n",
"* Group by indicators for numerical plots\n",
"* Filtering runs\n",
"\n",
"String metrics appear in the **Tracked Metrics** section of the Run Details page and can be added as a column in Run History reports."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# log a string metric\n",
"run.log(name='Category', value=category)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logging numerical metrics\n",
"The following cell logs some numerical metrics. Numerical metrics can include metrics such as AUC or MSE. You should log any parameter or significant output measure in order to understand trends across multiple experiments. Numerical metrics appear in the **Tracked Metrics** section of the Run Details page, and can be used in charts or KPI's in experiment Run History reports."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# log numerical values\n",
"run.log(name=\"scale factor\", value = scale_factor)\n",
"run.log(name='Magic Number', value=42 * scale_factor)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logging vectors\n",
"Vectors are good for recording information such as loss curves. You can log a vector by creating a list of numbers, calling ``log_list()`` and supplying a name and the list, or by repeatedly logging a value using the same name.\n",
"\n",
"Vectors are presented in Run Details as a chart, and are directly comparable in experiment reports when placed in a chart. \n",
"\n",
"**Note:** vectors logged into the run are expected to be relatively small. Logging very large vectors into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fibonacci_values = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]\n",
"scaled_values = (i * scale_factor for i in fibonacci_values)\n",
"\n",
"# Log a list of values. Note this will generate a single-variable line chart.\n",
"run.log_list(name='Fibonacci', value=scaled_values)\n",
"\n",
"for i in tqdm(range(-10, 10)):\n",
" # log a metric value repeatedly, this will generate a single-variable line chart.\n",
" run.log(name='Sigmoid', value=1 / (1 + np.exp(-i)))\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logging tables\n",
"Tables are good for recording related sets of information such as accuracy tables, confusion matrices, etc. \n",
"You can log a table in two ways:\n",
"* Create a dictionary of lists where each list represents a column in the table and call ``log_table()``\n",
"* Repeatedly call ``log_row()`` providing the same table name with a consistent set of named args as the column values\n",
"\n",
"Tables are presented in Run Details as a chart using the first two columns of the table \n",
"\n",
"**Note:** tables logged into the run are expected to be relatively small. Logging very large tables into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a dictionary to hold a table of values\n",
"sines = {}\n",
"sines['angle'] = []\n",
"sines['sine'] = []\n",
"\n",
"for i in tqdm(range(-10, 10)):\n",
" angle = i / 2.0 * scale_factor\n",
" \n",
" # log a 2 (or more) values as a metric repeatedly. This will generate a 2-variable line chart if you have 2 numerical columns.\n",
" run.log_row(name='Cosine Wave', angle=angle, cos=np.cos(angle))\n",
" \n",
" sines['angle'].append(angle)\n",
" sines['sine'].append(np.sin(angle))\n",
"\n",
"# log a dictionary as a table, this will generate a 2-variable chart if you have 2 numerical columns\n",
"run.log_table(name='Sine Wave', value=sines)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Logging images\n",
"You can directly log _matplotlib_ plots and arbitrary images to your run record. This code logs a _matplotlib_ pyplot object. Images show up in the run details page in the Azure ML Portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"# Create a plot\n",
"import matplotlib.pyplot as plt\n",
"angle = np.linspace(-3, 3, 50) * scale_factor\n",
"plt.plot(angle,np.tanh(angle), label='tanh')\n",
"plt.legend(fontsize=12)\n",
"plt.title('Hyperbolic Tangent', fontsize=16)\n",
"plt.grid(True)\n",
"\n",
"# Log the plot to the run. To log an arbitrary image, use the form run.log_image(name, path='./image_path.png')\n",
"run.log_image(name='Hyperbolic Tangent', plot=plt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Uploading files\n",
"\n",
"Files can also be uploaded explicitly and stored as artifacts along with the run record. These files are also visible in the *Outputs* tab of the Run Details page.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file_name = 'outputs/myfile.txt'\n",
"\n",
"with open(file_name, \"w\") as f:\n",
" f.write('This is an output file that will be uploaded.\\n')\n",
"\n",
"# Upload the file explicitly into artifacts \n",
"run.upload_file(name = file_name, path_or_stream = file_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Completing the run\n",
"\n",
"Calling `run.complete()` marks the run as completed and triggers the output file collection. If for any reason you need to indicate the run failed or simply need to cancel the run you can call `run.fail()` or `run.cancel()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"## Analyzing results"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can refresh the run in the Azure portal to see all of your results. In many cases you will want to analyze runs that were performed previously to inspect the contents or compare results. Runs can be fetched from their parent Experiment object using the ``Run()`` constructor or the ``experiment.get_runs()`` method. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fetched_run = Run(experiment, run_id)\n",
"fetched_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Call ``run.get_metrics()`` to retrieve all the metrics from a run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fetched_run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"See the files uploaded for this run by calling ``run.get_file_names()``"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fetched_run.get_file_names()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you know the file names in a run, you can download the files using the ``run.download_file()`` method"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"os.makedirs('files', exist_ok=True)\n",
"\n",
"for f in run.get_file_names():\n",
" dest = os.path.join('files', f.split('/')[-1])\n",
" print('Downloading file {} to {}...'.format(f, dest))\n",
" fetched_run.download_file(f, dest) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tagging a run\n",
"Often when you analyze the results of a run, you may need to tag that run with important personal or external information. You can add a tag to a run using the ``run.tag()`` method. AzureML supports valueless and valued tags."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fetched_run.tag(\"My Favorite Run\")\n",
"fetched_run.tag(\"Competition Rank\", 1)\n",
"\n",
"fetched_run.get_tags()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"To experiment more with logging and to understand how metrics can be visualized, go back to the *Start a run* section, try changing the category and scale_factor values and going through the notebook several times. Play with the KPI, charting, and column selection options on the experiment's Run History reports page to see how the various metrics can be combined and visualized.\n",
"\n",
"After learning about all of the logging options, go to the [train on remote vm](..\\train-on-remote-vm\\train-on-remote-vm.ipynb) notebook and experiment with logging from remote compute contexts."
]
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: logging-api
dependencies:
- numpy
- matplotlib
- tqdm
- pip:
- azureml-sdk
- azureml-widgets

View File

@@ -0,0 +1,7 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from azureml.core import Run
submitted_run = Run.get_context()
submitted_run.log(name="message", value="Hello from run!")

View File

@@ -0,0 +1,11 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
from azureml.core import Run
run = Run.get_context()
child_runs = run.create_children(count=5)
for c, child in enumerate(child_runs):
child.log(name="Hello from child run ", value=c)
child.complete()

View File

@@ -0,0 +1,8 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
import time
print("Wait for 10 seconds..")
time.sleep(10)
print("Done waiting")

View File

@@ -0,0 +1,602 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Manage runs\n",
"\n",
"## Table of contents\n",
"\n",
"1. [Introduction](#Introduction)\n",
"1. [Setup](#Setup)\n",
"1. [Start, monitor and complete a run](#Start,-monitor-and-complete-a-run)\n",
"1. [Add properties and tags](#Add-properties-and-tags)\n",
"1. [Query properties and tags](#Query-properties-and-tags)\n",
"1. [Start and query child runs](#Start-and-query-child-runs)\n",
"1. [Cancel or fail runs](#Cancel-or-fail-runs)\n",
"1. [Reproduce a run](#Reproduce-a-run)\n",
"1. [Next steps](#Next-steps)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"When you're building enterprise-grade machine learning models, it is important to track, organize, monitor and reproduce your training runs. For example, you might want to trace the lineage behind a model deployed to production, and re-run the training experiment to troubleshoot issues. \n",
"\n",
"This notebooks shows examples how to use Azure Machine Learning services to manage your training runs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also, if you're new to Azure ML, we recommend that you go through [the tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) first to learn the basic concepts.\n",
"\n",
"Let's first import required packages, check Azure ML SDK version, connect to your workspace and create an Experiment to hold the runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.core\n",
"from azureml.core import Workspace, Experiment, Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"print(azureml.core.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = Experiment(workspace=ws, name=\"explore-runs\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start, monitor and complete a run\n",
"\n",
"A run is an unit of execution, typically to train a model, but for other purposes as well, such as loading or transforming data. Runs are tracked by Azure ML service, and can be instrumented with metrics and artifact logging.\n",
"\n",
"A simplest way to start a run in your interactive Python session is to call *Experiment.start_logging* method. You can then log metrics from within the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"notebook_run = exp.start_logging()\n",
"\n",
"notebook_run.log(name=\"message\", value=\"Hello from run!\")\n",
"\n",
"print(notebook_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use *get_status method* to get the status of the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(notebook_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, you can simply enter the run to get a link to Azure Portal details"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"notebook_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Method *get_details* gives you more details on the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"notebook_run.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use *complete* method to end the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"notebook_run.complete()\n",
"print(notebook_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also use Python's *with...as* pattern. The run will automatically complete when moving out of scope. This way you don't need to manually complete the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with exp.start_logging() as notebook_run:\n",
" notebook_run.log(name=\"message\", value=\"Hello from run!\")\n",
" print(\"Is it still running?\",notebook_run.get_status())\n",
" \n",
"print(\"Has it completed?\",notebook_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, let's look at submitting a run as a separate Python process. To keep the example simple, we submit the run on local computer. Other targets could include remote VMs and Machine Learning Compute clusters in your Azure ML Workspace.\n",
"\n",
"We use *hello.py* script as an example. To perform logging, we need to get a reference to the Run instance from within the scope of the script. We do this using *Run.get_context* method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!more hello.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's submit the run on a local computer. A standard pattern in Azure ML SDK is to create run configuration, and then use *Experiment.submit* method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_config = ScriptRunConfig(source_directory='.', script='hello.py')\n",
"\n",
"local_script_run = exp.submit(run_config)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can view the status of the run as before"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(local_script_run.get_status())\n",
"local_script_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Submitted runs have additional log files you can inspect using *get_details_with_logs*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.get_details_with_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use *wait_for_completion* method to block the local execution until remote run is complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.wait_for_completion(show_output=True)\n",
"print(local_script_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Add properties and tags\n",
"\n",
"Properties and tags help you organize your runs. You can use them to describe, for example, who authored the run, what the results were, and what machine learning approach was used. And as you'll later learn, properties and tags can be used to query the history of your runs to find the important ones.\n",
"\n",
"For example, let's add \"author\" property to the run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.add_properties({\"author\":\"azureml-user\"})\n",
"print(local_script_run.get_properties())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Properties are immutable. Once you assign a value it cannot be changed, making them useful as a permanent record for auditing purposes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"try:\n",
" local_script_run.add_properties({\"author\":\"different-user\"})\n",
"except Exception as e:\n",
" print(e)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Tags on the other hand can be changed:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.tag(\"quality\", \"great run\")\n",
"print(local_script_run.get_tags())"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.tag(\"quality\", \"fantastic run\")\n",
"print(local_script_run.get_tags())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also add a simple string tag. It appears in the tag dictionary with value of None"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.tag(\"worth another look\")\n",
"print(local_script_run.get_tags())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Query properties and tags\n",
"\n",
"You can quary runs within an experiment that match specific properties and tags. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags={\"quality\":\"fantastic run\"}))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list(exp.get_runs(properties={\"author\":\"azureml-user\"},tags=\"worth another look\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start and query child runs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use child runs to group together related runs, for example different hyperparameter tuning iterations.\n",
"\n",
"Let's use *hello_with_children* script to create a batch of 5 child runs from within a submitted run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!more hello_with_children.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_config = ScriptRunConfig(source_directory='.', script='hello_with_children.py')\n",
"\n",
"local_script_run = exp.submit(run_config)\n",
"local_script_run.wait_for_completion(show_output=True)\n",
"print(local_script_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can start child runs one by one. Note that this is less efficient than submitting a batch of runs, because each creation results in a network call.\n",
"\n",
"Child runs too complete automatically as they move out of scope."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with exp.start_logging() as parent_run:\n",
" for c,count in enumerate(range(5)):\n",
" with parent_run.child_run() as child:\n",
" child.log(name=\"Hello from child run\", value=c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To query the child runs belonging to specific parent, use *get_children* method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"list(parent_run.get_children())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cancel or fail runs\n",
"\n",
"Sometimes, you realize that the run is not performing as intended, and you want to cancel it instead of waiting for it to complete.\n",
"\n",
"As an example, let's create a Python script with a delay in the middle."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!more hello_with_delay.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use *cancel* method to cancel a run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_config = ScriptRunConfig(source_directory='.', script='hello_with_delay.py')\n",
"\n",
"local_script_run = exp.submit(run_config)\n",
"print(\"Did the run start?\",local_script_run.get_status())\n",
"local_script_run.cancel()\n",
"print(\"Did the run cancel?\",local_script_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also mark an unsuccessful run as failed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run = exp.submit(run_config)\n",
"local_script_run.fail()\n",
"print(local_script_run.get_status())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reproduce a run\n",
"\n",
"When updating or troubleshooting on a model deployed to production, you sometimes need to revisit the original training run that produced the model. To help you with this, Azure ML service by default creates snapshots of your scripts a the time of run submission:\n",
"\n",
"You can use *restore_snapshot* to obtain a zip package of the latest snapshot of the script folder. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"local_script_run.restore_snapshot(path=\"snapshots\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then extract the zip package, examine the code, and submit your run again."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next steps\n",
"\n",
" * To learn more about logging APIs, see [logging API notebook](./logging-api/logging-api.ipynb)\n",
" * To learn more about remote runs, see [train on AML compute notebook](./train-on-amlcompute/train-on-amlcompute.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: manage-runs
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,562 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tensorboard Integration with Run History\n",
"\n",
"1. Run a Tensorflow job locally and view its TB output live.\n",
"2. The same, for a DSVM.\n",
"3. And once more, with an AmlCompute cluster.\n",
"4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Diagnostics\n",
"Opt-in diagnostics for better experience, quality, and security of future releases."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"Diagnostics"
]
},
"outputs": [],
"source": [
"from azureml.telemetry import set_diagnostics_collection\n",
"\n",
"set_diagnostics_collection(send_diagnostics=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print('Workspace name: ' + ws.name, \n",
" 'Azure region: ' + ws.location, \n",
" 'Subscription id: ' + ws.subscription_id, \n",
" 'Resource group: ' + ws.resource_group, sep='\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set experiment name and create project\n",
"Choose a name for your run history container in the workspace, and create a folder for the project."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from os import path, makedirs\n",
"experiment_name = 'tensorboard-demo'\n",
"\n",
"# experiment folder\n",
"exp_dir = './sample_projects/' + experiment_name\n",
"\n",
"if not path.exists(exp_dir):\n",
" makedirs(exp_dir)\n",
"\n",
"# runs we started in this session, for the finale\n",
"runs = []"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Download Tensorflow Tensorboard demo code\n",
"\n",
"Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n",
"\n",
"Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"import os\n",
"\n",
"tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n",
"with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n",
" file.write(tf_code.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure and run locally\n",
"\n",
"We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n",
"\n",
"Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"\n",
"# Create a run configuration.\n",
"run_config = RunConfiguration()\n",
"run_config.environment.python.user_managed_dependencies = True\n",
"\n",
"# You can choose a specific Python environment by pointing to a Python path \n",
"#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"logs_dir = os.path.join(os.curdir, \"logs\")\n",
"data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n",
"\n",
"if not path.exists(data_dir):\n",
" makedirs(data_dir)\n",
"\n",
"os.environ[\"TEST_TMPDIR\"] = data_dir\n",
"\n",
"# Writing logs to ./logs results in their being uploaded to Artifact Service,\n",
"# and thus, made accessible to our Tensorboard instance.\n",
"arguments_list = [\"--log_dir\", logs_dir]\n",
"\n",
"# Create an experiment\n",
"exp = Experiment(ws, experiment_name)\n",
"\n",
"# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n",
"# arguments_list += [\"--max_steps\", \"5000\"]\n",
"\n",
"script = ScriptRunConfig(exp_dir,\n",
" script=\"mnist_with_summaries.py\",\n",
" run_config=run_config,\n",
" arguments=arguments_list)\n",
"\n",
"run = exp.submit(script)\n",
"# You can also wait for the run to complete\n",
"# run.wait_for_completion(show_output=True)\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard\n",
"\n",
"Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"tensorboard-sample"
]
},
"outputs": [],
"source": [
"from azureml.tensorboard import Tensorboard\n",
"\n",
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now, with a DSVM\n",
"\n",
"Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n",
"Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n",
"\n",
"If you are unfamiliar with DSVM configuration, check [Train in a remote VM](../../training/train-on-remote-vm/train-on-remote-vm.ipynb) for a more detailed breakdown.\n",
"\n",
"**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The `DSVMCompute` class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n",
"\n",
"```shell\n",
"# create a DSVM in your resource group\n",
"# note you need to be at least a contributor to the resource group in order to execute this command successfully.\n",
"(myenv) $ az vm create --resource-group <resource_group_name> --name <some_vm_name> --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username <username> --admin-password <password> --generate-ssh-keys --authentication-type password\n",
"```\n",
"You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, RemoteCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')\n",
"address = os.getenv('AZUREML_DSVM_ADDRESS', default='<ip_address_or_fqdn>')\n",
"\n",
"compute_target_name = 'cpudsvm'\n",
"# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n",
"try:\n",
" attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n",
" print('found existing:', attached_dsvm_compute.name)\n",
"except ComputeTargetException:\n",
" config = RemoteCompute.attach_configuration(username=username,\n",
" address=address,\n",
" ssh_port=22,\n",
" private_key_file='./.ssh/id_rsa')\n",
" attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n",
" \n",
" attached_dsvm_compute.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit run using TensorFlow estimator\n",
"\n",
"Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.dnn import TensorFlow\n",
"\n",
"script_params = {\"--log_dir\": \"./logs\"}\n",
"\n",
"# If you want the run to go longer, set --max-steps to a higher number.\n",
"# script_params[\"--max_steps\"] = \"5000\"\n",
"\n",
"tf_estimator = TensorFlow(source_directory=exp_dir,\n",
" compute_target=attached_dsvm_compute,\n",
" entry_script='mnist_with_summaries.py',\n",
" script_params=script_params)\n",
"\n",
"run = exp.submit(tf_estimator)\n",
"\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard with this run\n",
"\n",
"Just like before."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Once more, with an AmlCompute cluster\n",
"\n",
"Just to prove we can, let's create an AmlCompute CPU cluster, and run our demo there, as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpucluster\"\n",
"\n",
"cts = ws.compute_targets\n",
"found = False\n",
"if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[cluster_name]\n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
"compute_target.wait_for_completion(show_output=True, min_node_count=None)\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"# print(compute_target.get_status().serialize())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit run using TensorFlow estimator\n",
"\n",
"Again, we can use the TensorFlow estimator and everything is set up automatically."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"script_params = {\"--log_dir\": \"./logs\"}\n",
"\n",
"# If you want the run to go longer, set --max-steps to a higher number.\n",
"# script_params[\"--max_steps\"] = \"5000\"\n",
"\n",
"tf_estimator = TensorFlow(source_directory=exp_dir,\n",
" compute_target=compute_target,\n",
" entry_script='mnist_with_summaries.py',\n",
" script_params=script_params)\n",
"\n",
"run = exp.submit(tf_estimator)\n",
"\n",
"runs.append(run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Start Tensorboard with this run\n",
"\n",
"Once more..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n",
"tb = Tensorboard([run])\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Finale\n",
"\n",
"If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# The Tensorboard constructor takes an array of runs...\n",
"# and it turns out that we have been building one of those all along.\n",
"tb = Tensorboard(runs)\n",
"\n",
"# If successful, start() returns a string with the URI of the instance.\n",
"tb.start()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stop Tensorboard\n",
"\n",
"As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tb.stop()"
]
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,6 @@
name: tensorboard
dependencies:
- pip:
- azureml-sdk
- azureml-tensorboard
- tensorflow

View File

@@ -0,0 +1,322 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/deploy-model/deploy-model.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy Model as Azure Machine Learning Web Service using MLflow\n",
"\n",
"This example shows you how to use mlflow together with Azure Machine Learning services for deploying a model as a web service. You'll learn how to:\n",
"\n",
" 1. Retrieve a previously trained scikit-learn model\n",
" 2. Create a Docker image from the model\n",
" 3. Deploy the model as a web service on Azure Container Instance\n",
" 4. Make a scoring request against the web service.\n",
"\n",
"## Prerequisites and Set-up\n",
"\n",
"This notebook requires you to first complete the [Use MLflow with Azure Machine Learning for Local Training Run](../train-local/train-local.ipnyb) or [Use MLflow with Azure Machine Learning for Remote Training Run](../train-remote/train-remote.ipnyb) notebook, so as to have an experiment run with uploaded model in your Azure Machine Learning Workspace.\n",
"\n",
"Also install following packages if you haven't already\n",
"\n",
"```\n",
"pip install azureml-mlflow pandas\n",
"```\n",
"\n",
"Then, import necessary packages:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"import azureml.mlflow\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Connect to workspace and set MLflow tracking URI\n",
"\n",
"Setting the tracking URI is required for retrieving the model and creating an image using the MLflow APIs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve model from previous run\n",
"\n",
"Let's retrieve the experiment from training notebook, and list the runs within that experiment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"experiment-with-mlflow\"\n",
"exp = ws.experiments[experiment_name]\n",
"\n",
"runs = list(exp.get_runs())\n",
"runs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, let's select the most recent training run and find its ID. You also need to specify the path in run history where the model was saved. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"runid = runs[0].id\n",
"model_save_path = \"model\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Docker image\n",
"\n",
"To create a Docker image with Azure Machine Learning for Model Management, use ```mlflow.azureml.build_image``` method. Specify the model path, your workspace, run ID and other parameters.\n",
"\n",
"MLflow automatically recognizes the model framework as scikit-learn, and creates the scoring logic and includes library dependencies for you.\n",
"\n",
"Note that the image creation can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow.azureml\n",
"\n",
"azure_image, azure_model = mlflow.azureml.build_image(model_uri=\"runs:/{}/{}\".format(runid, model_save_path),\n",
" workspace=ws,\n",
" model_name='diabetes-sklearn-model',\n",
" image_name='diabetes-sklearn-image',\n",
" synchronous=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy web service\n",
"\n",
"Let's use Azure Machine Learning SDK to deploy the image as a web service. \n",
"\n",
"First, specify the deployment configuration. Azure Container Instance is a suitable choice for a quick dev-test deployment, while Azure Kubernetes Service is suitable for scalable production deployments.\n",
"\n",
"Then, deploy the image using Azure Machine Learning SDK's ```deploy_from_image``` method.\n",
"\n",
"Note that the deployment can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice, Webservice\n",
"\n",
"\n",
"aci_config = AciWebservice.deploy_configuration(cpu_cores=1, \n",
" memory_gb=1, \n",
" tags={\"method\" : \"sklearn\"}, \n",
" description='Diabetes model',\n",
" location='eastus2')\n",
"\n",
"\n",
"# Deploy the image to Azure Container Instances (ACI) for real-time serving\n",
"webservice = Webservice.deploy_from_image(\n",
" image=azure_image, workspace=ws, name=\"diabetes-model-1\", deployment_config=aci_config)\n",
"\n",
"\n",
"webservice.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Make a scoring request\n",
"\n",
"Let's take the first few rows of test data and score them using the web service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"test_rows = [\n",
" [0.01991321, 0.05068012, 0.10480869, 0.07007254, -0.03596778,\n",
" -0.0266789 , -0.02499266, -0.00259226, 0.00371174, 0.04034337],\n",
" [-0.01277963, -0.04464164, 0.06061839, 0.05285819, 0.04796534,\n",
" 0.02937467, -0.01762938, 0.03430886, 0.0702113 , 0.00720652],\n",
" [ 0.03807591, 0.05068012, 0.00888341, 0.04252958, -0.04284755,\n",
" -0.02104223, -0.03971921, -0.00259226, -0.01811827, 0.00720652]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MLflow-based web service for scikit-learn model requires the data to be converted to Pandas DataFrame, and then serialized as JSON. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import pandas as pd\n",
"\n",
"test_rows_as_json = pd.DataFrame(test_rows).to_json(orient=\"split\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's pass the conveted and serialized data to web service to get the predictions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictions = webservice.run(test_rows_as_json)\n",
"\n",
"print(predictions)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can use the web service's scoring URI to make a raw HTTP request"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"webservice.scoring_uri"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can diagnose the web service using ```get_logs``` method."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"webservice.get_logs()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Next Steps\n",
"\n",
"Learn about [model management and inference in Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-model-management-and-deployment)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "rastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,8 @@
name: deploy-model
dependencies:
- scikit-learn
- matplotlib
- pip:
- azureml-sdk
- azureml-mlflow
- pandas

View File

@@ -0,0 +1,150 @@
# Copyright (c) 2017, PyTorch Team
# All rights reserved
# Licensed under BSD 3-Clause License.
# This example is based on PyTorch MNIST example:
# https://github.com/pytorch/examples/blob/master/mnist/main.py
import mlflow
import mlflow.pytorch
from mlflow.utils.environment import _mlflow_conda_env
import warnings
import cloudpickle
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, 5, 1)
self.conv2 = nn.Conv2d(20, 50, 5, 1)
self.fc1 = nn.Linear(4 * 4 * 50, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
# Added the view for reshaping score requests
x = x.view(-1, 1, 28, 28)
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(args, model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
# Use MLflow logging
mlflow.log_metric("epoch_loss", loss.item())
def test(args, model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += F.nll_loss(output, target, reduction="sum").item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print("\n")
print("Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
# Use MLflow logging
mlflow.log_metric("average_loss", test_loss)
class Args(object):
pass
# Training settings
args = Args()
setattr(args, 'batch_size', 64)
setattr(args, 'test_batch_size', 1000)
setattr(args, 'epochs', 3) # Higher number for better convergence
setattr(args, 'lr', 0.01)
setattr(args, 'momentum', 0.5)
setattr(args, 'no_cuda', True)
setattr(args, 'seed', 1)
setattr(args, 'log_interval', 10)
setattr(args, 'save_model', True)
use_cuda = not args.no_cuda and torch.cuda.is_available()
torch.manual_seed(args.seed)
device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST(
'../data',
train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
def driver():
warnings.filterwarnings("ignore")
# Dependencies for deploying the model
pytorch_index = "https://download.pytorch.org/whl/"
pytorch_version = "cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl"
deps = [
"cloudpickle=={}".format(cloudpickle.__version__),
pytorch_index + pytorch_version,
"torchvision=={}".format(torchvision.__version__),
"Pillow=={}".format("6.0.0")
]
with mlflow.start_run() as run:
model = Net().to(device)
optimizer = optim.SGD(
model.parameters(),
lr=args.lr,
momentum=args.momentum)
for epoch in range(1, args.epochs + 1):
train(args, model, device, train_loader, optimizer, epoch)
test(args, model, device, test_loader)
# Log model to run history using MLflow
if args.save_model:
model_env = _mlflow_conda_env(additional_pip_deps=deps)
mlflow.pytorch.log_model(model, "model", conda_env=model_env)
return run
if __name__ == "__main__":
driver()

View File

@@ -0,0 +1,481 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-deploy-pytorch/train-deploy-pytorch.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n",
"\n",
"This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images, and then deploy the model as a web service. You'll learn how to:\n",
"\n",
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
" 2. Create experiment\n",
" 3. Instrument your model with MLflow tracking\n",
" 4. Train a PyTorch model locally\n",
" 5. Train a model on GPU compute on Azure\n",
" 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
" 7. Create a Docker image from the trained model\n",
" 8. Deploy the model as a web service on Azure Container Instance\n",
" 9. Call the model to make predictions\n",
" \n",
"### Pre-requisites\n",
" \n",
"Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
"\n",
"Also, install mlflow-azureml package using ```pip install mlflow-azureml```. Note that mlflow-azureml installs mlflow package itself as a dependency, if you haven't done so previously.\n",
"\n",
"### Set-up\n",
"\n",
"Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys, os\n",
"import mlflow\n",
"import mlflow.azureml\n",
"import mlflow.sklearn\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"print(\"MLflow version:\", mlflow.version.VERSION)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set tracking URI\n",
"\n",
"Set the MLFlow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLFlow APIs will go to Azure ML services and will be tracked under your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Experiment\n",
"\n",
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"pytorch-with-mlflow\"\n",
"mlflow.set_experiment(experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model locally while logging metrics and artifacts\n",
"\n",
"The ```scripts/train.py``` program contains the code to load the image dataset, and train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
"\n",
"Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n",
"\n",
"Let's add the program to search path, import it as a module, and then invoke the driver function. Note that the training can take few minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lib_path = os.path.abspath(\"scripts\")\n",
"sys.path.append(lib_path)\n",
"\n",
"import train"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = train.driver()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can view the metrics of the run at Azure Portal"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(azureml.mlflow.get_portal_url(run))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train model on GPU compute on Azure\n",
"\n",
"Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
"\n",
"Create a PyTorch estimator to specify the training configuration: script, compute as well as additional packages needed. To enable MLflow tracking, include ```azureml-mlflow``` as pip package. The low-level specifications for the training run are encapsulated in the estimator instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.dnn import PyTorch\n",
"\n",
"pt = PyTorch(source_directory=\"./scripts\", \n",
" entry_script = \"train.py\", \n",
" compute_target = \"gpu-cluster\", \n",
" node_count = 1, \n",
" process_count_per_node = 1, \n",
" use_gpu=True,\n",
" pip_packages = [\"azureml-mlflow\", \"Pillow==6.0.0\"])\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get a reference to the experiment you created previously, but this time, as Azure Machine Learning experiment object.\n",
"\n",
"Then, use ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as cached image is used."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"exp = Experiment(ws, experiment_name)\n",
"run = exp.submit(pt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can monitor the run and its metrics on Azure Portal."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, you can wait for run to complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deploy model as web service\n",
"\n",
"To deploy a web service, first create a Docker image, and then deploy that Docker image on inferencing compute.\n",
"\n",
"The ```mlflow.azureml.build_image``` function builds a Docker image from saved PyTorch model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specififies package dependencies for you."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_file_names()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then build a docker image using *runs:/&lt;run.id&gt;/model* as the model_uri path.\n",
"\n",
"Note that the image building can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_path = \"model\"\n",
"\n",
"\n",
"azure_image, azure_model = mlflow.azureml.build_image(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
" workspace=ws,\n",
" model_name='pytorch_mnist',\n",
" image_name='pytorch-mnist-img',\n",
" synchronous=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
"\n",
"[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
"\n",
"Note that the service deployment can take several minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.webservice import AciWebservice, Webservice\n",
"\n",
"aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n",
" memory_gb=5, \n",
" tags={\"data\": \"MNIST\", \"method\" : \"pytorch\"}, \n",
" description=\"Predict using webservice\")\n",
"\n",
"\n",
"# Deploy the image to Azure Container Instances (ACI) for real-time serving\n",
"webservice = Webservice.deploy_from_image(\n",
" image=azure_image, workspace=ws, name=\"pytorch-mnist-1\", deployment_config=aci_config)\n",
"\n",
"\n",
"webservice.wait_for_deployment()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once the deployment has completed you can check the scoring URI of the web service."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(\"Scoring URI is: {}\".format(webservice.scoring_uri))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make predictions using web service\n",
"\n",
"To make the web service, create a test data set as normalized PyTorch tensors. \n",
"\n",
"Then, let's define a utility function that takes a random image and converts it into format and shape suitable for as input to PyTorch inferencing end-point. The conversion is done by: \n",
"\n",
" 1. Select a random (image, label) tuple\n",
" 2. Take the image and converting the tensor to NumPy array \n",
" 3. Reshape array into 1 x 1 x N array\n",
" * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n",
" * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n",
" 4. Convert the NumPy array to list to make it into a built-in type.\n",
" 5. Create a dictionary {\"data\", &lt;list&gt;} that can be converted to JSON string for web service requests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from torchvision import datasets, transforms\n",
"import random\n",
"import numpy as np\n",
"\n",
"test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([\n",
" transforms.ToTensor(),\n",
" transforms.Normalize((0.1307,), (0.3081,))]))\n",
"\n",
"\n",
"def get_random_image():\n",
" image_idx = random.randint(0,len(test_data))\n",
" image_as_tensor = test_data[image_idx][0]\n",
" return {\"data\": elem for elem in image_as_tensor.numpy().reshape(1,1,-1).tolist()}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, invoke the web service using a random test image. Convert the dictionary containing the image to JSON string before passing it to web service.\n",
"\n",
"The response contains the raw scores for each label, with greater value indicating higher probability. Sort the labels and select the one with greatest score to get the prediction. Let's also plot the image sent to web service for comparison purposes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import json\n",
"import matplotlib.pyplot as plt\n",
"\n",
"test_image = get_random_image()\n",
"\n",
"response = webservice.run(json.dumps(test_image))\n",
"\n",
"response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n",
"\n",
"\n",
"print(\"Predicted label:\", response[0][0])\n",
"plt.imshow(np.array(test_image[\"data\"]).reshape(28,28), cmap = \"gray\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also call the web service using a raw POST method against the web service"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"response = requests.post(url=webservice.scoring_uri, data=json.dumps(test_image),headers={\"Content-type\": \"application/json\"})\n",
"print(response.text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"celltoolbar": "Edit Metadata",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"name": "mlflow-sparksummit-pytorch",
"notebookId": 2495374963457641
},
"nbformat": 4,
"nbformat_minor": 1
}

View File

@@ -0,0 +1,8 @@
name: train-and-deploy-pytorch
dependencies:
- matplotlib
- pip:
- azureml-sdk
- azureml-mlflow
- https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl
- https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp35-cp35m-win_amd64.whl

View File

@@ -0,0 +1,248 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-local/train-local.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MLflow with Azure Machine Learning for Local Training Run\n",
"\n",
"This example shows you how to use mlflow tracking APIs together with Azure Machine Learning services for storing your metrics and artifacts, from local Notebook run. You'll learn how to:\n",
"\n",
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
" 2. Create experiment\n",
" 3. Train a model on your local computer while logging metrics and artifacts\n",
" 4. View your experiment within your Azure ML Workspace in Azure Portal.\n",
"\n",
"## Prerequisites and Set-up\n",
"\n",
"Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
"\n",
"Install azureml-mlflow package before running this notebook. Note that mlflow itself gets installed as dependency if you haven't installed it yet.\n",
"\n",
"```\n",
"pip install azureml-mlflow\n",
"```\n",
"\n",
"This example also uses scikit-learn and matplotlib packages. Install them:\n",
"```\n",
"pip install scikit-learn matplotlib\n",
"```\n",
"\n",
"Then, import necessary packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"import mlflow.sklearn\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Check core SDK version number\n",
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set tracking URI\n",
"\n",
"Set the MLflow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLflow APIs will go to Azure ML services and will be tracked under your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create Experiment\n",
"\n",
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"experiment-with-mlflow\"\n",
"mlflow.set_experiment(experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create training and test data set\n",
"\n",
"This example uses diabetes dataset to build a simple regression model. Let's load the dataset and split it into training and test sets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.datasets import load_diabetes\n",
"from sklearn.linear_model import Ridge\n",
"from sklearn.metrics import mean_squared_error\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"X, y = load_diabetes(return_X_y = True)\n",
"columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
"data = {\n",
" \"train\":{\"X\": X_train, \"y\": y_train}, \n",
" \"test\":{\"X\": X_test, \"y\": y_test}\n",
"}\n",
"\n",
"print (\"Data contains\", len(data['train']['X']), \"training samples and\",len(data['test']['X']), \"test samples\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train while logging metrics and artifacts\n",
"\n",
"Next, start a mlflow run to train a scikit-learn regression model. Note that the training script has been instrumented using MLflow to:\n",
" * Log model hyperparameter alpha value\n",
" * Log mean squared error against test set\n",
" * Save the scikit-learn based regression model produced by training\n",
" * Save an image that shows actuals vs predictions against test set.\n",
" \n",
"These metrics and artifacts have been recorded to your Azure ML Workspace; in the next step you'll learn how to view them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a run object in the experiment\n",
"model_save_path = \"model\"\n",
"\n",
"with mlflow.start_run() as run:\n",
" # Log the algorithm parameter alpha to the run\n",
" mlflow.log_metric('alpha', 0.03)\n",
" # Create, fit, and test the scikit-learn Ridge regression model\n",
" regression_model = Ridge(alpha=0.03)\n",
" regression_model.fit(data['train']['X'], data['train']['y'])\n",
" preds = regression_model.predict(data['test']['X'])\n",
"\n",
" # Log mean squared error\n",
" print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))\n",
" mlflow.log_metric('mse', mean_squared_error(data['test']['y'], preds))\n",
" \n",
" # Save the model to the outputs directory for capture\n",
" mlflow.sklearn.log_model(regression_model,model_save_path)\n",
" \n",
" # Plot actuals vs predictions and save the plot within the run\n",
" fig = plt.figure(1)\n",
" idx = np.argsort(data['test']['y'])\n",
" plt.plot(data['test']['y'][idx],preds[idx])\n",
" fig.savefig(\"actuals_vs_predictions.png\")\n",
" mlflow.log_artifact(\"actuals_vs_predictions.png\") "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can open the report page for your experiment and runs within it from Azure Portal.\n",
"\n",
"Select one of the runs to view the metrics, and the plot you saved. The saved scikit-learn model appears under **outputs** tab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws.experiments[experiment_name]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
"Try out these notebooks to learn more about MLflow-Azure Machine Learning integration:\n",
"\n",
" * [Train a model using remote compute on Azure Cloud](../train-on-remote/train-on-remote.ipynb)\n",
" * [Deploy the model as a web service](../deploy-model/deploy-model.ipynb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "rastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,7 @@
name: train-local
dependencies:
- scikit-learn
- matplotlib
- pip:
- azureml-sdk
- azureml-mlflow

View File

@@ -0,0 +1,318 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use MLflow with Azure Machine Learning for Remote Training Run\n",
"\n",
"This example shows you how to use MLflow tracking APIs together with Azure Machine Learning services for storing your metrics and artifacts, from local Notebook run. You'll learn how to:\n",
"\n",
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
" 2. Create experiment\n",
" 3. Train a model on Machine Learning Compute while logging metrics and artifacts\n",
" 4. View your experiment within your Azure ML Workspace in Azure Portal."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set-up\n",
"\n",
"Check Azure ML SDK version installed on your computer, and then connect to your Workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check core SDK version number\n",
"import azureml.core\n",
"from azureml.core import Workspace, Experiment\n",
"\n",
"print(\"SDK version:\", azureml.core.VERSION)\n",
"\n",
"ws = Workspace.from_config()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's also create a Machine Learning Compute cluster for submitting the remote run. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print(\"Found existing cpu-cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new cpu-cluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
" min_nodes=0,\n",
" max_nodes=1)\n",
"\n",
" # Create the cluster with the specified name and configuration\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
" \n",
" # Wait for the cluster to complete, show the output log\n",
" cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create Azure ML Experiment"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following steps show how to submit a training Python script to a cluster as an Azure ML run, while logging happens through MLflow APIs to your Azure ML Workspace. Let's first create an experiment to hold the training runs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment\n",
"\n",
"experiment_name = \"experiment-with-mlflow\"\n",
"exp = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Instrument remote training script using MLflow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use [*train_diabetes.py*](train_diabetes.py) to train a regression model against diabetes dataset as the example. Note that the training script uses mlflow.start_run() to start logging, and then logs metrics, saves the trained scikit-learn model, and saves a plot as an artifact.\n",
"\n",
"Run following command to view the script file. Notice the mlflow logging statements, and also notice that the script doesn't have explicit dependencies on azureml library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"training_script = 'train_diabetes.py'\n",
"with open(training_script, 'r') as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit Run to Cluster \n",
"\n",
"Let's submit the run to cluster. When running on the remote cluster as submitted run, Azure ML sets the MLflow tracking URI to point to your Azure ML Workspace, so that the metrics and artifacts are automatically logged there.\n",
"\n",
"Note that you have to specify the packages your script depends on, including *azureml-mlflow* that implicitly enables the MLflow logging to Azure ML. \n",
"\n",
"First, create a environment with Docker enable and required package dependencies specified."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"mlflow"
]
},
"outputs": [],
"source": [
"from azureml.core import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"env = Environment(name=\"mlflow-env\")\n",
"\n",
"env.docker.enabled = True\n",
"\n",
"# Specify conda dependencies with scikit-learn and temporary pointers to mlflow extensions\n",
"cd = CondaDependencies.create(\n",
" conda_packages=[\"scikit-learn\", \"matplotlib\"],\n",
" pip_packages=[\"azureml-mlflow\", \"numpy\"]\n",
" )\n",
"\n",
"env.python.conda_dependencies = cd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, specify a script run configuration that includes the training script, environment and CPU cluster created earlier."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=\".\", script=training_script)\n",
"src.run_config.environment = env\n",
"src.run_config.target = cpu_cluster.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, submit the run. Note that the first instance of the run typically takes longer as the Docker-based environment is created, several minutes. Subsequent runs reuse the image and are faster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run = exp.submit(src)\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can navigate to your Azure ML Workspace at Azure Portal to view the run metrics and artifacts. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also get the metrics and bring them to your local notebook, and view the details of the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws.get_details()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Next steps\n",
"\n",
" * [Deploy the model as a web service](../deploy-model/deploy-model.ipynb)\n",
" * [Learn more about Azure Machine Learning compute options](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"authors": [
{
"name": "rastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: train-remote
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,46 @@
# Copyright (c) Microsoft. All rights reserved.
# Licensed under the MIT license.
import numpy as np
from sklearn.datasets import load_diabetes
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import mlflow
import mlflow.sklearn
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
with mlflow.start_run():
X, y = load_diabetes(return_X_y=True)
columns = ['age', 'gender', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
data = {
"train": {"X": X_train, "y": y_train},
"test": {"X": X_test, "y": y_test}}
mlflow.log_metric("Training samples", len(data['train']['X']))
mlflow.log_metric("Test samples", len(data['test']['X']))
# Log the algorithm parameter alpha to the run
mlflow.log_metric('alpha', 0.03)
# Create, fit, and test the scikit-learn Ridge regression model
regression_model = Ridge(alpha=0.03)
regression_model.fit(data['train']['X'], data['train']['y'])
preds = regression_model.predict(data['test']['X'])
# Log mean squared error
print('Mean Squared Error is', mean_squared_error(data['test']['y'], preds))
mlflow.log_metric('mse', mean_squared_error(data['test']['y'], preds))
# Save the model to the outputs directory for capture
mlflow.sklearn.log_model(regression_model, "model")
# Plot actuals vs predictions and save the plot within the run
fig = plt.figure(1)
idx = np.argsort(data['test']['y'])
plt.plot(data['test']['y'][idx], preds[idx])
fig.savefig("actuals_vs_predictions.png")
mlflow.log_artifact("actuals_vs_predictions.png")

View File

@@ -286,7 +286,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"estimator-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.estimator import Estimator\n", "from azureml.train.estimator import Estimator\n",

View File

@@ -252,7 +252,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"dnn-chainer-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.dnn import Chainer\n", "from azureml.train.dnn import Chainer\n",

View File

@@ -250,7 +250,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"dnn-pytorch-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.dnn import PyTorch\n", "from azureml.train.dnn import PyTorch\n",

View File

@@ -412,7 +412,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"dnn-tensorflow-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.dnn import TensorFlow\n", "from azureml.train.dnn import TensorFlow\n",

View File

@@ -7,8 +7,6 @@ Follow these sample notebooks to learn:
3. [Train on remote VM](train-on-remote-vm): train a model using a remote Azure VM as compute target. 3. [Train on remote VM](train-on-remote-vm): train a model using a remote Azure VM as compute target.
4. [Train on ML Compute](train-on-amlcompute): train a model using an ML Compute cluster as compute target. 4. [Train on ML Compute](train-on-amlcompute): train a model using an ML Compute cluster as compute target.
5. [Train in an HDI Spark cluster](train-in-spark): train a Spark ML model using an HDInsight Spark cluster as compute target. 5. [Train in an HDI Spark cluster](train-in-spark): train a Spark ML model using an HDInsight Spark cluster as compute target.
6. [Logging API](logging-api): experiment with various logging functions to create runs and automatically generate graphs. 6. [Train and hyperparameter tune on Iris Dataset with Scikit-learn](train-hyperparameter-tune-deploy-with-sklearn): train a model using the Scikit-learn estimator and tune hyperparameters with Hyperdrive.
7. [Manage runs](manage-runs): learn different ways how to start runs and child runs, monitor them, and cancel them.
8. [Train and hyperparameter tune on Iris Dataset with Scikit-learn](train-hyperparameter-tune-deploy-with-sklearn): train a model using the Scikit-learn estimator and tune hyperparameters with Hyperdrive.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/README.png) ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/README.png)

View File

@@ -298,7 +298,11 @@
{ {
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {
"tags": [
"sklearn-remarks-sample"
]
},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.train.sklearn import SKLearn\n", "from azureml.train.sklearn import SKLearn\n",

View File

@@ -332,7 +332,11 @@
"\n", "\n",
"* [Train on ML Compute](../../train-on-amlcompute)\n", "* [Train on ML Compute](../../train-on-amlcompute)\n",
"\n", "\n",
"* [Train on remote VM](../../train-on-remote-vm)" "* [Train on remote VM](../../train-on-remote-vm)\n",
"\n",
"Learn more about registering and deploying a model:\n",
"\n",
"* [Model Register and Deploy](../../deploy-to-cloud/model-register-and-deploy.ipynb)"
] ]
}, },
{ {

View File

@@ -0,0 +1,9 @@
# Work With Data Using Azure Machine Learning Service
Azure Machine Learning Datasets (preview) make it easier to access and work with your data. Datasets manage data in various scenarios such as model training and pipeline creation. Using the Azure Machine Learning SDK, you can access underlying storage, explore and prepare data, manage the life cycle of different Dataset definitions, and compare between Datasets used in training and in production.
- For an example of using Datasets, see the [sample](datasets).
- For advanced data preparation examples, see [dataprep](dataprep).
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/README..png)

View File

@@ -0,0 +1,300 @@
# Azure Machine Learning Data Prep SDK
The Azure Machine Learning Data Prep SDK helps data scientists explore, cleanse and transform data for machine learning workflows in any Python environment.
Key benefits to the SDK:
- Cross-platform functionality. Write with a single SDK and run it on Windows, macOS, or Linux.
- Intelligent transformations powered by AI, including grouping similar values to their canonical form and deriving columns by examples without custom code.
- Capability to work with large, multiple files of different schema.
- Scalability on a single machine by streaming data during processing rather than loading into memory.
- Seamless integration with other Azure Machine Learning services. You can simply pass your prepared data file into `AutoMLConfig` object for automated machine learning training.
You will find in this repo:
- [Getting Started Tutorial](tutorials/getting-started/getting-started.ipynb) for a quick introduction to the main features of Data Prep SDK.
- [Case Study Notebooks](case-studies/new-york-taxi) that present an end-to-end data preparation tutorial where users start with small dataset, profile data with statistics summary, cleanse and perform feature engineering. All transformation steps are saved in a dataflow object. Users can easily reapply the same steps on the full dataset, and run it on Spark.
- [How-To Guide Notebooks](how-to-guides) for more in-depth sample code at feature level.
## Installation
Here are the [SDK installation steps](https://aka.ms/aml-data-prep-installation).
## Documentation
Here is more information on how to use the new Data Prep SDK:
- [SDK overview and API reference docs](http://aka.ms/data-prep-sdk) that show different classes, methods, and function parameters for the SDK.
- [Tutorial: Prep NYC taxi data](https://docs.microsoft.com/azure/machine-learning/service/tutorial-data-prep) for regression modeling and then run automated machine learning to build the model.
- [How to load data](https://docs.microsoft.com/azure/machine-learning/service/how-to-load-data) is an overview guide on how to load data using the Data Prep SDK.
- [How to transform data](https://docs.microsoft.com/azure/machine-learning/service/how-to-transform-data) is an overview guide on how to transform data.
- [How to write data](https://docs.microsoft.com/azure/machine-learning/service/how-to-write-data) is an overview guide on how to write data to different storage locations.
## Support
If you have any questions or feedback, send us an email at: [askamldataprep@microsoft.com](mailto:askamldataprep@microsoft.com).
## Release Notes
### 2019-07-25 (version 1.1.9)
New features
- Added support for reading a file directly from a http or https url.
Bug fixes and improvements
- Improved error message when attempting to read a Parquet Dataset from a remote source (which is not currently supported).
- Fixed a bug when writing to Parquet file format in ADLS Gen 2, and updating the ADLS Gen 2 container name in the path.
### 2019-07-09 (version 1.1.8)
New features
- Dataflow objects can now be iterated over, producing a sequence of records. See documentation for `Dataflow.to_record_iterator`.
Bug fixes and improvements
- Increased the robustness of DataPrep SDK.
- Improved handling of pandas DataFrames with non-string Column Indexes.
- Improved the performance of `to_pandas_dataframe` in Datasets.
- Fixed a bug where Spark execution of Datasets failed when run in a multi-node environment.
### 2019-07-01 (version 1.1.7)
We reverted a change that improved performance, as it was causing issues for some customers using Azure Databricks. If you experienced an issue on Azure Databricks, you can upgrade to version 1.1.7 using one of the methods below:
1. Run this script to upgrade: `%sh /home/ubuntu/databricks/python/bin/pip install azureml-dataprep==1.1.7`
2. Recreate the cluster, which will install the latest Data Prep SDK version.
### 2019-06-24 (version 1.1.6)
New features
- Added summary functions for top values (`SummaryFunction.TOPVALUES`) and bottom values (`SummaryFunction.BOTTOMVALUES`).
Bug fixes and improvements
- Significantly improved the performance of `read_pandas_dataframe`.
- Fixed a bug that would cause `get_profile()` on a Dataflow pointing to binary files to fail.
- Exposed `set_diagnostics_collection()` to allow for programmatic enabling/disabling of the telemetry collection.
- Changed the behavior of `get_profile()`. NaN values are now ignored for Min, Mean, Std, and Sum, which aligns with the behavior of Pandas.
### 2019-06-10 (version 1.1.5)
Bug fixes and improvements
- For interpreted datetime values that have a 2-digit year format, the range of valid years has been updated to match Windows May Release. The range has been changed from 1930-2029 to 1950-2049.
- When reading in a file and setting `handleQuotedLineBreaks=True`, `\r` will be treated as a new line.
- Fixed a bug that caused `read_pandas_dataframe` to fail in some cases.
- Improved performance of `get_profile`.
- Improved error messages.
### 2019-05-28 (version 1.1.4)
New features
- You can now use the following expression language functions to extract and parse datetime values into new columns.
- `RegEx.extract_record()` extracts datetime elements into a new column.
- `create_datetime()` creates datetime objects from separate datetime elements.
- When calling `get_profile()`, you can now see that quantile columns are labeled as (est.) to clearly indicate that the values are approximations.
- You can now use ** globbing when reading from Azure Blob Storage.
- e.g. `dprep.read_csv(path='https://yourblob.blob.core.windows.net/yourcontainer/**/data/*.csv')`
Bug fixes
- Fixed a bug related to reading a Parquet file from a remote source (Azure Blob).
### 2019-05-08 (version 1.1.3)
New features
- Added support to read from a PostgresSQL database, either by calling `read_postgresql` or using a Datastore.
- See examples in how-to guides:
- [Data Ingestion notebook](https://aka.ms/aml-data-prep-ingestion-nb)
- [Datastore notebook](https://aka.ms/aml-data-prep-datastore-nb)
Bug fixes and improvements
- Fixed issues with column type conversion:
- Now correctly converts a boolean or numeric column to a boolean column.
- Now does not fail when attempting to set a date column to be date type.
- Improved JoinType types and accompanying reference documentation. When joining two dataflows, you can now specify one of these types of join:
- NONE, MATCH, INNER, UNMATCHLEFT, LEFTANTI, LEFTOUTER, UNMATCHRIGHT, RIGHTANTI, RIGHTOUTER, FULLANTI, FULL.
- Improved data type inference to recognize more date formats.
### 2019-04-17 (version 1.1.2)
Note: Data Prep Python SDK will no longer install `numpy` and `pandas` packages. See [updated installation instructions](https://aka.ms/aml-data-prep-installation).
New features
- You can now use the Pivot transform.
- How-to guide: [Pivot notebook](https://aka.ms/aml-data-prep-pivot-nb)
- You can now use regular expressions in native functions.
- Examples:
- `dflow.filter(dprep.RegEx('pattern').is_match(dflow['column_name']))`
- `dflow.assert_value('column_name', dprep.RegEx('pattern').is_match(dprep.value))`
- You can now use `to_upper` and `to_lower` functions in expression language.
- You can now see the number of unique values of each column in a data profile.
- For some of the commonly used reader steps, you can now pass in the `infer_column_types` argument. If it is set to `True`, Data Prep will attempt to detect and automatically convert column types.
- `inference_arguments` is now deprecated.
- You can now call `Dataflow.shape`.
Bug fixes and improvements
- `keep_columns` now accepts an additional optional argument `validate_column_exists`, which checks if the result of `keep_columns` will contain any columns.
- All reader steps (which read from a file) now accept an additional optional argument `verify_exists`.
- Improved performance of reading from pandas dataframe and getting data profiles.
- Fixed a bug where slicing a single step from a Dataflow failed with a single index.
### 2019-04-08 (version 1.1.1)
New features
- You can read multiple Datastore/DataPath/DataReference sources using read_* transforms.
- You can perform the following operations on columns to create a new column: division, floor, modulo, power, length.
- Data Prep is now part of the Azure ML diagnostics suite and will log diagnostic information by default.
- To turn this off, set this environment variable to true: DISABLE_DPREP_LOGGER
Bug fixes and improvements
- Improved code documentation for commonly used classes and functions.
- Fixed a bug in auto_read_file that failed to read Excel files.
- Added option to overwrite the folder in read_pandas_dataframe.
- Improved performance of dotnetcore2 dependency installation, and added support for Fedora 27/28 and Ubuntu 1804.
- Improved the performance of reading from Azure Blobs.
- Column type detection now supports columns of type Long.
- Fixed a bug where some date values were being displayed as timestamps instead of Python datetime objects.
- Fixed a bug where some type counts were being displayed as doubles instead of integers.
### 2019-03-25 (version 1.1.0)
Breaking changes
- The concept of the Data Prep Package has been deprecated and is no longer supported. Instead of persisting multiple Dataflows in one Package, you can persist Dataflows individually.
- How-to guide: [Opening and Saving Dataflows notebook](https://aka.ms/aml-data-prep-open-save-dataflows-nb)
New features
- Data Prep can now recognize columns that match a particular Semantic Type, and split accordingly. The STypes currently supported include: email address, geographic coordinates (latitude & longitude), IPv4 and IPv6 addresses, US phone number, and US zip code.
- How-to guide: [Semantic Types notebook](https://aka.ms/aml-data-prep-semantic-types-nb)
- Data Prep now supports the following operations to generate a resultant column from two numeric columns: subtract, multiply, divide, and modulo.
- You can call `verify_has_data()` on a Dataflow to check whether the Dataflow would produce records if executed.
Bug fixes and improvements
- You can now specify the number of bins to use in a histogram for numeric column profiles.
- The `read_pandas_dataframe` transform now requires the DataFrame to have string- or byte- typed column names.
- Fixed a bug in the `fill_nulls` transform, where values were not correctly filled in if the column was missing.
### 2019-03-11 (version 1.0.17)
New features
- Now supports adding two numeric columns to generate a resultant column using the expression language.
Bug fixes and improvements
- Improved the documentation and parameter checking for random_split.
### 2019-02-27 (version 1.0.16)
Bug fix
- Fixed a Service Principal authentication issue that was caused by an API change.
### 2019-02-25 (version 1.0.15)
New features
- Data Prep now supports writing file streams from a dataflow. Also provides the ability to manipulate the file stream names to create new file names.
- How-to guide: [Working With File Streams notebook](https://aka.ms/aml-data-prep-file-stream-nb)
Bug fixes and improvements
- Improved performance of t-Digest on large data sets.
- Data Prep now supports reading data from a DataPath.
- One hot encoding now works on boolean and numeric columns.
- Other miscellaneous bug fixes.
### 2019-02-11 (version 1.0.12)
New features
- Data Prep now supports reading from an Azure SQL database using Datastore.
Changes
- Significantly improved the memory performance of certain operations on large data.
- `read_pandas_dataframe()` now requires `temp_folder` to be specified.
- The `name` property on `ColumnProfile` has been deprecated - use `column_name` instead.
### 2019-01-28 (version 1.0.8)
Bug fixes
- Significantly improved the performance of getting data profiles.
- Fixed minor bugs related to error reporting.
### 2019-01-14 (version 1.0.7)
New features
- Datastore improvements (documented in [Datastore how-to-guide](https://aka.ms/aml-data-prep-datastore-nb))
- Added ability to read from and write to Azure File Share and ADLS Datastores in scale-up.
- When using Datastores, Data Prep now supports using service principal authentication instead of interactive authentication.
- Added support for wasb and wasbs urls.
### 2019-01-09 (version 1.0.6)
Bug fixes
- Fixed bug with reading from public readable Azure Blob containers on Spark.
### 2018-12-19 (version 1.0.4)
New features
- `to_bool` function now allows mismatched values to be converted to Error values. This is the new default mismatch behavior for `to_bool` and `set_column_types`, whereas the previous default behavior was to convert mismatched values to False.
- When calling `to_pandas_dataframe`, there is a new option to interpret null/missing values in numeric columns as NaN.
- Added ability to check the return type of some expressions to ensure type consistency and fail early.
- You can now call `parse_json` to parse values in a column as JSON objects and expand them into multiple columns.
Bug fixes
- Fixed a bug that crashed `set_column_types` in Python 3.5.2.
- Fixed a bug that crashed when connecting to Datastore using an AML image.
### 2018-12-07 (version 0.5.3)
Fixed missing dependency issue for .NET Core2 on Ubuntu 16.
### 2018-12-03 (version 0.5.2)
Breaking changes
- `SummaryFunction.N` was renamed to `SummaryFunction.Count`.
Bug fixes
- Use latest AML Run Token when reading from and writing to datastores on remote runs. Previously, if the AML Run Token is updated in Python, the Data Prep runtime will not be updated with the updated AML Run Token.
- Additional clearer error messages
- to_spark_dataframe() will no longer crash when Spark uses Kryo serialization
- Value Count Inspector can now show more than 1000 unique values
- Random Split no longer fails if the original Dataflow doesnt have a name
### 2018-11-19 (version 0.5.0)
New features
- Created a new DataPrep CLI to execute DataPrep packages and view the data profile for a dataset or dataflow
- Redesigned SetColumnType API to improve usability
- Renamed smart_read_file to auto_read_file
- Now includes skew and kurtosis in the Data Profile
- Can sample with stratified sampling
- Can read from zip files that contain CSV files
- Can split datasets row-wise with Random Split (e.g. into test-train sets)
- Can get all the column data types from a dataflow or a data profile by calling .dtypes
- Can get the row count from a dataflow or a data profile by calling .row_count
Bug fixes
- Fixed long to double conversion
- Fixed assert after any add column
- Fixed an issue with FuzzyGrouping, where it would not detect groups in some cases
- Fixed sort function to respect multi-column sort order
- Fixed and/or expressions to be similar to how Pandas handles them
- Fixed reading from dbfs path.
- Made error messages more understandable
- Now no longer fails when reading on remote compute target using AML token
- Now no longer fails on Linux DSVM
- Now no longer crashes when non-string values are in string predicates
- Now handles assertion errors when Dataflow should fail correctly
- Now supports dbutils mounted storage locations on Azure Databricks
### 2018-11-05 (version 0.4.0)
New features
- Type Count added to Data Profile
- Value Count and Histogram is now available
- More percentiles in Data Profile
- The Median is available in Summarize
- Python 3.7 is now supported
- When you save a dataflow that contains datastores to a Data Prep package, the datastore information will be persisted as part of the Data Prep package
- Writing to datastore is now supported
Bug fixes
- 64bit unsigned integer overflows are now handled properly on Linux
- Fixed incorrect text label for plain text files in smart_read
- String column type now shows up in metrics view
- Type count now is fixed to show ValueKinds mapped to single FieldType instead of individual ones
- Write_to_csv no longer fails when path is provided as a string
- When using Replace, leaving “find” blank will no longer fail
## Datasets License Information
IMPORTANT: Please read the notice and find out more about this NYC Taxi and Limousine Commission dataset here: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
IMPORTANT: Please read the notice and find out more about this Chicago Police Department dataset here: https://catalog.data.gov/dataset/crimes-2001-to-present-398a4
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/README.png)

View File

@@ -0,0 +1,513 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Cleaning up New York Taxi Cab data\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use DataPrep to clean and featurize the data which can then be used to predict taxi trip duration. We will not use the For Hire Vehicle (FHV) datasets as they are not really taxi rides and they don't provide drop-off time and geo-coordinates."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from IPython.display import display\n",
"from os import path\n",
"from tempfile import mkdtemp\n",
"\n",
"import pandas as pd\n",
"import azureml.dataprep as dprep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a quick peek at yellow cab data and green cab data to see what the data looks like. DataPrep supports globing, so you will notice below that we have added a `*` in the path.\n",
"\n",
"*We are using a small sample of the taxi data for this demo. You can find a bigger sample ~6GB by changing \"green-small\" to \"green-sample\" and \"yellow-small\" to \"yellow-sample\" in the paths below.*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pd.set_option('display.max_columns', None)\n",
"\n",
"cache_location = mkdtemp()\n",
"green_path = \"https://dprepdata.blob.core.windows.net/demo/green-small/*\"\n",
"yellow_path = \"https://dprepdata.blob.core.windows.net/demo/yellow-small/*\"\n",
"# (optional) Download and view a subset of the data: https://dprepdata.blob.core.windows.net/demo/green-small/green_tripdata_2013-08.csv\n",
"\n",
"print(\"Retrieving data from the following two sources:\")\n",
"print(green_path)\n",
"print(yellow_path)\n",
"\n",
"green_df = dprep.read_csv(path=green_path, header=dprep.PromoteHeadersMode.GROUPED)\n",
"yellow_df = dprep.auto_read_file(path=yellow_path)\n",
"\n",
"display(green_df.head(5))\n",
"display(yellow_df.head(5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Cleanup"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define some shortcut transforms that will apply to all Dataflows."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"all_columns = dprep.ColumnSelector(term=\".*\", use_regex=True)\n",
"drop_if_all_null = [all_columns, dprep.ColumnRelationship(dprep.ColumnRelationship.ALL)]\n",
"useful_columns = [\n",
" \"cost\", \"distance\"\"distance\", \"dropoff_datetime\", \"dropoff_latitude\", \"dropoff_longitude\",\n",
" \"passengers\", \"pickup_datetime\", \"pickup_latitude\", \"pickup_longitude\", \"store_forward\", \"vendor\"\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's first work with the green taxi data and get it into a good shape that then can be combined with the yellow taxi data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (green_df\n",
" .replace_na(columns=all_columns)\n",
" .drop_nulls(*drop_if_all_null)\n",
" .rename_columns(column_pairs={\n",
" \"VendorID\": \"vendor\",\n",
" \"lpep_pickup_datetime\": \"pickup_datetime\",\n",
" \"Lpep_dropoff_datetime\": \"dropoff_datetime\",\n",
" \"lpep_dropoff_datetime\": \"dropoff_datetime\",\n",
" \"Store_and_fwd_flag\": \"store_forward\",\n",
" \"store_and_fwd_flag\": \"store_forward\",\n",
" \"Pickup_longitude\": \"pickup_longitude\",\n",
" \"Pickup_latitude\": \"pickup_latitude\",\n",
" \"Dropoff_longitude\": \"dropoff_longitude\",\n",
" \"Dropoff_latitude\": \"dropoff_latitude\",\n",
" \"Passenger_count\": \"passengers\",\n",
" \"Fare_amount\": \"cost\",\n",
" \"Trip_distance\": \"distance\"\n",
" })\n",
" .keep_columns(columns=useful_columns))\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"green_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's do the same thing to yellow taxi data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (yellow_df\n",
" .replace_na(columns=all_columns)\n",
" .drop_nulls(*drop_if_all_null)\n",
" .rename_columns(column_pairs={\n",
" \"vendor_name\": \"vendor\",\n",
" \"VendorID\": \"vendor\",\n",
" \"vendor_id\": \"vendor\",\n",
" \"Trip_Pickup_DateTime\": \"pickup_datetime\",\n",
" \"tpep_pickup_datetime\": \"pickup_datetime\",\n",
" \"Trip_Dropoff_DateTime\": \"dropoff_datetime\",\n",
" \"tpep_dropoff_datetime\": \"dropoff_datetime\",\n",
" \"store_and_forward\": \"store_forward\",\n",
" \"store_and_fwd_flag\": \"store_forward\",\n",
" \"Start_Lon\": \"pickup_longitude\",\n",
" \"Start_Lat\": \"pickup_latitude\",\n",
" \"End_Lon\": \"dropoff_longitude\",\n",
" \"End_Lat\": \"dropoff_latitude\",\n",
" \"Passenger_Count\": \"passengers\",\n",
" \"passenger_count\": \"passengers\",\n",
" \"Fare_Amt\": \"cost\",\n",
" \"fare_amount\": \"cost\",\n",
" \"Trip_Distance\": \"distance\",\n",
" \"trip_distance\": \"distance\"\n",
" })\n",
" .keep_columns(columns=useful_columns))\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"yellow_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now append the rows from the `yellow_df` to `green_df`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = green_df.append_rows(dataflows=[yellow_df])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at the pickup and drop-off coordinates' data profile to see how the data is distributed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"decimal_type = dprep.TypeConverter(data_type=dprep.FieldType.DECIMAL)\n",
"combined_df = combined_df.set_column_types(type_conversions={\n",
" \"pickup_longitude\": decimal_type,\n",
" \"pickup_latitude\": decimal_type,\n",
" \"dropoff_longitude\": decimal_type,\n",
" \"dropoff_latitude\": decimal_type\n",
"})\n",
"combined_df.keep_columns(columns=[\n",
" \"pickup_longitude\", \"pickup_latitude\", \n",
" \"dropoff_longitude\", \"dropoff_latitude\"\n",
"]).get_profile()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the data profile, we can see that there are coordinates that are missing and coordinates that are not in New York. Let's filter out coordinates not in the [city border](https://mapmakerapp.com?map=5b60a055a191245990310739f658)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (combined_df\n",
" .drop_nulls(\n",
" columns=[\"pickup_longitude\", \"pickup_latitude\", \"dropoff_longitude\", \"dropoff_latitude\"],\n",
" column_relationship=dprep.ColumnRelationship(dprep.ColumnRelationship.ANY)\n",
" ) \n",
" .filter(dprep.f_and(\n",
" dprep.col(\"pickup_longitude\") <= -73.72,\n",
" dprep.col(\"pickup_longitude\") >= -74.09,\n",
" dprep.col(\"pickup_latitude\") <= 40.88,\n",
" dprep.col(\"pickup_latitude\") >= 40.53,\n",
" dprep.col(\"dropoff_longitude\") <= -73.72,\n",
" dprep.col(\"dropoff_longitude\") >= -74.09,\n",
" dprep.col(\"dropoff_latitude\") <= 40.88,\n",
" dprep.col(\"dropoff_latitude\") >= 40.53\n",
" )))\n",
"tmp_df.keep_columns(columns=[\n",
" \"pickup_longitude\", \"pickup_latitude\", \n",
" \"dropoff_longitude\", \"dropoff_latitude\"\n",
"]).get_profile()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's take a look at the data profile for the `store_forward` column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df.keep_columns(columns='store_forward').get_profile()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the data profile of `store_forward` above, we can see that the data is inconsistent and there are missing values. Let's fix them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = combined_df.replace(columns=\"store_forward\", find=\"0\", replace_with=\"N\").fill_nulls(\"store_forward\", \"N\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now split the pick up and drop off datetimes into a date column and a time column. We will use `split_column_by_example` to perform the split. If the `example` parameter of `split_column_by_example` is omitted, we will automatically try to figure out where to split based on the data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (combined_df\n",
" .split_column_by_example(source_column=\"pickup_datetime\")\n",
" .split_column_by_example(source_column=\"dropoff_datetime\"))\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's rename the columns generated by `split_column_by_example` into meaningful names."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (combined_df\n",
" .rename_columns(column_pairs={\n",
" \"pickup_datetime_1\": \"pickup_date\",\n",
" \"pickup_datetime_2\": \"pickup_time\",\n",
" \"dropoff_datetime_1\": \"dropoff_date\",\n",
" \"dropoff_datetime_2\": \"dropoff_time\"\n",
" }))\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature Engineering"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Datetime features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's split the pickup and drop-off date further into day of week, day of month, and month. For pickup and drop-off time columns, we will split it into hour, minute, and second."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = (combined_df\n",
" .derive_column_by_example(\n",
" source_columns=\"pickup_date\", \n",
" new_column_name=\"pickup_weekday\", \n",
" example_data=[(\"2009-01-04\", \"Sunday\"), (\"2013-08-22\", \"Thursday\")]\n",
" )\n",
" .derive_column_by_example(\n",
" source_columns=\"dropoff_date\",\n",
" new_column_name=\"dropoff_weekday\",\n",
" example_data=[(\"2013-08-22\", \"Thursday\"), (\"2013-11-03\", \"Sunday\")]\n",
" )\n",
" .split_column_by_example(source_column=\"pickup_date\")\n",
" .split_column_by_example(source_column=\"pickup_time\")\n",
" .split_column_by_example(source_column=\"dropoff_date\")\n",
" .split_column_by_example(source_column=\"dropoff_time\")\n",
" .split_column_by_example(source_column=\"pickup_time_1\")\n",
" .split_column_by_example(source_column=\"dropoff_time_1\")\n",
" .drop_columns(columns=[\n",
" \"pickup_date\", \"pickup_time\", \"dropoff_date\", \"dropoff_time\", \n",
" \"pickup_date_1\", \"dropoff_date_1\", \"pickup_time_1\", \"dropoff_time_1\"\n",
" ])\n",
" .rename_columns(column_pairs={\n",
" \"pickup_date_2\": \"pickup_month\",\n",
" \"pickup_date_3\": \"pickup_monthday\",\n",
" \"pickup_time_1_1\": \"pickup_hour\",\n",
" \"pickup_time_1_2\": \"pickup_minute\",\n",
" \"pickup_time_2\": \"pickup_second\",\n",
" \"dropoff_date_2\": \"dropoff_month\",\n",
" \"dropoff_date_3\": \"dropoff_monthday\",\n",
" \"dropoff_time_1_1\": \"dropoff_hour\",\n",
" \"dropoff_time_1_2\": \"dropoff_minute\",\n",
" \"dropoff_time_2\": \"dropoff_second\"\n",
" }))\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From the data above, we can see that the pickup and drop-off date and time components produced from the transforms above looks good. Let's drop the `pickup_datetime` and `dropoff_datetime` columns as they are no longer needed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tmp_df = combined_df.drop_columns(columns=[\"pickup_datetime\", \"dropoff_datetime\"])\n",
"tmp_df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"combined_df = tmp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now save the transformation steps into a DataPrep package so we can use it to to run on spark."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow_path = path.join(mkdtemp(), \"new_york_taxi.dprep\")\n",
"combined_df.save(file_path=dflow_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi.png)"
]
}
],
"metadata": {
"authors": [
{
"name": "sihhu"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,135 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scale-Out Data Preparation\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once we are done with preparing and featurizing the data locally, we can run the same steps on the full dataset in scale-out mode. The new york taxi cab data is about 300GB in total, which is perfect for scale-out. Let's start by downloading the package we saved earlier to disk. Feel free to run the `new_york_taxi_cab.ipynb` notebook to generate the package yourself, in which case you may comment out the download code and set the `package_path` to where the package is saved."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from tempfile import mkdtemp\n",
"from os import path\n",
"from urllib.request import urlretrieve\n",
"\n",
"dflow_root = mkdtemp()\n",
"dflow_path = path.join(dflow_root, \"new_york_taxi.dprep\")\n",
"print(\"Downloading Dataflow to: {}\".format(dflow_path))\n",
"urlretrieve(\"https://dprepdata.blob.core.windows.net/demo/new_york_taxi_v2.dprep\", dflow_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's load the package we just downloaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import azureml.dataprep as dprep\n",
"\n",
"df = dprep.Dataflow.open(dflow_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's replace the datasources with the full dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from uuid import uuid4\n",
"\n",
"other_step = df._get_steps()[7].arguments['dataflows'][0]['anonymousSteps'][0]\n",
"other_step['id'] = str(uuid4())\n",
"other_step['arguments']['path']['target'] = 1\n",
"other_step['arguments']['path']['resourceDetails'][0]['path'] = 'https://wranglewestus.blob.core.windows.net/nyctaxi/yellow_tripdata*'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"green_dsource = dprep.BlobDataSource(\"https://wranglewestus.blob.core.windows.net/nyctaxi/green_tripdata*\")\n",
"df = df.replace_datasource(green_dsource)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once we have replaced the datasource, we can now run the same steps on the full dataset. We will print the first 5 rows of the spark DataFrame. Since we are running on the full dataset, this might take a little while depending on your spark cluster's size."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"spark_df = df.to_spark_dataframe()\n",
"spark_df.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi_scale-out.png)"
]
}
],
"metadata": {
"authors": [
{
"name": "sihhu"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"skip_execute_as_test": true
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,45 @@
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC/C0oc6vvF1UEc
y9JeGDXdtKynG11wTTIHIokFhNinHNSpJBLmNWFyFkqzvjJCPR4kWuqw4IXhCS3L
VoqRmT680SvUFFF6HnEaa75Bc1YSACn1ZsHuCRGrqO9BaTgt3mM0sRYC67+f+W0E
tA+k+EA0XnTtDdEBX3RLzvaYAR4yijEHIBQeeNemPYK4msW6Xw67ib1xn59blX4Z
a4Z85FjrekmoTl9493bFj6znDTX6wpKsPF7WLEF9S+oD/Lg4EHBi9BfefFxQpGZ9
FQHToFKyz1tA2iaY/9LjCtJcincMkuXt3KuQA4Nv2GiTzz4+FEy1pOqHnyNL2tFR
1G5n04BHAgMBAAECggEAAqcXeltQ76hMZSf3XdMcPF3b394jaAHKZgr2uBrmHzvp
QAf+MzAekET6+I/1hrHujzar95TGhx9ngWFMP0VPd7O31hQKJZXyoBlK5QHC+jEC
ZCPvIW0Cz81itRfO7eQeoIas9ZFscb4240/Uv8eqrI97NCdy9X/rz3mqNuYdEzqN
2v9XlwE/Fyx79O1PQqzPRiQt3n4ss9NO169y7X99KUZtYiZAiyBBGS8wYdaGF69G
URZ3qwoUE+nByZdeRfFLLTy+UDCOwQZV+0V4p0J++YLqQAac340A1F4D60qzMHnv
KVKnMc+RrYYVFOZU+USRlphSl3Ws5j0u94CiLitK4QKBgQDivJVHNmk1JleI/MPF
bx/YT5gzcVRFhGxkGso12JrQiFPs05JmoRFaqNBDNoZYDn2ggUrMwZVfPI5C6+7U
tCe2vrjVpvcAO9reK1u4N9ohpUpkocxWQy0nNHlrorDTZnyKreRtPC87W8xpiwl4
R/+nMgGd8vex7tGfchpThj8ZeQKBgQDXs2sgpE8vmnZBWrXAuGD8M9VnfcALEjwL
Fi3NR+XCr8jHkeIJVbSI2/asWsBGg8v6gV6Cdx9KV9r+fHDzdocS85X4P7crP83A
IX2rTT6Hsmc170SzCDa2jJJyLHQ6qtXBS9ZW8/dPFc1fiBf0NcmTLrRoNg5N8Px6
Qt0T51q3vwKBgQCYAfhOetMD2AW9iEAzwDFoUsxmSKdHx+TnI/LHMMVx4sPpNVqk
RX2d+ylMtmRQ6r4cejHMnkfnRnDVutkubu1lHe5LBpn35Sjx472k/oTWI7uBRdv5
RSYjb5GrsLG9uKrsSnKnLT85G20qoRUjN5nU3LiqzPZ0qviMXfH6ZzkseQKBgQCT
ft6MTY7QUGD4w5xxEiNPkeolgHmnmGpyclITg0x7WlSDEyBrna17wF3m8Y91KH58
56XGtMoyvezEBDgAY1ZuAR7VyEvqSRDahow2bPWLONUWrmxduAohvfIOHJPF4jeU
m9UPVHgSHih3YMpwda9G87LtZ7lUVqtutvYRvCvuZQKBgAypo514DZW7Y9lMCgkR
GpJLKCWFR0Sl9bQXI7N5nAG0YFz5ZhdA1PjS2tj+OKyWR6wekbv3g0CyVXT4XYsi
tKRu9PR2OUQLPv/h2qLAeSOYdScfWoOU5tlb4tkLoUNmj5/N9VpqbvLdDh6hPWQL
o4s+29QYKEoNmOrcZ6oRkRP8
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIICoTCCAYkCAgPoMA0GCSqGSIb3DQEBBQUAMBQxEjAQBgNVBAMMCUNMSS1Mb2dp
bjAiGA8yMDE5MDUwMzIwMDIwOVoYDzIwMjAwNTAzMjAwMjExWjAUMRIwEAYDVQQD
DAlDTEktTG9naW4wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC/C0oc
6vvF1UEcy9JeGDXdtKynG11wTTIHIokFhNinHNSpJBLmNWFyFkqzvjJCPR4kWuqw
4IXhCS3LVoqRmT680SvUFFF6HnEaa75Bc1YSACn1ZsHuCRGrqO9BaTgt3mM0sRYC
67+f+W0EtA+k+EA0XnTtDdEBX3RLzvaYAR4yijEHIBQeeNemPYK4msW6Xw67ib1x
n59blX4Za4Z85FjrekmoTl9493bFj6znDTX6wpKsPF7WLEF9S+oD/Lg4EHBi9Bfe
fFxQpGZ9FQHToFKyz1tA2iaY/9LjCtJcincMkuXt3KuQA4Nv2GiTzz4+FEy1pOqH
nyNL2tFR1G5n04BHAgMBAAEwDQYJKoZIhvcNAQEFBQADggEBAGz3pOgNPESr+QoO
OVCgSS6VtWlmrAcxl5JaiNBFpBGAqfvbfRe1eZY7Rn6fuw1jc3pPBVzNTf8Plel+
DcuLzDLJAEag2GpRE+Xg57DNSwPqP6jZfHRE/ufLwIRLcNG9wRUwqlBvdAu1Kign
nlTZvTEAwxlQdvmIIT1XrTLZ+OwtVXcgrf0vInmueZKz/UDqsSDPY+d426S9eOWt
60h2WgXPU3QvBYfA6Yd2ReeP3+SHwBd4/1ByNFWBytcI9ow3pp2JznU366dfX4IQ
Q0iOTvHzXbfPmtsxqho6+hBbLvXVNWJMg8e22Pp/TyXYqeV5V09k18EgCnuA/9Gd
kKDVROA=
-----END CERTIFICATE-----

View File

@@ -0,0 +1,45 @@
-----BEGIN PRIVATE KEY-----
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDEg09d0uWdyo9c
uKbaJss7BT/NNuBw0Nh2pyHzLCJHyShcRi8UcmAeTlaMXdyr5NqTjqc1VT+CZA/o
IZQxbFfkt87pyRmbIw34B1rCy3/FuT4o6n+rcWaRppBo8bBt1+9P7GID3KS0HWuk
fWoAJaODsuC+mlbuB2s6CwPKbF8X30YGTL12SN73o4xewU8BDRUrSQEG1Gh5+5sV
3abQFx/4DYNVqWQy4e15N5QkV8qCa06wCGAgq6NkgnVZVRZbxS2VQo2V+xEFkJEG
yhtfTS+pRLsvTZQoIoYC+E7gAYmB9KhLPtX50DJ/xmI93/qL4Yt6pcjioecq4//n
NORKAFHBAgMBAAECggEAYab67p3ZmsLI4QOlbmyuu0KNhPXLLGSr3LKLDWMWGeQd
WVVLGfcISqcVHSWbfhP4hjDyaG9XYv1EZk8hbDnxp2eru8NCJTSTQXiuInSrpt65
w+1byh9NH/3Mb0oDKWKPuoC16ENh2VtxXUkxPqd1jQF761uY7Snkn/BPTuzxiFN8
Swrhum4b0CZf1XS4rTuk0b8tgSilGbk1DVMYANmQGb5TjMKjAJHzTIF5LYosXppQ
q8xr24XRMpz4m9KuGZTPePZ3ycGadnQV205uE0fuCsru1V5xsNYKh9LYJYWPyD49
L4eFHgLc2uVL9XFJZ7wujW0z5ZxyhWwObHoWvYrRjQKBgQD4frvJ7W8wJxNDa+3k
rKVnN+vjCSLDqc3HVZPvVkEhZRXAx6PSTYJVAMi2ULdhoxrT6jDwsN7KA1qpqb9n
NOttuAqFrJLPRRTjc5YjvBL1Yb4/wFUMR4OgrhhtwIEXlftXN830zlW1Wvo6S8o1
vkGG9KuoVhfroyu4XAJpokd6CwKBgQDKcqvKIzhrF7Oed3STIpLJieeE0n+Dkz+I
AEXm1E7ulT57BYTYO2jLLSUYnetew+QL85cXFSsuEUgH3H2fhckBdq8jcjJGi+YB
7OA1WLUyvDvM6E6CxguzdNNNbtmhXNyLCOrxjiV35wzj47y/UPcWrURQZgzaxovH
+c8mPeeO4wKBgQDB/GVqwFDxXT+7fVDsGB7TUiNyTBp4dmFvAA6JY2Nax4fQw8jO
jrV02DTXpnFR5js2PXdRHjH9r9qh4iLKVdSIBYkpS0wcREiHOx907Ag8yL31FJcQ
C+/kiqQFYaclG29naef8+OqNteTrh2jmxYxv5ybuNa9cwzeJJ0K25fk4ewKBgQCl
2tooqUAgZHOQILdNj2aIXEVjSHyVE75ZsjeSS187EOP2L2hNKibJRXv9terNYVjj
/bVLgNk2TYwgfKAiX510aIJFXNoZd6WA8EojCkCwhwvK7IrdkliltdEiv+zlyMkZ
0r2AFf9WQuEJllrctf0oA91SrLhdR4ne1CbEYrThFwKBgEoK2tStBVypdnAZe7mI
ahk4Lv3QYqwD+qd8H6VRwbX1EtggWCQh0jAohcCzn2HHq+zjUlT3RF7ey46z0gel
+58sKj7uAHuHJ+pg8xI0CWS8Vy6E2hT5bCanb0rKXguuwx+90Kn/xj/yAK7CeIId
PrJHSlG9/au3N6cbVM65RHPG
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
MIICoTCCAYkCAgPoMA0GCSqGSIb3DQEBBQUAMBQxEjAQBgNVBAMMCUNMSS1Mb2dp
bjAiGA8yMDE5MDcxNjAxMDEwNloYDzIwMjAwNzE2MDEwMTA4WjAUMRIwEAYDVQQD
DAlDTEktTG9naW4wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDEg09d
0uWdyo9cuKbaJss7BT/NNuBw0Nh2pyHzLCJHyShcRi8UcmAeTlaMXdyr5NqTjqc1
VT+CZA/oIZQxbFfkt87pyRmbIw34B1rCy3/FuT4o6n+rcWaRppBo8bBt1+9P7GID
3KS0HWukfWoAJaODsuC+mlbuB2s6CwPKbF8X30YGTL12SN73o4xewU8BDRUrSQEG
1Gh5+5sV3abQFx/4DYNVqWQy4e15N5QkV8qCa06wCGAgq6NkgnVZVRZbxS2VQo2V
+xEFkJEGyhtfTS+pRLsvTZQoIoYC+E7gAYmB9KhLPtX50DJ/xmI93/qL4Yt6pcji
oecq4//nNORKAFHBAgMBAAEwDQYJKoZIhvcNAQEFBQADggEBAIDer4wNPbb+FEGs
P+qwYWkDoDHjk3zG2bw8LEjp28PfzlXg5ng2W/rcNHnWTxkDSp7xCaJLhNuCRXx6
vF8sNsQscW9219ZWv5OSETYivLDX1It24ZepAetWmM4NAamU9ZkJHIVidpyZPtZ+
I9PvrTh44KW8VaPhhR5Gv0cUgq4rjhyHCyk8ZpEB4fO83/1fu5MnQUsPvqzrlgEa
p3/GwG7AGSye0QyWdjrt2rcO0QWrCelZdkFut8kV0FHOzrrEgvoLDBlgzN9/qY+a
Yb0+kqR1WBr58HZRG4i4abRpI49xMNp+egASN/8tPSsaR2BIsVmXBSg9Bd+k/f1V
IUg8NDw=
-----END CERTIFICATE-----

View File

@@ -0,0 +1,54 @@
"Retrieved from https://en.wikipedia.org/wiki/Chicago_City_Council on November 6, 2018"
Ward,Name,Took Office,Party
1,Proco Joe Moreno,2010*,Dem
2,Brian Hopkins,2015,Dem
3,Pat Dowell,2007,Dem
4,Sophia King,2016*,Dem
5,Leslie Hairston,1999,Dem
6,Roderick Sawyer,2011,Dem
7,Gregory Mitchell,2015,Dem
8,Michelle A. Harris,2006*,Dem
9,Anthony Beale,1999,Dem
10,Susie Sadlowski Garza,2015,Dem
11,Patrick Daley Thompson,2015,Dem
12,George Cardenas,2003,Dem
13,Marty Quinn,2011,Dem
14,Edward M. Burke,1969,Dem
15,Raymond Lopez,2015,Dem
16,Toni Foulkes,2007,Dem
17,David H. Moore,2015,Dem
18,Derrick Curtis,2015,Dem
19,Matthew O'Shea,2011,Dem
20,Willie Cochran,2007,Dem
21,Howard Brookins Jr.,2003,Dem
22,Ricardo Muñoz,1993*,Dem
23,Silvana Tabares,2018*,Dem
24,"Michael Scott, Jr.",2015,Dem
25,Daniel Solis,1996*,Dem
26,Roberto Maldonado,2009*,Dem
27,"Walter Burnett, Jr.",1995,Dem
28,Jason Ervin,2011*,Dem
29,Chris Taliaferro,2015,Dem
30,Ariel Reboyras,2003,Dem
31,Milly Santiago,2015,Dem
32,Scott Waguespack,2007,Dem
33,Deb Mell,2013*,Dem
34,Carrie Austin,1994*,Dem
35,Carlos Ramirez-Rosa,2015,Dem
36,Gilbert Villegas,2015,Dem
37,Emma Mitts,2000*,Dem
38,Nicholas Sposato,2011,Ind
39,Margaret Laurino,1994*,Dem
40,Patrick J. O'Connor,1983,Dem
41,Anthony Napolitano,2015,Rep
42,Brendan Reilly,2007,Dem
43,Michele Smith,2011,Dem
44,Thomas M. Tunney,2002*,Dem
45,John Arena,2011,Dem
46,James Cappleman,2011,Dem
47,Ameya Pawar,2011,Dem
48,Harry Osterman,2011,Dem
49,Joe Moore,1991,Dem
50,Debra Silverstein,2011,Dem
1 Retrieved from https://en.wikipedia.org/wiki/Chicago_City_Council on November 6, 2018
2 Ward Name Took Office Party
3 1 Proco Joe Moreno 2010* Dem
4 2 Brian Hopkins 2015 Dem
5 3 Pat Dowell 2007 Dem
6 4 Sophia King 2016* Dem
7 5 Leslie Hairston 1999 Dem
8 6 Roderick Sawyer 2011 Dem
9 7 Gregory Mitchell 2015 Dem
10 8 Michelle A. Harris 2006* Dem
11 9 Anthony Beale 1999 Dem
12 10 Susie Sadlowski Garza 2015 Dem
13 11 Patrick Daley Thompson 2015 Dem
14 12 George Cardenas 2003 Dem
15 13 Marty Quinn 2011 Dem
16 14 Edward M. Burke 1969 Dem
17 15 Raymond Lopez 2015 Dem
18 16 Toni Foulkes 2007 Dem
19 17 David H. Moore 2015 Dem
20 18 Derrick Curtis 2015 Dem
21 19 Matthew O'Shea 2011 Dem
22 20 Willie Cochran 2007 Dem
23 21 Howard Brookins Jr. 2003 Dem
24 22 Ricardo Muñoz 1993* Dem
25 23 Silvana Tabares 2018* Dem
26 24 Michael Scott, Jr. 2015 Dem
27 25 Daniel Solis 1996* Dem
28 26 Roberto Maldonado 2009* Dem
29 27 Walter Burnett, Jr. 1995 Dem
30 28 Jason Ervin 2011* Dem
31 29 Chris Taliaferro 2015 Dem
32 30 Ariel Reboyras 2003 Dem
33 31 Milly Santiago 2015 Dem
34 32 Scott Waguespack 2007 Dem
35 33 Deb Mell 2013* Dem
36 34 Carrie Austin 1994* Dem
37 35 Carlos Ramirez-Rosa 2015 Dem
38 36 Gilbert Villegas 2015 Dem
39 37 Emma Mitts 2000* Dem
40 38 Nicholas Sposato 2011 Ind
41 39 Margaret Laurino 1994* Dem
42 40 Patrick J. O'Connor 1983 Dem
43 41 Anthony Napolitano 2015 Rep
44 42 Brendan Reilly 2007 Dem
45 43 Michele Smith 2011 Dem
46 44 Thomas M. Tunney 2002* Dem
47 45 John Arena 2011 Dem
48 46 James Cappleman 2011 Dem
49 47 Ameya Pawar 2011 Dem
50 48 Harry Osterman 2011 Dem
51 49 Joe Moore 1991 Dem
52 50 Debra Silverstein 2011 Dem

View File

@@ -0,0 +1,15 @@
File updated 11/2/2018
ID|Case Number|Date|Block|IUCR|Primary Type|Description|Location Description|Arrest|Domestic|Beat|District|Ward|Community Area|FBI Code|X Coordinate|Y Coordinate|Year|Updated On|Latitude|Longitude|Location
10140490|HY329907|07/05/2015 11:50:00 PM|050XX N NEWLAND AVE|0820|THEFT|$500 AND UNDER|STREET|false|false|1613|016|41|10|06|1129230|1933315|2015|07/12/2015 12:42:46 PM|41.973309466|-87.800174996|(41.973309466, -87.800174996)
10139776|HY329265|07/05/2015 11:30:00 PM|011XX W MORSE AVE|0460|BATTERY|SIMPLE|STREET|false|true|2431|024|49|1|08B|1167370|1946271|2015|07/12/2015 12:42:46 PM|42.008124017|-87.65955018|(42.008124017, -87.65955018)
10140270|HY329253|07/05/2015 11:20:00 PM|121XX S FRONT AVE|0486|BATTERY|DOMESTIC BATTERY SIMPLE|STREET|false|true|0532||9|53|08B|||2015|07/12/2015 12:42:46 PM|||
10139885|HY329308|07/05/2015 11:19:00 PM|051XX W DIVISION ST|0610|BURGLARY|FORCIBLE ENTRY|SMALL RETAIL STORE|false|false|1531|015|37|25|05|1141721|1907465|2015|07/12/2015 12:42:46 PM|41.902152027|-87.754883404|(41.902152027, -87.754883404)
10140379|HY329556|07/05/2015 11:00:00 PM|012XX W LAKE ST|0930|MOTOR VEHICLE THEFT|THEFT/RECOVERY: AUTOMOBILE|STREET|false|false|1215|012|27|28|07|1168413|1901632|2015|07/12/2015 12:42:46 PM|41.885610142|-87.657008701|(41.885610142, -87.657008701)
10140868|HY330421|07/05/2015 10:54:00 PM|118XX S PEORIA ST|1320|CRIMINAL DAMAGE|TO VEHICLE|VEHICLE NON-COMMERCIAL|false|false|0524|005|34|53|14|1172409|1826485|2015|07/12/2015 12:42:46 PM|41.6793109|-87.644545209|(41.6793109, -87.644545209)
10139762|HY329232|07/05/2015 10:42:00 PM|026XX W 37TH PL|1020|ARSON|BY FIRE|VACANT LOT/LAND|false|false|0911|009|12|58|09|1159436|1879658|2015|07/12/2015 12:42:46 PM|41.825500607|-87.690578042|(41.825500607, -87.690578042)
10139722|HY329228|07/05/2015 10:30:00 PM|016XX S CENTRAL PARK AVE|1811|NARCOTICS|POSS: CANNABIS 30GMS OR LESS|ALLEY|true|false|1021|010|24|29|18|1152687|1891389|2015|07/12/2015 12:42:46 PM|41.857827814|-87.715028789|(41.857827814, -87.715028789)
10139774|HY329209|07/05/2015 10:15:00 PM|048XX N ASHLAND AVE|1310|CRIMINAL DAMAGE|TO PROPERTY|APARTMENT|false|false|2032|020|46|3|14|1164821|1932394|2015|07/12/2015 12:42:46 PM|41.970099796|-87.669324377|(41.970099796, -87.669324377)
10139697|HY329177|07/05/2015 10:10:00 PM|058XX S ARTESIAN AVE|1320|CRIMINAL DAMAGE|TO VEHICLE|ALLEY|false|false|0824|008|16|63|14|1160997|1865851|2015|07/12/2015 12:42:46 PM|41.787580282|-87.685233078|(41.787580282, -87.685233078)
1 File updated 11/2/2018
2 ID|Case Number|Date|Block|IUCR|Primary Type|Description|Location Description|Arrest|Domestic|Beat|District|Ward|Community Area|FBI Code|X Coordinate|Y Coordinate|Year|Updated On|Latitude|Longitude|Location
3 10140490|HY329907|07/05/2015 11:50:00 PM|050XX N NEWLAND AVE|0820|THEFT|$500 AND UNDER|STREET|false|false|1613|016|41|10|06|1129230|1933315|2015|07/12/2015 12:42:46 PM|41.973309466|-87.800174996|(41.973309466, -87.800174996)
4 10139776|HY329265|07/05/2015 11:30:00 PM|011XX W MORSE AVE|0460|BATTERY|SIMPLE|STREET|false|true|2431|024|49|1|08B|1167370|1946271|2015|07/12/2015 12:42:46 PM|42.008124017|-87.65955018|(42.008124017, -87.65955018)
5 10140270|HY329253|07/05/2015 11:20:00 PM|121XX S FRONT AVE|0486|BATTERY|DOMESTIC BATTERY SIMPLE|STREET|false|true|0532||9|53|08B|||2015|07/12/2015 12:42:46 PM|||
6 10139885|HY329308|07/05/2015 11:19:00 PM|051XX W DIVISION ST|0610|BURGLARY|FORCIBLE ENTRY|SMALL RETAIL STORE|false|false|1531|015|37|25|05|1141721|1907465|2015|07/12/2015 12:42:46 PM|41.902152027|-87.754883404|(41.902152027, -87.754883404)
7 10140379|HY329556|07/05/2015 11:00:00 PM|012XX W LAKE ST|0930|MOTOR VEHICLE THEFT|THEFT/RECOVERY: AUTOMOBILE|STREET|false|false|1215|012|27|28|07|1168413|1901632|2015|07/12/2015 12:42:46 PM|41.885610142|-87.657008701|(41.885610142, -87.657008701)
8 10140868|HY330421|07/05/2015 10:54:00 PM|118XX S PEORIA ST|1320|CRIMINAL DAMAGE|TO VEHICLE|VEHICLE NON-COMMERCIAL|false|false|0524|005|34|53|14|1172409|1826485|2015|07/12/2015 12:42:46 PM|41.6793109|-87.644545209|(41.6793109, -87.644545209)
9 10139762|HY329232|07/05/2015 10:42:00 PM|026XX W 37TH PL|1020|ARSON|BY FIRE|VACANT LOT/LAND|false|false|0911|009|12|58|09|1159436|1879658|2015|07/12/2015 12:42:46 PM|41.825500607|-87.690578042|(41.825500607, -87.690578042)
10 10139722|HY329228|07/05/2015 10:30:00 PM|016XX S CENTRAL PARK AVE|1811|NARCOTICS|POSS: CANNABIS 30GMS OR LESS|ALLEY|true|false|1021|010|24|29|18|1152687|1891389|2015|07/12/2015 12:42:46 PM|41.857827814|-87.715028789|(41.857827814, -87.715028789)
11 10139774|HY329209|07/05/2015 10:15:00 PM|048XX N ASHLAND AVE|1310|CRIMINAL DAMAGE|TO PROPERTY|APARTMENT|false|false|2032|020|46|3|14|1164821|1932394|2015|07/12/2015 12:42:46 PM|41.970099796|-87.669324377|(41.970099796, -87.669324377)
12 10139697|HY329177|07/05/2015 10:10:00 PM|058XX S ARTESIAN AVE|1320|CRIMINAL DAMAGE|TO VEHICLE|ALLEY|false|false|0824|008|16|63|14|1160997|1865851|2015|07/12/2015 12:42:46 PM|41.787580282|-87.685233078|(41.787580282, -87.685233078)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,11 @@
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
10498554,HZ239907,4/4/2016 23:56,007XX E 111TH ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,FALSE,FALSE,531,5,9,50,11,1183356,1831503,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
10516598,HZ258664,4/15/2016 17:00,082XX S MARSHFIELD AVE,890,THEFT,FROM BUILDING,RESIDENCE,FALSE,FALSE,614,6,21,71,6,1166776,1850053,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
10519196,HZ261252,4/15/2016 10:00,104XX S SACRAMENTO AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,FALSE,FALSE,2211,22,19,74,11,,,2016,5/12/2016 15:50,,,
10519591,HZ261534,4/15/2016 9:00,113XX S PRAIRIE AVE,1120,DECEPTIVE PRACTICE,FORGERY,RESIDENCE,FALSE,FALSE,531,5,9,49,10,,,2016,5/13/2016 15:51,,,
10534446,HZ277630,4/15/2016 10:00,055XX N KEDZIE AVE,890,THEFT,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",FALSE,FALSE,1712,17,40,13,6,,,2016,5/25/2016 15:59,,,
10535059,HZ278872,4/15/2016 4:30,004XX S KILBOURN AVE,810,THEFT,OVER $500,RESIDENCE,FALSE,FALSE,1131,11,24,26,6,,,2016,5/25/2016 15:59,,,
10499802,HZ240778,4/15/2016 10:00,010XX N MILWAUKEE AVE,1152,DECEPTIVE PRACTICE,ILLEGAL USE CASH CARD,RESIDENCE,FALSE,FALSE,1213,12,27,24,11,,,2016,5/27/2016 15:45,,,
10522293,HZ264802,4/15/2016 16:00,019XX W DIVISION ST,1110,DECEPTIVE PRACTICE,BOGUS CHECK,RESTAURANT,FALSE,FALSE,1424,14,1,24,11,1163094,1908003,2016,5/16/2016 15:48,41.90320604,-87.67636193,"(41.903206037, -87.676361925)"
10523111,HZ265911,4/15/2016 8:00,061XX N SHERIDAN RD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,FALSE,FALSE,2433,24,48,77,11,,,2016,5/16/2016 15:50,,,
10525877,HZ268138,4/15/2016 15:00,023XX W EASTWOOD AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,FALSE,FALSE,1911,19,47,4,11,,,2016,5/18/2016 15:50,,,
1 ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
2 10498554 HZ239907 4/4/2016 23:56 007XX E 111TH ST 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 OTHER FALSE FALSE 531 5 9 50 11 1183356 1831503 2016 5/11/2016 15:48 41.69283384 -87.60431945 (41.692833841, -87.60431945)
3 10516598 HZ258664 4/15/2016 17:00 082XX S MARSHFIELD AVE 890 THEFT FROM BUILDING RESIDENCE FALSE FALSE 614 6 21 71 6 1166776 1850053 2016 5/12/2016 15:48 41.74410697 -87.66449429 (41.744106973, -87.664494285)
4 10519196 HZ261252 4/15/2016 10:00 104XX S SACRAMENTO AVE 1154 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT $300 AND UNDER RESIDENCE FALSE FALSE 2211 22 19 74 11 2016 5/12/2016 15:50
5 10519591 HZ261534 4/15/2016 9:00 113XX S PRAIRIE AVE 1120 DECEPTIVE PRACTICE FORGERY RESIDENCE FALSE FALSE 531 5 9 49 10 2016 5/13/2016 15:51
6 10534446 HZ277630 4/15/2016 10:00 055XX N KEDZIE AVE 890 THEFT FROM BUILDING SCHOOL, PUBLIC, BUILDING FALSE FALSE 1712 17 40 13 6 2016 5/25/2016 15:59
7 10535059 HZ278872 4/15/2016 4:30 004XX S KILBOURN AVE 810 THEFT OVER $500 RESIDENCE FALSE FALSE 1131 11 24 26 6 2016 5/25/2016 15:59
8 10499802 HZ240778 4/15/2016 10:00 010XX N MILWAUKEE AVE 1152 DECEPTIVE PRACTICE ILLEGAL USE CASH CARD RESIDENCE FALSE FALSE 1213 12 27 24 11 2016 5/27/2016 15:45
9 10522293 HZ264802 4/15/2016 16:00 019XX W DIVISION ST 1110 DECEPTIVE PRACTICE BOGUS CHECK RESTAURANT FALSE FALSE 1424 14 1 24 11 1163094 1908003 2016 5/16/2016 15:48 41.90320604 -87.67636193 (41.903206037, -87.676361925)
10 10523111 HZ265911 4/15/2016 8:00 061XX N SHERIDAN RD 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 RESIDENCE FALSE FALSE 2433 24 48 77 11 2016 5/16/2016 15:50
11 10525877 HZ268138 4/15/2016 15:00 023XX W EASTWOOD AVE 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 FALSE FALSE 1911 19 47 4 11 2016 5/18/2016 15:50

View File

@@ -0,0 +1,11 @@
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
10378283,HZ114126,1/10/2016 11:00,033XX W IRVING PARK RD,610,BURGLARY,FORCIBLE ENTRY,RESIDENCE-GARAGE,TRUE,FALSE,1724,17,33,16,5,1153593,1926401,2016,5/22/2016 15:51,41.95388599,-87.71077048,"(41.95388599, -87.710770479)"
10382154,HZ118288,1/10/2016 21:00,055XX S FRANCISCO AVE,1754,OFFENSE INVOLVING CHILDREN,AGG SEX ASSLT OF CHILD FAM MBR,RESIDENCE,FALSE,TRUE,824,8,14,63,2,1157983,1867874,2016,6/1/2016 15:51,41.79319349,-87.69622926,"(41.793193489, -87.696229255)"
10374287,HZ110730,1/10/2016 11:50,043XX W ARMITAGE AVE,5002,OTHER OFFENSE,OTHER VEHICLE OFFENSE,STREET,FALSE,TRUE,2522,25,30,20,26,1146917,1912931,2016,6/7/2016 15:55,41.91705356,-87.73565764,"(41.917053561, -87.735657637)"
10374662,HZ110403,1/10/2016 1:30,073XX S CLAREMONT AVE,497,BATTERY,AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON,STREET,FALSE,TRUE,835,8,18,66,04B,1162007,1855951,2016,2/4/2016 15:44,41.76039236,-87.68180481,"(41.760392356, -87.681804812)"
10374720,HZ110836,1/10/2016 7:30,079XX S RHODES AVE,890,THEFT,FROM BUILDING,OTHER,FALSE,FALSE,624,6,6,44,6,1181279,1852568,2016,2/4/2016 15:44,41.75068679,-87.61127681,"(41.75068679, -87.611276811)"
10375178,HZ110832,1/10/2016 14:20,057XX S KEDZIE AVE,460,BATTERY,SIMPLE,RESTAURANT,FALSE,FALSE,824,8,14,63,08B,1156029,1866379,2016,2/4/2016 15:44,41.78913051,-87.7034346,"(41.78913051, -87.703434602)"
10398695,HZ135279,1/10/2016 23:00,031XX S PARNELL AVE,620,BURGLARY,UNLAWFUL ENTRY,RESIDENCE-GARAGE,FALSE,FALSE,915,9,11,60,5,1173138,1884117,2016,2/4/2016 15:44,41.8374442,-87.64017699,"(41.837444199, -87.640176991)"
10402270,HZ138745,1/10/2016 11:00,051XX S ELIZABETH ST,620,BURGLARY,UNLAWFUL ENTRY,APARTMENT,FALSE,FALSE,934,9,16,61,5,,,2016,2/4/2016 6:53,,,
10380619,HZ116583,1/10/2016 9:41,091XX S PAXTON AVE,4387,OTHER OFFENSE,VIOLATE ORDER OF PROTECTION,RESIDENCE,TRUE,TRUE,413,4,7,48,26,1192434,1844707,2016,2/2/2016 15:56,41.72885134,-87.57065553,"(41.728851343, -87.570655525)"
10400131,HZ136171,1/10/2016 18:00,0000X W TERMINAL ST,810,THEFT,OVER $500,AIRPORT BUILDING NON-TERMINAL - SECURE AREA,FALSE,FALSE,1651,16,41,76,6,,,2016,2/2/2016 15:58,,,
1 ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
2 10378283 HZ114126 1/10/2016 11:00 033XX W IRVING PARK RD 610 BURGLARY FORCIBLE ENTRY RESIDENCE-GARAGE TRUE FALSE 1724 17 33 16 5 1153593 1926401 2016 5/22/2016 15:51 41.95388599 -87.71077048 (41.95388599, -87.710770479)
3 10382154 HZ118288 1/10/2016 21:00 055XX S FRANCISCO AVE 1754 OFFENSE INVOLVING CHILDREN AGG SEX ASSLT OF CHILD FAM MBR RESIDENCE FALSE TRUE 824 8 14 63 2 1157983 1867874 2016 6/1/2016 15:51 41.79319349 -87.69622926 (41.793193489, -87.696229255)
4 10374287 HZ110730 1/10/2016 11:50 043XX W ARMITAGE AVE 5002 OTHER OFFENSE OTHER VEHICLE OFFENSE STREET FALSE TRUE 2522 25 30 20 26 1146917 1912931 2016 6/7/2016 15:55 41.91705356 -87.73565764 (41.917053561, -87.735657637)
5 10374662 HZ110403 1/10/2016 1:30 073XX S CLAREMONT AVE 497 BATTERY AGGRAVATED DOMESTIC BATTERY: OTHER DANG WEAPON STREET FALSE TRUE 835 8 18 66 04B 1162007 1855951 2016 2/4/2016 15:44 41.76039236 -87.68180481 (41.760392356, -87.681804812)
6 10374720 HZ110836 1/10/2016 7:30 079XX S RHODES AVE 890 THEFT FROM BUILDING OTHER FALSE FALSE 624 6 6 44 6 1181279 1852568 2016 2/4/2016 15:44 41.75068679 -87.61127681 (41.75068679, -87.611276811)
7 10375178 HZ110832 1/10/2016 14:20 057XX S KEDZIE AVE 460 BATTERY SIMPLE RESTAURANT FALSE FALSE 824 8 14 63 08B 1156029 1866379 2016 2/4/2016 15:44 41.78913051 -87.7034346 (41.78913051, -87.703434602)
8 10398695 HZ135279 1/10/2016 23:00 031XX S PARNELL AVE 620 BURGLARY UNLAWFUL ENTRY RESIDENCE-GARAGE FALSE FALSE 915 9 11 60 5 1173138 1884117 2016 2/4/2016 15:44 41.8374442 -87.64017699 (41.837444199, -87.640176991)
9 10402270 HZ138745 1/10/2016 11:00 051XX S ELIZABETH ST 620 BURGLARY UNLAWFUL ENTRY APARTMENT FALSE FALSE 934 9 16 61 5 2016 2/4/2016 6:53
10 10380619 HZ116583 1/10/2016 9:41 091XX S PAXTON AVE 4387 OTHER OFFENSE VIOLATE ORDER OF PROTECTION RESIDENCE TRUE TRUE 413 4 7 48 26 1192434 1844707 2016 2/2/2016 15:56 41.72885134 -87.57065553 (41.728851343, -87.570655525)
11 10400131 HZ136171 1/10/2016 18:00 0000X W TERMINAL ST 810 THEFT OVER $500 AIRPORT BUILDING NON-TERMINAL - SECURE AREA FALSE FALSE 1651 16 41 76 6 2016 2/2/2016 15:58

View File

@@ -0,0 +1,204 @@
{
"id": "75637565-60ad-4baa-87d3-396a7930cfe7",
"blocks": [
{
"id": "ba5a8061-129e-4618-953a-ce3e89c8f2cb",
"type": "Microsoft.DPrep.GetFilesBlock",
"arguments": {
"path": {
"target": 0,
"resourceDetails": [
{
"path": "./crime-spring.csv"
}
]
}
},
"isEnabled": true,
"name": null,
"annotation": null
},
{
"id": "1b345643-6b60-4ca1-99f9-2a64ae932a23",
"type": "Microsoft.DPrep.ParseDelimitedBlock",
"arguments": {
"columnHeadersMode": 1,
"fileEncoding": 0,
"handleQuotedLineBreaks": false,
"preview": false,
"separator": ",",
"skipRowsMode": 0
},
"isEnabled": true,
"name": null,
"annotation": null
},
{
"id": "12cf73a2-1487-4915-bfa7-c86be7de08c0",
"type": "Microsoft.DPrep.SetColumnTypesBlock",
"arguments": {
"columnConversion": [
{
"column": {
"type": 2,
"details": {
"selectedColumn": "ID"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "IUCR"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Domestic"
}
},
"typeProperty": 1
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Beat"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "District"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Ward"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Community Area"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Year"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Longitude"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Arrest"
}
},
"typeProperty": 1
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "X Coordinate"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Updated On"
}
},
"typeArguments": {
"dateTimeFormats": [
"%m/%d/%Y %I:%M:%S %p"
]
},
"typeProperty": 4
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Date"
}
},
"typeArguments": {
"dateTimeFormats": [
"%m/%d/%Y %I:%M:%S %p"
]
},
"typeProperty": 4
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Y Coordinate"
}
},
"typeProperty": 3
},
{
"column": {
"type": 2,
"details": {
"selectedColumn": "Latitude"
}
},
"typeProperty": 3
}
]
},
"isEnabled": true,
"name": null,
"annotation": null
},
{
"id": "dfd62543-9285-412b-a930-0aeaaffde699",
"type": "Microsoft.DPrep.HandlePathColumnBlock",
"arguments": {
"pathColumnOperation": 0
},
"isEnabled": true,
"name": null,
"annotation": null
}
],
"inspectors": []
}

View File

@@ -0,0 +1,10 @@
10140490 HY329907 7/5/2015 23:50 050XX N NEWLAND AVE 820 THEFT
10139776 HY329265 7/5/2015 23:30 011XX W MORSE AVE 460 BATTERY
10140270 HY329253 7/5/2015 23:20 121XX S FRONT AVE 486 BATTERY
10139885 HY329308 7/5/2015 23:19 051XX W DIVISION ST 610 BURGLARY
10140379 HY329556 7/5/2015 23:00 012XX W LAKE ST 930 MOTOR VEHICLE THEFT
10140868 HY330421 7/5/2015 22:54 118XX S PEORIA ST 1320 CRIMINAL DAMAGE
10139762 HY329232 7/5/2015 22:42 026XX W 37TH PL 1020 ARSON
10139722 HY329228 7/5/2015 22:30 016XX S CENTRAL PARK AVE 1811 NARCOTICS
10139774 HY329209 7/5/2015 22:15 048XX N ASHLAND AVE 1310 CRIMINAL DAMAGE
10139697 HY329177 7/5/2015 22:10 058XX S ARTESIAN AVE 1320 CRIMINAL DAMAGE

View File

@@ -0,0 +1,12 @@
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
10498554,HZ239907,4/15/2016 23:56,007XX E 111TH ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,FALSE,FALSE,531,5,9,50,11,1183356,1831503,2016,5/11/2016 15:48,41.69283384,-87.60431945,"(41.692833841, -87.60431945)"
10516598,HZ258664,4/15/2016 17:00,082XX S MARSHFIELD AVE,890,THEFT,FROM BUILDING,RESIDENCE,FALSE,FALSE,614,6,21,71,6,1166776,1850053,2016,5/12/2016 15:48,41.74410697,-87.66449429,"(41.744106973, -87.664494285)"
10519196,HZ261252,4/15/2016 10:00,104XX S SACRAMENTO AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,FALSE,FALSE,2211,22,19,74,11,,,2016,5/12/2016 15:50,,,
10519591,HZ261534,4/15/2016 9:00,113XX S PRAIRIE AVE,1120,DECEPTIVE PRACTICE,FORGERY,RESIDENCE,FALSE,FALSE,531,5,9,49,10,,,2016,5/13/2016 15:51,,,
10534446,HZ277630,4/15/2016 10:00,055XX N KEDZIE AVE,890,THEFT,FROM BUILDING,"SCHOOL, PUBLIC, BUILDING",FALSE,FALSE,1712,17,40,13,6,,,2016,5/25/2016 15:59,,,
10535059,HZ278872,4/15/2016 4:30,004XX S KILBOURN AVE,810,THEFT,OVER $500,RESIDENCE,FALSE,FALSE,1131,11,24,26,6,,,2016,5/25/2016 15:59,,,
10499802,HZ240778,4/15/2016 10:00,010XX N MILWAUKEE AVE,1152,DECEPTIVE PRACTICE,ILLEGAL USE CASH CARD,RESIDENCE,FALSE,FALSE,1213,12,27,24,11,,,2016,5/27/2016 15:45,,,
10522293,HZ264802,4/15/2016 16:00,019XX W DIVISION ST,1110,DECEPTIVE PRACTICE,BOGUS CHECK,RESTAURANT,FALSE,FALSE,1424,14,1,24,11,1163094,1908003,2016,5/16/2016 15:48,41.90320604,-87.67636193,"(41.903206037, -87.676361925)"
10523111,HZ265911,4/15/2016 8:00,061XX N SHERIDAN RD,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,FALSE,FALSE,2433,24,48,77,11,,,2016,5/16/2016 15:50,,,
10525877,HZ268138,4/15/2016 15:00,023XX W EASTWOOD AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,FALSE,FALSE,1911,19,47,4,11,,,2016,5/18/2016 15:50,,,
1 ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
2 ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
3 10498554 HZ239907 4/15/2016 23:56 007XX E 111TH ST 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 OTHER FALSE FALSE 531 5 9 50 11 1183356 1831503 2016 5/11/2016 15:48 41.69283384 -87.60431945 (41.692833841, -87.60431945)
4 10516598 HZ258664 4/15/2016 17:00 082XX S MARSHFIELD AVE 890 THEFT FROM BUILDING RESIDENCE FALSE FALSE 614 6 21 71 6 1166776 1850053 2016 5/12/2016 15:48 41.74410697 -87.66449429 (41.744106973, -87.664494285)
5 10519196 HZ261252 4/15/2016 10:00 104XX S SACRAMENTO AVE 1154 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT $300 AND UNDER RESIDENCE FALSE FALSE 2211 22 19 74 11 2016 5/12/2016 15:50
6 10519591 HZ261534 4/15/2016 9:00 113XX S PRAIRIE AVE 1120 DECEPTIVE PRACTICE FORGERY RESIDENCE FALSE FALSE 531 5 9 49 10 2016 5/13/2016 15:51
7 10534446 HZ277630 4/15/2016 10:00 055XX N KEDZIE AVE 890 THEFT FROM BUILDING SCHOOL, PUBLIC, BUILDING FALSE FALSE 1712 17 40 13 6 2016 5/25/2016 15:59
8 10535059 HZ278872 4/15/2016 4:30 004XX S KILBOURN AVE 810 THEFT OVER $500 RESIDENCE FALSE FALSE 1131 11 24 26 6 2016 5/25/2016 15:59
9 10499802 HZ240778 4/15/2016 10:00 010XX N MILWAUKEE AVE 1152 DECEPTIVE PRACTICE ILLEGAL USE CASH CARD RESIDENCE FALSE FALSE 1213 12 27 24 11 2016 5/27/2016 15:45
10 10522293 HZ264802 4/15/2016 16:00 019XX W DIVISION ST 1110 DECEPTIVE PRACTICE BOGUS CHECK RESTAURANT FALSE FALSE 1424 14 1 24 11 1163094 1908003 2016 5/16/2016 15:48 41.90320604 -87.67636193 (41.903206037, -87.676361925)
11 10523111 HZ265911 4/15/2016 8:00 061XX N SHERIDAN RD 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 RESIDENCE FALSE FALSE 2433 24 48 77 11 2016 5/16/2016 15:50
12 10525877 HZ268138 4/15/2016 15:00 023XX W EASTWOOD AVE 1153 DECEPTIVE PRACTICE FINANCIAL IDENTITY THEFT OVER $ 300 FALSE FALSE 1911 19 47 4 11 2016 5/18/2016 15:50

View File

@@ -0,0 +1,10 @@
10140490 HY329907 7/5/2015 23:50 050XX N NEWLAND AVE 820 THEFT
10139776 HY329265 7/5/2015 23:30 011XX W MORSE AVE 460 BATTERY
10140270 HY329253 7/5/2015 23:20 121XX S FRONT AVE 486 BATTERY
10139885 HY329308 7/5/2015 23:19 051XX W DIVISION ST 610 BURGLARY
10140379 HY329556 7/5/2015 23:00 012XX W LAKE ST 930 MOTOR VEHICLE THEFT
10140868 HY330421 7/5/2015 22:54 118XX S PEORIA ST 1320 CRIMINAL DAMAGE
10139762 HY329232 7/5/2015 22:42 026XX W 37TH PL 1020 ARSON
10139722 HY329228 7/5/2015 22:30 016XX S CENTRAL PARK AVE 1811 NARCOTICS
10139774 HY329209 7/5/2015 22:15 048XX N ASHLAND AVE 1310 CRIMINAL DAMAGE
10139697 HY329177 7/5/2015 22:10 058XX S ARTESIAN AVE 1320 CRIMINAL DAMAGE

View File

@@ -0,0 +1,11 @@
ID |CaseNumber| |Completed|
10140490 |HY329907| |Y|
10139776 |HY329265| |Y|
10140270 |HY329253| |N|
10139885 |HY329308| |Y|
10140379 |HY329556| |N|
10140868 |HY330421| |N|
10139762 |HY329232| |N|
10139722 |HY329228| |Y|
10139774 |HY329209| |N|
10139697 |HY329177| |N|
1 ID CaseNumber Completed
2 10140490 HY329907 Y
3 10139776 HY329265 Y
4 10140270 HY329253 N
5 10139885 HY329308 Y
6 10140379 HY329556 N
7 10140868 HY330421 N
8 10139762 HY329232 N
9 10139722 HY329228 Y
10 10139774 HY329209 N
11 10139697 HY329177 N

Some files were not shown because too many files have changed in this diff Show More