mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 09:37:04 -05:00
Compare commits
15 Commits
dev01
...
release-1.
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
249bcac3c7 | ||
|
|
4a6bcebccc | ||
|
|
56e0ebc5ac | ||
|
|
2aa39f2f4a | ||
|
|
4d247c1877 | ||
|
|
f6682f6f6d | ||
|
|
26ecf25233 | ||
|
|
44c3a486c0 | ||
|
|
c574f429b8 | ||
|
|
77d557a5dc | ||
|
|
13dedec4a4 | ||
|
|
6f5c52676f | ||
|
|
90c105537c | ||
|
|
ef264b1073 | ||
|
|
824ac5e021 |
@@ -52,7 +52,6 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
|||||||
|
|
||||||
Visit following repos to see projects contributed by Azure ML users:
|
Visit following repos to see projects contributed by Azure ML users:
|
||||||
|
|
||||||
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
|
|
||||||
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||||
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
|
||||||
|
|||||||
@@ -103,7 +103,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.43 of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.0.45 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -20,7 +20,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETLÂ and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model in Azure.\n",
|
"The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c3\u201a\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c2\u00a0in Azure.\n",
|
||||||
" \n",
|
" \n",
|
||||||
"In this notebook, we will do the following:\n",
|
"In this notebook, we will do the following:\n",
|
||||||
" \n",
|
" \n",
|
||||||
|
|||||||
709
contrib/datadrift/azure-ml-datadrift.ipynb
Normal file
709
contrib/datadrift/azure-ml-datadrift.ipynb
Normal file
@@ -0,0 +1,709 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Track Data Drift between Training and Inference Data in Production \n",
|
||||||
|
"\n",
|
||||||
|
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
|
||||||
|
"\n",
|
||||||
|
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
||||||
|
"\n",
|
||||||
|
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Prerequisites and Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Install the DataDrift package\n",
|
||||||
|
"\n",
|
||||||
|
"Install the azureml-contrib-datadrift, azureml-contrib-opendatasets and lightgbm packages before running this notebook.\n",
|
||||||
|
"```\n",
|
||||||
|
"pip install azureml-contrib-datadrift\n",
|
||||||
|
"pip install azureml-contrib-datasets\n",
|
||||||
|
"pip install lightgbm\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Import Dependencies"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import os\n",
|
||||||
|
"import time\n",
|
||||||
|
"from datetime import datetime, timedelta\n",
|
||||||
|
"\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import requests\n",
|
||||||
|
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
||||||
|
"from azureml.contrib.opendatasets import NoaaIsdWeather\n",
|
||||||
|
"from azureml.core import Dataset, Workspace, Run\n",
|
||||||
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
||||||
|
"prefix = \"dd\"\n",
|
||||||
|
"\n",
|
||||||
|
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
|
||||||
|
"model_name = \"driftmodel\"\n",
|
||||||
|
"image_name = \"{}driftimage\".format(prefix)\n",
|
||||||
|
"service_name = \"{}driftservice\".format(prefix)\n",
|
||||||
|
"\n",
|
||||||
|
"# optionally, set email address to receive an email alert for DataDrift\n",
|
||||||
|
"email_address = \"\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Generate Train/Testing Data\n",
|
||||||
|
"\n",
|
||||||
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
||||||
|
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
||||||
|
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
|
||||||
|
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
|
||||||
|
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
|
||||||
|
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
|
||||||
|
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
|
||||||
|
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
|
||||||
|
" '725066']\n",
|
||||||
|
"\n",
|
||||||
|
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def enrich_weather_noaa_data(noaa_df):\n",
|
||||||
|
" hours_in_day = 23\n",
|
||||||
|
" week_in_year = 52\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
|
||||||
|
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
"\n",
|
||||||
|
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" \n",
|
||||||
|
" return noaa_df\n",
|
||||||
|
"\n",
|
||||||
|
"def add_window_col(input_df):\n",
|
||||||
|
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
||||||
|
" df_shifted = input_df.copy()\n",
|
||||||
|
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||||
|
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
||||||
|
"\n",
|
||||||
|
" # merge, keeping only observations where -1 lag is present\n",
|
||||||
|
" df2 = pd.merge(input_df,\n",
|
||||||
|
" df_shifted,\n",
|
||||||
|
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
|
||||||
|
" how='inner', # use 'left' to keep observations without lags\n",
|
||||||
|
" suffixes=['', '-7'])\n",
|
||||||
|
" return df2\n",
|
||||||
|
"\n",
|
||||||
|
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
|
||||||
|
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
|
||||||
|
" # Read into Pandas data frame.\n",
|
||||||
|
" noaa_df = isd.to_pandas_dataframe()\n",
|
||||||
|
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
|
||||||
|
" \n",
|
||||||
|
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
|
||||||
|
" df_filtered.reset_index(drop=True)\n",
|
||||||
|
" \n",
|
||||||
|
" # Enrich with time features\n",
|
||||||
|
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
|
||||||
|
" \n",
|
||||||
|
" return df_enriched\n",
|
||||||
|
"\n",
|
||||||
|
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
|
||||||
|
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
|
||||||
|
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
|
||||||
|
" noaa_df = pd.concat([df_1, df_2])\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Adding window feature\")\n",
|
||||||
|
" df_window = add_window_col(noaa_df)\n",
|
||||||
|
" \n",
|
||||||
|
" cat_columns = df_window.dtypes == object\n",
|
||||||
|
" cat_columns = cat_columns[cat_columns == True]\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Encoding categorical columns\")\n",
|
||||||
|
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Dropping unnecessary columns\")\n",
|
||||||
|
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
||||||
|
" \n",
|
||||||
|
" return df_featurized"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Train model on Jan 1 - 14, 2009 data\n",
|
||||||
|
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
||||||
|
"df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"label = \"temperature\"\n",
|
||||||
|
"x_df = df.drop(label, axis=1)\n",
|
||||||
|
"y_df = df[[label]]\n",
|
||||||
|
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
|
||||||
|
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
|
||||||
|
"\n",
|
||||||
|
"training_dir = 'outputs/training'\n",
|
||||||
|
"training_file = \"training.csv\"\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate training dataframe to register as Training Dataset\n",
|
||||||
|
"os.makedirs(training_dir, exist_ok=True)\n",
|
||||||
|
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
||||||
|
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create/Register Training Dataset"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dataset_name = \"dataset\"\n",
|
||||||
|
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
||||||
|
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
|
||||||
|
"\n",
|
||||||
|
"dstore = ws.get_default_datastore()\n",
|
||||||
|
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
|
||||||
|
"dpath = dstore.path(\"data/training/training.csv\")\n",
|
||||||
|
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
||||||
|
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
||||||
|
"\n",
|
||||||
|
"trainingDataSnapshot = trainingDataset.create_snapshot(snapshot_name=snapshot_name, compute_target=None, create_data_snapshot=True)\n",
|
||||||
|
"datasets = [(Dataset.Scenario.TRAINING, trainingDataSnapshot)]\n",
|
||||||
|
"print(\"dataset registration done.\\n\")\n",
|
||||||
|
"datasets"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train and Save Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import lightgbm as lgb\n",
|
||||||
|
"\n",
|
||||||
|
"train = lgb.Dataset(data=x_train, \n",
|
||||||
|
" label=y_train)\n",
|
||||||
|
"\n",
|
||||||
|
"test = lgb.Dataset(data=x_test, \n",
|
||||||
|
" label=y_test,\n",
|
||||||
|
" reference=train)\n",
|
||||||
|
"\n",
|
||||||
|
"params = {'learning_rate' : 0.1,\n",
|
||||||
|
" 'boosting' : 'gbdt',\n",
|
||||||
|
" 'metric' : 'rmse',\n",
|
||||||
|
" 'feature_fraction' : 1,\n",
|
||||||
|
" 'bagging_fraction' : 1,\n",
|
||||||
|
" 'max_depth': 6,\n",
|
||||||
|
" 'num_leaves' : 31,\n",
|
||||||
|
" 'objective' : 'regression',\n",
|
||||||
|
" 'bagging_freq' : 1,\n",
|
||||||
|
" \"verbose\": -1,\n",
|
||||||
|
" 'min_data_per_leaf': 100}\n",
|
||||||
|
"\n",
|
||||||
|
"model = lgb.train(params, \n",
|
||||||
|
" num_boost_round=500,\n",
|
||||||
|
" train_set=train,\n",
|
||||||
|
" valid_sets=[train, test],\n",
|
||||||
|
" verbose_eval=50,\n",
|
||||||
|
" early_stopping_rounds=25)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('outputs', exist_ok=True)\n",
|
||||||
|
"joblib.dump(model, model_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model = Model.register(model_path=model_file,\n",
|
||||||
|
" model_name=model_name,\n",
|
||||||
|
" workspace=ws,\n",
|
||||||
|
" datasets=datasets)\n",
|
||||||
|
"\n",
|
||||||
|
"print(model_name, image_name, service_name, model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Deploy Model To AKS"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prepare Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
||||||
|
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Image creation may take up to 15 minutes.\n",
|
||||||
|
"\n",
|
||||||
|
"image_name = image_name + str(model.version)\n",
|
||||||
|
"\n",
|
||||||
|
"if not image_name in ws.images:\n",
|
||||||
|
" # Use the score.py defined in this directory as the execution script\n",
|
||||||
|
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
|
||||||
|
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||||
|
" runtime=\"python\",\n",
|
||||||
|
" conda_file=\"myenv.yml\",\n",
|
||||||
|
" description=\"Image with weather dataset model\")\n",
|
||||||
|
" image = ContainerImage.create(name=image_name,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" image_config=image_config,\n",
|
||||||
|
" workspace=ws)\n",
|
||||||
|
"\n",
|
||||||
|
" image.wait_for_creation(show_output=True)\n",
|
||||||
|
"else:\n",
|
||||||
|
" image = ws.images[image_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Compute Target"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_name = 'dd-demo-e2e'\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_name in ws.compute_targets:\n",
|
||||||
|
" aks_target = ComputeTarget.create(workspace=ws,\n",
|
||||||
|
" name=aks_name,\n",
|
||||||
|
" provisioning_configuration=prov_config)\n",
|
||||||
|
"\n",
|
||||||
|
" aks_target.wait_for_completion(show_output=True)\n",
|
||||||
|
" print(aks_target.provisioning_state)\n",
|
||||||
|
" print(aks_target.provisioning_errors)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_target=ws.compute_targets[aks_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_service_name = service_name\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_service_name in ws.webservices:\n",
|
||||||
|
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
|
||||||
|
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
|
||||||
|
" name=aks_service_name,\n",
|
||||||
|
" image=image,\n",
|
||||||
|
" deployment_config=aks_config,\n",
|
||||||
|
" deployment_target=aks_target)\n",
|
||||||
|
" aks_service.wait_for_deployment(show_output=True)\n",
|
||||||
|
" print(aks_service.state)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_service = ws.webservices[aks_service_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Run DataDrift Analysis"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Send Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Download Scoring Data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Score Model on March 15, 2016 data\n",
|
||||||
|
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
||||||
|
"# Add the window feature column\n",
|
||||||
|
"scoring_df = add_window_col(scoring_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Drop features not used by the model\n",
|
||||||
|
"print(\"Dropping unnecessary columns\")\n",
|
||||||
|
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
||||||
|
"scoring_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
||||||
|
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
||||||
|
"extra_cols = ('Path', 'Column1')\n",
|
||||||
|
"for k in extra_cols:\n",
|
||||||
|
" columns_dict.pop(k, None)\n",
|
||||||
|
"training_columns = list(columns_dict.keys())\n",
|
||||||
|
"\n",
|
||||||
|
"categorical_columns = scoring_df.dtypes == object\n",
|
||||||
|
"categorical_columns = categorical_columns[categorical_columns == True]\n",
|
||||||
|
"\n",
|
||||||
|
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
|
||||||
|
"encoded_df = scoring_df.join(test_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
|
||||||
|
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
|
||||||
|
"for col in difference:\n",
|
||||||
|
" encoded_df[col] = 0\n",
|
||||||
|
"encoded_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Serialize dataframe to list of row dictionaries\n",
|
||||||
|
"encoded_dict = encoded_df.to_dict('records')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"\n",
|
||||||
|
"# retreive the API keys. AML generates two keys.\n",
|
||||||
|
"key1, key2 = aks_service.get_keys()\n",
|
||||||
|
"\n",
|
||||||
|
"total_count = len(scoring_df)\n",
|
||||||
|
"i = 0\n",
|
||||||
|
"load = []\n",
|
||||||
|
"for row in encoded_dict:\n",
|
||||||
|
" load.append(row)\n",
|
||||||
|
" i = i + 1\n",
|
||||||
|
" if i % 100 == 0:\n",
|
||||||
|
" payload = json.dumps({\"data\": load})\n",
|
||||||
|
" \n",
|
||||||
|
" # construct raw HTTP request and send to the service\n",
|
||||||
|
" payload_binary = bytes(payload,encoding = 'utf8')\n",
|
||||||
|
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||||
|
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
|
||||||
|
"\n",
|
||||||
|
" load = []\n",
|
||||||
|
" time.sleep(3)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure DataDrift"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"services = [service_name]\n",
|
||||||
|
"start = datetime.now() - timedelta(days=2)\n",
|
||||||
|
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
|
||||||
|
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
|
||||||
|
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
|
||||||
|
"\n",
|
||||||
|
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
|
||||||
|
"try:\n",
|
||||||
|
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
|
||||||
|
"except KeyError:\n",
|
||||||
|
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
||||||
|
" \n",
|
||||||
|
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Run an Adhoc DataDriftDetector Run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"target_date = datetime.today()\n",
|
||||||
|
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"exp = Experiment(ws, datadrift._id)\n",
|
||||||
|
"dd_run = Run(experiment=exp, run_id=run)\n",
|
||||||
|
"RunDetails(dd_run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Get Drift Analysis Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"children = list(dd_run.get_children())\n",
|
||||||
|
"for child in children:\n",
|
||||||
|
" child.wait_for_completion()\n",
|
||||||
|
"\n",
|
||||||
|
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
||||||
|
"drift_metrics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Show all drift figures, one per serivice.\n",
|
||||||
|
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
||||||
|
"\n",
|
||||||
|
"drift_figures = datadrift.show(with_details=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Enable DataDrift Schedule"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"datadrift.enable_schedule()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rafarmah"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
58
contrib/datadrift/score.py
Normal file
58
contrib/datadrift/score.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
import pickle
|
||||||
|
import json
|
||||||
|
import numpy
|
||||||
|
import azureml.train.automl
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from azureml.core.model import Model
|
||||||
|
from azureml.core.run import Run
|
||||||
|
from azureml.monitoring import ModelDataCollector
|
||||||
|
import time
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
||||||
|
|
||||||
|
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
||||||
|
model_path = Model.get_model_path(model_name="driftmodel")
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
||||||
|
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
||||||
|
"temperature-7"]
|
||||||
|
|
||||||
|
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
||||||
|
|
||||||
|
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
||||||
|
identifier="inputs",
|
||||||
|
feature_names=feature_names)
|
||||||
|
|
||||||
|
prediction_dc = ModelDataCollector("driftmodel",
|
||||||
|
identifier="predictions",
|
||||||
|
feature_names=["temperature"])
|
||||||
|
|
||||||
|
|
||||||
|
def run(raw_data):
|
||||||
|
global inputs_dc, prediction_dc
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_data)["data"]
|
||||||
|
data = pd.DataFrame(data)
|
||||||
|
|
||||||
|
# Remove the categorical features as the model expects OHE values
|
||||||
|
input_data = data.drop(categorical_features, axis=1)
|
||||||
|
|
||||||
|
result = model.predict(input_data)
|
||||||
|
|
||||||
|
# Collect the non-OHE dataframe
|
||||||
|
collected_df = data[feature_names]
|
||||||
|
|
||||||
|
inputs_dc.collect(collected_df.values)
|
||||||
|
prediction_dc.collect(result)
|
||||||
|
return result.tolist()
|
||||||
|
except Exception as e:
|
||||||
|
error = str(e)
|
||||||
|
|
||||||
|
print(error + time.strftime("%H:%M:%S"))
|
||||||
|
return error
|
||||||
@@ -179,6 +179,26 @@ jupyter notebook
|
|||||||
- Simple example of using automated ML for classification with ONNX models
|
- Simple example of using automated ML for classification with ONNX models
|
||||||
- Uses local compute for training
|
- Uses local compute for training
|
||||||
|
|
||||||
|
- [auto-ml-bank-marketing-subscribers-with-deployment.ipynb](bank-marketing-subscribers-with-deployment/auto-ml-bank-marketing-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
|
||||||
|
- Simple example of using automated ML for classification to predict term deposit subscriptions for a bank
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-creditcard-with-deployment.ipynb](credit-card-fraud-detection-with-deployment/auto-ml-creditcard-with-deployment.ipynb)
|
||||||
|
- Dataset: Kaggle's [credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
|
||||||
|
- Simple example of using automated ML for classification to fraudulent credit card transactions
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-hardware-performance-with-deployment.ipynb](hardware-performance-prediction-with-deployment/auto-ml-hardware-performance-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [computer hardware dataset](https://archive.ics.uci.edu/ml/datasets/Computer+Hardware)
|
||||||
|
- Simple example of using automated ML for regression to predict the performance of certain combinations of hardware components
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
|
- [auto-ml-concrete-strength-with-deployment.ipynb](predicting-concrete-strength-with-deployment/auto-ml-concrete-strength-with-deployment.ipynb)
|
||||||
|
- Dataset: UCI's [concrete compressive strength dataset](https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set)
|
||||||
|
- Simple example of using automated ML for regression to predict the strength predict the compressive strength of concrete based off of different ingredient combinations and quantities of those ingredients
|
||||||
|
- Uses azure compute for training
|
||||||
|
|
||||||
<a name="documentation"></a>
|
<a name="documentation"></a>
|
||||||
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
|
||||||
|
|
||||||
|
|||||||
@@ -9,6 +9,8 @@ IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
|
|||||||
|
|
||||||
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
IF NOT EXIST %automl_env_file% GOTO YmlMissing
|
||||||
|
|
||||||
|
IF "%CONDA_EXE%"=="" GOTO CondaMissing
|
||||||
|
|
||||||
call conda activate %conda_env_name% 2>nul:
|
call conda activate %conda_env_name% 2>nul:
|
||||||
|
|
||||||
if not errorlevel 1 (
|
if not errorlevel 1 (
|
||||||
@@ -42,6 +44,15 @@ IF NOT "%options%"=="nolaunch" (
|
|||||||
|
|
||||||
goto End
|
goto End
|
||||||
|
|
||||||
|
:CondaMissing
|
||||||
|
echo Please run this script from an Anaconda Prompt window.
|
||||||
|
echo You can start an Anaconda Prompt window by
|
||||||
|
echo typing Anaconda Prompt on the Start menu.
|
||||||
|
echo If you don't see the Anaconda Prompt app, install Miniconda.
|
||||||
|
echo If you are running an older version of Miniconda or Anaconda,
|
||||||
|
echo you can upgrade using the command: conda update conda
|
||||||
|
goto End
|
||||||
|
|
||||||
:YmlMissing
|
:YmlMissing
|
||||||
echo File %automl_env_file% not found.
|
echo File %automl_env_file% not found.
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,742 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Classification with Deployment using a Bank Marketing Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Deploy](#Deploy)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"In this example we use the UCI Bank Marketing dataset to showcase how you can use AutoML for a classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if the client will subscribe to a term deposit with the bank.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an experiment using an existing workspace.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Register the model.\n",
|
||||||
|
"6. Create a container image.\n",
|
||||||
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
|
"8. Test the ACI service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn import datasets\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# choose a name for experiment\n",
|
||||||
|
"experiment_name = 'automl-classification-bmarketing'\n",
|
||||||
|
"# project folder\n",
|
||||||
|
"project_folder = './sample_projects/automl-classification-bankmarketing'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data() script to be utilized in azure compute. To do this first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_Run_config."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here we create the script to be run in azure comput for loading the data, we load the bank marketing dataset into X_train and y_train. Next X_train and y_train is returned for training the model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y = df.pop(label_col)\n",
|
||||||
|
" y = y.values[:, None]\n",
|
||||||
|
" X = df.values\n",
|
||||||
|
" return X, y\n",
|
||||||
|
" \n",
|
||||||
|
"def get_data():\n",
|
||||||
|
" # Load the bank marketing datasets.\n",
|
||||||
|
" from sklearn.datasets import load_diabetes\n",
|
||||||
|
" from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
" X_train, y_train = _read_x_y('https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv', \"y\")\n",
|
||||||
|
"\n",
|
||||||
|
" columns = ['age','job','marital','education','default','housing','loan','contact','month','day_of_week','duration','campaign','pdays','previous','poutcome','emp.var.rate','cons.price.idx','cons.conf.idx','euribor3m','nr.employed','y']\n",
|
||||||
|
"\n",
|
||||||
|
" return { \"X\" : X_train, \"y\" : y_train[:,0] }"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 2,\n",
|
||||||
|
" \"primary_metric\": 'AUC_weighted',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
|
" debug_log = 'automl_errors.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy\n",
|
||||||
|
"\n",
|
||||||
|
"### Retrieve the Best Model\n",
|
||||||
|
"\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model trained on bank marketing data to predict if a client will subscribe to a term deposit'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||||
|
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"bmData\", 'type': \"automl_classification\"},\n",
|
||||||
|
" description = \"Image for automl classification sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n",
|
||||||
|
" description = 'sample service for Automl Classification')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-bankmarketing'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained split our data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y = df.pop(label_col)\n",
|
||||||
|
" y = y.values[:, None]\n",
|
||||||
|
" X = df.values\n",
|
||||||
|
" return X, y"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Load the bank marketing datasets.\n",
|
||||||
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"from numpy import array\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"X_test, y_test = _read_x_y('https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv',\"y\")\n",
|
||||||
|
"\n",
|
||||||
|
"columns = ['age','job','marital','education','default','housing','loan','contact','month','day_of_week','duration','campaign','pdays','previous','poutcome','emp.var.rate','cons.price.idx','cons.conf.idx','euribor3m','nr.employed','y']"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
|
"actual = array(y_test.tolist())\n",
|
||||||
|
"print(y_pred.shape, \" \", actual[:,0].shape)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"y_test = y_test[:,0]# Plot outputs\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred, color='b')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This Bank Marketing dataset is made available under the Creative Commons (CCO: Public Domain) License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: https://creativecommons.org/publicdomain/zero/1.0/ and is available at: https://www.kaggle.com/janiobachmann/bank-marketing-dataset .\n",
|
||||||
|
"\n",
|
||||||
|
"_**Acknowledgements**_\n",
|
||||||
|
"This data set is originally available within the UCI Machine Learning Database: https://archive.ics.uci.edu/ml/datasets/bank+marketing\n",
|
||||||
|
"\n",
|
||||||
|
"[Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,718 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Classification with Deployment using Credit Card Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Deploy](#Deploy)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"In this example we use the associated credit card dataset to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI). The classification goal is to predict if a creditcard transaction is or is not considered a fraudulent charge.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an experiment using an existing workspace.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Register the model.\n",
|
||||||
|
"6. Create a container image.\n",
|
||||||
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
||||||
|
"8. Test the ACI service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.train.automl.run import AutoMLRun"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# choose a name for experiment\n",
|
||||||
|
"experiment_name = 'automl-classification-ccard'\n",
|
||||||
|
"# project folder\n",
|
||||||
|
"project_folder = './sample_projects/automl-classification-creditcard'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment=Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy','py-xgboost<=0.80'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the credit card dataset into cards and store the Class column (y) in the y variable and store the remaining data in the x variable. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
" \n",
|
||||||
|
"def get_data():\n",
|
||||||
|
" cards = pd.read_csv(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\")\n",
|
||||||
|
" y = cards.Class\n",
|
||||||
|
" x = cards.drop('Class', axis=1)\n",
|
||||||
|
" X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.2, random_state=1)\n",
|
||||||
|
" \n",
|
||||||
|
" return { \"X\" : X_train, \"y\" : y_train.values}"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 2,\n",
|
||||||
|
" \"primary_metric\": 'average_precision_score_weighted',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
||||||
|
" debug_log = 'automl_errors_20190417.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
||||||
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy\n",
|
||||||
|
"\n",
|
||||||
|
"### Retrieve the Best Model\n",
|
||||||
|
"\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
||||||
|
" pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"cards\", 'type': \"automl_classification\"},\n",
|
||||||
|
" description = \"Image for automl classification sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"cards\", 'type': \"automl_classification\"}, \n",
|
||||||
|
" description = 'sample service for Automl Classification')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-creditcard'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"cards = pd.read_csv(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\")\n",
|
||||||
|
"print(cards.head())\n",
|
||||||
|
"y = cards.Class\n",
|
||||||
|
"x = cards.drop('Class', axis=1)\n",
|
||||||
|
"X_train, X_test, y_train, y_test = train_test_split(x,y,test_size=0.2, random_state=1)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"y_pred = fitted_model.predict(X_test)\n",
|
||||||
|
"y_pred"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#Randomly select and test\n",
|
||||||
|
"# Plot outputs\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred, color='b')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()\n",
|
||||||
|
"\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Universit\u00c3\u00a9 Libre de Bruxelles) on big data mining and fraud detection. More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project\n",
|
||||||
|
"Please cite the following works: \n",
|
||||||
|
"\u00e2\u20ac\u00a2\tAndrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tDal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tDal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE\n",
|
||||||
|
"o\tDal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
|
||||||
|
"\u00e2\u20ac\u00a2\tCarcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.7"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -129,6 +129,22 @@
|
|||||||
" test_size=0.2, \n",
|
" test_size=0.2, \n",
|
||||||
" random_state=0)\n",
|
" random_state=0)\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Ensure the x_train and x_test are pandas DataFrame."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
"# Convert the X_train and X_test to pandas DataFrame and set column names,\n",
|
||||||
"# This is needed for initializing the input variable names of ONNX model, \n",
|
"# This is needed for initializing the input variable names of ONNX model, \n",
|
||||||
"# and the prediction with the ONNX model using the inference helper.\n",
|
"# and the prediction with the ONNX model using the inference helper.\n",
|
||||||
@@ -158,6 +174,13 @@
|
|||||||
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Set the preprocess=True, currently the InferenceHelper only supports this mode."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -299,7 +322,7 @@
|
|||||||
" onnxrt_present = False\n",
|
" onnxrt_present = False\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def get_onnx_res(run):\n",
|
"def get_onnx_res(run):\n",
|
||||||
" res_path = '_debug_y_trans_converter.json'\n",
|
" res_path = 'onnx_resource.json'\n",
|
||||||
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
" run.download_file(name=constants.MODEL_RESOURCE_PATH_ONNX, output_file_path=res_path)\n",
|
||||||
" with open(res_path) as f:\n",
|
" with open(res_path) as f:\n",
|
||||||
" onnx_res = json.load(f)\n",
|
" onnx_res = json.load(f)\n",
|
||||||
@@ -316,7 +339,7 @@
|
|||||||
" print(pred_prob_onnx)\n",
|
" print(pred_prob_onnx)\n",
|
||||||
"else:\n",
|
"else:\n",
|
||||||
" if not python_version_compatible:\n",
|
" if not python_version_compatible:\n",
|
||||||
" print('Please use Python version 3.6 to run the inference helper.') \n",
|
" print('Please use Python version 3.6 or 3.7 to run the inference helper.') \n",
|
||||||
" if not onnxrt_present:\n",
|
" if not onnxrt_present:\n",
|
||||||
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
" print('Please install the onnxruntime package to do the prediction with ONNX model.')"
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -21,7 +21,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Automated Machine Learning\n",
|
"# Automated Machine Learning\n",
|
||||||
"_**Prepare Data using `azureml.dataprep` for Remote Execution (DSVM)**_\n",
|
"_**Prepare Data using `azureml.dataprep` for Remote Execution (AmlCompute)**_\n",
|
||||||
"\n",
|
"\n",
|
||||||
"## Contents\n",
|
"## Contents\n",
|
||||||
"1. [Introduction](#Introduction)\n",
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
|||||||
@@ -72,7 +72,6 @@
|
|||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
"# Squash warning messages for cleaner output in the notebook\n",
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
||||||
"\n",
|
"\n",
|
||||||
"\n",
|
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
"from azureml.train.automl import AutoMLConfig\n",
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
|||||||
@@ -65,10 +65,6 @@
|
|||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import warnings\n",
|
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
|||||||
@@ -67,10 +67,6 @@
|
|||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np\n",
|
"import numpy as np\n",
|
||||||
"import logging\n",
|
"import logging\n",
|
||||||
"import warnings\n",
|
|
||||||
"# Squash warning messages for cleaner output in the notebook\n",
|
|
||||||
"warnings.showwarning = lambda *args, **kwargs: None\n",
|
|
||||||
"\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"from azureml.core.workspace import Workspace\n",
|
"from azureml.core.workspace import Workspace\n",
|
||||||
"from azureml.core.experiment import Experiment\n",
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
|||||||
@@ -0,0 +1,812 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Data](#Data)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"In this example we use the Predicting Compressive Strength of Concrete Dataset to showcase how you can use AutoML for a regression problem. The regression goal is to predict the compressive strength of concrete based off of different ingredient combinations and the quantities of those ingredients.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Test the best fitted model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
" \n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
|
"experiment_name = 'automl-regression-concrete'\n",
|
||||||
|
"project_folder = './sample_projects/automl-regression-concrete'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace Name'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the concrete strength dataset into the X and y variables. Next, split the data using train_test_split and return X_train and y_train for training the model. Finally, return X_train and y_train for training the model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y = df.pop(label_col)\n",
|
||||||
|
" y = y.values[:, None]\n",
|
||||||
|
" X = df.values\n",
|
||||||
|
" return X, y\n",
|
||||||
|
" \n",
|
||||||
|
"def get_data():\n",
|
||||||
|
" X,y = _read_x_y(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\",\"CONCRETE\")\n",
|
||||||
|
" X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)\n",
|
||||||
|
" \n",
|
||||||
|
" return { \"X\" : X_train, \"y\" : y_train[:,0] }"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 5,\n",
|
||||||
|
" \"primary_metric\": 'spearman_correlation',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
|
" debug_log = 'automl.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results\n",
|
||||||
|
"Widget for Monitoring Runs\n",
|
||||||
|
"The widget will first report a \u00e2\u20ac\u0153loading status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"Note: The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n",
|
||||||
|
"Retrieve All Child Runs\n",
|
||||||
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"children = list(remote_run.get_children())\n",
|
||||||
|
"metricslist = {}\n",
|
||||||
|
"for run in children:\n",
|
||||||
|
" properties = run.get_properties()\n",
|
||||||
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
|
"\n",
|
||||||
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
|
"rundata"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve the Best Model\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"iteration = 3\n",
|
||||||
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
|
"print(third_run)\n",
|
||||||
|
"print(third_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"},\n",
|
||||||
|
" description = \"Image for automl regression sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
|
" description = 'sample service for Automl Regression')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-concrete'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y = df.pop(label_col)\n",
|
||||||
|
" y = y.values[:, None]\n",
|
||||||
|
" X = df.values\n",
|
||||||
|
" return X, y"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"X,y = _read_x_y(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/compresive_strength_concrete.csv\",\"CONCRETE\")\n",
|
||||||
|
"X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)\n",
|
||||||
|
"\n",
|
||||||
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
|
"\n",
|
||||||
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
|
"y_residual_test = y_test - y_pred_test\n",
|
||||||
|
"\n",
|
||||||
|
"y_residual_train.shape"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
|
"\n",
|
||||||
|
"# Set up a multi-plot chart.\n",
|
||||||
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
|
"f.set_figheight(6)\n",
|
||||||
|
"f.set_figwidth(16)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of training set.\n",
|
||||||
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)), fontsize = 12)\n",
|
||||||
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot a histogram.\n",
|
||||||
|
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), bins = 10, histtype = 'step')\n",
|
||||||
|
"#a0.hist(y_residual_train, orientation = 'horizontal', color = ['b']*len(y_residual_train), alpha = 0.2, bins = 10)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of test set.\n",
|
||||||
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)), fontsize = 12)\n",
|
||||||
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
|
"a1.set_yticklabels([])\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot a histogram.\n",
|
||||||
|
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), bins = 10, histtype = 'step')\n",
|
||||||
|
"#a1.hist(y_residual_test, orientation = 'horizontal', color = ['b']*len(y_residual_test), alpha = 0.2, bins = 10)\n",
|
||||||
|
"\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Plot outputs\n",
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred_test, color='b')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements\n",
|
||||||
|
"\n",
|
||||||
|
"This Predicting Compressive Strength of Concrete Dataset is made available under the CC0 1.0 Universal (CC0 1.0)\n",
|
||||||
|
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0)\n",
|
||||||
|
"Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/pavanraj159/concrete-compressive-strength-data-set and http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength\n",
|
||||||
|
"\n",
|
||||||
|
"I-Cheng Yeh, \"Modeling of strength of high performance concrete using artificial neural networks,\" Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). \n",
|
||||||
|
"\n",
|
||||||
|
"Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,823 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Automated Machine Learning\n",
|
||||||
|
"_**Regression with Deployment using Hardware Performance Dataset**_\n",
|
||||||
|
"\n",
|
||||||
|
"## Contents\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
"1. [Data](#Data)\n",
|
||||||
|
"1. [Train](#Train)\n",
|
||||||
|
"1. [Results](#Results)\n",
|
||||||
|
"1. [Test](#Test)\n",
|
||||||
|
"1. [Acknowledgements](#Acknowledgements)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Introduction\n",
|
||||||
|
"In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. The Regression goal is to predict the performance of certain combinations of hardware parts.\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will learn how to:\n",
|
||||||
|
"1. Create an `Experiment` in an existing `Workspace`.\n",
|
||||||
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
||||||
|
"3. Train the model using local compute.\n",
|
||||||
|
"4. Explore the results.\n",
|
||||||
|
"5. Test the best fitted model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"As part of the setup you have already created an Azure ML Workspace object. For AutoML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import logging\n",
|
||||||
|
"\n",
|
||||||
|
"from matplotlib import pyplot as plt\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import os\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
" \n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for the experiment and specify the project folder.\n",
|
||||||
|
"experiment_name = 'automl-regression-hardware'\n",
|
||||||
|
"project_folder = './sample_projects/automl-remote-regression'\n",
|
||||||
|
"\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"\n",
|
||||||
|
"output = {}\n",
|
||||||
|
"output['SDK version'] = azureml.core.VERSION\n",
|
||||||
|
"output['Subscription ID'] = ws.subscription_id\n",
|
||||||
|
"output['Workspace Name'] = ws.name\n",
|
||||||
|
"output['Resource Group'] = ws.resource_group\n",
|
||||||
|
"output['Location'] = ws.location\n",
|
||||||
|
"output['Project Directory'] = project_folder\n",
|
||||||
|
"output['Experiment Name'] = experiment.name\n",
|
||||||
|
"pd.set_option('display.max_colwidth', -1)\n",
|
||||||
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
||||||
|
"outputDf.T"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
|
||||||
|
"#### Creation of AmlCompute takes approximately 5 minutes. \n",
|
||||||
|
"If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.core.compute import ComputeTarget\n",
|
||||||
|
"\n",
|
||||||
|
"# Choose a name for your cluster.\n",
|
||||||
|
"amlcompute_cluster_name = \"automlcl\"\n",
|
||||||
|
"\n",
|
||||||
|
"found = False\n",
|
||||||
|
"# Check if this compute target already exists in the workspace.\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
|
||||||
|
" found = True\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
" compute_target = cts[amlcompute_cluster_name]\n",
|
||||||
|
" \n",
|
||||||
|
"if not found:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
|
||||||
|
" #vm_priority = 'lowpriority', # optional\n",
|
||||||
|
" max_nodes = 6)\n",
|
||||||
|
"\n",
|
||||||
|
" # Create the cluster.\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
|
||||||
|
" \n",
|
||||||
|
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
|
||||||
|
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
|
||||||
|
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
|
||||||
|
" \n",
|
||||||
|
" # For a more detailed view of current AmlCompute status, use get_status()."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here load the data in the get_data script to be utilized in azure compute. To do this, first load all the necessary libraries and dependencies to set up paths for the data and to create the conda_run_config."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"if not os.path.isdir('data'):\n",
|
||||||
|
" os.mkdir('data')\n",
|
||||||
|
" \n",
|
||||||
|
"if not os.path.exists(project_folder):\n",
|
||||||
|
" os.makedirs(project_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import RunConfiguration\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"# create a new RunConfig object\n",
|
||||||
|
"conda_run_config = RunConfiguration(framework=\"python\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Set compute target to AmlCompute\n",
|
||||||
|
"conda_run_config.target = compute_target\n",
|
||||||
|
"conda_run_config.environment.docker.enabled = True\n",
|
||||||
|
"conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], conda_packages=['numpy'])\n",
|
||||||
|
"conda_run_config.environment.python.conda_dependencies = cd"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Load Data\n",
|
||||||
|
"\n",
|
||||||
|
"Here create the script to be run in azure compute for loading the data, load the hardware dataset into the X and y variables. Next split the data using train_test_split and return X_train and y_train for training the model."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile $project_folder/get_data.py\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
|
"\n",
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y = df.pop(label_col)\n",
|
||||||
|
" y = y.values[:, None]\n",
|
||||||
|
" X = df.values\n",
|
||||||
|
" return X, y\n",
|
||||||
|
" \n",
|
||||||
|
"def get_data():\n",
|
||||||
|
" X,y = _read_x_y(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\",\"ERP\")\n",
|
||||||
|
" X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)\n",
|
||||||
|
" \n",
|
||||||
|
" return { \"X\" : X_train, \"y\" : y_train[:,0] }"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n",
|
||||||
|
"## Train\n",
|
||||||
|
"\n",
|
||||||
|
"Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
|
||||||
|
"\n",
|
||||||
|
"|Property|Description|\n",
|
||||||
|
"|-|-|\n",
|
||||||
|
"|**task**|classification or regression|\n",
|
||||||
|
"|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
|
||||||
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
||||||
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
||||||
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||||
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
||||||
|
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
|
||||||
|
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n",
|
||||||
|
"\n",
|
||||||
|
"**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### If you would like to see even better results increase \"iteration_time_out minutes\" to 10+ mins and increase \"iterations\" to a minimum of 30"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"iteration_timeout_minutes\": 5,\n",
|
||||||
|
" \"iterations\": 10,\n",
|
||||||
|
" \"n_cross_validations\": 5,\n",
|
||||||
|
" \"primary_metric\": 'spearman_correlation',\n",
|
||||||
|
" \"preprocess\": True,\n",
|
||||||
|
" \"max_concurrent_iterations\": 5,\n",
|
||||||
|
" \"verbosity\": logging.INFO,\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(task = 'regression',\n",
|
||||||
|
" debug_log = 'automl_errors_20190417.log',\n",
|
||||||
|
" path = project_folder,\n",
|
||||||
|
" run_configuration=conda_run_config,\n",
|
||||||
|
" data_script = project_folder + \"/get_data.py\",\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config, show_output = False)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"remote_run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Widget for Monitoring Runs\n",
|
||||||
|
"\n",
|
||||||
|
"The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(remote_run).show() "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.train.automl.run import AutoMLRun\n",
|
||||||
|
"setup_run = AutoMLRun(experiment, remote_run.id + \"_setup\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Wait until the run finishes.\n",
|
||||||
|
"remote_run.wait_for_completion(show_output = True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve All Child Runs\n",
|
||||||
|
"You can also use SDK methods to fetch all the child runs and see individual metrics that we log."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"children = list(remote_run.get_children())\n",
|
||||||
|
"metricslist = {}\n",
|
||||||
|
"for run in children:\n",
|
||||||
|
" properties = run.get_properties()\n",
|
||||||
|
" metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n",
|
||||||
|
" metricslist[int(properties['iteration'])] = metrics\n",
|
||||||
|
"\n",
|
||||||
|
"rundata = pd.DataFrame(metricslist).sort_index(1)\n",
|
||||||
|
"rundata"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Retrieve the Best Model\n",
|
||||||
|
"Below we select the best pipeline from our iterations. The get_output method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on get_output allow you to retrieve the best run and fitted model for any logged metric or for a particular iteration."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"#### Best Model Based on Any Other Metric\n",
|
||||||
|
"Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"lookup_metric = \"root_mean_squared_error\"\n",
|
||||||
|
"best_run, fitted_model = remote_run.get_output(metric = lookup_metric)\n",
|
||||||
|
"print(best_run)\n",
|
||||||
|
"print(fitted_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"iteration = 3\n",
|
||||||
|
"third_run, third_model = remote_run.get_output(iteration = iteration)\n",
|
||||||
|
"print(third_run)\n",
|
||||||
|
"print(third_model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register the Fitted Model for Deployment\n",
|
||||||
|
"If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"description = 'AutoML Model'\n",
|
||||||
|
"tags = None\n",
|
||||||
|
"model = remote_run.register_model(description = description, tags = tags)\n",
|
||||||
|
"\n",
|
||||||
|
"print(remote_run.model_id) # This will be written to the script file later in the notebook."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Scoring Script\n",
|
||||||
|
"The scoring script is required to generate the image for deployment. It contains the code to do the predictions on input data."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%writefile score.py\n",
|
||||||
|
"import pickle\n",
|
||||||
|
"import json\n",
|
||||||
|
"import numpy\n",
|
||||||
|
"import azureml.train.automl\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"\n",
|
||||||
|
"def init():\n",
|
||||||
|
" global model\n",
|
||||||
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
||||||
|
" # deserialize the model file back into a sklearn model\n",
|
||||||
|
" model = joblib.load(model_path)\n",
|
||||||
|
"\n",
|
||||||
|
"def run(rawdata):\n",
|
||||||
|
" try:\n",
|
||||||
|
" data = json.loads(rawdata)['data']\n",
|
||||||
|
" data = numpy.array(data)\n",
|
||||||
|
" result = model.predict(data)\n",
|
||||||
|
" except Exception as e:\n",
|
||||||
|
" result = str(e)\n",
|
||||||
|
" return json.dumps({\"error\": result})\n",
|
||||||
|
" return json.dumps({\"result\":result.tolist()})"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a YAML File for the Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. Details about retrieving the versions can be found in notebook [12.auto-ml-retrieve-the-training-sdk-versions](12.auto-ml-retrieve-the-training-sdk-versions.ipynb)."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dependencies = remote_run.get_run_sdk_dependencies(iteration = 1)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"for p in ['azureml-train-automl', 'azureml-sdk', 'azureml-core']:\n",
|
||||||
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"conda_env_file_name = 'myenv.yml'\n",
|
||||||
|
"myenv.save_to_file('.', conda_env_file_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Substitute the actual version number in the environment file.\n",
|
||||||
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
||||||
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-sdk']))\n",
|
||||||
|
"\n",
|
||||||
|
"# Substitute the actual model id in the script file.\n",
|
||||||
|
"\n",
|
||||||
|
"script_file_name = 'score.py'\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'r') as cefr:\n",
|
||||||
|
" content = cefr.read()\n",
|
||||||
|
"\n",
|
||||||
|
"with open(script_file_name, 'w') as cefw:\n",
|
||||||
|
" cefw.write(content.replace('<<modelid>>', remote_run.model_id))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a Container Image\n",
|
||||||
|
"\n",
|
||||||
|
"Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n",
|
||||||
|
"or when testing a model that is under development."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import Image, ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"image_config = ContainerImage.image_configuration(runtime= \"python\",\n",
|
||||||
|
" execution_script = script_file_name,\n",
|
||||||
|
" conda_file = conda_env_file_name,\n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"},\n",
|
||||||
|
" description = \"Image for automl regression sample\")\n",
|
||||||
|
"\n",
|
||||||
|
"image = Image.create(name = \"automlsampleimage\",\n",
|
||||||
|
" # this is the model object \n",
|
||||||
|
" models = [model],\n",
|
||||||
|
" image_config = image_config, \n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"\n",
|
||||||
|
"image.wait_for_creation(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"if image.creation_state == 'Failed':\n",
|
||||||
|
" print(\"Image build log at: \" + image.image_build_log_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy the Image as a Web Service on Azure Container Instance\n",
|
||||||
|
"\n",
|
||||||
|
"Deploy an image that contains the model and other assets needed by the service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
||||||
|
" memory_gb = 1, \n",
|
||||||
|
" tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n",
|
||||||
|
" description = 'sample service for Automl Regression')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-sample-hardware'\n",
|
||||||
|
"print(aci_service_name)\n",
|
||||||
|
"aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n",
|
||||||
|
" image = image,\n",
|
||||||
|
" name = aci_service_name,\n",
|
||||||
|
" workspace = ws)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Delete a Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Deletes the specified web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.delete()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Get Logs from a Deployed Web Service\n",
|
||||||
|
"\n",
|
||||||
|
"Gets logs from a deployed web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"#aci_service.get_logs()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Test\n",
|
||||||
|
"\n",
|
||||||
|
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def _read_x_y(file_name, label_col):\n",
|
||||||
|
" df = pd.read_csv(file_name)\n",
|
||||||
|
" y_split = None\n",
|
||||||
|
" if label_col in df.columns:\n",
|
||||||
|
" y_split = df.pop(label_col)\n",
|
||||||
|
" y_split = y_split.values[:, None]\n",
|
||||||
|
" X_split = df.values\n",
|
||||||
|
" return X_split, y_split\n",
|
||||||
|
" \n",
|
||||||
|
"\n",
|
||||||
|
"X,y = _read_x_y(\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\",\"ERP\")\n",
|
||||||
|
"X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"##### Predict on training and test set, and calculate residual values."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"y_pred_train = fitted_model.predict(X_train)\n",
|
||||||
|
"y_residual_train = y_train - y_pred_train\n",
|
||||||
|
"\n",
|
||||||
|
"y_pred_test = fitted_model.predict(X_test)\n",
|
||||||
|
"y_residual_test = y_test - y_pred_test"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Calculate metrics for the prediction\n",
|
||||||
|
"\n",
|
||||||
|
"Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
|
||||||
|
"from the trained model that was returned."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"from sklearn.metrics import mean_squared_error, r2_score\n",
|
||||||
|
"\n",
|
||||||
|
"# Set up a multi-plot chart.\n",
|
||||||
|
"f, (a0, a1) = plt.subplots(1, 2, gridspec_kw = {'width_ratios':[1, 1], 'wspace':0, 'hspace': 0})\n",
|
||||||
|
"f.suptitle('Regression Residual Values', fontsize = 18)\n",
|
||||||
|
"f.set_figheight(6)\n",
|
||||||
|
"f.set_figwidth(16)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of training set.\n",
|
||||||
|
"a0.axis([0, 360, -200, 200])\n",
|
||||||
|
"a0.plot(y_residual_train, 'bo', alpha = 0.5)\n",
|
||||||
|
"a0.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a0.text(16,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_train, y_pred_train))), fontsize = 12)\n",
|
||||||
|
"a0.text(16,140,'R2 score = {0:.2f}'.format(r2_score(y_train, y_pred_train)),fontsize = 12)\n",
|
||||||
|
"a0.set_xlabel('Training samples', fontsize = 12)\n",
|
||||||
|
"a0.set_ylabel('Residual Values', fontsize = 12)\n",
|
||||||
|
"\n",
|
||||||
|
"# Plot residual values of test set.\n",
|
||||||
|
"a1.axis([0, 90, -200, 200])\n",
|
||||||
|
"a1.plot(y_residual_test, 'bo', alpha = 0.5)\n",
|
||||||
|
"a1.plot([-10,360],[0,0], 'r-', lw = 3)\n",
|
||||||
|
"a1.text(5,170,'RMSE = {0:.2f}'.format(np.sqrt(mean_squared_error(y_test, y_pred_test))), fontsize = 12)\n",
|
||||||
|
"a1.text(5,140,'R2 score = {0:.2f}'.format(r2_score(y_test, y_pred_test)),fontsize = 12)\n",
|
||||||
|
"a1.set_xlabel('Test samples', fontsize = 12)\n",
|
||||||
|
"a1.set_yticklabels([])\n",
|
||||||
|
"\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib notebook\n",
|
||||||
|
"test_pred = plt.scatter(y_test, y_pred_test, color='')\n",
|
||||||
|
"test_test = plt.scatter(y_test, y_test, color='g')\n",
|
||||||
|
"plt.legend((test_pred, test_test), ('prediction', 'truth'), loc='upper left', fontsize=8)\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Acknowledgements\n",
|
||||||
|
"This Predicting Hardware Performance Dataset is made available under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/. Any rights in individual contents of the database are licensed under the CC0 1.0 Universal (CC0 1.0) Public Domain Dedication License: https://creativecommons.org/publicdomain/zero/1.0/ . The dataset itself can be found here: https://www.kaggle.com/faizunnabi/comp-hardware-performance and https://archive.ics.uci.edu/ml/datasets/Computer+Hardware\n",
|
||||||
|
"\n",
|
||||||
|
"_**Citation Found Here**_\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "v-rasav"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.7.1"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -126,25 +126,6 @@
|
|||||||
"**Note:** Creation of a new workspace can take several minutes."
|
"**Note:** Creation of a new workspace can take several minutes."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"##TESTONLY\n",
|
|
||||||
"# Import the Workspace class and check the Azure ML SDK version.\n",
|
|
||||||
"from azureml.core import Workspace\n",
|
|
||||||
"\n",
|
|
||||||
"ws = Workspace.create(name = workspace_name,\n",
|
|
||||||
" subscription_id = subscription_id,\n",
|
|
||||||
" resource_group = resource_group, \n",
|
|
||||||
" location = workspace_region,\n",
|
|
||||||
" auth = auth_sp,\n",
|
|
||||||
" exist_ok=True)\n",
|
|
||||||
"ws.get_details()"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -612,7 +593,7 @@
|
|||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3.6",
|
"display_name": "Python 3.6",
|
||||||
"language": "Python",
|
"language": "Python",
|
||||||
"name": "Python36"
|
"name": "python36"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"language_info": {
|
||||||
"codemirror_mode": {
|
"codemirror_mode": {
|
||||||
|
|||||||
709
how-to-use-azureml/data-drift/azure-ml-datadrift.ipynb
Normal file
709
how-to-use-azureml/data-drift/azure-ml-datadrift.ipynb
Normal file
@@ -0,0 +1,709 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Track Data Drift between Training and Inference Data in Production \n",
|
||||||
|
"\n",
|
||||||
|
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
|
||||||
|
"\n",
|
||||||
|
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
||||||
|
"\n",
|
||||||
|
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Prerequisites and Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Install the DataDrift package\n",
|
||||||
|
"\n",
|
||||||
|
"Install the azureml-contrib-datadrift, azureml-contrib-opendatasets and lightgbm packages before running this notebook.\n",
|
||||||
|
"```\n",
|
||||||
|
"pip install azureml-contrib-datadrift\n",
|
||||||
|
"pip install azureml-contrib-datasets\n",
|
||||||
|
"pip install lightgbm\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Import Dependencies"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import os\n",
|
||||||
|
"import time\n",
|
||||||
|
"from datetime import datetime, timedelta\n",
|
||||||
|
"\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import requests\n",
|
||||||
|
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
||||||
|
"from azureml.contrib.opendatasets import NoaaIsdWeather\n",
|
||||||
|
"from azureml.core import Dataset, Workspace, Run\n",
|
||||||
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
||||||
|
"prefix = \"dd\"\n",
|
||||||
|
"\n",
|
||||||
|
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
|
||||||
|
"model_name = \"driftmodel\"\n",
|
||||||
|
"image_name = \"{}driftimage\".format(prefix)\n",
|
||||||
|
"service_name = \"{}driftservice\".format(prefix)\n",
|
||||||
|
"\n",
|
||||||
|
"# optionally, set email address to receive an email alert for DataDrift\n",
|
||||||
|
"email_address = \"\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Generate Train/Testing Data\n",
|
||||||
|
"\n",
|
||||||
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
||||||
|
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
||||||
|
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
|
||||||
|
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
|
||||||
|
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
|
||||||
|
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
|
||||||
|
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
|
||||||
|
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
|
||||||
|
" '725066']\n",
|
||||||
|
"\n",
|
||||||
|
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def enrich_weather_noaa_data(noaa_df):\n",
|
||||||
|
" hours_in_day = 23\n",
|
||||||
|
" week_in_year = 52\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
|
||||||
|
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
"\n",
|
||||||
|
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" \n",
|
||||||
|
" return noaa_df\n",
|
||||||
|
"\n",
|
||||||
|
"def add_window_col(input_df):\n",
|
||||||
|
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
||||||
|
" df_shifted = input_df.copy()\n",
|
||||||
|
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||||
|
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
||||||
|
"\n",
|
||||||
|
" # merge, keeping only observations where -1 lag is present\n",
|
||||||
|
" df2 = pd.merge(input_df,\n",
|
||||||
|
" df_shifted,\n",
|
||||||
|
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
|
||||||
|
" how='inner', # use 'left' to keep observations without lags\n",
|
||||||
|
" suffixes=['', '-7'])\n",
|
||||||
|
" return df2\n",
|
||||||
|
"\n",
|
||||||
|
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
|
||||||
|
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
|
||||||
|
" # Read into Pandas data frame.\n",
|
||||||
|
" noaa_df = isd.to_pandas_dataframe()\n",
|
||||||
|
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
|
||||||
|
" \n",
|
||||||
|
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
|
||||||
|
" df_filtered.reset_index(drop=True)\n",
|
||||||
|
" \n",
|
||||||
|
" # Enrich with time features\n",
|
||||||
|
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
|
||||||
|
" \n",
|
||||||
|
" return df_enriched\n",
|
||||||
|
"\n",
|
||||||
|
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
|
||||||
|
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
|
||||||
|
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
|
||||||
|
" noaa_df = pd.concat([df_1, df_2])\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Adding window feature\")\n",
|
||||||
|
" df_window = add_window_col(noaa_df)\n",
|
||||||
|
" \n",
|
||||||
|
" cat_columns = df_window.dtypes == object\n",
|
||||||
|
" cat_columns = cat_columns[cat_columns == True]\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Encoding categorical columns\")\n",
|
||||||
|
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Dropping unnecessary columns\")\n",
|
||||||
|
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
||||||
|
" \n",
|
||||||
|
" return df_featurized"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Train model on Jan 1 - 14, 2009 data\n",
|
||||||
|
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
||||||
|
"df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"label = \"temperature\"\n",
|
||||||
|
"x_df = df.drop(label, axis=1)\n",
|
||||||
|
"y_df = df[[label]]\n",
|
||||||
|
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
|
||||||
|
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
|
||||||
|
"\n",
|
||||||
|
"training_dir = 'outputs/training'\n",
|
||||||
|
"training_file = \"training.csv\"\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate training dataframe to register as Training Dataset\n",
|
||||||
|
"os.makedirs(training_dir, exist_ok=True)\n",
|
||||||
|
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
||||||
|
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create/Register Training Dataset"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dataset_name = \"dataset\"\n",
|
||||||
|
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
||||||
|
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
|
||||||
|
"\n",
|
||||||
|
"dstore = ws.get_default_datastore()\n",
|
||||||
|
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
|
||||||
|
"dpath = dstore.path(\"data/training/training.csv\")\n",
|
||||||
|
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
||||||
|
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
||||||
|
"\n",
|
||||||
|
"trainingDataSnapshot = trainingDataset.create_snapshot(snapshot_name=snapshot_name, compute_target=None, create_data_snapshot=True)\n",
|
||||||
|
"datasets = [(Dataset.Scenario.TRAINING, trainingDataSnapshot)]\n",
|
||||||
|
"print(\"dataset registration done.\\n\")\n",
|
||||||
|
"datasets"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train and Save Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import lightgbm as lgb\n",
|
||||||
|
"\n",
|
||||||
|
"train = lgb.Dataset(data=x_train, \n",
|
||||||
|
" label=y_train)\n",
|
||||||
|
"\n",
|
||||||
|
"test = lgb.Dataset(data=x_test, \n",
|
||||||
|
" label=y_test,\n",
|
||||||
|
" reference=train)\n",
|
||||||
|
"\n",
|
||||||
|
"params = {'learning_rate' : 0.1,\n",
|
||||||
|
" 'boosting' : 'gbdt',\n",
|
||||||
|
" 'metric' : 'rmse',\n",
|
||||||
|
" 'feature_fraction' : 1,\n",
|
||||||
|
" 'bagging_fraction' : 1,\n",
|
||||||
|
" 'max_depth': 6,\n",
|
||||||
|
" 'num_leaves' : 31,\n",
|
||||||
|
" 'objective' : 'regression',\n",
|
||||||
|
" 'bagging_freq' : 1,\n",
|
||||||
|
" \"verbose\": -1,\n",
|
||||||
|
" 'min_data_per_leaf': 100}\n",
|
||||||
|
"\n",
|
||||||
|
"model = lgb.train(params, \n",
|
||||||
|
" num_boost_round=500,\n",
|
||||||
|
" train_set=train,\n",
|
||||||
|
" valid_sets=[train, test],\n",
|
||||||
|
" verbose_eval=50,\n",
|
||||||
|
" early_stopping_rounds=25)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('outputs', exist_ok=True)\n",
|
||||||
|
"joblib.dump(model, model_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model = Model.register(model_path=model_file,\n",
|
||||||
|
" model_name=model_name,\n",
|
||||||
|
" workspace=ws,\n",
|
||||||
|
" datasets=datasets)\n",
|
||||||
|
"\n",
|
||||||
|
"print(model_name, image_name, service_name, model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Deploy Model To AKS"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prepare Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
||||||
|
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Image creation may take up to 15 minutes.\n",
|
||||||
|
"\n",
|
||||||
|
"image_name = image_name + str(model.version)\n",
|
||||||
|
"\n",
|
||||||
|
"if not image_name in ws.images:\n",
|
||||||
|
" # Use the score.py defined in this directory as the execution script\n",
|
||||||
|
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
|
||||||
|
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||||
|
" runtime=\"python\",\n",
|
||||||
|
" conda_file=\"myenv.yml\",\n",
|
||||||
|
" description=\"Image with weather dataset model\")\n",
|
||||||
|
" image = ContainerImage.create(name=image_name,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" image_config=image_config,\n",
|
||||||
|
" workspace=ws)\n",
|
||||||
|
"\n",
|
||||||
|
" image.wait_for_creation(show_output=True)\n",
|
||||||
|
"else:\n",
|
||||||
|
" image = ws.images[image_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Compute Target"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_name = 'dd-demo-e2e'\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_name in ws.compute_targets:\n",
|
||||||
|
" aks_target = ComputeTarget.create(workspace=ws,\n",
|
||||||
|
" name=aks_name,\n",
|
||||||
|
" provisioning_configuration=prov_config)\n",
|
||||||
|
"\n",
|
||||||
|
" aks_target.wait_for_completion(show_output=True)\n",
|
||||||
|
" print(aks_target.provisioning_state)\n",
|
||||||
|
" print(aks_target.provisioning_errors)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_target=ws.compute_targets[aks_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_service_name = service_name\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_service_name in ws.webservices:\n",
|
||||||
|
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
|
||||||
|
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
|
||||||
|
" name=aks_service_name,\n",
|
||||||
|
" image=image,\n",
|
||||||
|
" deployment_config=aks_config,\n",
|
||||||
|
" deployment_target=aks_target)\n",
|
||||||
|
" aks_service.wait_for_deployment(show_output=True)\n",
|
||||||
|
" print(aks_service.state)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_service = ws.webservices[aks_service_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Run DataDrift Analysis"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Send Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Download Scoring Data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Score Model on March 15, 2016 data\n",
|
||||||
|
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
||||||
|
"# Add the window feature column\n",
|
||||||
|
"scoring_df = add_window_col(scoring_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Drop features not used by the model\n",
|
||||||
|
"print(\"Dropping unnecessary columns\")\n",
|
||||||
|
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
||||||
|
"scoring_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
||||||
|
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
||||||
|
"extra_cols = ('Path', 'Column1')\n",
|
||||||
|
"for k in extra_cols:\n",
|
||||||
|
" columns_dict.pop(k, None)\n",
|
||||||
|
"training_columns = list(columns_dict.keys())\n",
|
||||||
|
"\n",
|
||||||
|
"categorical_columns = scoring_df.dtypes == object\n",
|
||||||
|
"categorical_columns = categorical_columns[categorical_columns == True]\n",
|
||||||
|
"\n",
|
||||||
|
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
|
||||||
|
"encoded_df = scoring_df.join(test_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
|
||||||
|
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
|
||||||
|
"for col in difference:\n",
|
||||||
|
" encoded_df[col] = 0\n",
|
||||||
|
"encoded_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Serialize dataframe to list of row dictionaries\n",
|
||||||
|
"encoded_dict = encoded_df.to_dict('records')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"\n",
|
||||||
|
"# retreive the API keys. AML generates two keys.\n",
|
||||||
|
"key1, key2 = aks_service.get_keys()\n",
|
||||||
|
"\n",
|
||||||
|
"total_count = len(scoring_df)\n",
|
||||||
|
"i = 0\n",
|
||||||
|
"load = []\n",
|
||||||
|
"for row in encoded_dict:\n",
|
||||||
|
" load.append(row)\n",
|
||||||
|
" i = i + 1\n",
|
||||||
|
" if i % 100 == 0:\n",
|
||||||
|
" payload = json.dumps({\"data\": load})\n",
|
||||||
|
" \n",
|
||||||
|
" # construct raw HTTP request and send to the service\n",
|
||||||
|
" payload_binary = bytes(payload,encoding = 'utf8')\n",
|
||||||
|
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||||
|
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
|
||||||
|
"\n",
|
||||||
|
" load = []\n",
|
||||||
|
" time.sleep(3)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure DataDrift"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"services = [service_name]\n",
|
||||||
|
"start = datetime.now() - timedelta(days=2)\n",
|
||||||
|
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
|
||||||
|
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
|
||||||
|
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
|
||||||
|
"\n",
|
||||||
|
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
|
||||||
|
"try:\n",
|
||||||
|
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
|
||||||
|
"except KeyError:\n",
|
||||||
|
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
||||||
|
" \n",
|
||||||
|
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Run an Adhoc DataDriftDetector Run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"target_date = datetime.today()\n",
|
||||||
|
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"exp = Experiment(ws, datadrift._id)\n",
|
||||||
|
"dd_run = Run(experiment=exp, run_id=run)\n",
|
||||||
|
"RunDetails(dd_run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Get Drift Analysis Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"children = list(dd_run.get_children())\n",
|
||||||
|
"for child in children:\n",
|
||||||
|
" child.wait_for_completion()\n",
|
||||||
|
"\n",
|
||||||
|
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
||||||
|
"drift_metrics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Show all drift figures, one per serivice.\n",
|
||||||
|
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
||||||
|
"\n",
|
||||||
|
"drift_figures = datadrift.show(with_details=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Enable DataDrift Schedule"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"datadrift.enable_schedule()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rafarmah"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -1 +1,3 @@
|
|||||||
Under contruction...please visit again soon!
|
## Using data drift APIs
|
||||||
|
|
||||||
|
1. [Detect data drift for a model](azure-ml-datadrift.ipynb): Detect data drift for a deployed model.
|
||||||
58
how-to-use-azureml/data-drift/score.py
Normal file
58
how-to-use-azureml/data-drift/score.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
import pickle
|
||||||
|
import json
|
||||||
|
import numpy
|
||||||
|
import azureml.train.automl
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from azureml.core.model import Model
|
||||||
|
from azureml.core.run import Run
|
||||||
|
from azureml.monitoring import ModelDataCollector
|
||||||
|
import time
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
||||||
|
|
||||||
|
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
||||||
|
model_path = Model.get_model_path(model_name="driftmodel")
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
||||||
|
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
||||||
|
"temperature-7"]
|
||||||
|
|
||||||
|
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
||||||
|
|
||||||
|
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
||||||
|
identifier="inputs",
|
||||||
|
feature_names=feature_names)
|
||||||
|
|
||||||
|
prediction_dc = ModelDataCollector("driftmodel",
|
||||||
|
identifier="predictions",
|
||||||
|
feature_names=["temperature"])
|
||||||
|
|
||||||
|
|
||||||
|
def run(raw_data):
|
||||||
|
global inputs_dc, prediction_dc
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_data)["data"]
|
||||||
|
data = pd.DataFrame(data)
|
||||||
|
|
||||||
|
# Remove the categorical features as the model expects OHE values
|
||||||
|
input_data = data.drop(categorical_features, axis=1)
|
||||||
|
|
||||||
|
result = model.predict(input_data)
|
||||||
|
|
||||||
|
# Collect the non-OHE dataframe
|
||||||
|
collected_df = data[feature_names]
|
||||||
|
|
||||||
|
inputs_dc.collect(collected_df.values)
|
||||||
|
prediction_dc.collect(result)
|
||||||
|
return result.tolist()
|
||||||
|
except Exception as e:
|
||||||
|
error = str(e)
|
||||||
|
|
||||||
|
print(error + time.strftime("%H:%M:%S"))
|
||||||
|
return error
|
||||||
@@ -385,9 +385,9 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python36"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"language_info": {
|
||||||
"codemirror_mode": {
|
"codemirror_mode": {
|
||||||
|
|||||||
@@ -265,7 +265,7 @@
|
|||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for your CPU cluster\n",
|
"# Choose a name for your CPU cluster\n",
|
||||||
"cpu_cluster_name = \"cpucluster\"\n",
|
"cpu_cluster_name = \"cpu-cluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Verify that cluster does not exist already\n",
|
"# Verify that cluster does not exist already\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
@@ -370,7 +370,7 @@
|
|||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Choose a name for your CPU cluster\n",
|
"# Choose a name for your CPU cluster\n",
|
||||||
"cpu_cluster_name = \"cpucluster\"\n",
|
"cpu_cluster_name = \"cpu-cluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Verify that cluster does not exist already\n",
|
"# Verify that cluster does not exist already\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
@@ -506,7 +506,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
|
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
|
||||||
"# 'cpucluster' in this case but use a different VM family for instance.\n",
|
"# 'cpu-cluster' in this case but use a different VM family for instance.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# cpu_cluster.delete()"
|
"# cpu_cluster.delete()"
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -36,22 +36,6 @@
|
|||||||
"4. Visualize the global and local explanations with the visualization dashboard."
|
"4. Visualize the global and local explanations with the visualization dashboard."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
|
||||||
"cell_type": "markdown",
|
|
||||||
"metadata": {},
|
|
||||||
"source": [
|
|
||||||
"This example needs sklearn-pandas. If it is not installed, uncomment and run the following line."
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"cell_type": "code",
|
|
||||||
"execution_count": null,
|
|
||||||
"metadata": {},
|
|
||||||
"outputs": [],
|
|
||||||
"source": [
|
|
||||||
"#!pip install sklearn-pandas"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -63,7 +47,6 @@
|
|||||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
||||||
"from sklearn.linear_model import LogisticRegression\n",
|
"from sklearn.linear_model import LogisticRegression\n",
|
||||||
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
"from azureml.explain.model.tabular_explainer import TabularExplainer\n",
|
||||||
"from sklearn_pandas import DataFrameMapper\n",
|
|
||||||
"import pandas as pd\n",
|
"import pandas as pd\n",
|
||||||
"import numpy as np"
|
"import numpy as np"
|
||||||
]
|
]
|
||||||
@@ -113,6 +96,13 @@
|
|||||||
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
|
"x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"sklearn imports"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -121,7 +111,51 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from sklearn.pipeline import Pipeline\n",
|
"from sklearn.pipeline import Pipeline\n",
|
||||||
"from sklearn.impute import SimpleImputer\n",
|
"from sklearn.impute import SimpleImputer\n",
|
||||||
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
|
"from sklearn.preprocessing import StandardScaler, OneHotEncoder"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We can explain raw features by either using a `sklearn.compose.ColumnTransformer` or a list of fitted transformer tuples. The cell below uses `sklearn.compose.ColumnTransformer`. In case you want to run the example with the list of fitted transformer tuples, comment the cell below and uncomment the cell that follows after. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from sklearn.compose import ColumnTransformer\n",
|
||||||
|
"\n",
|
||||||
|
"transformations = ColumnTransformer([\n",
|
||||||
|
" (\"age_fare\", Pipeline(steps=[\n",
|
||||||
|
" ('imputer', SimpleImputer(strategy='median')),\n",
|
||||||
|
" ('scaler', StandardScaler())\n",
|
||||||
|
" ]), [\"age\", \"fare\"]),\n",
|
||||||
|
" (\"embarked\", Pipeline(steps=[\n",
|
||||||
|
" (\"imputer\", SimpleImputer(strategy='constant', fill_value='missing')), \n",
|
||||||
|
" (\"encoder\", OneHotEncoder(sparse=False))]), [\"embarked\"]),\n",
|
||||||
|
" (\"sex_pclass\", OneHotEncoder(sparse=False), [\"sex\", \"pclass\"]) \n",
|
||||||
|
"])\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Append classifier to preprocessing pipeline.\n",
|
||||||
|
"# Now we have a full prediction pipeline.\n",
|
||||||
|
"clf = Pipeline(steps=[('preprocessor', transformations),\n",
|
||||||
|
" ('classifier', LogisticRegression(solver='lbfgs'))])\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"'''\n",
|
||||||
|
"# Uncomment below if sklearn-pandas is not installed\n",
|
||||||
|
"#!pip install sklearn-pandas\n",
|
||||||
"from sklearn_pandas import DataFrameMapper\n",
|
"from sklearn_pandas import DataFrameMapper\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
"# Impute, standardize the numeric features and one-hot encode the categorical features. \n",
|
||||||
@@ -141,7 +175,8 @@
|
|||||||
"# Append classifier to preprocessing pipeline.\n",
|
"# Append classifier to preprocessing pipeline.\n",
|
||||||
"# Now we have a full prediction pipeline.\n",
|
"# Now we have a full prediction pipeline.\n",
|
||||||
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
"clf = Pipeline(steps=[('preprocessor', DataFrameMapper(transformations)),\n",
|
||||||
" ('classifier', LogisticRegression(solver='lbfgs'))])"
|
" ('classifier', LogisticRegression(solver='lbfgs'))])\n",
|
||||||
|
"'''"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -0,0 +1,24 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In compare.py")
|
||||||
|
print("As a data scientist, this is where I use my compare code.")
|
||||||
|
parser = argparse.ArgumentParser("compare")
|
||||||
|
parser.add_argument("--compare_data1", type=str, help="compare_data1 data")
|
||||||
|
parser.add_argument("--compare_data2", type=str, help="compare_data2 data")
|
||||||
|
parser.add_argument("--output_compare", type=str, help="output_compare directory")
|
||||||
|
parser.add_argument("--pipeline_param", type=int, help="pipeline parameter")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.compare_data1)
|
||||||
|
print("Argument 2: %s" % args.compare_data2)
|
||||||
|
print("Argument 3: %s" % args.output_compare)
|
||||||
|
print("Argument 4: %s" % args.pipeline_param)
|
||||||
|
|
||||||
|
if not (args.output_compare is None):
|
||||||
|
os.makedirs(args.output_compare, exist_ok=True)
|
||||||
|
print("%s created" % args.output_compare)
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In extract.py")
|
||||||
|
print("As a data scientist, this is where I use my extract code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("extract")
|
||||||
|
parser.add_argument("--input_extract", type=str, help="input_extract data")
|
||||||
|
parser.add_argument("--output_extract", type=str, help="output_extract directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_extract)
|
||||||
|
print("Argument 2: %s" % args.output_extract)
|
||||||
|
|
||||||
|
if not (args.output_extract is None):
|
||||||
|
os.makedirs(args.output_extract, exist_ok=True)
|
||||||
|
print("%s created" % args.output_extract)
|
||||||
@@ -0,0 +1,22 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In train.py")
|
||||||
|
print("As a data scientist, this is where I use my training code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("train")
|
||||||
|
|
||||||
|
parser.add_argument("--input_data", type=str, help="input data")
|
||||||
|
parser.add_argument("--output_train", type=str, help="output_train directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_data)
|
||||||
|
print("Argument 2: %s" % args.output_train)
|
||||||
|
|
||||||
|
if not (args.output_train is None):
|
||||||
|
os.makedirs(args.output_train, exist_ok=True)
|
||||||
|
print("%s created" % args.output_train)
|
||||||
@@ -0,0 +1,58 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import pandas as pd
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
|
||||||
|
|
||||||
|
def get_dict(dict_str):
|
||||||
|
pairs = dict_str.strip("{}").split("\;")
|
||||||
|
new_dict = {}
|
||||||
|
for pair in pairs:
|
||||||
|
key, value = pair.strip('\\').split(":")
|
||||||
|
new_dict[key.strip().strip("'")] = value.strip().strip("'")
|
||||||
|
|
||||||
|
return new_dict
|
||||||
|
|
||||||
|
|
||||||
|
print("Cleans the input data")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("cleanse")
|
||||||
|
parser.add_argument("--input_cleanse", type=str, help="raw taxi data")
|
||||||
|
parser.add_argument("--output_cleanse", type=str, help="cleaned taxi data directory")
|
||||||
|
parser.add_argument("--useful_columns", type=str, help="useful columns to keep")
|
||||||
|
parser.add_argument("--columns", type=str, help="rename column pattern")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input taxi data path): %s" % args.input_cleanse)
|
||||||
|
print("Argument 2(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;")))
|
||||||
|
print("Argument 3(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;")))
|
||||||
|
print("Argument 4(output cleansed taxi data path): %s" % args.output_cleanse)
|
||||||
|
|
||||||
|
raw_df = dprep.read_csv(path=args.input_cleanse, header=dprep.PromoteHeadersMode.GROUPED)
|
||||||
|
|
||||||
|
# These functions ensure that null data is removed from the data set,
|
||||||
|
# which will help increase machine learning model accuracy.
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep
|
||||||
|
# for more details
|
||||||
|
|
||||||
|
useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")]
|
||||||
|
columns = get_dict(args.columns)
|
||||||
|
|
||||||
|
all_columns = dprep.ColumnSelector(term=".*", use_regex=True)
|
||||||
|
drop_if_all_null = [all_columns, dprep.ColumnRelationship(dprep.ColumnRelationship.ALL)]
|
||||||
|
|
||||||
|
new_df = (raw_df
|
||||||
|
.replace_na(columns=all_columns)
|
||||||
|
.drop_nulls(*drop_if_all_null)
|
||||||
|
.rename_columns(column_pairs=columns)
|
||||||
|
.keep_columns(columns=useful_columns))
|
||||||
|
|
||||||
|
if not (args.output_cleanse is None):
|
||||||
|
os.makedirs(args.output_cleanse, exist_ok=True)
|
||||||
|
print("%s created" % args.output_cleanse)
|
||||||
|
write_df = new_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_cleanse))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,55 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
|
||||||
|
print("Filters out coordinates for locations that are outside the city border.",
|
||||||
|
"Chain the column filter commands within the filter() function",
|
||||||
|
"and define the minimum and maximum bounds for each field.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("filter")
|
||||||
|
parser.add_argument("--input_filter", type=str, help="merged taxi data directory")
|
||||||
|
parser.add_argument("--output_filter", type=str, help="filter out out of city locations")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input taxi data path): %s" % args.input_filter)
|
||||||
|
print("Argument 2(output filtered taxi data path): %s" % args.output_filter)
|
||||||
|
|
||||||
|
combined_df = dprep.read_csv(args.input_filter + '/part-*')
|
||||||
|
|
||||||
|
# These functions filter out coordinates for locations that are outside the city border.
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
|
||||||
|
|
||||||
|
# Create a condensed view of the dataflow to just show the lat/long fields,
|
||||||
|
# which makes it easier to evaluate missing or out-of-scope coordinates
|
||||||
|
decimal_type = dprep.TypeConverter(data_type=dprep.FieldType.DECIMAL)
|
||||||
|
combined_df = combined_df.set_column_types(type_conversions={
|
||||||
|
"pickup_longitude": decimal_type,
|
||||||
|
"pickup_latitude": decimal_type,
|
||||||
|
"dropoff_longitude": decimal_type,
|
||||||
|
"dropoff_latitude": decimal_type
|
||||||
|
})
|
||||||
|
|
||||||
|
# Filter out coordinates for locations that are outside the city border.
|
||||||
|
# Chain the column filter commands within the filter() function
|
||||||
|
# and define the minimum and maximum bounds for each field
|
||||||
|
latlong_filtered_df = (combined_df
|
||||||
|
.drop_nulls(columns=["pickup_longitude",
|
||||||
|
"pickup_latitude",
|
||||||
|
"dropoff_longitude",
|
||||||
|
"dropoff_latitude"],
|
||||||
|
column_relationship=dprep.ColumnRelationship(dprep.ColumnRelationship.ANY))
|
||||||
|
.filter(dprep.f_and(dprep.col("pickup_longitude") <= -73.72,
|
||||||
|
dprep.col("pickup_longitude") >= -74.09,
|
||||||
|
dprep.col("pickup_latitude") <= 40.88,
|
||||||
|
dprep.col("pickup_latitude") >= 40.53,
|
||||||
|
dprep.col("dropoff_longitude") <= -73.72,
|
||||||
|
dprep.col("dropoff_longitude") >= -74.09,
|
||||||
|
dprep.col("dropoff_latitude") <= 40.88,
|
||||||
|
dprep.col("dropoff_latitude") >= 40.53)))
|
||||||
|
|
||||||
|
if not (args.output_filter is None):
|
||||||
|
os.makedirs(args.output_filter, exist_ok=True)
|
||||||
|
print("%s created" % args.output_filter)
|
||||||
|
write_df = latlong_filtered_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_filter))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,29 @@
|
|||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
|
||||||
|
print("Merge Green and Yellow taxi data")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("merge")
|
||||||
|
parser.add_argument("--input_green_merge", type=str, help="cleaned green taxi data directory")
|
||||||
|
parser.add_argument("--input_yellow_merge", type=str, help="cleaned yellow taxi data directory")
|
||||||
|
parser.add_argument("--output_merge", type=str, help="green and yellow taxi data merged")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input green taxi data path): %s" % args.input_green_merge)
|
||||||
|
print("Argument 2(input yellow taxi data path): %s" % args.input_yellow_merge)
|
||||||
|
print("Argument 3(output merge taxi data path): %s" % args.output_merge)
|
||||||
|
|
||||||
|
green_df = dprep.read_csv(args.input_green_merge + '/part-*')
|
||||||
|
yellow_df = dprep.read_csv(args.input_yellow_merge + '/part-*')
|
||||||
|
|
||||||
|
# Appending yellow data to green data
|
||||||
|
combined_df = green_df.append_rows([yellow_df])
|
||||||
|
|
||||||
|
if not (args.output_merge is None):
|
||||||
|
os.makedirs(args.output_merge, exist_ok=True)
|
||||||
|
print("%s created" % args.output_merge)
|
||||||
|
write_df = combined_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_merge))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
|
||||||
|
print("Replace undefined values to relavant values and rename columns to meaningful names")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("normalize")
|
||||||
|
parser.add_argument("--input_normalize", type=str, help="combined and converted taxi data")
|
||||||
|
parser.add_argument("--output_normalize", type=str, help="replaced undefined values and renamed columns")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input taxi data path): %s" % args.input_normalize)
|
||||||
|
print("Argument 2(output normalized taxi data path): %s" % args.output_normalize)
|
||||||
|
|
||||||
|
combined_converted_df = dprep.read_csv(args.input_normalize + '/part-*')
|
||||||
|
|
||||||
|
# These functions replace undefined values and rename to use meaningful names.
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
|
||||||
|
|
||||||
|
replaced_stfor_vals_df = combined_converted_df.replace(columns="store_forward",
|
||||||
|
find="0",
|
||||||
|
replace_with="N").fill_nulls("store_forward", "N")
|
||||||
|
|
||||||
|
replaced_distance_vals_df = replaced_stfor_vals_df.replace(columns="distance",
|
||||||
|
find=".00",
|
||||||
|
replace_with=0).fill_nulls("distance", 0)
|
||||||
|
|
||||||
|
replaced_distance_vals_df = replaced_distance_vals_df.to_number(["distance"])
|
||||||
|
|
||||||
|
time_split_df = (replaced_distance_vals_df
|
||||||
|
.split_column_by_example(source_column="pickup_datetime")
|
||||||
|
.split_column_by_example(source_column="dropoff_datetime"))
|
||||||
|
|
||||||
|
# Split the pickup and dropoff datetime values into the respective date and time columns
|
||||||
|
renamed_col_df = (time_split_df
|
||||||
|
.rename_columns(column_pairs={
|
||||||
|
"pickup_datetime_1": "pickup_date",
|
||||||
|
"pickup_datetime_2": "pickup_time",
|
||||||
|
"dropoff_datetime_1": "dropoff_date",
|
||||||
|
"dropoff_datetime_2": "dropoff_time"}))
|
||||||
|
|
||||||
|
if not (args.output_normalize is None):
|
||||||
|
os.makedirs(args.output_normalize, exist_ok=True)
|
||||||
|
print("%s created" % args.output_normalize)
|
||||||
|
write_df = renamed_col_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_normalize))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,88 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
|
||||||
|
print("Transforms the renamed taxi data to the required format")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("transform")
|
||||||
|
parser.add_argument("--input_transform", type=str, help="renamed taxi data")
|
||||||
|
parser.add_argument("--output_transform", type=str, help="transformed taxi data")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input taxi data path): %s" % args.input_transform)
|
||||||
|
print("Argument 2(output final transformed taxi data): %s" % args.output_transform)
|
||||||
|
|
||||||
|
renamed_df = dprep.read_csv(args.input_transform + '/part-*')
|
||||||
|
|
||||||
|
# These functions transform the renamed data to be used finally for training.
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details
|
||||||
|
|
||||||
|
# Split the pickup and dropoff date further into the day of the week, day of the month, and month values.
|
||||||
|
# To get the day of the week value, use the derive_column_by_example() function.
|
||||||
|
# The function takes an array parameter of example objects that define the input data,
|
||||||
|
# and the preferred output. The function automatically determines your preferred transformation.
|
||||||
|
# For the pickup and dropoff time columns, split the time into the hour, minute, and second by using
|
||||||
|
# the split_column_by_example() function with no example parameter. After you generate the new features,
|
||||||
|
# use the drop_columns() function to delete the original fields as the newly generated features are preferred.
|
||||||
|
# Rename the rest of the fields to use meaningful descriptions.
|
||||||
|
|
||||||
|
transformed_features_df = (renamed_df
|
||||||
|
.derive_column_by_example(
|
||||||
|
source_columns="pickup_date",
|
||||||
|
new_column_name="pickup_weekday",
|
||||||
|
example_data=[("2009-01-04", "Sunday"), ("2013-08-22", "Thursday")])
|
||||||
|
.derive_column_by_example(
|
||||||
|
source_columns="dropoff_date",
|
||||||
|
new_column_name="dropoff_weekday",
|
||||||
|
example_data=[("2013-08-22", "Thursday"), ("2013-11-03", "Sunday")])
|
||||||
|
|
||||||
|
.split_column_by_example(source_column="pickup_time")
|
||||||
|
.split_column_by_example(source_column="dropoff_time")
|
||||||
|
|
||||||
|
.split_column_by_example(source_column="pickup_time_1")
|
||||||
|
.split_column_by_example(source_column="dropoff_time_1")
|
||||||
|
.drop_columns(columns=[
|
||||||
|
"pickup_date", "pickup_time", "dropoff_date", "dropoff_time",
|
||||||
|
"pickup_date_1", "dropoff_date_1", "pickup_time_1", "dropoff_time_1"])
|
||||||
|
|
||||||
|
.rename_columns(column_pairs={
|
||||||
|
"pickup_date_2": "pickup_month",
|
||||||
|
"pickup_date_3": "pickup_monthday",
|
||||||
|
"pickup_time_1_1": "pickup_hour",
|
||||||
|
"pickup_time_1_2": "pickup_minute",
|
||||||
|
"pickup_time_2": "pickup_second",
|
||||||
|
"dropoff_date_2": "dropoff_month",
|
||||||
|
"dropoff_date_3": "dropoff_monthday",
|
||||||
|
"dropoff_time_1_1": "dropoff_hour",
|
||||||
|
"dropoff_time_1_2": "dropoff_minute",
|
||||||
|
"dropoff_time_2": "dropoff_second"}))
|
||||||
|
|
||||||
|
# Drop the pickup_datetime and dropoff_datetime columns because they're
|
||||||
|
# no longer needed (granular time features like hour,
|
||||||
|
# minute and second are more useful for model training).
|
||||||
|
processed_df = transformed_features_df.drop_columns(columns=["pickup_datetime", "dropoff_datetime"])
|
||||||
|
|
||||||
|
# Use the type inference functionality to automatically check the data type of each field,
|
||||||
|
# and display the inference results.
|
||||||
|
type_infer = processed_df.builders.set_column_types()
|
||||||
|
type_infer.learn()
|
||||||
|
|
||||||
|
# The inference results look correct based on the data. Now apply the type conversions to the dataflow.
|
||||||
|
type_converted_df = type_infer.to_dataflow()
|
||||||
|
|
||||||
|
# Before you package the dataflow, run two final filters on the data set.
|
||||||
|
# To eliminate incorrectly captured data points,
|
||||||
|
# filter the dataflow on records where both the cost and distance variable values are greater than zero.
|
||||||
|
# This step will significantly improve machine learning model accuracy,
|
||||||
|
# because data points with a zero cost or distance represent major outliers that throw off prediction accuracy.
|
||||||
|
|
||||||
|
final_df = type_converted_df.filter(dprep.col("distance") > 0)
|
||||||
|
final_df = final_df.filter(dprep.col("cost") > 0)
|
||||||
|
|
||||||
|
# Writing the final dataframe to use for training in the following steps
|
||||||
|
if not (args.output_transform is None):
|
||||||
|
os.makedirs(args.output_transform, exist_ok=True)
|
||||||
|
print("%s created" % args.output_transform)
|
||||||
|
write_df = final_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_transform))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,31 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
import azureml.core
|
||||||
|
|
||||||
|
print("Extracts important features from prepared data")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("featurization")
|
||||||
|
parser.add_argument("--input_featurization", type=str, help="input featurization")
|
||||||
|
parser.add_argument("--useful_columns", type=str, help="columns to use")
|
||||||
|
parser.add_argument("--output_featurization", type=str, help="output featurization")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input training data path): %s" % args.input_featurization)
|
||||||
|
print("Argument 2(column features to use): %s" % str(args.useful_columns.strip("[]").split("\;")))
|
||||||
|
print("Argument 3:(output featurized training data path) %s" % args.output_featurization)
|
||||||
|
|
||||||
|
dflow_prepared = dprep.read_csv(args.input_featurization + '/part-*')
|
||||||
|
|
||||||
|
# These functions extracts useful features for training
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail
|
||||||
|
|
||||||
|
useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")]
|
||||||
|
dflow = dflow_prepared.keep_columns(useful_columns)
|
||||||
|
|
||||||
|
if not (args.output_featurization is None):
|
||||||
|
os.makedirs(args.output_featurization, exist_ok=True)
|
||||||
|
print("%s created" % args.output_featurization)
|
||||||
|
write_df = dflow.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_featurization))
|
||||||
|
write_df.run_local()
|
||||||
@@ -0,0 +1,12 @@
|
|||||||
|
|
||||||
|
import os
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
def get_data():
|
||||||
|
print("In get_data")
|
||||||
|
print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])
|
||||||
|
X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + "/part-00000", header=0)
|
||||||
|
y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + "/part-00000", header=0)
|
||||||
|
|
||||||
|
return {"X": X_train.values, "y": y_train.values.flatten()}
|
||||||
@@ -0,0 +1,48 @@
|
|||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import azureml.dataprep as dprep
|
||||||
|
import azureml.core
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
|
||||||
|
|
||||||
|
def write_output(df, path):
|
||||||
|
os.makedirs(path, exist_ok=True)
|
||||||
|
print("%s created" % path)
|
||||||
|
df.to_csv(path + "/part-00000", index=False)
|
||||||
|
|
||||||
|
|
||||||
|
print("Split the data into train and test")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("split")
|
||||||
|
parser.add_argument("--input_split_features", type=str, help="input split features")
|
||||||
|
parser.add_argument("--input_split_labels", type=str, help="input split labels")
|
||||||
|
parser.add_argument("--output_split_train_x", type=str, help="output split train features")
|
||||||
|
parser.add_argument("--output_split_train_y", type=str, help="output split train labels")
|
||||||
|
parser.add_argument("--output_split_test_x", type=str, help="output split test features")
|
||||||
|
parser.add_argument("--output_split_test_y", type=str, help="output split test labels")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1(input taxi data features path): %s" % args.input_split_features)
|
||||||
|
print("Argument 2(input taxi data labels path): %s" % args.input_split_labels)
|
||||||
|
print("Argument 3(output training features split path): %s" % args.output_split_train_x)
|
||||||
|
print("Argument 4(output training labels split path): %s" % args.output_split_train_y)
|
||||||
|
print("Argument 5(output test features split path): %s" % args.output_split_test_x)
|
||||||
|
print("Argument 6(output test labels split path): %s" % args.output_split_test_y)
|
||||||
|
|
||||||
|
x_df = dprep.read_csv(path=args.input_split_features, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe()
|
||||||
|
y_df = dprep.read_csv(path=args.input_split_labels, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe()
|
||||||
|
|
||||||
|
# These functions splits the input features and labels into test and train data
|
||||||
|
# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail
|
||||||
|
|
||||||
|
x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)
|
||||||
|
|
||||||
|
if not (args.output_split_train_x is None and
|
||||||
|
args.output_split_test_x is None and
|
||||||
|
args.output_split_train_y is None and
|
||||||
|
args.output_split_test_y is None):
|
||||||
|
write_output(x_train, args.output_split_train_x)
|
||||||
|
write_output(y_train, args.output_split_train_y)
|
||||||
|
write_output(x_test, args.output_split_test_x)
|
||||||
|
write_output(y_test, args.output_split_test_y)
|
||||||
@@ -299,7 +299,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You can also specify a custom Docker image for exeution. In this case, you probably want to tell the system not to build a new conda environment for you. Instead, you can specify the path to an existing Python environment in the custom Docker image.\n",
|
"You can also specify a custom Docker image for execution. In this case, you probably want to tell the system not to build a new conda environment for you. Instead, you can specify the path to an existing Python environment in the custom Docker image. If custom Docker image information is not specified, Azure ML uses the default Docker image to run your training. For more information about Docker containers used in Azure ML training, please see [Azure ML Containers repository](https://github.com/Azure/AzureML-Containers).\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Note**: since the below example points to the preinstalled Python environment in the miniconda3 image maintained by continuum.io on Docker Hub where Azure ML SDK is not present, the logging metric code is not triggered. But a run history record is still recorded. "
|
"**Note**: since the below example points to the preinstalled Python environment in the miniconda3 image maintained by continuum.io on Docker Hub where Azure ML SDK is not present, the logging metric code is not triggered. But a run history record is still recorded. "
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -115,7 +115,7 @@
|
|||||||
"from azureml.core.compute_target import ComputeTargetException\n",
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"gpucluster\"\n",
|
"cluster_name = \"gpu-cluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"try:\n",
|
"try:\n",
|
||||||
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
@@ -264,7 +264,8 @@
|
|||||||
" script_params=script_params,\n",
|
" script_params=script_params,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" entry_script='pytorch_train.py',\n",
|
" entry_script='pytorch_train.py',\n",
|
||||||
" use_gpu=True)"
|
" use_gpu=True,\n",
|
||||||
|
" pip_packages=['pillow==5.4.1'])"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -366,7 +367,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveRunConfig, uniform, PrimaryMetricGoal\n",
|
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, uniform, PrimaryMetricGoal\n",
|
||||||
"\n",
|
"\n",
|
||||||
"param_sampling = RandomParameterSampling( {\n",
|
"param_sampling = RandomParameterSampling( {\n",
|
||||||
" 'learning_rate': uniform(0.0005, 0.005),\n",
|
" 'learning_rate': uniform(0.0005, 0.005),\n",
|
||||||
@@ -376,7 +377,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n",
|
"early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
|
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
|
||||||
" hyperparameter_sampling=param_sampling, \n",
|
" hyperparameter_sampling=param_sampling, \n",
|
||||||
" policy=early_termination_policy,\n",
|
" policy=early_termination_policy,\n",
|
||||||
" primary_metric_name='best_val_acc',\n",
|
" primary_metric_name='best_val_acc',\n",
|
||||||
@@ -399,7 +400,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# start the HyperDrive run\n",
|
"# start the HyperDrive run\n",
|
||||||
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
|
"hyperdrive_run = experiment.submit(hyperdrive_config)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -100,7 +100,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using SDK version 1.0.43, you are currently running version\", azureml.core.VERSION)"
|
"print(\"This notebook was created using SDK version 1.0.45, you are currently running version\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -4,8 +4,8 @@
|
|||||||
|
|
||||||
Try out the sample notebooks:
|
Try out the sample notebooks:
|
||||||
|
|
||||||
* [Use MLflow with Azure Machine Learning for local training run](./train-local/train-local.ipynb)
|
* [Use MLflow with Azure Machine Learning for Local Training Run](./train-local/train-local.ipynb)
|
||||||
* [Use MLflow with Azure Machine Learning for remote training run](./train-remote/train-remote.ipynb)
|
* [Use MLflow with Azure Machine Learning for Remote Training Run](./train-remote/train-remote.ipynb)
|
||||||
* [Deploy Model as Azure Machine Learning web service using MLflow](./deploy-model/deploy-model.ipynb)
|
* [Deploy Model as Azure Machine Learning Web Service using MLflow](./deploy-model/deploy-model.ipynb)
|
||||||
|
|
||||||

|

|
||||||
@@ -0,0 +1,150 @@
|
|||||||
|
# Copyright (c) 2017, PyTorch Team
|
||||||
|
# All rights reserved
|
||||||
|
# Licensed under BSD 3-Clause License.
|
||||||
|
|
||||||
|
# This example is based on PyTorch MNIST example:
|
||||||
|
# https://github.com/pytorch/examples/blob/master/mnist/main.py
|
||||||
|
|
||||||
|
import mlflow
|
||||||
|
import mlflow.pytorch
|
||||||
|
from mlflow.utils.environment import _mlflow_conda_env
|
||||||
|
import warnings
|
||||||
|
import cloudpickle
|
||||||
|
import torch
|
||||||
|
import torch.nn as nn
|
||||||
|
import torch.nn.functional as F
|
||||||
|
import torch.optim as optim
|
||||||
|
import torchvision
|
||||||
|
from torchvision import datasets, transforms
|
||||||
|
|
||||||
|
|
||||||
|
class Net(nn.Module):
|
||||||
|
def __init__(self):
|
||||||
|
super(Net, self).__init__()
|
||||||
|
self.conv1 = nn.Conv2d(1, 20, 5, 1)
|
||||||
|
self.conv2 = nn.Conv2d(20, 50, 5, 1)
|
||||||
|
self.fc1 = nn.Linear(4 * 4 * 50, 500)
|
||||||
|
self.fc2 = nn.Linear(500, 10)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
# Added the view for reshaping score requests
|
||||||
|
x = x.view(-1, 1, 28, 28)
|
||||||
|
x = F.relu(self.conv1(x))
|
||||||
|
x = F.max_pool2d(x, 2, 2)
|
||||||
|
x = F.relu(self.conv2(x))
|
||||||
|
x = F.max_pool2d(x, 2, 2)
|
||||||
|
x = x.view(-1, 4 * 4 * 50)
|
||||||
|
x = F.relu(self.fc1(x))
|
||||||
|
x = self.fc2(x)
|
||||||
|
return F.log_softmax(x, dim=1)
|
||||||
|
|
||||||
|
|
||||||
|
def train(args, model, device, train_loader, optimizer, epoch):
|
||||||
|
model.train()
|
||||||
|
for batch_idx, (data, target) in enumerate(train_loader):
|
||||||
|
data, target = data.to(device), target.to(device)
|
||||||
|
optimizer.zero_grad()
|
||||||
|
output = model(data)
|
||||||
|
loss = F.nll_loss(output, target)
|
||||||
|
loss.backward()
|
||||||
|
optimizer.step()
|
||||||
|
if batch_idx % args.log_interval == 0:
|
||||||
|
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
|
||||||
|
epoch, batch_idx * len(data), len(train_loader.dataset),
|
||||||
|
100. * batch_idx / len(train_loader), loss.item()))
|
||||||
|
# Use MLflow logging
|
||||||
|
mlflow.log_metric("epoch_loss", loss.item())
|
||||||
|
|
||||||
|
|
||||||
|
def test(args, model, device, test_loader):
|
||||||
|
model.eval()
|
||||||
|
test_loss = 0
|
||||||
|
correct = 0
|
||||||
|
with torch.no_grad():
|
||||||
|
for data, target in test_loader:
|
||||||
|
data, target = data.to(device), target.to(device)
|
||||||
|
output = model(data)
|
||||||
|
# sum up batch loss
|
||||||
|
test_loss += F.nll_loss(output, target, reduction="sum").item()
|
||||||
|
# get the index of the max log-probability
|
||||||
|
pred = output.argmax(dim=1, keepdim=True)
|
||||||
|
correct += pred.eq(target.view_as(pred)).sum().item()
|
||||||
|
|
||||||
|
test_loss /= len(test_loader.dataset)
|
||||||
|
print("\n")
|
||||||
|
print("Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format(
|
||||||
|
test_loss, correct, len(test_loader.dataset),
|
||||||
|
100. * correct / len(test_loader.dataset)))
|
||||||
|
# Use MLflow logging
|
||||||
|
mlflow.log_metric("average_loss", test_loss)
|
||||||
|
|
||||||
|
|
||||||
|
class Args(object):
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
# Training settings
|
||||||
|
args = Args()
|
||||||
|
setattr(args, 'batch_size', 64)
|
||||||
|
setattr(args, 'test_batch_size', 1000)
|
||||||
|
setattr(args, 'epochs', 3) # Higher number for better convergence
|
||||||
|
setattr(args, 'lr', 0.01)
|
||||||
|
setattr(args, 'momentum', 0.5)
|
||||||
|
setattr(args, 'no_cuda', True)
|
||||||
|
setattr(args, 'seed', 1)
|
||||||
|
setattr(args, 'log_interval', 10)
|
||||||
|
setattr(args, 'save_model', True)
|
||||||
|
|
||||||
|
use_cuda = not args.no_cuda and torch.cuda.is_available()
|
||||||
|
|
||||||
|
torch.manual_seed(args.seed)
|
||||||
|
|
||||||
|
device = torch.device("cuda" if use_cuda else "cpu")
|
||||||
|
|
||||||
|
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
|
||||||
|
train_loader = torch.utils.data.DataLoader(
|
||||||
|
datasets.MNIST('../data', train=True, download=True,
|
||||||
|
transform=transforms.Compose([
|
||||||
|
transforms.ToTensor(),
|
||||||
|
transforms.Normalize((0.1307,), (0.3081,))
|
||||||
|
])),
|
||||||
|
batch_size=args.batch_size, shuffle=True, **kwargs)
|
||||||
|
test_loader = torch.utils.data.DataLoader(
|
||||||
|
datasets.MNIST(
|
||||||
|
'../data',
|
||||||
|
train=False,
|
||||||
|
transform=transforms.Compose([
|
||||||
|
transforms.ToTensor(),
|
||||||
|
transforms.Normalize((0.1307,), (0.3081,))])),
|
||||||
|
batch_size=args.test_batch_size, shuffle=True, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
def driver():
|
||||||
|
warnings.filterwarnings("ignore")
|
||||||
|
# Dependencies for deploying the model
|
||||||
|
pytorch_index = "https://download.pytorch.org/whl/"
|
||||||
|
pytorch_version = "cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl"
|
||||||
|
deps = [
|
||||||
|
"cloudpickle=={}".format(cloudpickle.__version__),
|
||||||
|
pytorch_index + pytorch_version,
|
||||||
|
"torchvision=={}".format(torchvision.__version__),
|
||||||
|
"Pillow=={}".format("6.0.0")
|
||||||
|
]
|
||||||
|
with mlflow.start_run() as run:
|
||||||
|
model = Net().to(device)
|
||||||
|
optimizer = optim.SGD(
|
||||||
|
model.parameters(),
|
||||||
|
lr=args.lr,
|
||||||
|
momentum=args.momentum)
|
||||||
|
for epoch in range(1, args.epochs + 1):
|
||||||
|
train(args, model, device, train_loader, optimizer, epoch)
|
||||||
|
test(args, model, device, test_loader)
|
||||||
|
# Log model to run history using MLflow
|
||||||
|
if args.save_model:
|
||||||
|
model_env = _mlflow_conda_env(additional_pip_deps=deps)
|
||||||
|
mlflow.pytorch.log_model(model, "model", conda_env=model_env)
|
||||||
|
return run
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
driver()
|
||||||
@@ -0,0 +1,481 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n",
|
||||||
|
"\n",
|
||||||
|
"This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images, and then deploy the model as a web service. You'll learn how to:\n",
|
||||||
|
"\n",
|
||||||
|
" 1. Set up MLflow tracking URI so as to use Azure ML\n",
|
||||||
|
" 2. Create experiment\n",
|
||||||
|
" 3. Instrument your model with MLflow tracking\n",
|
||||||
|
" 4. Train a PyTorch model locally\n",
|
||||||
|
" 5. Train a model on GPU compute on Azure\n",
|
||||||
|
" 6. View your experiment within your Azure ML Workspace in Azure Portal\n",
|
||||||
|
" 7. Create a Docker image from the trained model\n",
|
||||||
|
" 8. Deploy the model as a web service on Azure Container Instance\n",
|
||||||
|
" 9. Call the model to make predictions\n",
|
||||||
|
" \n",
|
||||||
|
"### Pre-requisites\n",
|
||||||
|
" \n",
|
||||||
|
"Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n",
|
||||||
|
"\n",
|
||||||
|
"Also, install mlflow-azureml package using ```pip install mlflow-azureml```. Note that mlflow-azureml installs mlflow package itself as a dependency, if you haven't done so previously.\n",
|
||||||
|
"\n",
|
||||||
|
"### Set-up\n",
|
||||||
|
"\n",
|
||||||
|
"Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import sys, os\n",
|
||||||
|
"import mlflow\n",
|
||||||
|
"import mlflow.azureml\n",
|
||||||
|
"import mlflow.sklearn\n",
|
||||||
|
"\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"SDK version:\", azureml.core.VERSION)\n",
|
||||||
|
"print(\"MLflow version:\", mlflow.version.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"ws.get_details()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Set tracking URI\n",
|
||||||
|
"\n",
|
||||||
|
"Set the MLFlow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLFlow APIs will go to Azure ML services and will be tracked under your Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create Experiment\n",
|
||||||
|
"\n",
|
||||||
|
"In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"experiment_name = \"pytorch-with-mlflow\"\n",
|
||||||
|
"mlflow.set_experiment(experiment_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Train model locally while logging metrics and artifacts\n",
|
||||||
|
"\n",
|
||||||
|
"The ```scripts/train.py``` program contains the code to load the image dataset, and train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n",
|
||||||
|
"\n",
|
||||||
|
"Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n",
|
||||||
|
"\n",
|
||||||
|
"Let's add the program to search path, import it as a module, and then invoke the driver function. Note that the training can take few minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"lib_path = os.path.abspath(\"scripts\")\n",
|
||||||
|
"sys.path.append(lib_path)\n",
|
||||||
|
"\n",
|
||||||
|
"import train"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run = train.driver()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can view the metrics of the run at Azure Portal"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(azureml.mlflow.get_portal_url(run))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Train model on GPU compute on Azure\n",
|
||||||
|
"\n",
|
||||||
|
"Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n",
|
||||||
|
"\n",
|
||||||
|
"Create a PyTorch estimator to specify the training configuration: script, compute as well as additional packages needed. To enable MLflow tracking, include ```azureml-mlflow``` as pip package. The low-level specifications for the training run are encapsulated in the estimator instance."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.train.dnn import PyTorch\n",
|
||||||
|
"\n",
|
||||||
|
"pt = PyTorch(source_directory=\"./scripts\", \n",
|
||||||
|
" entry_script = \"train.py\", \n",
|
||||||
|
" compute_target = \"gpu-cluster\", \n",
|
||||||
|
" node_count = 1, \n",
|
||||||
|
" process_count_per_node = 1, \n",
|
||||||
|
" use_gpu=True,\n",
|
||||||
|
" pip_packages = [\"azureml-mlflow\", \"Pillow==6.0.0\"])\n",
|
||||||
|
"\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Get a reference to the experiment you created previously, but this time, as Azure Machine Learning experiment object.\n",
|
||||||
|
"\n",
|
||||||
|
"Then, use ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as cached image is used."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Experiment\n",
|
||||||
|
"\n",
|
||||||
|
"exp = Experiment(ws, experiment_name)\n",
|
||||||
|
"run = exp.submit(pt)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can monitor the run and its metrics on Azure Portal."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Also, you can wait for run to complete."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy model as web service\n",
|
||||||
|
"\n",
|
||||||
|
"To deploy a web service, first create a Docker image, and then deploy that Docker image on inferencing compute.\n",
|
||||||
|
"\n",
|
||||||
|
"The ```mlflow.azureml.build_image``` function builds a Docker image from saved PyTorch model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specififies package dependencies for you."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.get_file_names()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then build a docker image using *runs:/<run.id>/model* as the model_uri path.\n",
|
||||||
|
"\n",
|
||||||
|
"Note that the image building can take several minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_path = \"model\"\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"azure_image, azure_model = mlflow.azureml.build_image(model_uri='runs:/{}/{}'.format(run.id, model_path),\n",
|
||||||
|
" workspace=ws,\n",
|
||||||
|
" model_name='pytorch_mnist',\n",
|
||||||
|
" image_name='pytorch-mnist-img',\n",
|
||||||
|
" synchronous=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then, deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n",
|
||||||
|
"\n",
|
||||||
|
"[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n",
|
||||||
|
"\n",
|
||||||
|
"Note that the service deployment can take several minutes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice, Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n",
|
||||||
|
" memory_gb=5, \n",
|
||||||
|
" tags={\"data\": \"MNIST\", \"method\" : \"pytorch\"}, \n",
|
||||||
|
" description=\"Predict using webservice\")\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# Deploy the image to Azure Container Instances (ACI) for real-time serving\n",
|
||||||
|
"webservice = Webservice.deploy_from_image(\n",
|
||||||
|
" image=azure_image, workspace=ws, name=\"pytorch-mnist-1\", deployment_config=aci_config)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"webservice.wait_for_deployment()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Once the deployment has completed you can check the scoring URI of the web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(\"Scoring URI is: {}\".format(webservice.scoring_uri))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Make predictions using web service\n",
|
||||||
|
"\n",
|
||||||
|
"To make the web service, create a test data set as normalized PyTorch tensors. \n",
|
||||||
|
"\n",
|
||||||
|
"Then, let's define a utility function that takes a random image and converts it into format and shape suitable for as input to PyTorch inferencing end-point. The conversion is done by: \n",
|
||||||
|
"\n",
|
||||||
|
" 1. Select a random (image, label) tuple\n",
|
||||||
|
" 2. Take the image and converting the tensor to NumPy array \n",
|
||||||
|
" 3. Reshape array into 1 x 1 x N array\n",
|
||||||
|
" * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n",
|
||||||
|
" * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n",
|
||||||
|
" 4. Convert the NumPy array to list to make it into a built-in type.\n",
|
||||||
|
" 5. Create a dictionary {\"data\", <list>} that can be converted to JSON string for web service requests."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from torchvision import datasets, transforms\n",
|
||||||
|
"import random\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"\n",
|
||||||
|
"test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([\n",
|
||||||
|
" transforms.ToTensor(),\n",
|
||||||
|
" transforms.Normalize((0.1307,), (0.3081,))]))\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def get_random_image():\n",
|
||||||
|
" image_idx = random.randint(0,len(test_data))\n",
|
||||||
|
" image_as_tensor = test_data[image_idx][0]\n",
|
||||||
|
" return {\"data\": elem for elem in image_as_tensor.numpy().reshape(1,1,-1).tolist()}"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Then, invoke the web service using a random test image. Convert the dictionary containing the image to JSON string before passing it to web service.\n",
|
||||||
|
"\n",
|
||||||
|
"The response contains the raw scores for each label, with greater value indicating higher probability. Sort the labels and select the one with greatest score to get the prediction. Let's also plot the image sent to web service for comparison purposes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"\n",
|
||||||
|
"import json\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"\n",
|
||||||
|
"test_image = get_random_image()\n",
|
||||||
|
"\n",
|
||||||
|
"response = webservice.run(json.dumps(test_image))\n",
|
||||||
|
"\n",
|
||||||
|
"response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"Predicted label:\", response[0][0])\n",
|
||||||
|
"plt.imshow(np.array(test_image[\"data\"]).reshape(28,28), cmap = \"gray\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can also call the web service using a raw POST method against the web service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import requests\n",
|
||||||
|
"\n",
|
||||||
|
"response = requests.post(url=webservice.scoring_uri, data=json.dumps(test_image),headers={\"Content-type\": \"application/json\"})\n",
|
||||||
|
"print(response.text)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"celltoolbar": "Edit Metadata",
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.7.3"
|
||||||
|
},
|
||||||
|
"name": "mlflow-sparksummit-pytorch",
|
||||||
|
"notebookId": 2495374963457641
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 1
|
||||||
|
}
|
||||||
@@ -243,7 +243,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.id"
|
"run"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
|
|||||||
@@ -61,7 +61,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"To create and register Datasets you need:\n",
|
"To create and register Datasets you need:\n",
|
||||||
"\n",
|
"\n",
|
||||||
" * An Azure subscription. If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.\n",
|
" * An Azure subscription. If you don\u00e2\u20ac\u2122t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.\n",
|
||||||
"\n",
|
"\n",
|
||||||
" * An Azure Machine Learning service workspace. See the [Create an Azure Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace?branch=release-build-amls).\n",
|
" * An Azure Machine Learning service workspace. See the [Create an Azure Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace?branch=release-build-amls).\n",
|
||||||
"\n",
|
"\n",
|
||||||
@@ -399,6 +399,13 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"You have now finished using a dataset from start to finish of your experiment!"
|
"You have now finished using a dataset from start to finish of your experiment!"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
@@ -408,9 +415,9 @@
|
|||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
"display_name": "Python 3",
|
"display_name": "Python 3.6",
|
||||||
"language": "python",
|
"language": "python",
|
||||||
"name": "python3"
|
"name": "python36"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"language_info": {
|
||||||
"codemirror_mode": {
|
"codemirror_mode": {
|
||||||
@@ -422,7 +429,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.7.3"
|
"version": "3.6.4"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
0
model-deployment/README.md
Normal file
0
model-deployment/README.md
Normal file
95
setup-environment/NBSETUP.md
Normal file
95
setup-environment/NBSETUP.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
# Set up your notebook environment for Azure Machine Learning
|
||||||
|
|
||||||
|
To run the notebooks in this repository use one of following options.
|
||||||
|
|
||||||
|
## **Option 1: Use Azure Notebooks**
|
||||||
|
Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. Azure Machine Learning Python SDK is already pre-installed in the Azure Notebooks `Python 3.6` kernel.
|
||||||
|
|
||||||
|
1. [](https://aka.ms/aml-clone-azure-notebooks)
|
||||||
|
[Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks
|
||||||
|
1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
|
||||||
|
1. Open one of the sample notebooks
|
||||||
|
|
||||||
|
**Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
|
||||||
|
|
||||||
|
## **Option 2: Use your own notebook server**
|
||||||
|
|
||||||
|
### Quick installation
|
||||||
|
We recommend you create a Python virtual environment ([Miniconda](https://conda.io/miniconda.html) preferred but [virtualenv](https://virtualenv.pypa.io/en/latest/) works too) and install the SDK in it.
|
||||||
|
```sh
|
||||||
|
# install just the base SDK
|
||||||
|
pip install azureml-sdk
|
||||||
|
|
||||||
|
# clone the sample repoistory
|
||||||
|
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
||||||
|
|
||||||
|
# below steps are optional
|
||||||
|
# install the base SDK, Jupyter notebook server and tensorboard
|
||||||
|
pip install azureml-sdk[notebooks,tensorboard]
|
||||||
|
|
||||||
|
# install model explainability component
|
||||||
|
pip install azureml-sdk[explain]
|
||||||
|
|
||||||
|
# install automated ml components
|
||||||
|
pip install azureml-sdk[automl]
|
||||||
|
|
||||||
|
# install experimental features (not ready for production use)
|
||||||
|
pip install azureml-sdk[contrib]
|
||||||
|
```
|
||||||
|
|
||||||
|
Note the _extras_ (the keywords inside the square brackets) can be combined. For example:
|
||||||
|
```sh
|
||||||
|
# install base SDK, Jupyter notebook and automated ml components
|
||||||
|
pip install azureml-sdk[notebooks,automl]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Full instructions
|
||||||
|
[Install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
|
||||||
|
|
||||||
|
Please make sure you start with the [Configuration](configuration.ipynb) notebook to create and connect to a workspace.
|
||||||
|
|
||||||
|
|
||||||
|
### Video walkthrough:
|
||||||
|
|
||||||
|
[!VIDEO https://youtu.be/VIsXeTuW3FU]
|
||||||
|
|
||||||
|
## **Option 3: Use Docker**
|
||||||
|
|
||||||
|
You need to have Docker engine installed locally and running. Open a command line window and type the following command.
|
||||||
|
|
||||||
|
__Note:__ We use version `1.0.10` below as an exmaple, but you can replace that with any available version number you like.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# clone the sample repoistory
|
||||||
|
git clone https://github.com/Azure/MachineLearningNotebooks.git
|
||||||
|
|
||||||
|
# change current directory to the folder
|
||||||
|
# where Dockerfile of the specific SDK version is located.
|
||||||
|
cd MachineLearningNotebooks/Dockerfiles/1.0.10
|
||||||
|
|
||||||
|
# build a Docker image with the a name (azuremlsdk for example)
|
||||||
|
# and a version number tag (1.0.10 for example).
|
||||||
|
# this can take several minutes depending on your computer speed and network bandwidth.
|
||||||
|
docker build . -t azuremlsdk:1.0.10
|
||||||
|
|
||||||
|
# launch the built Docker container which also automatically starts
|
||||||
|
# a Jupyter server instance listening on port 8887 of the host machine
|
||||||
|
docker run -it -p 8887:8887 azuremlsdk:1.0.10
|
||||||
|
```
|
||||||
|
|
||||||
|
Now you can point your browser to http://localhost:8887. We recommend that you start from the `configuration.ipynb` notebook at the root directory.
|
||||||
|
|
||||||
|
If you need additional Azure ML SDK components, you can either modify the Docker files before you build the Docker images to add additional steps, or install them through command line in the live container after you build the Docker image. For example:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# install the core SDK and automated ml components
|
||||||
|
pip install azureml-sdk[automl]
|
||||||
|
|
||||||
|
# install the core SDK and model explainability component
|
||||||
|
pip install azureml-sdk[explain]
|
||||||
|
|
||||||
|
# install the core SDK and experimental components
|
||||||
|
pip install azureml-sdk[contrib]
|
||||||
|
```
|
||||||
|
Drag and Drop
|
||||||
|
The image will be downloaded by Fatkun
|
||||||
291
setup-environment/configuration.ipynb
Normal file
291
setup-environment/configuration.ipynb
Normal file
@@ -0,0 +1,291 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Configuration\n",
|
||||||
|
"\n",
|
||||||
|
"_**Setting up your Azure Machine Learning services workspace and configuring your notebook library**_\n",
|
||||||
|
"\n",
|
||||||
|
"---\n",
|
||||||
|
"---\n",
|
||||||
|
"\n",
|
||||||
|
"## Table of Contents\n",
|
||||||
|
"\n",
|
||||||
|
"1. [Introduction](#Introduction)\n",
|
||||||
|
" 1. What is an Azure Machine Learning workspace\n",
|
||||||
|
"1. [Setup](#Setup)\n",
|
||||||
|
" 1. Azure subscription\n",
|
||||||
|
" 1. Azure ML SDK and other library installation\n",
|
||||||
|
" 1. Azure Container Instance registration\n",
|
||||||
|
"1. [Configure your Azure ML Workspace](#Configure%20your%20Azure%20ML%20workspace)\n",
|
||||||
|
" 1. Workspace parameters\n",
|
||||||
|
" 1. Access your workspace\n",
|
||||||
|
" 1. Create a new workspace\n",
|
||||||
|
"1. [Next steps](#Next%20steps)\n",
|
||||||
|
"\n",
|
||||||
|
"---\n",
|
||||||
|
"\n",
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook configures your library of notebooks to connect to an Azure Machine Learning (ML) workspace. In this case, a library contains all of the notebooks in the current folder and any nested folders. You can configure this notebook library to use an existing workspace or create a new workspace.\n",
|
||||||
|
"\n",
|
||||||
|
"Typically you will need to run this notebook only once per notebook library as all other notebooks will use connection information that is written here. If you want to redirect your notebook library to work with a different workspace, then you should re-run this notebook.\n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you will\n",
|
||||||
|
"* Learn about getting an Azure subscription\n",
|
||||||
|
"* Specify your workspace parameters\n",
|
||||||
|
"* Access or create your workspace\n",
|
||||||
|
"* Add a default compute cluster for your workspace\n",
|
||||||
|
"\n",
|
||||||
|
"### What is an Azure Machine Learning workspace\n",
|
||||||
|
"\n",
|
||||||
|
"An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inferencing, and the monitoring of deployed models."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"This section describes activities required before you can access any Azure ML services functionality."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### 1. Azure Subscription\n",
|
||||||
|
"\n",
|
||||||
|
"In order to create an Azure ML Workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com). Later in this notebook you will need information such as your subscription ID in order to create and access AML workspaces.\n",
|
||||||
|
"\n",
|
||||||
|
"### 2. Azure ML SDK and other library installation\n",
|
||||||
|
"\n",
|
||||||
|
"If you are running in your own environment, follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment). If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed.\n",
|
||||||
|
"\n",
|
||||||
|
"Also install following libraries to your environment. Many of the example notebooks depend on them\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
|
"(myenv) $ conda install -y matplotlib tqdm scikit-learn\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Once installation is complete, the following cell checks the Azure ML SDK version:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"install"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import azureml.core\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"This notebook was created using version 1.0.45 of the Azure ML SDK\")\n",
|
||||||
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"If you are using an older version of the SDK then this notebook was created using, you should upgrade your SDK.\n",
|
||||||
|
"\n",
|
||||||
|
"### 3. Azure Container Instance registration\n",
|
||||||
|
"Azure Machine Learning uses of [Azure Container Instance (ACI)](https://azure.microsoft.com/services/container-instances) to deploy dev/test web services. An Azure subscription needs to be registered to use ACI. If you or the subscription owner have not yet registered ACI on your subscription, you will need to use the [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and execute the following commands. Note that if you ran through the AML [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) you have already registered ACI. \n",
|
||||||
|
"\n",
|
||||||
|
"```shell\n",
|
||||||
|
"# check to see if ACI is already registered\n",
|
||||||
|
"(myenv) $ az provider show -n Microsoft.ContainerInstance -o table\n",
|
||||||
|
"\n",
|
||||||
|
"# if ACI is not registered, run this command.\n",
|
||||||
|
"# note you need to be the subscription owner in order to execute this command successfully.\n",
|
||||||
|
"(myenv) $ az provider register -n Microsoft.ContainerInstance\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"---"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure your Azure ML workspace\n",
|
||||||
|
"\n",
|
||||||
|
"### Workspace parameters\n",
|
||||||
|
"\n",
|
||||||
|
"To use an AML Workspace, you will need to import the Azure ML SDK and supply the following information:\n",
|
||||||
|
"* Your subscription id\n",
|
||||||
|
"* A resource group name\n",
|
||||||
|
"* (optional) The region that will host your workspace\n",
|
||||||
|
"* A name for your workspace\n",
|
||||||
|
"\n",
|
||||||
|
"You can get your subscription ID from the [Azure portal](https://portal.azure.com).\n",
|
||||||
|
"\n",
|
||||||
|
"You will also need access to a [_resource group_](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
||||||
|
"\n",
|
||||||
|
"The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data.\n",
|
||||||
|
"\n",
|
||||||
|
"The name for your workspace is unique within the subscription and should be descriptive enough to discern among other AML Workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation.\n",
|
||||||
|
"\n",
|
||||||
|
"The following cell allows you to specify your workspace parameters. This cell uses the python method `os.getenv` to read values from environment variables which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. \n",
|
||||||
|
"\n",
|
||||||
|
"If you ran the Azure Machine Learning [quickstart](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) in Azure Notebooks, you already have a configured workspace! You can go to your Azure Machine Learning Getting Started library, view *config.json* file, and copy-paste the values for subscription ID, resource group and workspace name below.\n",
|
||||||
|
"\n",
|
||||||
|
"Replace the default values in the cell below with your workspace parameters"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"\n",
|
||||||
|
"subscription_id = os.getenv(\"SUBSCRIPTION_ID\", default=\"<my-subscription-id>\")\n",
|
||||||
|
"resource_group = os.getenv(\"RESOURCE_GROUP\", default=\"<my-resource-group>\")\n",
|
||||||
|
"workspace_name = os.getenv(\"WORKSPACE_NAME\", default=\"<my-workspace-name>\")\n",
|
||||||
|
"workspace_region = os.getenv(\"WORKSPACE_REGION\", default=\"eastus2\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Access your workspace\n",
|
||||||
|
"\n",
|
||||||
|
"The following cell uses the Azure ML SDK to attempt to load the workspace specified by your parameters. If this cell succeeds, your notebook library will be configured to access the workspace from all notebooks using the `Workspace.from_config()` method. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"try:\n",
|
||||||
|
" ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)\n",
|
||||||
|
" # write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
|
" ws.write_config()\n",
|
||||||
|
" print(\"Workspace configuration succeeded. Skip the workspace creation steps below\")\n",
|
||||||
|
"except:\n",
|
||||||
|
" print(\"Workspace not accessible. Change your parameters or create a new workspace below\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a new workspace\n",
|
||||||
|
"\n",
|
||||||
|
"If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, the create workspace command will create one for you using the name you provide.\n",
|
||||||
|
"\n",
|
||||||
|
"**Note**: As with other Azure services, there are limits on certain resources (for example AmlCompute quota) associated with the Azure ML service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.\n",
|
||||||
|
"\n",
|
||||||
|
"This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.\n",
|
||||||
|
"\n",
|
||||||
|
"This will fail if:\n",
|
||||||
|
"* You do not have permission to create a workspace in the resource group\n",
|
||||||
|
"* You do not have permission to create a resource group if it's non-existing.\n",
|
||||||
|
"* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription\n",
|
||||||
|
"\n",
|
||||||
|
"If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"create workspace"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"# Create the workspace using the specified parameters\n",
|
||||||
|
"ws = Workspace.create(name = workspace_name,\n",
|
||||||
|
" subscription_id = subscription_id,\n",
|
||||||
|
" resource_group = resource_group, \n",
|
||||||
|
" location = workspace_region,\n",
|
||||||
|
" create_resource_group = True,\n",
|
||||||
|
" exist_ok = True)\n",
|
||||||
|
"ws.get_details()\n",
|
||||||
|
"\n",
|
||||||
|
"# write the details of the workspace to a configuration file to the notebook library\n",
|
||||||
|
"ws.write_config()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"---\n",
|
||||||
|
"\n",
|
||||||
|
"## Next steps\n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook you configured this notebook library to connect easily to an Azure ML workspace. You can copy this notebook to your own libraries to connect them to you workspace, or use it to bootstrap new workspaces completely.\n",
|
||||||
|
"\n",
|
||||||
|
"If you came here from another notebook, you can return there and complete that exercise, or you can try out the [Tutorials](./tutorials) or jump into \"how-to\" notebooks and start creating and deploying models. A good place to start is the [train within notebook](./how-to-use-azureml/training/train-within-notebook) example that walks through a simplified but complete end to end machine learning process."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "roastala"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
0
training/README.md
Normal file
0
training/README.md
Normal file
Binary file not shown.
Reference in New Issue
Block a user