mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 01:27:06 -05:00
479 lines
16 KiB
Plaintext
479 lines
16 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
|
"\n",
|
|
"Licensed under the MIT License."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Automated Machine Learning\n",
|
|
"_**Classification with Deployment**_\n",
|
|
"\n",
|
|
"## Contents\n",
|
|
"1. [Introduction](#Introduction)\n",
|
|
"1. [Setup](#Setup)\n",
|
|
"1. [Train](#Train)\n",
|
|
"1. [Deploy](#Deploy)\n",
|
|
"1. [Test](#Test)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Introduction\n",
|
|
"\n",
|
|
"In this example we use the scikit learn's [digit dataset](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) to showcase how you can use AutoML for a simple classification problem and deploy it to an Azure Container Instance (ACI).\n",
|
|
"\n",
|
|
"Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
|
|
"\n",
|
|
"In this notebook you will learn how to:\n",
|
|
"1. Create an experiment using an existing workspace.\n",
|
|
"2. Configure AutoML using `AutoMLConfig`.\n",
|
|
"3. Train the model using local compute.\n",
|
|
"4. Explore the results.\n",
|
|
"5. Register the model.\n",
|
|
"6. Create a container image.\n",
|
|
"7. Create an Azure Container Instance (ACI) service.\n",
|
|
"8. Test the ACI service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Setup\n",
|
|
"\n",
|
|
"As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import json\n",
|
|
"import logging\n",
|
|
"\n",
|
|
"from matplotlib import pyplot as plt\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"from sklearn import datasets\n",
|
|
"\n",
|
|
"import azureml.core\n",
|
|
"from azureml.core.experiment import Experiment\n",
|
|
"from azureml.core.workspace import Workspace\n",
|
|
"from azureml.train.automl import AutoMLConfig\n",
|
|
"from azureml.train.automl.run import AutoMLRun"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"ws = Workspace.from_config()\n",
|
|
"\n",
|
|
"# choose a name for experiment\n",
|
|
"experiment_name = 'automl-classification-deployment'\n",
|
|
"\n",
|
|
"experiment=Experiment(ws, experiment_name)\n",
|
|
"\n",
|
|
"output = {}\n",
|
|
"output['SDK version'] = azureml.core.VERSION\n",
|
|
"output['Subscription ID'] = ws.subscription_id\n",
|
|
"output['Workspace'] = ws.name\n",
|
|
"output['Resource Group'] = ws.resource_group\n",
|
|
"output['Location'] = ws.location\n",
|
|
"output['Experiment Name'] = experiment.name\n",
|
|
"pd.set_option('display.max_colwidth', -1)\n",
|
|
"outputDf = pd.DataFrame(data = output, index = [''])\n",
|
|
"outputDf.T"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Train\n",
|
|
"\n",
|
|
"Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
|
|
"\n",
|
|
"|Property|Description|\n",
|
|
"|-|-|\n",
|
|
"|**task**|classification or regression|\n",
|
|
"|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
|
|
"|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n",
|
|
"|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n",
|
|
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
|
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
|
|
"|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"digits = datasets.load_digits()\n",
|
|
"X_train = digits.data[10:,:]\n",
|
|
"y_train = digits.target[10:]\n",
|
|
"\n",
|
|
"automl_config = AutoMLConfig(task = 'classification',\n",
|
|
" name = experiment_name,\n",
|
|
" debug_log = 'automl_errors.log',\n",
|
|
" primary_metric = 'AUC_weighted',\n",
|
|
" iteration_timeout_minutes = 20,\n",
|
|
" iterations = 10,\n",
|
|
" verbosity = logging.INFO,\n",
|
|
" X = X_train, \n",
|
|
" y = y_train)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n",
|
|
"In this example, we specify `show_output = True` to print currently running iterations to the console."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"local_run = experiment.submit(automl_config, show_output = True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"local_run"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy\n",
|
|
"\n",
|
|
"### Retrieve the Best Model\n",
|
|
"\n",
|
|
"Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"best_run, fitted_model = local_run.get_output()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Register the Fitted Model for Deployment\n",
|
|
"If neither `metric` nor `iteration` are specified in the `register_model` call, the iteration with the best primary metric is registered."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"description = 'AutoML Model'\n",
|
|
"tags = None\n",
|
|
"model = local_run.register_model(description = description, tags = tags)\n",
|
|
"\n",
|
|
"print(local_run.model_id) # This will be written to the script file later in the notebook."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create Scoring Script"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%writefile score.py\n",
|
|
"import pickle\n",
|
|
"import json\n",
|
|
"import numpy\n",
|
|
"import azureml.train.automl\n",
|
|
"from sklearn.externals import joblib\n",
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"\n",
|
|
"def init():\n",
|
|
" global model\n",
|
|
" model_path = Model.get_model_path(model_name = '<<modelid>>') # this name is model.id of model that we want to deploy\n",
|
|
" # deserialize the model file back into a sklearn model\n",
|
|
" model = joblib.load(model_path)\n",
|
|
"\n",
|
|
"def run(rawdata):\n",
|
|
" try:\n",
|
|
" data = json.loads(rawdata)['data']\n",
|
|
" data = numpy.array(data)\n",
|
|
" result = model.predict(data)\n",
|
|
" except Exception as e:\n",
|
|
" result = str(e)\n",
|
|
" return json.dumps({\"error\": result})\n",
|
|
" return json.dumps({\"result\":result.tolist()})"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create a YAML File for the Environment"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"To ensure the fit results are consistent with the training results, the SDK dependency versions need to be the same as the environment that trains the model. The following cells create a file, myenv.yml, which specifies the dependencies from the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"experiment = Experiment(ws, experiment_name)\n",
|
|
"ml_run = AutoMLRun(experiment = experiment, run_id = local_run.id)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"dependencies = ml_run.get_run_sdk_dependencies(iteration = 7)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for p in ['azureml-train-automl', 'azureml-core']:\n",
|
|
" print('{}\\t{}'.format(p, dependencies[p]))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
"\n",
|
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n",
|
|
" pip_packages=['azureml-defaults','azureml-train-automl'])\n",
|
|
"\n",
|
|
"conda_env_file_name = 'myenv.yml'\n",
|
|
"myenv.save_to_file('.', conda_env_file_name)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Substitute the actual version number in the environment file.\n",
|
|
"# This is not strictly needed in this notebook because the model should have been generated using the current SDK version.\n",
|
|
"# However, we include this in case this code is used on an experiment from a previous SDK version.\n",
|
|
"\n",
|
|
"with open(conda_env_file_name, 'r') as cefr:\n",
|
|
" content = cefr.read()\n",
|
|
"\n",
|
|
"with open(conda_env_file_name, 'w') as cefw:\n",
|
|
" cefw.write(content.replace(azureml.core.VERSION, dependencies['azureml-train-automl']))\n",
|
|
"\n",
|
|
"# Substitute the actual model id in the script file.\n",
|
|
"\n",
|
|
"script_file_name = 'score.py'\n",
|
|
"\n",
|
|
"with open(script_file_name, 'r') as cefr:\n",
|
|
" content = cefr.read()\n",
|
|
"\n",
|
|
"with open(script_file_name, 'w') as cefw:\n",
|
|
" cefw.write(content.replace('<<modelid>>', local_run.model_id))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Deploy the model as a Web Service on Azure Container Instance\n",
|
|
"\n",
|
|
"Create the configuration needed for deploying the model as a web service service."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.model import InferenceConfig\n",
|
|
"from azureml.core.webservice import AciWebservice\n",
|
|
"\n",
|
|
"inference_config = InferenceConfig(runtime = \"python\", \n",
|
|
" entry_script = script_file_name,\n",
|
|
" conda_file = conda_env_file_name)\n",
|
|
"\n",
|
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n",
|
|
" memory_gb = 1, \n",
|
|
" tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n",
|
|
" description = 'sample service for Automl Classification')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from azureml.core.webservice import Webservice\n",
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"aci_service_name = 'automl-sample-01'\n",
|
|
"print(aci_service_name)\n",
|
|
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
|
"aci_service.wait_for_deployment(True)\n",
|
|
"print(aci_service.state)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Get the logs from service deployment"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"if aci_service.state != 'Healthy':\n",
|
|
" # run this command for debugging.\n",
|
|
" print(aci_service.get_logs())"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Delete a Web Service"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#aci_service.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Test"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Randomly select digits and test\n",
|
|
"digits = datasets.load_digits()\n",
|
|
"X_test = digits.data[:10, :]\n",
|
|
"y_test = digits.target[:10]\n",
|
|
"images = digits.images[:10]\n",
|
|
"\n",
|
|
"for index in np.random.choice(len(y_test), 3, replace = False):\n",
|
|
" print(index)\n",
|
|
" test_sample = json.dumps({'data':X_test[index:index + 1].tolist()})\n",
|
|
" predicted = aci_service.run(input_data = test_sample)\n",
|
|
" label = y_test[index]\n",
|
|
" predictedDict = json.loads(predicted)\n",
|
|
" title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0])\n",
|
|
" fig = plt.figure(1, figsize = (3,3))\n",
|
|
" ax1 = fig.add_axes((0,0,.8,.8))\n",
|
|
" ax1.set_title(title)\n",
|
|
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
|
|
" plt.show()"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "savitam"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |