mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-20 01:27:06 -05:00
662 lines
21 KiB
Plaintext
662 lines
21 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"# Quickstart: Train and deploy a model in Azure Machine Learning in 10 minutes\n",
|
|
"\n",
|
|
"In this quickstart, learn how to get started with Azure Machine Learning. You'll train an image classification model using the [MNIST](https://azure.microsoft.com/services/open-datasets/catalog/mnist/) dataset.\n",
|
|
"\n",
|
|
"You'll learn how to:\n",
|
|
"\n",
|
|
"> * Download a dataset and look at the data\n",
|
|
"> * Train an image classification model and log metrics\n",
|
|
"> * Deploy the model"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"## Connect to your workspace and create an experiment\n",
|
|
"\n",
|
|
"Import some libraries and create an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all users that have access to the workspace can collaborate on them."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612965916889
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"\n",
|
|
"import azureml.core\n",
|
|
"from azureml.core import Workspace\n",
|
|
"from azureml.core import Experiment\n",
|
|
"\n",
|
|
"# connect to your workspace\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"\n",
|
|
"# create experiment and start logging to a new run in the experiment\n",
|
|
"experiment_name = \"azure-ml-in10-mins-tutorial\"\n",
|
|
"exp = Experiment(workspace=ws, name=experiment_name)\n",
|
|
"run = exp.start_logging(snapshot_directory=None)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"## Import Data\n",
|
|
"\n",
|
|
"Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:\n",
|
|
"\n",
|
|
"* Download the MNIST dataset\n",
|
|
"* Display some sample images\n",
|
|
"\n",
|
|
"### Download the MNIST dataset\n",
|
|
"\n",
|
|
"You'll use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612965922274
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import os\n",
|
|
"from azureml.core import Dataset\n",
|
|
"from azureml.opendatasets import MNIST\n",
|
|
"\n",
|
|
"data_folder = os.path.join(os.getcwd(), \"data\")\n",
|
|
"os.makedirs(data_folder, exist_ok=True)\n",
|
|
"\n",
|
|
"mnist_file_dataset = MNIST.get_file_dataset()\n",
|
|
"mnist_file_dataset.download(data_folder, overwrite=True)\n",
|
|
"\n",
|
|
"mnist_file_dataset = mnist_file_dataset.register(\n",
|
|
" workspace=ws,\n",
|
|
" name=\"mnist_opendataset\",\n",
|
|
" description=\"training and test dataset\",\n",
|
|
" create_new_version=True,\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"### Take a look at the data\n",
|
|
"\n",
|
|
"Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. \n",
|
|
"\n",
|
|
"Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612965929041
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"from utils import load_data\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import numpy as np\n",
|
|
"import glob\n",
|
|
"\n",
|
|
"\n",
|
|
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
|
|
"X_train = (\n",
|
|
" load_data(\n",
|
|
" glob.glob(\n",
|
|
" os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
|
|
" )[0],\n",
|
|
" False,\n",
|
|
" )\n",
|
|
" / 255.0\n",
|
|
")\n",
|
|
"X_test = (\n",
|
|
" load_data(\n",
|
|
" glob.glob(\n",
|
|
" os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
|
|
" )[0],\n",
|
|
" False,\n",
|
|
" )\n",
|
|
" / 255.0\n",
|
|
")\n",
|
|
"y_train = load_data(\n",
|
|
" glob.glob(\n",
|
|
" os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
|
|
" )[0],\n",
|
|
" True,\n",
|
|
").reshape(-1)\n",
|
|
"y_test = load_data(\n",
|
|
" glob.glob(\n",
|
|
" os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
|
|
" )[0],\n",
|
|
" True,\n",
|
|
").reshape(-1)\n",
|
|
"\n",
|
|
"\n",
|
|
"# now let's show some randomly chosen images from the traininng set.\n",
|
|
"count = 0\n",
|
|
"sample_size = 30\n",
|
|
"plt.figure(figsize=(16, 6))\n",
|
|
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
|
" count = count + 1\n",
|
|
" plt.subplot(1, sample_size, count)\n",
|
|
" plt.axhline(\"\")\n",
|
|
" plt.axvline(\"\")\n",
|
|
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
|
|
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"## Train model and log metrics\n",
|
|
"\n",
|
|
"You'll train the model using the code below. Your training runs and metrics will be registered in the experiment you created, so that this information is available after you've finished.\n",
|
|
"\n",
|
|
"You'll be using the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from the [SciKit Learn framework](https://scikit-learn.org/) to classify the data.\n",
|
|
"\n",
|
|
"> **Note: The model training takes around 1 minute to complete.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612966046970
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# create the model\n",
|
|
"import numpy as np\n",
|
|
"from sklearn.linear_model import LogisticRegression\n",
|
|
"\n",
|
|
"reg = 0.5\n",
|
|
"clf = LogisticRegression(\n",
|
|
" C=1.0 / reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42\n",
|
|
")\n",
|
|
"clf.fit(X_train, y_train)\n",
|
|
"\n",
|
|
"# make predictions using the test set and calculate the accuracy\n",
|
|
"y_hat = clf.predict(X_test)\n",
|
|
"\n",
|
|
"# calculate accuracy on the prediction\n",
|
|
"acc = np.average(y_hat == y_test)\n",
|
|
"print(\"Accuracy is\", acc)\n",
|
|
"\n",
|
|
"run.log(\"regularization rate\", np.float(reg))\n",
|
|
"run.log(\"accuracy\", np.float(acc))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"## Version control your models with the model registry\n",
|
|
"\n",
|
|
"You can use model registration to store and version your models in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Azure Machine Learning supports any model that can be loaded through Python 3.\n",
|
|
"\n",
|
|
"The code below:\n",
|
|
"\n",
|
|
"1. Saves the model to disk\n",
|
|
"1. Uploads the model file to the run \n",
|
|
"1. Registers the uploaded model file\n",
|
|
"1. Transitions the run to a completed state"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612881042710
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"import joblib\n",
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"path = \"sklearn_mnist_model.pkl\"\n",
|
|
"joblib.dump(value=clf, filename=path)\n",
|
|
"\n",
|
|
"run.upload_file(name=path, path_or_stream=path)\n",
|
|
"\n",
|
|
"model = run.register_model(\n",
|
|
" model_name=\"sklearn_mnist_model\",\n",
|
|
" model_path=path,\n",
|
|
" description=\"Mnist handwriting recognition\",\n",
|
|
")\n",
|
|
"\n",
|
|
"run.complete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Deploy the model\n",
|
|
"\n",
|
|
"The next cell deploys the model to an Azure Container Instance so that you can score data in real-time (Azure Machine Learning also provides mechanisms to do batch scoring). A real-time endpoint allows application developers to integrate machine learning into their apps."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612881061728
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# create environment for the deploy\n",
|
|
"from azureml.core.environment import Environment\n",
|
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
|
"\n",
|
|
"# to install required packages\n",
|
|
"env = Environment(\"quickstart-env\")\n",
|
|
"cd = CondaDependencies.create(\n",
|
|
" pip_packages=[\"azureml-dataset-runtime[pandas,fuse]\", \"azureml-defaults\"],\n",
|
|
" conda_packages=[\"scikit-learn==0.22.1\"],\n",
|
|
")\n",
|
|
"\n",
|
|
"env.python.conda_dependencies = cd\n",
|
|
"\n",
|
|
"# Register environment to re-use later\n",
|
|
"env.register(workspace=ws)\n",
|
|
"\n",
|
|
"# create config file\n",
|
|
"from azureml.core.webservice import AciWebservice\n",
|
|
"\n",
|
|
"aciconfig = AciWebservice.deploy_configuration(\n",
|
|
" cpu_cores=1,\n",
|
|
" memory_gb=1,\n",
|
|
" tags={\"data\": \"MNIST\", \"method\": \"sklearn\"},\n",
|
|
" description=\"Predict MNIST with sklearn\",\n",
|
|
")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"> **Note: The deployment takes around 3 minutes to complete.**"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"%%time\n",
|
|
"import uuid\n",
|
|
"from azureml.core.webservice import Webservice\n",
|
|
"from azureml.core.model import InferenceConfig\n",
|
|
"from azureml.core.environment import Environment\n",
|
|
"from azureml.core import Workspace\n",
|
|
"from azureml.core.model import Model\n",
|
|
"\n",
|
|
"ws = Workspace.from_config()\n",
|
|
"model = Model(ws, \"sklearn_mnist_model\")\n",
|
|
"\n",
|
|
"\n",
|
|
"myenv = Environment.get(workspace=ws, name=\"quickstart-env\", version=\"1\")\n",
|
|
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
|
|
"\n",
|
|
"service_name = \"sklearn-mnist-svc-\" + str(uuid.uuid4())[:4]\n",
|
|
"service = Model.deploy(\n",
|
|
" workspace=ws,\n",
|
|
" name=service_name,\n",
|
|
" models=[model],\n",
|
|
" inference_config=inference_config,\n",
|
|
" deployment_config=aciconfig,\n",
|
|
")\n",
|
|
"\n",
|
|
"service.wait_for_deployment(show_output=True)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"The [*scoring script*](score.py) file referenced in the code above can be found in the same folder as this notebook, and has two functions:\n",
|
|
"\n",
|
|
"1. an `init` function that executes once when the service starts - in this function you normally get the model from the registry and set global variables\n",
|
|
"1. a `run(data)` function that executes each time a call is made to the service. In this function, you normally format the input data, run a prediction, and output the predicted result."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"## Test the model service\n",
|
|
"\n",
|
|
"You can test the model by sending a raw HTTP request to test the web service. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612881527399
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# scoring web service HTTP endpoint\n",
|
|
"print(service.scoring_uri)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612881538381
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# send raw HTTP request to test the web service.\n",
|
|
"import requests\n",
|
|
"\n",
|
|
"# send a random row from the test set to score\n",
|
|
"random_index = np.random.randint(0, len(X_test) - 1)\n",
|
|
"input_data = '{\"data\": [' + str(list(X_test[random_index])) + \"]}\"\n",
|
|
"\n",
|
|
"headers = {\"Content-Type\": \"application/json\"}\n",
|
|
"\n",
|
|
"# for AKS deployment you'd need to the service key in the header as well\n",
|
|
"# api_key = service.get_key()\n",
|
|
"# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}\n",
|
|
"\n",
|
|
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
|
"\n",
|
|
"print(\"POST to url\", service.scoring_uri)\n",
|
|
"# print(\"input data:\", input_data)\n",
|
|
"print(\"label:\", y_test[random_index])\n",
|
|
"print(\"prediction:\", resp.text)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"\n",
|
|
"### View the results of your training\n",
|
|
"\n",
|
|
"When you're finished with an experiment run, you can always return to view the results of your model training here in the Azure Machine Learning studio:\n",
|
|
"\n",
|
|
"1. Select **Experiments** (left-hand menu)\n",
|
|
"1. Select **azure-ml-in10-mins-tutorial**\n",
|
|
"1. Select **Run 1**\n",
|
|
"1. Select the **Metrics** Tab\n",
|
|
"\n",
|
|
"The metrics tab will display the parameter values that were logged to the run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"source": [
|
|
"### View the model in the model registry\n",
|
|
"\n",
|
|
"You can see the stored model by navigating to **Models** in the left-hand menu bar. Select the **sklearn_mnist_model** to see the details of the model, including the experiment run ID that created the model."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Clean up resources\n",
|
|
"\n",
|
|
"If you're not going to continue to use this model, delete the Model service using:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {
|
|
"gather": {
|
|
"logged": 1612881556520
|
|
},
|
|
"jupyter": {
|
|
"outputs_hidden": false,
|
|
"source_hidden": false
|
|
},
|
|
"nteract": {
|
|
"transient": {
|
|
"deleting": false
|
|
}
|
|
}
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"# if you want to keep workspace and only delete endpoint (it will incur cost while running)\n",
|
|
"service.delete()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you want to control cost further, stop the compute instance by selecting the \"Stop compute\" button next to the **Compute** dropdown. Then start the compute instance again the next time you need it."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"## Next Steps\n",
|
|
"\n",
|
|
"In this quickstart, you learned how to run machine learning code in Azure Machine Learning.\n",
|
|
"\n",
|
|
"Now that you have working code in a development environment, learn how to submit a **_job_** - ideally on a schedule or trigger (for example, arrival of new data).\n",
|
|
"\n",
|
|
" [**Learn how to get started with Azure ML Job Submission**](../quickstart-azureml-python-sdk/quickstart-azureml-python-sdk.ipynb) "
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"authors": [
|
|
{
|
|
"name": "cewidste"
|
|
}
|
|
],
|
|
"kernelspec": {
|
|
"display_name": "Python 3.6",
|
|
"language": "python",
|
|
"name": "python36"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.6.9"
|
|
},
|
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
|
"nteract": {
|
|
"version": "nteract-front-end@1.0.0"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 4
|
|
} |