update samples from Release-93 as a part of SDK release

This commit is contained in:
amlrelsa-ms
2021-03-24 16:45:36 +00:00
parent 54a065c698
commit 8b32e8d5ad
46 changed files with 2981 additions and 1505 deletions

View File

@@ -23,6 +23,10 @@ The following tutorials are intended to provide an introductory overview of Azur
| [Deploy an image classification model](https://docs.microsoft.com/azure/machine-learning/tutorial-deploy-models-with-aml) | Deploy a scikit-learn image classification model to Azure Container Instances. | [img-classification-part2-deploy.ipynb](image-classification-mnist-data/img-classification-part2-deploy.ipynb) | Image Classification | Scikit-Learn
| [Deploy an encrypted inferencing service](https://docs.microsoft.com/azure/machine-learning/tutorial-deploy-models-with-aml) |Deploy an image classification model for encrypted inferencing in Azure Container Instances | [img-classification-part3-deploy-encrypted.ipynb](image-classification-mnist-data/img-classification-part3-deploy-encrypted.ipynb) | Image Classification | Scikit-Learn
| [Use automated machine learning to predict taxi fares](https://docs.microsoft.com/azure/machine-learning/tutorial-auto-train-models) | Train a regression model to predict taxi fares using Automated Machine Learning. | [regression-part2-automated-ml.ipynb](regression-automl-nyc-taxi-data/regression-automated-ml.ipynb) | Regression | Automated ML
| Azure ML in 10 minutes, to be run on a Compute Instance |Learn how to run an image classification model, track model metrics, and deploy a model in 10 minutes. | [AzureMLIn10mins.ipynb](quickstart-ci/AzureMLIn10mins.ipynb) | Image Classification | Scikit-Learn |
| Get started with Azure ML Job Submission, to be run on a Compute Instance |Learn how to use the Azure Machine Learning Python SDK to submit batch jobs. | [GettingStartedWithPythonSDK.ipynb](quickstart-ci/GettingStartedWithPythonSDK.ipynb) | Image Classification | Scikit-Learn |
| Get started with Automated ML, to be run on a Compute Instance | Learn how to use Automated ML for Fraud classification. | [ClassificationWithAutomatedML.ipynb](quickstart-ci/ClassificationWithAutomatedML.ipynb) | Classification | Automated ML |
## Advanced Samples

View File

@@ -337,7 +337,7 @@
" error_threshold=1,\n",
" compute_target=compute_target,\n",
" process_count_per_node=2,\n",
" node_count=1\n",
" node_count=2\n",
")"
]
},
@@ -367,10 +367,11 @@
"source": [
"from azureml.pipeline.steps import ParallelRunStep\n",
"from datetime import datetime\n",
"import uuid\n",
"\n",
"parallel_step_name = \"batchscoring-\" + datetime.now().strftime(\"%Y%m%d%H%M\")\n",
"\n",
"label_config = label_ds.as_named_input(\"labels_input\")\n",
"label_config = label_ds.as_named_input(\"labels_input\").as_mount(\"/tmp/{}\".format(str(uuid.uuid4())))\n",
"\n",
"batch_score_step = ParallelRunStep(\n",
" name=parallel_step_name,\n",

View File

@@ -0,0 +1,669 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/AzureMLin10mins.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"# Quickstart: Train and deploy a model in Azure Machine Learning in 10 minutes\n",
"\n",
"In this quickstart, learn how to get started with Azure Machine Learning. You'll train an image classification model using the [MNIST](https://azure.microsoft.com/services/open-datasets/catalog/mnist/) dataset.\n",
"\n",
"You'll learn how to:\n",
"\n",
"> * Download a dataset and look at the data\n",
"> * Train an image classification model and log metrics\n",
"> * Deploy the model"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Connect to your workspace and create an experiment\n",
"\n",
"Import some libraries and create an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all users that have access to the workspace can collaborate on them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965916889
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"from azureml.core import Experiment\n",
"\n",
"# connect to your workspace\n",
"ws = Workspace.from_config()\n",
"\n",
"# create experiment and start logging to a new run in the experiment\n",
"experiment_name = \"azure-ml-in10-mins-tutorial\"\n",
"exp = Experiment(workspace=ws, name=experiment_name)\n",
"run = exp.start_logging(snapshot_directory=None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Import Data\n",
"\n",
"Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:\n",
"\n",
"* Download the MNIST dataset\n",
"* Display some sample images\n",
"\n",
"### Download the MNIST dataset\n",
"\n",
"You'll use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965922274
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import os\n",
"from azureml.core import Dataset\n",
"from azureml.opendatasets import MNIST\n",
"\n",
"data_folder = os.path.join(os.getcwd(), \"data\")\n",
"os.makedirs(data_folder, exist_ok=True)\n",
"\n",
"mnist_file_dataset = MNIST.get_file_dataset()\n",
"mnist_file_dataset.download(data_folder, overwrite=True)\n",
"\n",
"mnist_file_dataset = mnist_file_dataset.register(\n",
" workspace=ws,\n",
" name=\"mnist_opendataset\",\n",
" description=\"training and test dataset\",\n",
" create_new_version=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Take a look at the data\n",
"\n",
"Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. \n",
"\n",
"Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965929041
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from utils import load_data\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import glob\n",
"\n",
"\n",
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
"X_train = (\n",
" load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" False,\n",
" )\n",
" / 255.0\n",
")\n",
"X_test = (\n",
" load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" False,\n",
" )\n",
" / 255.0\n",
")\n",
"y_train = load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" True,\n",
").reshape(-1)\n",
"y_test = load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" True,\n",
").reshape(-1)\n",
"\n",
"\n",
"# now let's show some randomly chosen images from the traininng set.\n",
"count = 0\n",
"sample_size = 30\n",
"plt.figure(figsize=(16, 6))\n",
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
" count = count + 1\n",
" plt.subplot(1, sample_size, count)\n",
" plt.axhline(\"\")\n",
" plt.axvline(\"\")\n",
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Train model and log metrics\n",
"\n",
"You'll train the model using the code below. Your training runs and metrics will be registered in the experiment you created, so that this information is available after you've finished.\n",
"\n",
"You'll be using the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from the [SciKit Learn framework](https://scikit-learn.org/) to classify the data.\n",
"\n",
"> **Note: The model training takes around 1 minute to complete.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966046970
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# create the model\n",
"import numpy as np\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"reg = 0.5\n",
"clf = LogisticRegression(\n",
" C=1.0 / reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42\n",
")\n",
"clf.fit(X_train, y_train)\n",
"\n",
"# make predictions using the test set and calculate the accuracy\n",
"y_hat = clf.predict(X_test)\n",
"\n",
"# calculate accuracy on the prediction\n",
"acc = np.average(y_hat == y_test)\n",
"print(\"Accuracy is\", acc)\n",
"\n",
"run.log(\"regularization rate\", np.float(reg))\n",
"run.log(\"accuracy\", np.float(acc))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Version control your models with the model registry\n",
"\n",
"You can use model registration to store and version your models in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. Azure Machine Learning supports any model that can be loaded through Python 3.\n",
"\n",
"The code below:\n",
"\n",
"1. Saves the model to disk\n",
"1. Uploads the model file to the run \n",
"1. Registers the uploaded model file\n",
"1. Transitions the run to a completed state"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612881042710
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import joblib\n",
"from azureml.core.model import Model\n",
"\n",
"path = \"sklearn_mnist_model.pkl\"\n",
"joblib.dump(value=clf, filename=path)\n",
"\n",
"run.upload_file(name=path, path_or_stream=path)\n",
"\n",
"model = run.register_model(\n",
" model_name=\"sklearn_mnist_model\",\n",
" model_path=path,\n",
" description=\"Mnist handwriting recognition\",\n",
")\n",
"\n",
"run.complete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy the model\n",
"\n",
"The next cell deploys the model to an Azure Container Instance so that you can score data in real-time (Azure Machine Learning also provides mechanisms to do batch scoring). A real-time endpoint allows application developers to integrate machine learning into their apps."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612881061728
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# create environment for the deploy\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# to install required packages\n",
"env = Environment(\"quickstart-env\")\n",
"cd = CondaDependencies.create(\n",
" pip_packages=[\"azureml-dataset-runtime[pandas,fuse]\", \"azureml-defaults\"],\n",
" conda_packages=[\"scikit-learn==0.22.1\"],\n",
")\n",
"\n",
"env.python.conda_dependencies = cd\n",
"\n",
"# Register environment to re-use later\n",
"env.register(workspace=ws)\n",
"\n",
"# create config file\n",
"from azureml.core.webservice import AciWebservice\n",
"\n",
"aciconfig = AciWebservice.deploy_configuration(\n",
" cpu_cores=1,\n",
" memory_gb=1,\n",
" tags={\"data\": \"MNIST\", \"method\": \"sklearn\"},\n",
" description=\"Predict MNIST with sklearn\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"> **Note: The deployment takes around 3 minutes to complete.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"%%time\n",
"import uuid\n",
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"\n",
"ws = Workspace.from_config()\n",
"model = Model(ws, \"sklearn_mnist_model\")\n",
"\n",
"\n",
"myenv = Environment.get(workspace=ws, name=\"quickstart-env\", version=\"1\")\n",
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n",
"\n",
"service_name = \"sklearn-mnist-svc-\" + str(uuid.uuid4())[:4]\n",
"service = Model.deploy(\n",
" workspace=ws,\n",
" name=service_name,\n",
" models=[model],\n",
" inference_config=inference_config,\n",
" deployment_config=aciconfig,\n",
")\n",
"\n",
"service.wait_for_deployment(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [*scoring script*](score.py) file referenced in the code above can be found in the same folder as this notebook, and has two functions:\n",
"\n",
"1. an `init` function that executes once when the service starts - in this function you normally get the model from the registry and set global variables\n",
"1. a `run(data)` function that executes each time a call is made to the service. In this function, you normally format the input data, run a prediction, and output the predicted result."
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Test the model service\n",
"\n",
"You can test the model by sending a raw HTTP request to test the web service. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612881527399
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# scoring web service HTTP endpoint\n",
"print(service.scoring_uri)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612881538381
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# send raw HTTP request to test the web service.\n",
"import requests\n",
"\n",
"# send a random row from the test set to score\n",
"random_index = np.random.randint(0, len(X_test) - 1)\n",
"input_data = '{\"data\": [' + str(list(X_test[random_index])) + \"]}\"\n",
"\n",
"headers = {\"Content-Type\": \"application/json\"}\n",
"\n",
"# for AKS deployment you'd need to the service key in the header as well\n",
"# api_key = service.get_key()\n",
"# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}\n",
"\n",
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
"\n",
"print(\"POST to url\", service.scoring_uri)\n",
"# print(\"input data:\", input_data)\n",
"print(\"label:\", y_test[random_index])\n",
"print(\"prediction:\", resp.text)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"\n",
"### View the results of your training\n",
"\n",
"When you're finished with an experiment run, you can always return to view the results of your model training here in the Azure Machine Learning studio:\n",
"\n",
"1. Select **Experiments** (left-hand menu)\n",
"1. Select **azure-ml-in10-mins-tutorial**\n",
"1. Select **Run 1**\n",
"1. Select the **Metrics** Tab\n",
"\n",
"The metrics tab will display the parameter values that were logged to the run."
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### View the model in the model registry\n",
"\n",
"You can see the stored model by navigating to **Models** in the left-hand menu bar. Select the **sklearn_mnist_model** to see the details of the model, including the experiment run ID that created the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up resources\n",
"\n",
"If you're not going to continue to use this model, delete the Model service using:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612881556520
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# if you want to keep workspace and only delete endpoint (it will incur cost while running)\n",
"service.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to control cost further, stop the compute instance by selecting the \"Stop compute\" button next to the **Compute** dropdown. Then start the compute instance again the next time you need it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Next Steps\n",
"\n",
"In this quickstart, you learned how to run machine learning code in Azure Machine Learning.\n",
"\n",
"Now that you have working code in a development environment, learn how to submit a **_job_** - ideally on a schedule or trigger (for example, arrival of new data).\n",
"\n",
" [**Learn how to get started with Azure ML Job Submission**](GettingStartedWithPythonSDK.ipynb) "
]
}
],
"metadata": {
"authors": [
{
"name": "cewidste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python36",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
},
"microsoft": {
"host": {
"AzureML": {
"notebookHasBeenCompleted": true
}
}
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"nteract": {
"version": "nteract-front-end@1.0.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,11 @@
name: AzureMLIn10mins
dependencies:
- pip:
- azureml-sdk
- sklearn
- numpy
- matplotlib
- joblib
- uuid
- requests
- azureml-opendatasets

View File

@@ -0,0 +1,505 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/ClassificationWithAutomatedML.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"# Quickstart: Fraud Classification using Automated ML\n",
"\n",
"In this quickstart, you use automated machine learning in Azure Machine Learning service to train a classification model on an associated fraud credit card dataset. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.\n",
"\n",
"You will learn how to:\n",
"\n",
"> * Download a dataset and look at the data\n",
"> * Train a machine learning classification model using autoML \n",
"> * Explore the results\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Connect to your workspace and create an experiment\n",
"\n",
"You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612968646250
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import azureml.core\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.dataset import Dataset\n",
"from azureml.train.automl import AutoMLConfig"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612968706273
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"\n",
"# choose a name for your experiment\n",
"experiment_name = \"fraud-classification-automl-tutorial\"\n",
"\n",
"experiment = Experiment(ws, experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Load Data\n",
"\n",
"Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.\n",
"\n",
"\n",
"Follow this [how-to](https://aka.ms/azureml/howto/createdatasets) if you want to learn more about Datasets and how to use them.\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612968722555
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
"training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n",
"label_column_name = \"Class\""
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Train\n",
"\n",
"\n",
"\n",
"When you use automated machine learning in Azure ML, you input training data and configuration settings, and the process automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model. \n",
"Learn more about how you configure automated ML [here](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train).\n",
"\n",
"\n",
"Instantiate an [AutoMLConfig](https://docs.microsoft.com/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object. This defines the settings and data used to run the experiment.\n",
"\n",
"|Property|Description|\n",
"|-|-|\n",
"|**task**|classification or regression|\n",
"|**primary_metric**|This is the metric that you want to optimize. \n",
"|**enable_early_stopping** | Stop the run if the metric score is not showing improvement.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**training_data**|Input dataset, containing both features and label column.|\n",
"|**label_column_name**|The name of the label column.|\n",
"\n",
"You can find more information about primary metrics [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612968806233
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"automl_settings = {\n",
" \"n_cross_validations\": 3,\n",
" \"primary_metric\": \"average_precision_score_weighted\",\n",
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
" \"verbosity\": logging.INFO,\n",
" \"enable_stack_ensemble\": False,\n",
"}\n",
"\n",
"automl_config = AutoMLConfig(\n",
" task=\"classification\",\n",
" debug_log=\"automl_errors.log\",\n",
" training_data=training_data,\n",
" label_column_name=label_column_name,\n",
" **automl_settings,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Call the `submit` method on the experiment object and pass the run configuration. \n",
"\n",
"**Note: Depending on the data and the number of iterations an AutoML run can take a while to complete.**\n",
"\n",
"In this example, we specify `show_output = True` to print currently running iterations to the console. It is also possible to navigate to the experiment through the **Experiment** activity tab in the left menu, and monitor the run status from there."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612970125369
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"local_run = experiment.submit(automl_config, show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612976292559
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"local_run"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Analyze results\n",
"\n",
"Below we select the best model from our iterations. The `get_output` method on `automl_classifier` returns the best run and the model for the run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612976298373
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"best_run, best_model = local_run.get_output()\n",
"best_model"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Tests\n",
"\n",
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612976320370
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# convert the test data to dataframe\n",
"X_test_df = validation_data.drop_columns(\n",
" columns=[label_column_name]\n",
").to_pandas_dataframe()\n",
"y_test_df = validation_data.keep_columns(\n",
" columns=[label_column_name], validate=True\n",
").to_pandas_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612976325829
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# call the predict functions on the model\n",
"y_pred = best_model.predict(X_test_df)\n",
"y_pred"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"\n",
"\n",
"### Calculate metrics for the prediction\n",
"\n",
"Now visualize the data to show what our truth (actual) values are compared to the predicted values \n",
"from the trained model that was returned.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612976330108
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"import numpy as np\n",
"import itertools\n",
"\n",
"cf = confusion_matrix(y_test_df.values, y_pred)\n",
"plt.imshow(cf, cmap=plt.cm.Blues, interpolation=\"nearest\")\n",
"plt.colorbar()\n",
"plt.title(\"Confusion Matrix\")\n",
"plt.xlabel(\"Predicted\")\n",
"plt.ylabel(\"Actual\")\n",
"class_labels = [\"False\", \"True\"]\n",
"tick_marks = np.arange(len(class_labels))\n",
"plt.xticks(tick_marks, class_labels)\n",
"plt.yticks([-0.5, 0, 1, 1.5], [\"\", \"False\", \"True\", \"\"])\n",
"# plotting text value inside cells\n",
"thresh = cf.max() / 2.0\n",
"for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):\n",
" plt.text(\n",
" j,\n",
" i,\n",
" format(cf[i, j], \"d\"),\n",
" horizontalalignment=\"center\",\n",
" color=\"white\" if cf[i, j] > thresh else \"black\",\n",
" )\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Control cost and further exploration\n",
"\n",
"If you want to control cost you can stop the compute instance this notebook is running on by clicking the \"Stop compute\" button next to the status dropdown in the menu above.\n",
"\n",
"\n",
"If you want to run more notebook samples, you can click on **Sample Notebooks** next to the **Files** view and explore the notebooks made available for you there."
]
}
],
"metadata": {
"authors": [
{
"name": "cewidste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"microsoft": {
"host": {
"AzureML": {
"notebookHasBeenCompleted": true
}
}
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"nteract": {
"version": "nteract-front-end@1.0.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,4 @@
name: ClassificationWithAutomatedML
dependencies:
- pip:
- azureml-sdk

View File

@@ -0,0 +1,710 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/quickstart-ci/GettingStartedWithPythonSDK.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"# Quickstart: Learn how to get started with Azure ML Job Submission\n",
"\n",
"In this quickstart, you train a machine learning model by submitting a Job to a compute target. \n",
"When training, it is common to start on your local computer, and then later scale out to a cloud-based cluster. \n",
"\n",
"All you need to do is define the environment for each compute target within a script run configuration. Then, when you want to run your training experiment on a different compute target, specify the run configuration for that compute.\n",
"\n",
"This quickstart trains a simple logistic regression using the [MNIST](https://azure.microsoft.com/services/open-datasets/catalog/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n",
"\n",
"You will learn how to:\n",
"\n",
"> * Download a dataset and look at the data\n",
"> * Train an image classification model by submitting a batch job to a compute resource\n",
"> * Review training results, find and register the best model"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Connect to your workspace and create an experiment\n",
"\n",
"You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965838618
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import azureml.core\n",
"from azureml.core import Workspace\n",
"from azureml.core import Experiment\n",
"\n",
"# connect to your workspace\n",
"ws = Workspace.from_config()\n",
"\n",
"experiment_name = \"get-started-with-jobsubmission-tutorial\"\n",
"exp = Experiment(workspace=ws, name=experiment_name)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Import Data\n",
"\n",
"Before you train a model, you need to understand the data that you are using to train it. In this section you will:\n",
"\n",
"* Download the MNIST dataset\n",
"* Display some sample images\n",
"\n",
"### Download the MNIST dataset\n",
"\n",
"Use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways.\n",
"\n",
"Follow this [how-to](https://aka.ms/azureml/howto/createdatasets) if you want to learn more about Datasets and how to use them.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965850391
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import os\n",
"from azureml.core import Dataset\n",
"from azureml.opendatasets import MNIST\n",
"\n",
"data_folder = os.path.join(os.getcwd(), \"data\")\n",
"os.makedirs(data_folder, exist_ok=True)\n",
"\n",
"mnist_file_dataset = MNIST.get_file_dataset()\n",
"mnist_file_dataset.download(data_folder, overwrite=True)\n",
"\n",
"mnist_file_dataset = mnist_file_dataset.register(\n",
" workspace=ws,\n",
" name=\"mnist_opendataset\",\n",
" description=\"training and test dataset\",\n",
" create_new_version=True,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Take a look at the data\n",
"You will load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965857960
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# make sure utils.py is in the same directory as this code\n",
"from utils import load_data\n",
"import glob\n",
"\n",
"\n",
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
"X_train = (\n",
" load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" False,\n",
" )\n",
" / 255.0\n",
")\n",
"X_test = (\n",
" load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" False,\n",
" )\n",
" / 255.0\n",
")\n",
"y_train = load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" True,\n",
").reshape(-1)\n",
"y_test = load_data(\n",
" glob.glob(\n",
" os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
" )[0],\n",
" True,\n",
").reshape(-1)\n",
"\n",
"\n",
"# now let's show some randomly chosen images from the training set.\n",
"count = 0\n",
"sample_size = 30\n",
"plt.figure(figsize=(16, 6))\n",
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
" count = count + 1\n",
" plt.subplot(1, sample_size, count)\n",
" plt.axhline(\"\")\n",
" plt.axvline(\"\")\n",
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Submit your training job\n",
"\n",
"In this quickstart you submit a job to run on the local compute, but you can use the same code to submit this training job to other compute targets. With Azure Machine Learning, you can run your script on various compute targets without having to change your training script. \n",
"\n",
"To submit a job you need:\n",
"* A directory\n",
"* A training script\n",
"* Create a script run configuration\n",
"* Submit the job \n",
"\n",
"\n",
"### Directory and training script \n",
"\n",
"You need a directory to deliver the necessary code from your computer to the remote resource. A directory with a training script has been created for you and can be found in the same folder as this notebook.\n",
"\n",
"Take a few minutes to examine the training script."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965865707
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"with open(\"sklearn-mnist-batch/train.py\", \"r\") as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Notice how the script gets data and saves models:\n",
"\n",
"+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the dataset for this argument:\n",
"`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`\n",
"\n",
"\n",
"+ The training script saves your model into a directory named outputs. <br/>\n",
"`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`<br/>\n",
"Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial.\n",
"\n",
"The file `utils.py` is referenced from the training script to load the dataset correctly. This script is also copied into the script folder so that it can be accessed along with the training script."
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Configure the training job\n",
"\n",
"Create a [ScriptRunConfig]() object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Configure the ScriptRunConfig by specifying:\n",
"\n",
"* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
"* The compute target. In this case you will point to local compute\n",
"* The training script name, train.py\n",
"* An environment that contains the libraries needed to run the script\n",
"* Arguments required from the training script. \n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments. Here you will be using a curated environment that has already been made available through the workspace. \n",
"\n",
"Read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) if you want to learn more about Environments and how to use them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965877458
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# use a curated environment that has already been built for you\n",
"\n",
"env = Environment.get(workspace=ws, name=\"AzureML-Scikit-learn-0.20.3\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Create a [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?preserve-view=true&view=azure-ml-py) object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. A script run configuration is used to configure the information necessary for submitting a training run as part of an experiment. \n",
"\n",
"Read more about configuring and submitting training runs [here](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965882781
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.core import ScriptRunConfig\n",
"\n",
"args = [\"--data-folder\", mnist_file_dataset.as_mount(), \"--regularization\", 0.5]\n",
"\n",
"script_folder = \"sklearn-mnist-batch\"\n",
"src = ScriptRunConfig(\n",
" source_directory=script_folder,\n",
" script=\"train.py\",\n",
" arguments=args,\n",
" compute_target=\"local\",\n",
" environment=env,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Submit the job\n",
"\n",
"Run the experiment by submitting the ScriptRunConfig object. After this there are many options for monitoring your run. You can either navigate to the experiment \"get-started-with-jobsubmission-tutorial\" in the left menu item Experiments to monitor the run (quick link to the run details page in the cell output below), or you can monitor the run inline in this notebook by using the Jupyter widget activated below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612965911435
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"run = exp.submit(config=src)\n",
"run"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Jupyter widget\n",
"\n",
"Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966026710
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.widgets import RunDetails\n",
"\n",
"RunDetails(run).show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"if you want to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Get log results upon completion\n",
"\n",
"Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966045110
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# specify show_output to True for a verbose log\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Display run results\n",
"\n",
"You now have a trained model. Retrieve all the metrics logged during the run, including the accuracy of the model:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966059052
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"print(run.get_metrics())"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Register model\n",
"\n",
"The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` on the compute where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966064041
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"print(run.get_file_names())"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Register the model in the workspace so that you (or your team members with access to the workspace) can later query, examine, and deploy this model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1612966068862
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# register model\n",
"model = run.register_model(\n",
" model_name=\"sklearn_mnist\", model_path=\"outputs/sklearn_mnist_model.pkl\"\n",
")\n",
"print(model.name, model.id, model.version, sep=\"\\t\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Control Cost\n",
"\n",
"If you want to control cost you can stop the compute instance this notebook is running on by clicking the \"Stop compute\" button next to the status dropdown in the menu above.\n",
"\n",
" ## Next Steps\n",
"\n",
"In this quickstart, you have seen how to run jobs-based machine learning code in Azure Machine Learning. \n",
"\n",
"It is also possible to use automated machine learning in Azure Machine Learning service to find the best model in an automated fashion. To see how this works, we recommend that you follow the next quickstart in this series, [**Fraud Classification using Automated ML**](ClassificationWithAutomatedML.ipynb). This quickstart is focused on AutoML using the Python SDK."
]
}
],
"metadata": {
"authors": [
{
"name": "cewidste"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
},
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
"nteract": {
"version": "nteract-front-end@1.0.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -0,0 +1,11 @@
name: GettingStartedWithPythonSDK
dependencies:
- pip:
- azureml-sdk
- sklearn
- numpy
- matplotlib
- joblib
- uuid
- requests
- azureml-opendatasets

View File

@@ -0,0 +1,21 @@
import json
import numpy as np
import os
import joblib
def init():
global model
# AZUREML_MODEL_DIR is an environment variable created during deployment.
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "sklearn_mnist_model.pkl")
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)["data"])
# make prediction
y_hat = model.predict(data)
# you can return any data type as long as it is JSON-serializable
return y_hat.tolist()

View File

@@ -0,0 +1,82 @@
import argparse
import os
import numpy as np
import glob
from sklearn.linear_model import LogisticRegression
import joblib
from azureml.core import Run
from utils import load_data
# let user feed in 2 parameters, the dataset to mount or download,
# and the regularization rate of the logistic regression model
parser = argparse.ArgumentParser()
parser.add_argument(
"--data-folder", type=str, dest="data_folder", help="data folder mounting point"
)
parser.add_argument(
"--regularization", type=float, dest="reg", default=0.01, help="regularization rate"
)
args = parser.parse_args()
data_folder = args.data_folder
print("Data folder:", data_folder)
# load train and test set into numpy arrays
# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.
X_train = (
load_data(
glob.glob(
os.path.join(data_folder, "**/train-images-idx3-ubyte.gz"), recursive=True
)[0],
False,
) /
255.0
)
X_test = (
load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-images-idx3-ubyte.gz"), recursive=True
)[0],
False,
) /
255.0
)
y_train = load_data(
glob.glob(
os.path.join(data_folder, "**/train-labels-idx1-ubyte.gz"), recursive=True
)[0],
True,
).reshape(-1)
y_test = load_data(
glob.glob(
os.path.join(data_folder, "**/t10k-labels-idx1-ubyte.gz"), recursive=True
)[0],
True,
).reshape(-1)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep="\n")
# get hold of the current run
run = Run.get_context()
print("Train a logistic regression model with regularization rate of", args.reg)
clf = LogisticRegression(
C=1.0 / args.reg, solver="liblinear", multi_class="auto", random_state=42
)
clf.fit(X_train, y_train)
print("Predict the test set")
y_hat = clf.predict(X_test)
# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print("Accuracy is", acc)
run.log("regularization rate", np.float(args.reg))
run.log("accuracy", np.float(acc))
os.makedirs("outputs", exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=clf, filename="outputs/sklearn_mnist_model.pkl")

View File

@@ -0,0 +1,24 @@
import gzip
import numpy as np
import struct
# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
with gzip.open(filename) as gz:
struct.unpack("I", gz.read(4))
n_items = struct.unpack(">I", gz.read(4))
if not label:
n_rows = struct.unpack(">I", gz.read(4))[0]
n_cols = struct.unpack(">I", gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols)
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(n_items[0], 1)
return res
# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
return np.eye(num_of_classes)[array.reshape(-1)]

View File

@@ -0,0 +1,24 @@
import gzip
import numpy as np
import struct
# load compressed MNIST gz files and return numpy arrays
def load_data(filename, label=False):
with gzip.open(filename) as gz:
struct.unpack("I", gz.read(4))
n_items = struct.unpack(">I", gz.read(4))
if not label:
n_rows = struct.unpack(">I", gz.read(4))[0]
n_cols = struct.unpack(">I", gz.read(4))[0]
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
res = res.reshape(n_items[0], n_rows * n_cols)
else:
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
res = res.reshape(n_items[0], 1)
return res
# one-hot encode a 1-D array
def one_hot_encode(array, num_of_classes):
return np.eye(num_of_classes)[array.reshape(-1)]