mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
update samples from Release-129 as a part of SDK release
This commit is contained in:
@@ -1,498 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Quickstart: Fraud Classification using Automated ML\n",
|
||||
"\n",
|
||||
"In this quickstart, you use automated machine learning in Azure Machine Learning service to train a classification model on an associated fraud credit card dataset. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.\n",
|
||||
"\n",
|
||||
"You will learn how to:\n",
|
||||
"\n",
|
||||
"> * Download a dataset and look at the data\n",
|
||||
"> * Train a machine learning classification model using autoML \n",
|
||||
"> * Explore the results\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Connect to your workspace and create an experiment\n",
|
||||
"\n",
|
||||
"You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612968646250
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import logging\n",
|
||||
"\n",
|
||||
"from matplotlib import pyplot as plt\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"import azureml.core\n",
|
||||
"from azureml.core.experiment import Experiment\n",
|
||||
"from azureml.core.workspace import Workspace\n",
|
||||
"from azureml.core.dataset import Dataset\n",
|
||||
"from azureml.train.automl import AutoMLConfig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612968706273
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# choose a name for your experiment\n",
|
||||
"experiment_name = \"fraud-classification-automl-tutorial\"\n",
|
||||
"\n",
|
||||
"experiment = Experiment(ws, experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Load Data\n",
|
||||
"\n",
|
||||
"Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Follow this [how-to](https://aka.ms/azureml/howto/createdatasets) if you want to learn more about Datasets and how to use them.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612968722555
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
|
||||
"dataset = Dataset.Tabular.from_delimited_files(data)\n",
|
||||
"training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n",
|
||||
"label_column_name = \"Class\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Train\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"When you use automated machine learning in Azure ML, you input training data and configuration settings, and the process automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model. \n",
|
||||
"Learn more about how you configure automated ML [here](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Instantiate an [AutoMLConfig](https://docs.microsoft.com/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object. This defines the settings and data used to run the experiment.\n",
|
||||
"\n",
|
||||
"|Property|Description|\n",
|
||||
"|-|-|\n",
|
||||
"|**task**|classification or regression|\n",
|
||||
"|**primary_metric**|This is the metric that you want to optimize. \n",
|
||||
"|**enable_early_stopping** | Stop the run if the metric score is not showing improvement.|\n",
|
||||
"|**n_cross_validations**|Number of cross validation splits.|\n",
|
||||
"|**training_data**|Input dataset, containing both features and label column.|\n",
|
||||
"|**label_column_name**|The name of the label column.|\n",
|
||||
"\n",
|
||||
"You can find more information about primary metrics [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612968806233
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"automl_settings = {\n",
|
||||
" \"n_cross_validations\": 3,\n",
|
||||
" \"primary_metric\": \"average_precision_score_weighted\",\n",
|
||||
" \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n",
|
||||
" \"verbosity\": logging.INFO,\n",
|
||||
" \"enable_stack_ensemble\": False,\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"automl_config = AutoMLConfig(\n",
|
||||
" task=\"classification\",\n",
|
||||
" debug_log=\"automl_errors.log\",\n",
|
||||
" training_data=training_data,\n",
|
||||
" label_column_name=label_column_name,\n",
|
||||
" **automl_settings,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"Call the `submit` method on the experiment object and pass the run configuration. \n",
|
||||
"\n",
|
||||
"**Note: Depending on the data and the number of iterations an AutoML run can take a while to complete.**\n",
|
||||
"\n",
|
||||
"In this example, we specify `show_output = True` to print currently running iterations to the console. It is also possible to navigate to the experiment through the **Experiment** activity tab in the left menu, and monitor the run status from there."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612970125369
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"local_run = experiment.submit(automl_config, show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612976292559
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"local_run"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Analyze results\n",
|
||||
"\n",
|
||||
"Below we select the best model from our iterations. The `get_output` method on `automl_classifier` returns the best run and the model for the run."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612976298373
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"best_run, best_model = local_run.get_output()\n",
|
||||
"best_model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Tests\n",
|
||||
"\n",
|
||||
"Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612976320370
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# convert the test data to dataframe\n",
|
||||
"X_test_df = validation_data.drop_columns(\n",
|
||||
" columns=[label_column_name]\n",
|
||||
").to_pandas_dataframe()\n",
|
||||
"y_test_df = validation_data.keep_columns(\n",
|
||||
" columns=[label_column_name], validate=True\n",
|
||||
").to_pandas_dataframe()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612976325829
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# call the predict functions on the model\n",
|
||||
"y_pred = best_model.predict(X_test_df)\n",
|
||||
"y_pred"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"\n",
|
||||
"\n",
|
||||
"### Calculate metrics for the prediction\n",
|
||||
"\n",
|
||||
"Now visualize the data to show what our truth (actual) values are compared to the predicted values \n",
|
||||
"from the trained model that was returned.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612976330108
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.metrics import confusion_matrix\n",
|
||||
"import numpy as np\n",
|
||||
"import itertools\n",
|
||||
"\n",
|
||||
"cf = confusion_matrix(y_test_df.values, y_pred)\n",
|
||||
"plt.imshow(cf, cmap=plt.cm.Blues, interpolation=\"nearest\")\n",
|
||||
"plt.colorbar()\n",
|
||||
"plt.title(\"Confusion Matrix\")\n",
|
||||
"plt.xlabel(\"Predicted\")\n",
|
||||
"plt.ylabel(\"Actual\")\n",
|
||||
"class_labels = [\"False\", \"True\"]\n",
|
||||
"tick_marks = np.arange(len(class_labels))\n",
|
||||
"plt.xticks(tick_marks, class_labels)\n",
|
||||
"plt.yticks([-0.5, 0, 1, 1.5], [\"\", \"False\", \"True\", \"\"])\n",
|
||||
"# plotting text value inside cells\n",
|
||||
"thresh = cf.max() / 2.0\n",
|
||||
"for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):\n",
|
||||
" plt.text(\n",
|
||||
" j,\n",
|
||||
" i,\n",
|
||||
" format(cf[i, j], \"d\"),\n",
|
||||
" horizontalalignment=\"center\",\n",
|
||||
" color=\"white\" if cf[i, j] > thresh else \"black\",\n",
|
||||
" )\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Control cost and further exploration\n",
|
||||
"\n",
|
||||
"If you want to control cost you can stop the compute instance this notebook is running on by clicking the \"Stop compute\" button next to the status dropdown in the menu above.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"If you want to run more notebook samples, you can click on **Sample Notebooks** next to the **Files** view and explore the notebooks made available for you there."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "cewidste"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.9"
|
||||
},
|
||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
||||
"nteract": {
|
||||
"version": "nteract-front-end@1.0.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,4 +0,0 @@
|
||||
name: quickstart-azureml-automl
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
@@ -1,502 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Quickstart: Train and deploy a model in Azure Machine Learning in 10 minutes\n",
|
||||
"\n",
|
||||
"In this quickstart, learn how to get started with Azure Machine Learning. You'll train an image classification model using the [MNIST](https://docs.microsoft.com/azure/open-datasets/dataset-mnist) dataset.\n",
|
||||
"\n",
|
||||
"You'll learn how to:\n",
|
||||
"\n",
|
||||
"* Download a dataset and look at the data\n",
|
||||
"* Train an image classification model and log metrics using MLflow\n",
|
||||
"* Deploy the model to do real-time inference"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Import Data\n",
|
||||
"\n",
|
||||
"Before you train a model, you need to understand the data you're using to train it. In this section, learn how to:\n",
|
||||
"\n",
|
||||
"* Download the MNIST dataset\n",
|
||||
"* Display some sample images\n",
|
||||
"\n",
|
||||
"You'll use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for better models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import os\n",
|
||||
"from azureml.opendatasets import MNIST\n",
|
||||
"\n",
|
||||
"data_folder = os.path.join(os.getcwd(), \"/tmp/qs_data\")\n",
|
||||
"os.makedirs(data_folder, exist_ok=True)\n",
|
||||
"\n",
|
||||
"mnist_file_dataset = MNIST.get_file_dataset()\n",
|
||||
"mnist_file_dataset.download(data_folder, overwrite=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Take a look at the data\n",
|
||||
"\n",
|
||||
"Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. \n",
|
||||
"\n",
|
||||
"Note this step requires a `load_data` function that's included in an `utils.py` file. This file is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from utils import load_data\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"import glob\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n",
|
||||
"X_train = (\n",
|
||||
" load_data(\n",
|
||||
" glob.glob(\n",
|
||||
" os.path.join(data_folder, \"**/train-images-idx3-ubyte.gz\"), recursive=True\n",
|
||||
" )[0],\n",
|
||||
" False,\n",
|
||||
" )\n",
|
||||
" / 255.0\n",
|
||||
")\n",
|
||||
"X_test = (\n",
|
||||
" load_data(\n",
|
||||
" glob.glob(\n",
|
||||
" os.path.join(data_folder, \"**/t10k-images-idx3-ubyte.gz\"), recursive=True\n",
|
||||
" )[0],\n",
|
||||
" False,\n",
|
||||
" )\n",
|
||||
" / 255.0\n",
|
||||
")\n",
|
||||
"y_train = load_data(\n",
|
||||
" glob.glob(\n",
|
||||
" os.path.join(data_folder, \"**/train-labels-idx1-ubyte.gz\"), recursive=True\n",
|
||||
" )[0],\n",
|
||||
" True,\n",
|
||||
").reshape(-1)\n",
|
||||
"y_test = load_data(\n",
|
||||
" glob.glob(\n",
|
||||
" os.path.join(data_folder, \"**/t10k-labels-idx1-ubyte.gz\"), recursive=True\n",
|
||||
" )[0],\n",
|
||||
" True,\n",
|
||||
").reshape(-1)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# now let's show some randomly chosen images from the traininng set.\n",
|
||||
"count = 0\n",
|
||||
"sample_size = 30\n",
|
||||
"plt.figure(figsize=(16, 6))\n",
|
||||
"for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n",
|
||||
" count = count + 1\n",
|
||||
" plt.subplot(1, sample_size, count)\n",
|
||||
" plt.axhline(\"\")\n",
|
||||
" plt.axvline(\"\")\n",
|
||||
" plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n",
|
||||
" plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Train model and log metrics with MLflow\n",
|
||||
"\n",
|
||||
"You'll train the model using the code below. Note that you are using MLflow autologging to track metrics and log model artefacts.\n",
|
||||
"\n",
|
||||
"You'll be using the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from the [SciKit Learn framework](https://scikit-learn.org/) to classify the data.\n",
|
||||
"\n",
|
||||
"**Note: The model training takes approximately 2 minutes to complete.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612966046970
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create the model\n",
|
||||
"import mlflow\n",
|
||||
"import numpy as np\n",
|
||||
"from sklearn.linear_model import LogisticRegression\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"\n",
|
||||
"# connect to your workspace\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"# create experiment and start logging to a new run in the experiment\n",
|
||||
"experiment_name = \"azure-ml-in10-mins-tutorial\"\n",
|
||||
"\n",
|
||||
"# set up MLflow to track the metrics\n",
|
||||
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())\n",
|
||||
"mlflow.set_experiment(experiment_name)\n",
|
||||
"mlflow.autolog()\n",
|
||||
"\n",
|
||||
"# set up the Logistic regression model\n",
|
||||
"reg = 0.5\n",
|
||||
"clf = LogisticRegression(\n",
|
||||
" C=1.0 / reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# train the model\n",
|
||||
"with mlflow.start_run() as run:\n",
|
||||
" clf.fit(X_train, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## View Experiment\n",
|
||||
"In the left-hand menu in Azure Machine Learning Studio, select __Experiments__ and then select your experiment (azure-ml-in10-mins-tutorial). An experiment is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that experiment. If the name doesn't exist when you submit an experiment, if you select your run you will see various tabs containing metrics, logs, explanations, etc.\n",
|
||||
"\n",
|
||||
"## Version control your models with the model registry\n",
|
||||
"\n",
|
||||
"You can use model registration to store and version your models in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. The code below registers and versions the model you trained above. Once you have executed the code cell below you will be able to see the model in the registry by selecting __Models__ in the left-hand menu in Azure Machine Learning Studio."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612881042710
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# register the model\n",
|
||||
"model_uri = \"runs:/{}/model\".format(run.info.run_id)\n",
|
||||
"model = mlflow.register_model(model_uri, \"sklearn_mnist_model\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Deploy the model for real-time inference\n",
|
||||
"In this section you learn how to deploy a model so that an application can consume (inference) the model over REST.\n",
|
||||
"\n",
|
||||
"### Create deployment configuration\n",
|
||||
"The code cell gets a _curated environment_, which specifies all the dependencies required to host the model (for example, the packages like scikit-learn). Also, you create a _deployment configuration_, which specifies the amount of compute required to host the model. In this case, the compute will have 1CPU and 1GB memory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612881061728
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create environment for the deploy\n",
|
||||
"from azureml.core.environment import Environment\n",
|
||||
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||
"from azureml.core.webservice import AciWebservice\n",
|
||||
"\n",
|
||||
"# get a curated environment\n",
|
||||
"env = Environment.get(\n",
|
||||
" workspace=ws, \n",
|
||||
" name=\"AzureML-sklearn-1.0-ubuntu20.04-py38-cpu\",\n",
|
||||
" version=1\n",
|
||||
")\n",
|
||||
"env.inferencing_stack_version='latest'\n",
|
||||
"\n",
|
||||
"# create deployment config i.e. compute resources\n",
|
||||
"aciconfig = AciWebservice.deploy_configuration(\n",
|
||||
" cpu_cores=1,\n",
|
||||
" memory_gb=1,\n",
|
||||
" tags={\"data\": \"MNIST\", \"method\": \"sklearn\"},\n",
|
||||
" description=\"Predict MNIST with sklearn\",\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Deploy model\n",
|
||||
"\n",
|
||||
"This next code cell deploys the model to Azure Container Instance (ACI).\n",
|
||||
"\n",
|
||||
"**Note: The deployment takes approximately 3 minutes to complete.**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"%%time\n",
|
||||
"import uuid\n",
|
||||
"from azureml.core.model import InferenceConfig\n",
|
||||
"from azureml.core.environment import Environment\n",
|
||||
"from azureml.core.model import Model\n",
|
||||
"\n",
|
||||
"# get the registered model\n",
|
||||
"model = Model(ws, \"sklearn_mnist_model\")\n",
|
||||
"\n",
|
||||
"# create an inference config i.e. the scoring script and environment\n",
|
||||
"inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)\n",
|
||||
"\n",
|
||||
"# deploy the service\n",
|
||||
"service_name = \"sklearn-mnist-svc-\" + str(uuid.uuid4())[:4]\n",
|
||||
"service = Model.deploy(\n",
|
||||
" workspace=ws,\n",
|
||||
" name=service_name,\n",
|
||||
" models=[model],\n",
|
||||
" inference_config=inference_config,\n",
|
||||
" deployment_config=aciconfig,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"service.wait_for_deployment(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The [*scoring script*](score.py) file referenced in the code above can be found in the same folder as this notebook, and has two functions:\n",
|
||||
"\n",
|
||||
"1. an `init` function that executes once when the service starts - in this function you normally get the model from the registry and set global variables\n",
|
||||
"1. a `run(data)` function that executes each time a call is made to the service. In this function, you normally format the input data, run a prediction, and output the predicted result.\n",
|
||||
"\n",
|
||||
"### View Endpoint\n",
|
||||
"Once the model has been successfully deployed, you can view the endpoint by navigating to __Endpoints__ in the left-hand menu in Azure Machine Learning Studio. You will be able to see the state of the endpoint (healthy/unhealthy), logs, and consume (how applications can consume the model)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Test the model service\n",
|
||||
"\n",
|
||||
"You can test the model by sending a raw HTTP request to test the web service. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612881538381
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# send raw HTTP request to test the web service.\n",
|
||||
"import requests\n",
|
||||
"\n",
|
||||
"# send a random row from the test set to score\n",
|
||||
"random_index = np.random.randint(0, len(X_test) - 1)\n",
|
||||
"input_data = '{\"data\": [' + str(list(X_test[random_index])) + \"]}\"\n",
|
||||
"\n",
|
||||
"headers = {\"Content-Type\": \"application/json\"}\n",
|
||||
"\n",
|
||||
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||
"\n",
|
||||
"print(\"POST to url\", service.scoring_uri)\n",
|
||||
"print(\"label:\", y_test[random_index])\n",
|
||||
"print(\"prediction:\", resp.text)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Clean up resources\n",
|
||||
"\n",
|
||||
"If you're not going to continue to use this model, delete the Model service using:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612881556520
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# if you want to keep workspace and only delete endpoint (it will incur cost while running)\n",
|
||||
"service.delete()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If you want to control cost further, stop the compute instance by selecting the \"Stop compute\" button next to the **Compute** dropdown. Then start the compute instance again the next time you need it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Next Steps\n",
|
||||
"\n",
|
||||
"In this quickstart, you learned how to run machine learning code in Azure Machine Learning.\n",
|
||||
"\n",
|
||||
"Now that you have working code in a development environment, learn how to submit a **_job_** - ideally on a schedule or trigger (for example, arrival of new data).\n",
|
||||
"\n",
|
||||
" [**Learn how to get started with Azure ML Job Submission**](../quickstart-azureml-python-sdk/quickstart-azureml-python-sdk.ipynb) "
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "cewidste"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.9"
|
||||
},
|
||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
||||
"nteract": {
|
||||
"version": "nteract-front-end@1.0.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,11 +0,0 @@
|
||||
name: quickstart-azureml-in-10mins
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- sklearn
|
||||
- numpy
|
||||
- matplotlib
|
||||
- joblib
|
||||
- uuid
|
||||
- requests
|
||||
- azureml-opendatasets
|
||||
@@ -1,21 +0,0 @@
|
||||
import json
|
||||
import numpy as np
|
||||
import os
|
||||
import joblib
|
||||
|
||||
|
||||
def init():
|
||||
global model
|
||||
# AZUREML_MODEL_DIR is an environment variable created during deployment.
|
||||
# It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
|
||||
# For multiple models, it points to the folder containing all deployed models (./azureml-models)
|
||||
model_path = os.path.join(os.getenv("AZUREML_MODEL_DIR"), "model/model.pkl")
|
||||
model = joblib.load(model_path)
|
||||
|
||||
|
||||
def run(raw_data):
|
||||
data = np.array(json.loads(raw_data)["data"])
|
||||
# make prediction
|
||||
y_hat = model.predict(data)
|
||||
# you can return any data type as long as it is JSON-serializable
|
||||
return y_hat.tolist()
|
||||
@@ -1,24 +0,0 @@
|
||||
import gzip
|
||||
import numpy as np
|
||||
import struct
|
||||
|
||||
|
||||
# load compressed MNIST gz files and return numpy arrays
|
||||
def load_data(filename, label=False):
|
||||
with gzip.open(filename) as gz:
|
||||
struct.unpack("I", gz.read(4))
|
||||
n_items = struct.unpack(">I", gz.read(4))
|
||||
if not label:
|
||||
n_rows = struct.unpack(">I", gz.read(4))[0]
|
||||
n_cols = struct.unpack(">I", gz.read(4))[0]
|
||||
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
|
||||
res = res.reshape(n_items[0], n_rows * n_cols)
|
||||
else:
|
||||
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
|
||||
res = res.reshape(n_items[0], 1)
|
||||
return res
|
||||
|
||||
|
||||
# one-hot encode a 1-D array
|
||||
def one_hot_encode(array, num_of_classes):
|
||||
return np.eye(num_of_classes)[array.reshape(-1)]
|
||||
@@ -1,355 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# Quickstart: Learn how to submit batch jobs with the Azure Machine Learning Python SDK\n",
|
||||
"\n",
|
||||
"In this quickstart, you learn how to submit a batch training job using the Python SDK. In this example, we submit the job to the 'local' machine (the compute instance you are running this notebook on). However, you can use exactly the same method to submit the job to different compute targets (for example, AKS, Azure Machine Learning Compute Cluster, Synapse, etc) by changing a single line of code. A full list of support compute targets can be viewed [here](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target). \n",
|
||||
"\n",
|
||||
"This quickstart trains a simple logistic regression using the [MNIST](https://docs.microsoft.com/azure/open-datasets/dataset-mnist) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n",
|
||||
"\n",
|
||||
"You will learn how to:\n",
|
||||
"\n",
|
||||
"> * Download a dataset and look at the data\n",
|
||||
"> * Train an image classification model by submitting a batch job to a compute resource\n",
|
||||
"> * Use MLflow autologging to track model metrics and log the model artefact\n",
|
||||
"> * Review training results, find and register the best model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Connect to your workspace and create an experiment\n",
|
||||
"\n",
|
||||
"You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612965838618
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"from azureml.core import Workspace\n",
|
||||
"from azureml.core import Experiment\n",
|
||||
"\n",
|
||||
"# connect to your workspace\n",
|
||||
"ws = Workspace.from_config()\n",
|
||||
"\n",
|
||||
"experiment_name = \"get-started-with-jobsubmission-tutorial\"\n",
|
||||
"exp = Experiment(workspace=ws, name=experiment_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### The MNIST dataset\n",
|
||||
"\n",
|
||||
"Use Azure Open Datasets to get the raw MNIST data files. [Azure Open Datasets](https://docs.microsoft.com/azure/open-datasets/overview-what-are-open-datasets) are curated public datasets that you can use to add scenario-specific features to machine learning solutions for more accurate models. Each dataset has a corresponding class, `MNIST` in this case, to retrieve the data in different ways.\n",
|
||||
"\n",
|
||||
"Follow this [how-to](https://aka.ms/azureml/howto/createdatasets) if you want to learn more about Datasets and how to use them.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612965850391
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.opendatasets import MNIST\n",
|
||||
"\n",
|
||||
"mnist_file_dataset = MNIST.get_file_dataset()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Define the Environment\n",
|
||||
"An Environment defines Python packages, environment variables, and Docker settings that are used in machine learning experiments. Here you will be using a curated environment that has already been made available through the workspace. \n",
|
||||
"\n",
|
||||
"Read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) if you want to learn more about Environments and how to use them."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612965877458
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core.environment import Environment\n",
|
||||
"\n",
|
||||
"# use a curated environment that has already been built for you\n",
|
||||
"\n",
|
||||
"env = Environment.get(workspace=ws, \n",
|
||||
" name=\"AzureML-Scikit-learn0.24-Cuda11-OpenMpi4.1.0-py36\", \n",
|
||||
" version=1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Configure the training job\n",
|
||||
"\n",
|
||||
"Create a [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py) object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Configure the ScriptRunConfig by specifying:\n",
|
||||
"\n",
|
||||
"* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
|
||||
"* The compute target. In this case you will point to local compute\n",
|
||||
"* The training script name, train.py\n",
|
||||
"* An environment that contains the libraries needed to run the script\n",
|
||||
"* Arguments required from the training script. \n",
|
||||
"\n",
|
||||
"In this run we will be submitting to \"local\", which is the compute instance you are running this notebook. If you have another compute target (for example: AKS, Azure ML Compute Cluster, Azure Databricks, etc) then you just need to change the `compute_target` argument below. You can learn more about other compute targets [here](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets). "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612965882781
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.core import ScriptRunConfig\n",
|
||||
"\n",
|
||||
"args = [\"--data-folder\", mnist_file_dataset.as_mount(), \"--regularization\", 0.5]\n",
|
||||
"\n",
|
||||
"src = ScriptRunConfig(\n",
|
||||
" source_directory=\"src\",\n",
|
||||
" script=\"train.py\",\n",
|
||||
" arguments=args,\n",
|
||||
" compute_target=\"local\",\n",
|
||||
" environment=env,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"### Submit the job\n",
|
||||
"\n",
|
||||
"Run the experiment by submitting the ScriptRunConfig object. After this there are many options for monitoring your run. Once submitted, you can either navigate to the experiment \"get-started-with-jobsubmission-tutorial\" in the left menu item __Experiments__ to monitor the run, or you can monitor the run inline as the `run.wait_for_completion(show_output=True)` will stream the logs of the run. You will see that the environment is built for you to ensure reproducibility - this adds a couple of minutes to the run time. On subsequent runs, the environment is re-used making the runtime shorter."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612965911435
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"run = exp.submit(config=src)\n",
|
||||
"run.wait_for_completion(show_output=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Register model\n",
|
||||
"\n",
|
||||
"The training script used the MLflow autologging feature and therefore the model was captured and stored on your behalf. Below we register the model into the Azure Machine Learning Model registry, which lets you keep track of all the models in your Azure Machine Learning workspace.\n",
|
||||
"\n",
|
||||
"Models are identified by name and version. Each time you register a model with the same name as an existing one, the registry assumes that it's a new version. The version is incremented, and the new model is registered under the same name.\n",
|
||||
"\n",
|
||||
"When you register the model, you can provide additional metadata tags and then use the tags when you search for models."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"gather": {
|
||||
"logged": 1612966068862
|
||||
},
|
||||
"jupyter": {
|
||||
"outputs_hidden": false,
|
||||
"source_hidden": false
|
||||
},
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# register model\n",
|
||||
"model = run.register_model(\n",
|
||||
" model_name=\"sklearn_mnist\", model_path=\"model/model.pkl\"\n",
|
||||
")\n",
|
||||
"print(model.name, model.id, model.version, sep=\"\\t\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You will now be able to see the model in the regsitry by selecting __Models__ in the left-hand menu of the Azure Machine Learning Studio."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"nteract": {
|
||||
"transient": {
|
||||
"deleting": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"## Control Cost\n",
|
||||
"\n",
|
||||
"If you want to control cost you can stop the compute instance this notebook is running on by clicking the \"Stop compute\" button next to the status dropdown in the menu above.\n",
|
||||
"\n",
|
||||
" ## Next Steps\n",
|
||||
"\n",
|
||||
"In this quickstart, you have seen how to run jobs-based machine learning code in Azure Machine Learning. \n",
|
||||
"\n",
|
||||
"It is also possible to use automated machine learning in Azure Machine Learning service to find the best model in an automated fashion. To see how this works, we recommend that you follow the next quickstart in this series, [**Fraud Classification using Automated ML**](../quickstart-azureml-automl/quickstart-azureml-automl.ipynb). This quickstart is focused on AutoML using the Python SDK."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "cewidste"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.9"
|
||||
},
|
||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
|
||||
"nteract": {
|
||||
"version": "nteract-front-end@1.0.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
@@ -1,12 +0,0 @@
|
||||
name: quickstart-azureml-python-sdk
|
||||
dependencies:
|
||||
- pip:
|
||||
- azureml-sdk
|
||||
- sklearn
|
||||
- numpy
|
||||
- matplotlib
|
||||
- joblib
|
||||
- uuid
|
||||
- requests
|
||||
- azureml-opendatasets
|
||||
- azureml-widgets
|
||||
@@ -1,72 +0,0 @@
|
||||
import argparse
|
||||
import os
|
||||
import numpy as np
|
||||
import glob
|
||||
# import joblib
|
||||
import mlflow
|
||||
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from utils import load_data
|
||||
|
||||
# let user feed in 2 parameters, the dataset to mount or download,
|
||||
# and the regularization rate of the logistic regression model
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"--data-folder", type=str, dest="data_folder", help="data folder mounting point"
|
||||
)
|
||||
parser.add_argument(
|
||||
"--regularization", type=float, dest="reg", default=0.01, help="regularization rate"
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
data_folder = args.data_folder
|
||||
print("Data folder:", data_folder)
|
||||
|
||||
# load train and test set into numpy arrays
|
||||
# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.
|
||||
X_train = (
|
||||
load_data(
|
||||
glob.glob(
|
||||
os.path.join(data_folder, "**/train-images-idx3-ubyte.gz"), recursive=True
|
||||
)[0],
|
||||
False,
|
||||
) / 255.0
|
||||
)
|
||||
X_test = (
|
||||
load_data(
|
||||
glob.glob(
|
||||
os.path.join(data_folder, "**/t10k-images-idx3-ubyte.gz"), recursive=True
|
||||
)[0],
|
||||
False,
|
||||
) / 255.0
|
||||
)
|
||||
y_train = load_data(
|
||||
glob.glob(
|
||||
os.path.join(data_folder, "**/train-labels-idx1-ubyte.gz"), recursive=True
|
||||
)[0],
|
||||
True,
|
||||
).reshape(-1)
|
||||
y_test = load_data(
|
||||
glob.glob(
|
||||
os.path.join(data_folder, "**/t10k-labels-idx1-ubyte.gz"), recursive=True
|
||||
)[0],
|
||||
True,
|
||||
).reshape(-1)
|
||||
|
||||
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep="\n")
|
||||
|
||||
# use mlflow autologging
|
||||
mlflow.autolog()
|
||||
|
||||
print("Train a logistic regression model with regularization rate of", args.reg)
|
||||
clf = LogisticRegression(
|
||||
C=1.0 / args.reg, solver="liblinear", multi_class="auto", random_state=42
|
||||
)
|
||||
clf.fit(X_train, y_train)
|
||||
|
||||
print("Predict the test set")
|
||||
y_hat = clf.predict(X_test)
|
||||
|
||||
# calculate accuracy on the prediction
|
||||
acc = np.average(y_hat == y_test)
|
||||
print("Accuracy is", acc)
|
||||
@@ -1,24 +0,0 @@
|
||||
import gzip
|
||||
import numpy as np
|
||||
import struct
|
||||
|
||||
|
||||
# load compressed MNIST gz files and return numpy arrays
|
||||
def load_data(filename, label=False):
|
||||
with gzip.open(filename) as gz:
|
||||
struct.unpack("I", gz.read(4))
|
||||
n_items = struct.unpack(">I", gz.read(4))
|
||||
if not label:
|
||||
n_rows = struct.unpack(">I", gz.read(4))[0]
|
||||
n_cols = struct.unpack(">I", gz.read(4))[0]
|
||||
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
|
||||
res = res.reshape(n_items[0], n_rows * n_cols)
|
||||
else:
|
||||
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
|
||||
res = res.reshape(n_items[0], 1)
|
||||
return res
|
||||
|
||||
|
||||
# one-hot encode a 1-D array
|
||||
def one_hot_encode(array, num_of_classes):
|
||||
return np.eye(num_of_classes)[array.reshape(-1)]
|
||||
Reference in New Issue
Block a user