update samples from Release-6 as a part of 1.3.0 SDK stable release

This commit is contained in:
vizhur
2020-04-13 16:22:23 +00:00
parent c520bd1d41
commit 057e22b253
83 changed files with 3024 additions and 1249 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 62 KiB

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 19 KiB

After

Width:  |  Height:  |  Size: 26 KiB

View File

@@ -31,7 +31,7 @@
"\n",
"> * Connect your workspace and create an experiment \n",
"> * Load data and train a scikit-learn model\n",
"> * View training results in the portal\n",
"> * View training results in the studio\n",
"> * Retrieve the best model"
]
},
@@ -74,7 +74,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Portal. Parameters include your workspace reference, and a string name for the experiment."
"Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Machine Learning studio. Parameters include your workspace reference, and a string name for the experiment."
]
},
{
@@ -171,7 +171,7 @@
"\n",
"1. For each alpha hyperparameter value in the `alphas` array, a new run is created within the experiment. The alpha value is logged to differentiate between each run.\n",
"1. In each run, a Ridge model is instantiated, trained, and used to run predictions. The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. At this point the run has metadata attached for both the alpha value and the rmse accuracy.\n",
"1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the portal.\n",
"1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the studio.\n",
"1. At the end of each iteration the run is completed by calling `run.complete()`.\n",
"\n"
]
@@ -180,7 +180,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"After the training has completed, call the `experiment` variable to fetch a link to the experiment in the portal."
"After the training has completed, call the `experiment` variable to fetch a link to the experiment in the studio."
]
},
{
@@ -196,14 +196,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## View training results in portal"
"## View training results in studio"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Following the **Link to Azure Portal** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n",
"Following the **Link to Azure Machine Learning studio** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n",
"\n",
"When training models at scale over hundreds and thousands of runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time."
]
@@ -212,21 +212,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Main Experiment page in Portal](imgs/experiment_main.png)"
"![Main Experiment page in the studio](../imgs/experiment_main.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clicking on a run number link in the `RUN NUMBER` column takes you to the page for each individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually."
"Select a run number link in the `RUN NUMBER` column to see the page for an individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs + logs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Run details page in Portal](imgs/model_download.png)"
"![Run details page in the studio](../imgs/model_download.png)"
]
},
{
@@ -240,7 +240,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to being able to download model files from the experiment in the portal, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error."
"In addition to being able to download model files from the experiment in the studio, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error."
]
},
{
@@ -352,7 +352,7 @@
"\n",
"> * Connected your workspace and created an experiment\n",
"> * Loaded data and trained scikit-learn models\n",
"> * Viewed training results in the portal and retrieved models\n",
"> * Viewed training results in the studio and retrieved models\n",
"\n",
"[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning.\n",
"Learn how to develop [automated machine learning](https://docs.microsoft.com/azure/machine-learning/service/tutorial-auto-train-models) experiments."

View File

@@ -39,11 +39,7 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"register model from file"
]
},
"metadata": {},
"outputs": [],
"source": [
"# If you did NOT complete the tutorial, you can instead run this cell \n",
@@ -62,7 +58,19 @@
" model_name=model_name,\n",
" tags={\"data\": \"mnist\", \"model\": \"classification\"},\n",
" description=\"Mnist handwriting recognition\",\n",
" workspace=ws)"
" workspace=ws)\n",
"\n",
"from azureml.core.environment import Environment\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# to install required packages\n",
"env = Environment('tutorial-env')\n",
"cd = CondaDependencies.create(pip_packages=['azureml-dataprep[pandas,fuse]>=1.1.14', 'azureml-defaults'], conda_packages = ['scikit-learn==0.22.1'])\n",
"\n",
"env.python.conda_dependencies = cd\n",
"\n",
"# Register environment to re-use later\n",
"env.register(workspace = ws)"
]
},
{
@@ -98,190 +106,16 @@
"print(\"Azure ML SDK Version: \", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieve the model\n",
"\n",
"You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"load workspace",
"download model"
]
},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"import os \n",
"ws = Workspace.from_config()\n",
"model=Model(ws, 'sklearn_mnist')\n",
"\n",
"model.download(target_dir=os.getcwd(), exist_ok=True)\n",
"\n",
"# verify the downloaded model file\n",
"file_path = os.path.join(os.getcwd(), \"sklearn_mnist_model.pkl\")\n",
"\n",
"os.stat(file_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test model locally\n",
"\n",
"Before deploying, make sure your model is working locally by:\n",
"* Downloading the test data if you haven't already\n",
"* Loading test data\n",
"* Predicting test data\n",
"* Examining the confusion matrix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download test data\n",
"If you haven't already, download the test data to the **./data/** directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Dataset\n",
"from azureml.opendatasets import MNIST\n",
"\n",
"data_folder = os.path.join(os.getcwd(), 'data')\n",
"os.makedirs(data_folder, exist_ok=True)\n",
"\n",
"mnist_file_dataset = MNIST.get_file_dataset()\n",
"mnist_file_dataset.download(data_folder, overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load test data\n",
"\n",
"Load the test data from the **./data/** directory created during the training tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utils import load_data\n",
"import os\n",
"\n",
"data_folder = os.path.join(os.getcwd(), 'data')\n",
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n",
"X_test = load_data(os.path.join(data_folder, 't10k-images-idx3-ubyte.gz'), False) / 255.0\n",
"y_test = load_data(os.path.join(data_folder, 't10k-labels-idx1-ubyte.gz'), True).reshape(-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict test data\n",
"\n",
"Feed the test dataset to the model to get predictions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pickle\n",
"import joblib\n",
"\n",
"clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))\n",
"y_hat = clf.predict(X_test)\n",
"print(y_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Examine the confusion matrix\n",
"\n",
"Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"\n",
"conf_mx = confusion_matrix(y_test, y_hat)\n",
"print(conf_mx)\n",
"print('Overall accuracy:', np.average(y_hat == y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n",
"row_sums = conf_mx.sum(axis=1, keepdims=True)\n",
"norm_conf_mx = conf_mx / row_sums\n",
"np.fill_diagonal(norm_conf_mx, 0)\n",
"\n",
"fig = plt.figure(figsize=(8,5))\n",
"ax = fig.add_subplot(111)\n",
"cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n",
"ticks = np.arange(0, 10, 1)\n",
"ax.set_xticks(ticks)\n",
"ax.set_yticks(ticks)\n",
"ax.set_xticklabels(ticks)\n",
"ax.set_yticklabels(ticks)\n",
"fig.colorbar(cax)\n",
"plt.ylabel('true labels', fontsize=14)\n",
"plt.xlabel('predicted values', fontsize=14)\n",
"plt.savefig('conf.png')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Deploy as web service\n",
"\n",
"Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n",
"Deploy the model as a web service hosted in ACI. \n",
"\n",
"To build the correct environment for ACI, provide the following:\n",
"* A scoring script to show how to use the model\n",
"* An environment file to show what packages need to be installed\n",
"* A configuration file to build the ACI\n",
"* The model you trained before\n",
"\n",
@@ -324,52 +158,6 @@
" return y_hat.tolist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create environment file\n",
"\n",
"Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"set conda dependencies"
]
},
"outputs": [],
"source": [
"from azureml.core.conda_dependencies import CondaDependencies \n",
"\n",
"myenv = CondaDependencies()\n",
"myenv.add_conda_package(\"scikit-learn==0.22.1\")\n",
"myenv.add_pip_package(\"azureml-defaults\")\n",
"\n",
"with open(\"myenv.yml\",\"w\") as f:\n",
" f.write(myenv.serialize_to_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Review the content of the `myenv.yml` file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with open(\"myenv.yml\",\"r\") as f:\n",
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -432,6 +220,11 @@
"from azureml.core.webservice import Webservice\n",
"from azureml.core.model import InferenceConfig\n",
"from azureml.core.environment import Environment\n",
"from azureml.core import Workspace\n",
"from azureml.core.model import Model\n",
"\n",
"ws = Workspace.from_config()\n",
"model = Model(ws, 'sklearn_mnist')\n",
"\n",
"\n",
"myenv = Environment.get(workspace=ws, name=\"tutorial-env\", version=\"1\")\n",
@@ -470,14 +263,148 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test deployed service\n",
"## Test the model\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download test data\n",
"Download the test data to the **./data/** directory"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from azureml.core import Dataset\n",
"from azureml.opendatasets import MNIST\n",
"\n",
"data_folder = os.path.join(os.getcwd(), 'data')\n",
"os.makedirs(data_folder, exist_ok=True)\n",
"\n",
"mnist_file_dataset = MNIST.get_file_dataset()\n",
"mnist_file_dataset.download(data_folder, overwrite=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load test data\n",
"\n",
"Load the test data from the **./data/** directory created during the training tutorial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from utils import load_data\n",
"import os\n",
"\n",
"data_folder = os.path.join(os.getcwd(), 'data')\n",
"# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n",
"X_test = load_data(os.path.join(data_folder, 't10k-images-idx3-ubyte.gz'), False) / 255.0\n",
"y_test = load_data(os.path.join(data_folder, 't10k-labels-idx1-ubyte.gz'), True).reshape(-1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Predict test data\n",
"\n",
"Feed the test dataset to the model to get predictions.\n",
"\n",
"Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n",
"\n",
"The following code goes through these steps:\n",
"1. Send the data as a JSON array to the web service hosted in ACI. \n",
"\n",
"1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n",
"1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"test = json.dumps({\"data\": X_test.tolist()})\n",
"test = bytes(test, encoding='utf8')\n",
"y_hat = service.run(input_data=test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Examine the confusion matrix\n",
"\n",
"Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import confusion_matrix\n",
"\n",
"conf_mx = confusion_matrix(y_test, y_hat)\n",
"print(conf_mx)\n",
"print('Overall accuracy:', np.average(y_hat == y_test))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n",
"row_sums = conf_mx.sum(axis=1, keepdims=True)\n",
"norm_conf_mx = conf_mx / row_sums\n",
"np.fill_diagonal(norm_conf_mx, 0)\n",
"\n",
"fig = plt.figure(figsize=(8,5))\n",
"ax = fig.add_subplot(111)\n",
"cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n",
"ticks = np.arange(0, 10, 1)\n",
"ax.set_xticks(ticks)\n",
"ax.set_yticks(ticks)\n",
"ax.set_xticklabels(ticks)\n",
"ax.set_yticklabels(ticks)\n",
"fig.colorbar(cax)\n",
"plt.ylabel('true labels', fontsize=14)\n",
"plt.xlabel('predicted values', fontsize=14)\n",
"plt.savefig('conf.png')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Show predictions\n",
"\n",
"Test the deployed model with a random sample of 30 images from the test data. \n",
"\n",
"\n",
"1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n",
"\n",