diff --git a/README.md b/README.md index 9fff0fb4..a935eeef 100644 --- a/README.md +++ b/README.md @@ -1,17 +1,9 @@ ---- -page_type: sample -languages: -- python -products: -- azure -- azure-machine-learning-service -description: "With Azure Machine Learning service, learn to prep data, train, test, deploy, manage, and track machine learning models in a cloud-based environment." ---- - # Azure Machine Learning service example notebooks This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud. +![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/concept-azure-machine-learning-architecture/workflow.png) + ## Quick installation ```sh @@ -58,10 +50,13 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example --- + +## Community Repository +Visit this [community repository](https://github.com/microsoft/MLOps/tree/master/examples) to find useful end-to-end sample notebooks. Also, please follow these [contribution guidelines](https://github.com/microsoft/MLOps/blob/master/contributing.md) when contributing to this repository. + ## Projects using Azure Machine Learning Visit following repos to see projects contributed by Azure ML users: - - [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis. - [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT) - [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion) diff --git a/configuration.ipynb b/configuration.ipynb index d82c6131..cd34ebf9 100644 --- a/configuration.ipynb +++ b/configuration.ipynb @@ -103,7 +103,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.57 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.0.60 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb index 33d45416..46d8fdbc 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb @@ -225,7 +225,9 @@ "|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", "|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. " + "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. \n", + "\n", + "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the iteration_timeout_minutes parameter value to get results." ] }, { @@ -246,6 +248,7 @@ "\n", "automl_config = AutoMLConfig(task='forecasting', \n", " primary_metric='normalized_root_mean_squared_error',\n", + " blacklist_models = ['ExtremeRandomTrees'],\n", " iterations=10,\n", " iteration_timeout_minutes=5,\n", " X=X_train,\n", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb index 042ee804..958bbd96 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb @@ -463,7 +463,9 @@ "source": [ "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation.\n", "\n", - "Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features." + "Now that we configured target lags, that is the previous values of the target variables, and the prediction is no longer horizon-less. We therefore must still specify the `max_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n", + "\n", + "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the iteration_timeout_minutes parameter value to get results." ] }, { @@ -482,7 +484,7 @@ "automl_config_lags = AutoMLConfig(task='forecasting',\n", " debug_log='automl_nyc_energy_errors.log',\n", " primary_metric='normalized_root_mean_squared_error',\n", - " blacklist_models=['ElasticNet','ExtremeRandomTrees','GradientBoosting'],\n", + " blacklist_models=['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor'],\n", " iterations=10,\n", " iteration_timeout_minutes=10,\n", " X=X_train,\n", diff --git a/how-to-use-azureml/deployment/accelerated-models/NOTICE.txt b/how-to-use-azureml/deployment/accelerated-models/NOTICE.txt new file mode 100644 index 00000000..0451fc53 --- /dev/null +++ b/how-to-use-azureml/deployment/accelerated-models/NOTICE.txt @@ -0,0 +1,217 @@ + +NOTICES AND INFORMATION +Do Not Translate or Localize + +This Azure Machine Learning service example notebooks repository includes material from the projects listed below. + + +1. SSD-Tensorflow (https://github.com/balancap/ssd-tensorflow) + + +%% SSD-Tensorflow NOTICES AND INFORMATION BEGIN HERE +========================================= + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +========================================= +END OF SSD-Tensorflow NOTICES AND INFORMATION diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/README.md b/how-to-use-azureml/deployment/deploy-to-cloud/README.md new file mode 100644 index 00000000..46288c6e --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/README.md @@ -0,0 +1,12 @@ +# Model Deployment with Azure ML service +You can use Azure Machine Learning to package, debug, validate and deploy inference containers to a variety of compute targets. This process is known as "MLOps" (ML operationalization). +For more information please check out this article: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where + +## Get Started +To begin, you will need an ML workspace. +For more information please check out this article: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace + +## Deploy to the cloud +You can deploy to the cloud using the Azure ML CLI or the Azure ML SDK. +- CLI example: https://aka.ms/azmlcli +- Notebook example: [model-register-and-deploy](./model-register-and-deploy.ipynb). \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/helloworld.txt b/how-to-use-azureml/deployment/deploy-to-cloud/helloworld.txt new file mode 100644 index 00000000..a12521d8 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/helloworld.txt @@ -0,0 +1 @@ +RUN echo "this is test" \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb new file mode 100644 index 00000000..ef681642 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb @@ -0,0 +1,343 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register Model and deploy as Webservice\n", + "\n", + "This example shows how to deploy a Webservice in step-by-step fashion:\n", + "\n", + " 1. Register Model\n", + " 2. Deploy Model as Webservice" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your Models. Note you need to have a `sklearn_regression_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a Model with the same name `sklearn_regression_model.pkl` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n", + " model_name=\"sklearn_regression_model.pkl\",\n", + " tags={'area': \"diabetes\", 'type': \"regression\"},\n", + " description=\"Ridge regression model to predict diabetes\",\n", + " workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n", + "\n", + "More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "\n", + "env = Environment.from_conda_specification(name='deploytocloudenv', file_path='myenv.yml')\n", + "\n", + "# This is optional at this point\n", + "# env.register(workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Inference Configuration\n", + "\n", + "There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n", + "Note: in that case, your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.\n", + "\n", + "Sample code for using a source directory:\n", + "\n", + "```python\n", + "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", + " runtime= \"python\", \n", + " entry_script=\"x/y/score.py\",\n", + " conda_file=\"env/myenv.yml\", \n", + " extra_docker_file_steps=\"helloworld.txt\")\n", + "```\n", + "\n", + " - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", + " - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", + " - entry_script = contains logic specific to initializing your model and running predictions\n", + " - conda_file = manages conda and python package dependencies.\n", + " - extra_docker_file_steps = optional: any extra steps you want to inject into docker file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import InferenceConfig\n", + "\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy Model as Webservice on Azure Container Instance\n", + "\n", + "Note that the service creation can take few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "azuremlexception-remarks-sample" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice, Webservice\n", + "from azureml.exceptions import WebserviceException\n", + "\n", + "deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n", + "aci_service_name = 'aciservice1'\n", + "\n", + "try:\n", + " # if you want to get existing service below is the command\n", + " # since aci name needs to be unique in subscription deleting existing aci if any\n", + " # we use aci_service_name to create azure aci\n", + " service = Webservice(ws, name=aci_service_name)\n", + " if service:\n", + " service.delete()\n", + "except WebserviceException as e:\n", + " print()\n", + "\n", + "service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config)\n", + "\n", + "service.wait_for_deployment(True)\n", + "print(service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test web service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "test_sample = json.dumps({'data': [\n", + " [1,2,3,4,5,6,7,8,9,10], \n", + " [10,9,8,7,6,5,4,3,2,1]\n", + "]})\n", + "\n", + "test_sample_encoded = bytes(test_sample, encoding='utf8')\n", + "prediction = service.run(input_data=test_sample_encoded)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Delete ACI to clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Profiling\n", + "\n", + "You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n", + "\n", + "```python\n", + "profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n", + "profile.wait_for_profiling(True)\n", + "profiling_results = profile.get_results()\n", + "print(profiling_results)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Packaging\n", + "\n", + "If you want to build a Docker image that encapsulates your model and its dependencies, you can use the model packaging option. The output image will be pushed to your workspace's ACR.\n", + "\n", + "You must include an Environment object in your inference configuration to use `Model.package()`.\n", + "\n", + "```python\n", + "package = Model.package(ws, [model], inference_config)\n", + "package.wait_for_creation(show_output=True) # Or show_output=False to hide the Docker build logs.\n", + "package.pull()\n", + "```\n", + "\n", + "Instead of a fully-built image, you can also generate a Dockerfile and download all the assets needed to build an image on top of your Environment.\n", + "\n", + "```python\n", + "package = Model.package(ws, [model], inference_config, generate_dockerfile=True)\n", + "package.wait_for_creation(show_output=True)\n", + "package.save(\"./local_context_dir\")\n", + "```" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "aashishb" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.yml b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.yml new file mode 100644 index 00000000..ca6bae19 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.yml @@ -0,0 +1,4 @@ +name: model-register-and-deploy +dependencies: +- pip: + - azureml-sdk diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/myenv.yml b/how-to-use-azureml/deployment/deploy-to-cloud/myenv.yml new file mode 100644 index 00000000..36ee6703 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/myenv.yml @@ -0,0 +1,8 @@ +name: project_environment +dependencies: + - python=3.6.2 + - pip: + - azureml-defaults + - scikit-learn + - numpy + - inference-schema[numpy-support] diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/score.py b/how-to-use-azureml/deployment/deploy-to-cloud/score.py new file mode 100644 index 00000000..0086d27b --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-cloud/score.py @@ -0,0 +1,34 @@ +import pickle +import json +import numpy as np +from sklearn.externals import joblib +from sklearn.linear_model import Ridge +from azureml.core.model import Model + +from inference_schema.schema_decorators import input_schema, output_schema +from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType + + +def init(): + global model + # note here "sklearn_regression_model.pkl" is the name of the model registered under + # this is a different behavior than before when the code is run locally, even though the code is the same. + model_path = Model.get_model_path('sklearn_regression_model.pkl') + # deserialize the model file back into a sklearn model + model = joblib.load(model_path) + + +input_sample = np.array([[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]) +output_sample = np.array([3726.995]) + + +@input_schema('data', NumpyParameterType(input_sample)) +@output_schema(NumpyParameterType(output_sample)) +def run(data): + try: + result = model.predict(data) + # you can return any datatype as long as it is JSON-serializable + return result.tolist() + except Exception as e: + error = str(e) + return error diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/sklearn_regression_model.pkl b/how-to-use-azureml/deployment/deploy-to-cloud/sklearn_regression_model.pkl new file mode 100644 index 00000000..d10309b6 Binary files /dev/null and b/how-to-use-azureml/deployment/deploy-to-cloud/sklearn_regression_model.pkl differ diff --git a/how-to-use-azureml/deployment/deploy-to-local/README.md b/how-to-use-azureml/deployment/deploy-to-local/README.md new file mode 100644 index 00000000..9f08e5cd --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/README.md @@ -0,0 +1,12 @@ +# Model Deployment with Azure ML service +You can use Azure Machine Learning to package, debug, validate and deploy inference containers to a variety of compute targets. This process is known as "MLOps" (ML operationalization). +For more information please check out this article: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where + +## Get Started +To begin, you will need an ML workspace. +For more information please check out this article: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace + +## Deploy locally +You can deploy a model locally for testing & debugging using the Azure ML CLI or the Azure ML SDK. +- CLI example: https://aka.ms/azmlcli +- Notebook example: [register-model-deploy-local](./register-model-deploy-local.ipynb). \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-local/helloworld.txt b/how-to-use-azureml/deployment/deploy-to-local/helloworld.txt new file mode 100644 index 00000000..a12521d8 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/helloworld.txt @@ -0,0 +1 @@ +RUN echo "this is test" \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-local/myenv.yml b/how-to-use-azureml/deployment/deploy-to-local/myenv.yml new file mode 100644 index 00000000..36ee6703 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/myenv.yml @@ -0,0 +1,8 @@ +name: project_environment +dependencies: + - python=3.6.2 + - pip: + - azureml-defaults + - scikit-learn + - numpy + - inference-schema[numpy-support] diff --git a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb new file mode 100644 index 00000000..b0461399 --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb @@ -0,0 +1,488 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register model and deploy locally with advanced usages\n", + "\n", + "This example shows how to deploy a web service in step-by-step fashion:\n", + "\n", + " 1. Register model\n", + " 2. Deploy the image as a web service in a local Docker container.\n", + " 3. Quickly test changes to your entry script by reloading the local service.\n", + " 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n", + " model_name=\"sklearn_regression_model.pkl\",\n", + " tags={'area': \"diabetes\", 'type': \"regression\"},\n", + " description=\"Ridge regression model to predict diabetes\",\n", + " workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Manage your dependencies in a folder" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "source_directory = \"C:/abc\"\n", + "\n", + "os.makedirs(source_directory, exist_ok=True)\n", + "os.makedirs(\"C:/abc/x/y\", exist_ok=True)\n", + "os.makedirs(\"C:/abc/env\", exist_ok=True)\n", + "os.makedirs(\"C:/abc/dockerstep\", exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_regression_model.pkl` registered under the workspace. It is NOT referencing the local file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile C:/abc/x/y/score.py\n", + "import pickle\n", + "import json\n", + "import numpy as np\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "from inference_schema.schema_decorators import input_schema, output_schema\n", + "from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + " global name\n", + " # note here, entire source directory on inference config gets added into image\n", + " # bellow is the example how you can use any extra files in image\n", + " with open('./abc/extradata.json') as json_file: \n", + " data = json.load(json_file)\n", + " name = data[\"people\"][0][\"name\"]\n", + "\n", + "input_sample = np.array([[10,9,8,7,6,5,4,3,2,1]])\n", + "output_sample = np.array([3726.995])\n", + "\n", + "@input_schema('data', NumpyParameterType(input_sample))\n", + "@output_schema(NumpyParameterType(output_sample))\n", + "def run(data):\n", + " try:\n", + " result = model.predict(data)\n", + " # you can return any datatype as long as it is JSON-serializable\n", + " return \"Hello \" + name + \" here is your result = \" + str(result)\n", + " except Exception as e:\n", + " error = str(e)\n", + " return error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile C:/abc/env/myenv.yml\n", + "name: project_environment\n", + "dependencies:\n", + " - python=3.6.2\n", + " - pip:\n", + " - azureml-defaults\n", + " - scikit-learn\n", + " - numpy\n", + " - inference-schema[numpy-support]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile C:/abc/dockerstep/customDockerStep.txt\n", + "RUN echo \"this is test\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile C:/abc/extradata.json\n", + "{\n", + " \"people\": [\n", + " {\n", + " \"website\": \"microsoft.com\", \n", + " \"from\": \"Seattle\", \n", + " \"name\": \"Mrudula\"\n", + " }\n", + " ]\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Inference Configuration\n", + "\n", + " - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", + " - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", + " - entry_script = contains logic specific to initializing your model and running predictions\n", + " - conda_file = manages conda and python package dependencies.\n", + " - extra_docker_file_steps = optional: any extra steps you want to inject into docker file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import InferenceConfig\n", + "\n", + "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", + " runtime=\"python\", \n", + " entry_script=\"x/y/score.py\",\n", + " conda_file=\"env/myenv.yml\", \n", + " extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy Model as a Local Docker Web Service\n", + "\n", + "*Make sure you have Docker installed and running.*\n", + "\n", + "Note that the service creation can take few minutes.\n", + "\n", + "NOTE:\n", + "\n", + "The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running:\n", + "\n", + " # PowerShell command to switch to Linux engine\n", + " & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import LocalWebservice\n", + "\n", + "# This is optional, if not provided Docker will choose a random unused port.\n", + "deployment_config = LocalWebservice.deploy_configuration(port=6789)\n", + "\n", + "local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n", + "\n", + "local_service.wait_for_deployment()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Local service port: {}'.format(local_service.port))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check Status and Get Container Logs\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(local_service.get_logs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test Web Service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call the web service with some input data to get a prediction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "sample_input = json.dumps({\n", + " 'data': [\n", + " [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", + " [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]\n", + " ]\n", + "})\n", + "\n", + "sample_input = bytes(sample_input, encoding='utf-8')\n", + "\n", + "print(local_service.run(input_data=sample_input))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reload Service\n", + "\n", + "You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile C:/abc/x/y/score.py\n", + "import pickle\n", + "import json\n", + "import numpy as np\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "from inference_schema.schema_decorators import input_schema, output_schema\n", + "from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + " global name, from_location\n", + " # note here, entire source directory on inference config gets added into image\n", + " # bellow is the example how you can use any extra files in image\n", + " with open('./abc/extradata.json') as json_file: \n", + " data = json.load(json_file)\n", + " name = data[\"people\"][0][\"name\"]\n", + " from_location = data[\"people\"][0][\"from\"]\n", + "\n", + "input_sample = np.array([[10,9,8,7,6,5,4,3,2,1]])\n", + "output_sample = np.array([3726.995])\n", + "\n", + "@input_schema('data', NumpyParameterType(input_sample))\n", + "@output_schema(NumpyParameterType(output_sample))\n", + "def run(data):\n", + " try:\n", + " result = model.predict(data)\n", + " # you can return any datatype as long as it is JSON-serializable\n", + " return \"Hello \" + name + \" from \" + from_location + \" here is your result = \" + str(result)\n", + " except Exception as e:\n", + " error = str(e)\n", + " return error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_service.reload()\n", + "print(\"--------------------------------------------------------------\")\n", + "\n", + "# After calling reload(), run() will return the updated message.\n", + "local_service.run(input_data=sample_input)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Update Service\n", + "\n", + "If you want to change your model(s), Conda dependencies, or deployment configuration, call `update()` to rebuild the Docker image.\n", + "\n", + "```python\n", + "\n", + "local_service.update(models=[SomeOtherModelObject],\n", + " deployment_config=local_config,\n", + " inference_config=inference_config)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Delete Service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "keriehm" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb new file mode 100644 index 00000000..fc3e541f --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb @@ -0,0 +1,361 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register model and deploy locally\n", + "\n", + "This example shows how to deploy a web service in step-by-step fashion:\n", + "\n", + " 1. Register model\n", + " 2. Deploy the image as a web service in a local Docker container.\n", + " 3. Quickly test changes to your entry script by reloading the local service.\n", + " 4. Optionally, you can also make changes to model, conda or extra_docker_file_steps and update local service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your models. we are using `sklearn_regression_model.pkl` file in the current directory as a model with the same name `sklearn_regression_model.pkl` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model, framework, category, target customer etc. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path=\"sklearn_regression_model.pkl\",\n", + " model_name=\"sklearn_regression_model.pkl\",\n", + " tags={'area': \"diabetes\", 'type': \"regression\"},\n", + " description=\"Ridge regression model to predict diabetes\",\n", + " workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Environment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.environment import Environment\n", + "\n", + "environment = Environment(\"LocalDeploy\")\n", + "environment.python.conda_dependencies = CondaDependencies(\"myenv.yml\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Inference Configuration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import InferenceConfig\n", + "\n", + "inference_config = InferenceConfig(entry_script=\"score.py\",\n", + " environment=environment)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy Model as a Local Docker Web Service\n", + "\n", + "*Make sure you have Docker installed and running.*\n", + "\n", + "Note that the service creation can take few minutes.\n", + "\n", + "NOTE:\n", + "\n", + "The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running:\n", + "\n", + " # PowerShell command to switch to Linux engine\n", + " & 'C:\\Program Files\\Docker\\Docker\\DockerCli.exe' -SwitchLinuxEngine" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import LocalWebservice\n", + "\n", + "# This is optional, if not provided Docker will choose a random unused port.\n", + "deployment_config = LocalWebservice.deploy_configuration(port=6789)\n", + "\n", + "local_service = Model.deploy(ws, \"test\", [model], inference_config, deployment_config)\n", + "\n", + "local_service.wait_for_deployment()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Local service port: {}'.format(local_service.port))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Check Status and Get Container Logs\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(local_service.get_logs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test Web Service" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call the web service with some input data to get a prediction." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "\n", + "sample_input = json.dumps({\n", + " 'data': [\n", + " [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", + " [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]\n", + " ]\n", + "})\n", + "\n", + "sample_input = bytes(sample_input, encoding='utf-8')\n", + "\n", + "local_service.run(input_data=sample_input)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Reload Service\n", + "\n", + "You can update your score.py file and then call `reload()` to quickly restart the service. This will only reload your execution script and dependency files, it will not rebuild the underlying Docker image. As a result, `reload()` is fast, but if you do need to rebuild the image -- to add a new Conda or pip package, for instance -- you will have to call `update()`, instead (see below)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import pickle\n", + "import json\n", + "import numpy as np\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import Ridge\n", + "from azureml.core.model import Model\n", + "\n", + "from inference_schema.schema_decorators import input_schema, output_schema\n", + "from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType\n", + "\n", + "def init():\n", + " global model\n", + " # note here \"sklearn_regression_model.pkl\" is the name of the model registered under\n", + " # this is a different behavior than before when the code is run locally, even though the code is the same.\n", + " model_path = Model.get_model_path('sklearn_regression_model.pkl')\n", + " # deserialize the model file back into a sklearn model\n", + " model = joblib.load(model_path)\n", + "\n", + "input_sample = np.array([[10,9,8,7,6,5,4,3,2,1]])\n", + "output_sample = np.array([3726.995])\n", + "\n", + "@input_schema('data', NumpyParameterType(input_sample))\n", + "@output_schema(NumpyParameterType(output_sample))\n", + "def run(data):\n", + " try:\n", + " result = model.predict(data)\n", + " # you can return any datatype as long as it is JSON-serializable\n", + " return 'hello from updated score.py'\n", + " except Exception as e:\n", + " error = str(e)\n", + " return error" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_service.reload()\n", + "print(\"--------------------------------------------------------------\")\n", + "\n", + "# After calling reload(), run() will return the updated message.\n", + "local_service.run(input_data=sample_input)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Update Service\n", + "\n", + "If you want to change your model(s), Conda dependencies, or deployment configuration, call `update()` to rebuild the Docker image.\n", + "\n", + "```python\n", + "local_service.update(models=[SomeOtherModelObject],\n", + " inference_config=inference_config,\n", + " deployment_config=local_config)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Delete Service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "local_service.delete()" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "keriehm" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-local/score.py b/how-to-use-azureml/deployment/deploy-to-local/score.py new file mode 100644 index 00000000..0086d27b --- /dev/null +++ b/how-to-use-azureml/deployment/deploy-to-local/score.py @@ -0,0 +1,34 @@ +import pickle +import json +import numpy as np +from sklearn.externals import joblib +from sklearn.linear_model import Ridge +from azureml.core.model import Model + +from inference_schema.schema_decorators import input_schema, output_schema +from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType + + +def init(): + global model + # note here "sklearn_regression_model.pkl" is the name of the model registered under + # this is a different behavior than before when the code is run locally, even though the code is the same. + model_path = Model.get_model_path('sklearn_regression_model.pkl') + # deserialize the model file back into a sklearn model + model = joblib.load(model_path) + + +input_sample = np.array([[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]) +output_sample = np.array([3726.995]) + + +@input_schema('data', NumpyParameterType(input_sample)) +@output_schema(NumpyParameterType(output_sample)) +def run(data): + try: + result = model.predict(data) + # you can return any datatype as long as it is JSON-serializable + return result.tolist() + except Exception as e: + error = str(e) + return error diff --git a/how-to-use-azureml/deployment/deploy-to-local/sklearn_regression_model.pkl b/how-to-use-azureml/deployment/deploy-to-local/sklearn_regression_model.pkl new file mode 100644 index 00000000..d10309b6 Binary files /dev/null and b/how-to-use-azureml/deployment/deploy-to-local/sklearn_regression_model.pkl differ diff --git a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/AzureMachineLearningCycle.png b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/AzureMachineLearningCycle.png new file mode 100644 index 00000000..52de479f Binary files /dev/null and b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/AzureMachineLearningCycle.png differ diff --git a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/explanations-run-history.png b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/explanations-run-history.png new file mode 100644 index 00000000..a58ef3b6 Binary files /dev/null and b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/img/explanations-run-history.png differ diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/img/azure-machine-learning-cycle.png b/how-to-use-azureml/explain-model/azure-integration/scoring-time/img/azure-machine-learning-cycle.png new file mode 100644 index 00000000..52de479f Binary files /dev/null and b/how-to-use-azureml/explain-model/azure-integration/scoring-time/img/azure-machine-learning-cycle.png differ diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb index 4fcf5439..4432d4f4 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb @@ -545,4 +545,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/how-to-use-azureml/explain-model/tabular-data/img/interpretability-architecture.png b/how-to-use-azureml/explain-model/tabular-data/img/interpretability-architecture.png new file mode 100644 index 00000000..a1eff1cc Binary files /dev/null and b/how-to-use-azureml/explain-model/tabular-data/img/interpretability-architecture.png differ diff --git a/how-to-use-azureml/machine-learning-pipelines/README.md b/how-to-use-azureml/machine-learning-pipelines/README.md index aec5dce2..2caedc3c 100644 --- a/how-to-use-azureml/machine-learning-pipelines/README.md +++ b/how-to-use-azureml/machine-learning-pipelines/README.md @@ -42,7 +42,7 @@ Take a look at [intro-to-pipelines](./intro-to-pipelines/) for the list of noteb * The second type of notebooks illustrate more sophisticated scenarios, and are independent of each other. These notebooks include: 1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines. -2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. +2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. This sample also showcases how to use conda dependencies using runconfig when using Pipelines. 3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample. ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/README.png) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb new file mode 100644 index 00000000..21cabfe2 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb @@ -0,0 +1,266 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to Use Pipeline Drafts\n", + "In this notebook, we will show you how you can use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites and AML Basics\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", + "\n", + "### Initialization Steps" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "from azureml.core import Workspace\n", + "from azureml.core import Run, Experiment, Datastore\n", + "from azureml.widgets import RunDetails\n", + "\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Compute Target\n", + "Retrieve an already attached Azure Machine Learning Compute to use in the Pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(ws, aml_compute_target)\n", + " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", + "except:\n", + " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Build a Pipeline\n", + "Build a simple pipeline to use to create a PipelineDraft." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import Pipeline\n", + "from azureml.pipeline.steps import PythonScriptStep\n", + "\n", + "source_directory = \"publish_run_train\"\n", + "\n", + "train_step = PythonScriptStep(\n", + " name=\"Training_Step\",\n", + " script_name=\"train.py\", \n", + " compute_target=aml_compute_target, \n", + " source_directory=source_directory)\n", + "print(\"train step created\")\n", + "\n", + "pipeline = Pipeline(workspace=ws, steps=[train_step])\n", + "print (\"Pipeline is built\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Pipeline Draft\n", + "Create a PipelineDraft by specifying a name, description, experiment_name and Pipeline. You can also specify tags, properties and pipeline_parameter values.\n", + "\n", + "In this example we use the previously created Pipeline object to create the Pipeline Draft. You can also create a Pipeline Draft from an existing Pipeline Run, Published Pipeline, or other Pipeline Draft." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import PipelineDraft\n", + "\n", + "pipeline_draft = PipelineDraft.create(ws, name=\"TestPipelineDraft\",\n", + " description=\"draft description\",\n", + " experiment_name=\"helloworld\",\n", + " pipeline=pipeline,\n", + " continue_on_step_failure=True,\n", + " tags={'dev': 'true'},\n", + " properties={'train': 'value'})\n", + "\n", + "created_pipeline_draft_id = pipeline_draft.id" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List Pipeline Drafts in a Workspace\n", + "Use the PipelineDraft.list() function to list all PipelineDrafts in a Workspace. You can use the optional tags parameter to filter on specified tag values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_drafts = PipelineDraft.list(ws, tags={'dev': 'true'})\n", + "\n", + "for pipeline_draft in pipeline_drafts:\n", + " print(pipeline_draft)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get a Pipeline Draft by Id" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_draft = PipelineDraft.get(ws, id=created_pipeline_draft_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Update a Pipeline Draft\n", + "The update() function of a pipeline draft can be used to update the name, description, experiment name, pipeline parameter assignments, continue on step failure setting and Pipeline associated with the PipelineDraft. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "new_train_step = PythonScriptStep(\n", + " name=\"New_Training_Step\",\n", + " script_name=\"train.py\", \n", + " compute_target=aml_compute_target, \n", + " source_directory=source_directory)\n", + "\n", + "new_pipeline = Pipeline(workspace=ws, steps=[new_train_step])\n", + "\n", + "pipeline_draft.update(name=\"UpdatedPipelineDraft\", description=\"has updated train step\", pipeline=new_pipeline)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit a Pipeline Run from a Pipeline Draft\n", + "Use the pipeline_draft.submit() function to submit a PipelineRun. After the run is submitted, the PipelineDraft can still be edited and used to submit new runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run = pipeline_draft.submit_run()\n", + "pipeline_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Published Pipeline from a Pipeline Draft\n", + "Use the pipeline_draft.publish() function to create a Published Pipeline from the Pipeline Draft. After creating a Published Pipeline, the Pipeline Draft can still be edited and used to create other Published Pipelines." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "published_pipeline = pipeline_draft.publish()\n", + "published_pipeline" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "elihop" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.yml b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.yml new file mode 100644 index 00000000..198569b0 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.yml @@ -0,0 +1,5 @@ +name: aml-pipelines-how-to-use-pipeline-drafts +dependencies: +- pip: + - azureml-sdk + - azureml-widgets diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb index a2c5561f..ee0bbb8d 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb @@ -315,7 +315,25 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Set Published Pipeline to default version" + "#### Add Published Pipeline to PipelineEndpoint, \n", + "Adds a published pipeline (if its not present) using add() and if you want to add and set to default use add_default()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_endpoint_by_name.add(published_pipeline)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Add Published pipeline to PipelineEndpoint and set it to default version\n", + "Adding published pipeline to PipelineEndpoint if not present and set it to default" ] }, { @@ -391,40 +409,6 @@ "pipeline_endpoint_by_name.set_name(name=\"NewName\")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Add Published Pipeline to PipelineEndpoint, \n", - "Adding published pipeline, if its not present in PipelineEndpoint." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_endpoint_by_name.add(published_pipeline)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Add Published pipeline to PipelineEndpoint and set it to default version\n", - "Adding published pipeline to PipelineEndpoint if not present and set it to default" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_endpoint_by_name.add_default(published_pipeline)" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb index fd6d587c..83041a51 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb @@ -1,5 +1,12 @@ { "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.png)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -9,13 +16,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -100,7 +100,7 @@ "\n", "# Check core SDK version number\n", "\n", - "print(\"This notebook was created using SDK version 1.0.57, you are currently running version\", azureml.core.VERSION)" + "print(\"This notebook was created using SDK version 1.0.60, you are currently running version\", azureml.core.VERSION)" ] }, { @@ -447,6 +447,22 @@ "fetched_run.get_metrics()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call ``run.get_metrics(name = )`` to retrieve a metric value by name. Retrieving a single metric can be faster, especially if the run contains many metrics." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fetched_run.get_metrics(name = \"scale factor\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -522,6 +538,19 @@ "name": "roastala" } ], + "category": "other", + "compute": [ + "None" + ], + "datasets": [], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "None" + ], + "friendly_name": "Logging APIs", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -537,8 +566,14 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.5" - } + "version": "3.6.8" + }, + "order_index": 1, + "star_tag": [], + "tags": [ + "None" + ], + "task": "Logging APIs and analyzing results" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/work-with-data/datasets/README.md b/how-to-use-azureml/work-with-data/datasets/README.md index 51969d15..a08adf6c 100644 --- a/how-to-use-azureml/work-with-data/datasets/README.md +++ b/how-to-use-azureml/work-with-data/datasets/README.md @@ -14,6 +14,7 @@ With Azure Machine Learning datasets, you can: * [Create and register datasets](https://aka.ms/azureml/howto/createdatasets) * Use TabularDatasets in [automated machine learning training](https://aka.ms/automl-dataset) * Use TabularDatasets in [training](https://aka.ms/tabulardataset-samplenotebook) +* Use FileDatasets in [training](https://aka.ms/filedataset-samplenotebook) * For existing Dataset users: [Dataset API change notice](dataset-api-change-notice.md) diff --git a/how-to-use-azureml/work-with-data/datasets/dataset-api-change-notice.md b/how-to-use-azureml/work-with-data/datasets/dataset-api-change-notice.md index 8b377d16..3d1b2683 100644 --- a/how-to-use-azureml/work-with-data/datasets/dataset-api-change-notice.md +++ b/how-to-use-azureml/work-with-data/datasets/dataset-api-change-notice.md @@ -4,27 +4,25 @@ The existing Dataset class only supports data in tabular format. In order to support binary data and address a wider range of machine learning scenarios including deep learning, we will introduce Dataset types. Datasets are categorized into various types based on how users consume them in training. List of Dataset types: - **TabularDataset**: Represents data in a tabular format by parsing the provided file or list of files. TabularDataset can be created from csv, tsv, parquet files, SQL query results etc. For the complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference). It provides you with the ability to materialize the data into a pandas DataFrame. -- (upcoming) **FileDataset**: References single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. -- (upcoming) **LabeledDataset**: Represents labeled data that are produced by Azure Machine Learning Labeling service. LabaledDataset provides you with the ability to materialize the data into formats like [COCO](http://cocodataset.org/#homeo) or [TFRecord](https://www.tensorflow.org/tutorials/load_data/tf_records) on your compute. -- (upcoming) **TimeSeriesDataset**: An extension of TabularDataset that allows for specification of a time column and filtering the Dataset by time. +- **FileDataset**: References single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. -In order to transit from the current Dataset design to typed Dataset, we will deprecate a series of methods on the Dataset class and launch the FileDataset and TabularDataset classes. +In order to transit from the current Dataset design to typed Dataset, we will deprecate the following methods over time. ## Which methods on Dataset class will be deprecated in upcoming releases? Methods to be deprecated|Replacement in the new version| ----|-------- -[Dataset.get()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-workspace--name-none--id-none-)|`Dataset.get_by_name()` -[Dataset.from_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-pandas-dataframe-dataframe--path-none--in-memory-false-)|Creating a Dataset from in-memory DataFrame or local files will cause errors in training on remote compute. Therefore, the new Dataset design will only support creating Datasets from paths in datastores or public web urls. If you are using pandas, you can write the DataFrame into a parquet file, upload it to the cloud, and create a TabularDataset referencing the parquet file using `Dataset.Tabular.from_parquet_files()` -[Dataset.from_delimited_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-delimited-files-path--separator------header--promoteheadersbehavior-all-files-have-same-headers--3---encoding--fileencoding-utf8--0---quoting-false--infer-column-types-true--skip-rows-0--skip-mode--skiplinesbehavior-no-rows--0---comment-none--include-path-false--archive-options-none--partition-format-none-)|`Dataset.Tabular.from_delimited_files()` -[Dataset.auto_read_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#auto-read-files-path--include-path-false--partition-format-none-)|`auto_read_files` does not always produce results that match with users' expectation. To avoid confusion, this method is not introduced with TabularDataset for now. Please use `Dataset.Tabular.from_parquet_files()` or `Dataset.Tabular.from_delimited_files()` depending on your file format. -[Dataset.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-parquet-files-path--include-path-false--partition-format-none-)|`Dataset.Tabular.from_parquet_files()` -[Dataset.from_sql_query()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-sql-query-data-source--query-)|`Dataset.Tabular.from_sql_query()` +[Dataset.get()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-workspace--name-none--id-none-)|[Dataset.get_by_name()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--) +[Dataset.from_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-pandas-dataframe-dataframe--path-none--in-memory-false-)|Creating a Dataset from in-memory DataFrame or local files will cause errors in training on remote compute. Therefore, the new Dataset design will only support creating Datasets from paths in datastores or public web urls. If you are using pandas, you can write the DataFrame into a parquet file, upload it to the cloud, and create a TabularDataset referencing the parquet file using [Dataset.Tabular.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none-) +[Dataset.from_delimited_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-delimited-files-path--separator------header--promoteheadersbehavior-all-files-have-same-headers--3---encoding--fileencoding-utf8--0---quoting-false--infer-column-types-true--skip-rows-0--skip-mode--skiplinesbehavior-no-rows--0---comment-none--include-path-false--archive-options-none--partition-format-none-)|[Dataset.Tabular.from_delimited_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-delimited-files-path--validate-true--include-path-false--infer-column-types-true--set-column-types-none--separator------header--promoteheadersbehavior-all-files-have-same-headers--3--) +[Dataset.auto_read_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#auto-read-files-path--include-path-false--partition-format-none-)|`auto_read_files` does not always produce results that match with users' expectation. To avoid confusion, this method is not introduced with TabularDataset for now. Please use [Dataset.Tabular.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none-) or [Dataset.Tabular.from_delimited_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-delimited-files-path--validate-true--include-path-false--infer-column-types-true--set-column-types-none--separator------header--promoteheadersbehavior-all-files-have-same-headers--3--) depending on your file format. +[Dataset.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-parquet-files-path--include-path-false--partition-format-none-)|[Dataset.Tabular.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none-) +[Dataset.from_sql_query()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-sql-query-data-source--query-)|[Dataset.Tabular.from_sql_query()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-sql-query-query--validate-true--set-column-types-none-) [Dataset.from_excel_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-excel-files-path--sheet-name-none--use-column-headers-false--skip-rows-0--include-path-false--infer-column-types-true--partition-format-none-)|We will support creating a TabularDataset from Excel files in a future release. [Dataset.from_json_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-json-files-path--encoding--fileencoding-utf8--0---flatten-nested-arrays-false--include-path-false--partition-format-none-)| We will support creating a TabularDataset from json files in a future release. -[Dataset.to_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-pandas-dataframe--)|`TabularDataset.to_pandas_dataframe()` -[Dataset.to_spark_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-spark-dataframe--)|`TabularDataset.to_spark_dataframe()` -[Dataset.head(3)](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#head-count-)|`TabularDataset.take(3).to_pandas_dataframe()` -[Dataset.sample()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#sample-sample-strategy--arguments-)|`TabularDataset.take_sample()` +[Dataset.to_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-pandas-dataframe--)|[TabularDataset.to_pandas_dataframe()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#to-pandas-dataframe--) +[Dataset.to_spark_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-spark-dataframe--)|[TabularDataset.to_spark_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#to-spark-dataframe--) +[Dataset.head(3)](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#head-count-)|[TabularDataset.take(3).to_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#take-count-) +[Dataset.sample()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#sample-sample-strategy--arguments-)|[TabularDataset.take_sample()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#take-sample-probability--seed-none-) [Dataset.from_binary_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-binary-files-path-)|`Dataset.File.from_files()` @@ -46,8 +44,8 @@ from azureml.core.dataset import Dataset # get existing workspace workspace = Workspace.from_config() -# This method will convert old Dataset without type to a TabularDataset object automatically. -new_ds = Dataset.get_by_name(workapce, 'old_ds_name') +# This method will convert old Dataset without type to either a TabularDataset or a FileDataset object automatically. +new_ds = Dataset.get_by_name(workspace, 'old_ds_name') # register the new typed Dataset with the workspace new_ds.register(workspace, 'new_ds_name') diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb new file mode 100644 index 00000000..72089d4b --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb @@ -0,0 +1,716 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train an image classification model with Azure Machine Learning\n", + " \n", + "This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n", + "\n", + "Learn how to:\n", + "\n", + "> * Set up your development environment\n", + "> * Access and examine the data via AzureML FileDataset\n", + "> * Train a simple logistic regression model on a remote cluster\n", + "> * Review training results, find and register the best model\n", + "\n", + "## Prerequisites\n", + "\n", + "See prerequisites in the [Azure Machine Learning documentation](https://docs.microsoft.com/azure/machine-learning/service/tutorial-train-models-with-aml#prerequisites)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up your development environment\n", + "\n", + "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", + "\n", + "* Importing Python packages\n", + "* Connecting to a workspace to enable communication between your local computer and remote resources\n", + "* Creating an experiment to track all your runs\n", + "* Creating a remote compute target to use for training\n", + "\n", + "### Import packages\n", + "\n", + "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "check version" + ] + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "\n", + "import azureml.core\n", + "from azureml.core import Workspace\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to workspace\n", + "\n", + "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "load workspace" + ] + }, + "outputs": [], + "source": [ + "# load workspace configuration from the config.json file in the current folder.\n", + "workspace = Workspace.from_config()\n", + "print(workspace.name, workspace.location, workspace.resource_group, workspace.location, sep='\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create experiment\n", + "\n", + "Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create experiment" + ] + }, + "outputs": [], + "source": [ + "experiment_name = 'sklearn-mnist'\n", + "\n", + "from azureml.core import Experiment\n", + "exp = Experiment(workspace=workspace, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach existing compute resource\n", + "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", + "\n", + "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create mlc", + "amlcompute" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.compute import AmlCompute\n", + "from azureml.core.compute import ComputeTarget\n", + "\n", + "# Choose a name for your cluster.\n", + "amlcompute_cluster_name = \"azureml-compute\"\n", + "\n", + "found = False\n", + "# Check if this compute target already exists in the workspace.\n", + "cts = workspace.compute_targets\n", + "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[amlcompute_cluster_name]\n", + "\n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", + " #vm_priority = 'lowpriority', # optional\n", + " max_nodes = 6)\n", + "\n", + " # Create the cluster.\\n\",\n", + " compute_target = ComputeTarget.create(workspace, amlcompute_cluster_name, provisioning_config)\n", + "\n", + "print('Checking cluster status...')\n", + "# Can poll for a minimum number of nodes and for a specific timeout.\n", + "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", + "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", + "\n", + "# For a more detailed view of current AmlCompute status, use get_status()." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You now have the necessary packages and compute resources to train a model in the cloud. \n", + "\n", + "## Explore data\n", + "\n", + "Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n", + "\n", + "* Download the MNIST dataset\n", + "* Display some sample images\n", + "* Upload data to the cloud\n", + "\n", + "### Download the MNIST dataset\n", + "\n", + "Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import urllib.request\n", + "\n", + "data_folder = os.path.join(os.getcwd(), 'data')\n", + "os.makedirs(data_folder, exist_ok=True)\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display some sample images\n", + "\n", + "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `utils.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# make sure utils.py is in the same directory as this code\n", + "from utils import load_data\n", + "\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", + "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", + "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", + "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", + "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", + "\n", + "# now let's show some randomly chosen images from the traininng set.\n", + "count = 0\n", + "sample_size = 30\n", + "plt.figure(figsize = (16, 6))\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", + " count = count + 1\n", + " plt.subplot(1, sample_size, count)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now you have an idea of what these images look like and the expected prediction outcome.\n", + "\n", + "### Upload data to the cloud\n", + "\n", + "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", + "\n", + "The MNIST files are uploaded into a directory named `mnist` at the root of the datastore. See [access data from your datastores](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "use datastore" + ] + }, + "outputs": [], + "source": [ + "datastore = workspace.get_default_datastore()\n", + "print(datastore.datastore_type, datastore.account_name, datastore.container_name)\n", + "\n", + "datastore.upload(src_dir=data_folder, target_path='mnist', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a FileDataset\n", + "A FileDataset references single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. [Learn More](https://aka.ms/azureml/howto/createdatasets)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.dataset import Dataset\n", + "\n", + "datastore = workspace.get_default_datastore()\n", + "dataset = Dataset.File.from_files(path = [(datastore, 'mnist/')])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use the `register()` method to register datasets to your workspace so they can be shared with others, reused across various experiments, and refered to by name in your training script." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataset = dataset.register(workspace = workspace,\n", + " name = 'mnist dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train on a remote cluster\n", + "\n", + "For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n", + "* Create a directory\n", + "* Create a training script\n", + "* Create an estimator object\n", + "* Submit the job \n", + "\n", + "### Create a directory\n", + "\n", + "Create a directory to deliver the necessary code from your computer to the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "script_folder = os.path.join(os.getcwd(), \"sklearn-mnist\")\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a training script\n", + "\n", + "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile $script_folder/train.py\n", + "\n", + "import argparse\n", + "import os\n", + "import numpy as np\n", + "\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.externals import joblib\n", + "\n", + "from azureml.core import Run, Dataset\n", + "from utils import load_data\n", + "from uuid import uuid4\n", + "\n", + "# let user feed in the regularization rate of the logistic regression model as an argument\n", + "parser = argparse.ArgumentParser()\n", + "parser.add_argument('--dataset-name', dest='ds_name', help='the name of dataset')\n", + "parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n", + "args = parser.parse_args()\n", + "\n", + "# get hold of the current run\n", + "run = Run.get_context()\n", + "\n", + "workspace = run.experiment.workspace\n", + "dataset_name = args.ds_name\n", + "dataset = Dataset.get_by_name(workspace=workspace, name=dataset_name)\n", + "\n", + "# create a folder on the compute that we will mount the dataset to\n", + "data_folder = '/tmp/mnist/{}'.format(uuid4())\n", + "os.makedirs(data_folder)\n", + "\n", + "with dataset.mount(data_folder):\n", + " import glob\n", + " X_train_path = glob.glob(os.path.join(data_folder, '**/train-images.gz'), recursive=True)[0]\n", + " X_test_path = glob.glob(os.path.join(data_folder, '**/test-images.gz'), recursive=True)[0]\n", + " y_train_path = glob.glob(os.path.join(data_folder, '**/train-labels.gz'), recursive=True)[0]\n", + " y_test_path = glob.glob(os.path.join(data_folder, '**/test-labels.gz'), recursive=True)[0]\n", + " # load train and test set into numpy arrays\n", + " # note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n", + " X_train = load_data(X_train_path, False) / 255.0\n", + " X_test = load_data(X_test_path, False) / 255.0\n", + " y_train = load_data(y_train_path, True).reshape(-1)\n", + " y_test = load_data(y_test_path, True).reshape(-1)\n", + " print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n", + "\n", + " print('Train a logistic regression model with regularization rate of', args.reg)\n", + " clf = LogisticRegression(C=1.0/args.reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42)\n", + " clf.fit(X_train, y_train)\n", + "\n", + " print('Predict the test set')\n", + " y_hat = clf.predict(X_test)\n", + "\n", + " # calculate accuracy on the prediction\n", + " acc = np.average(y_hat == y_test)\n", + " print('Accuracy is', acc)\n", + "\n", + " run.log('regularization rate', np.float(args.reg))\n", + " run.log('accuracy', np.float(acc))\n", + "\n", + " os.makedirs('outputs', exist_ok=True)\n", + " # note file saved in the outputs folder is automatically uploaded into experiment record\n", + " joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice how the script gets data and saves models:\n", + "\n", + "+ The training script gets the mnist dataset registered with the workspace through the Run object, then uses the FileDataset to download file streams defined by it to a target path (data_folder) on the compute." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "+ The training script saves your model into a directory named outputs.
\n", + "`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`
\n", + "Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "shutil.copy('utils.py', script_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an estimator\n", + "\n", + "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create SKLearn estimator for scikit-learn model, by specifying\n", + "\n", + "* The name of the estimator object, `est`\n", + "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", + "* The compute target. In this case you will use the AmlCompute you created\n", + "* The training script name, train.py\n", + "* Parameters required from the training script \n", + "\n", + "In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.environment import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "env = Environment('my_env')\n", + "cd = CondaDependencies.create(pip_packages=['azureml-sdk<0.1.1', 'pandas','scikit-learn','azureml-dataprep[pandas,fuse]~=1.1.13rc'])\n", + "\n", + "env.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure estimator" + ] + }, + "outputs": [], + "source": [ + "from azureml.train.sklearn import SKLearn\n", + "\n", + "script_params = {\n", + " '--dataset-name': 'mnist dataset',\n", + " '--regularization': 0.5\n", + "}\n", + "\n", + "est = SKLearn(source_directory=script_folder,\n", + " script_params=script_params,\n", + " compute_target=compute_target,\n", + " environment_definition = env,\n", + " entry_script='train.py')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit the job to the cluster\n", + "\n", + "Run the experiment by submitting the estimator object. And you can navigate to Azure portal to monitor the run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "amlcompute", + "scikit-learn" + ] + }, + "outputs": [], + "source": [ + "run = exp.submit(config=est)\n", + "run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n", + "\n", + "## Monitor a remote run\n", + "\n", + "In total, the first run takes **approximately 10 minutes**.\n", + "\n", + "Here is what's happening while you wait:\n", + "\n", + "- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**. \n", + "\n", + " This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n", + "\n", + "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n", + "\n", + "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n", + "\n", + "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n", + "\n", + "\n", + "You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n", + "\n", + "### Jupyter widget\n", + "\n", + "Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "use notebook widget" + ] + }, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get log results upon completion\n", + "\n", + "Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "remote run", + "amlcompute", + "scikit-learn" + ] + }, + "outputs": [], + "source": [ + "# specify show_output to True for a verbose log\n", + "run.wait_for_completion(show_output=True) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display run results\n", + "\n", + "You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get metrics" + ] + }, + "outputs": [], + "source": [ + "print(run.get_metrics())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register model\n", + "\n", + "The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n", + "\n", + "You can see files associated with that run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "query history" + ] + }, + "outputs": [], + "source": [ + "print(run.get_file_names())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from history" + ] + }, + "outputs": [], + "source": [ + "# register model \n", + "model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n", + "print(model.name, model.id, model.version, sep='\\t')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/datasets/datasets-tutorial/filedatasets-tutorial.png)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + }, + "msauthor": "sihhu" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py new file mode 100644 index 00000000..98170ada --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py @@ -0,0 +1,27 @@ +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. + +import gzip +import numpy as np +import struct + + +# load compressed MNIST gz files and return numpy arrays +def load_data(filename, label=False): + with gzip.open(filename) as gz: + struct.unpack('I', gz.read(4)) + n_items = struct.unpack('>I', gz.read(4)) + if not label: + n_rows = struct.unpack('>I', gz.read(4))[0] + n_cols = struct.unpack('>I', gz.read(4))[0] + res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8) + res = res.reshape(n_items[0], n_rows * n_cols) + else: + res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8) + res = res.reshape(n_items[0], 1) + return res + + +# one-hot encode a 1-D array +def one_hot_encode(array, num_of_classes): + return np.eye(num_of_classes)[array.reshape(-1)] diff --git a/index.md b/index.md new file mode 100644 index 00000000..e791b0d0 --- /dev/null +++ b/index.md @@ -0,0 +1,499 @@ + +# Index +Azure Machine Learning is a cloud service that you use to train, deploy, automate, +and manage machine learning models. This index should assist in navigating the Azure +Machine Learning notebook samples and encourage efficient retrieval of topics and content. +![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/Index.png) + +## Getting Started + +|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | +|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| + + +## Tutorials + +|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | +|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| + + +## Training + +|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | +|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| + + + +## Deployment + + +|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | +|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| + + + +## Other Notebooks +|Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | +|:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| + +| [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azuremlconfiguration.ipynb) | | | | | | | + + +| [azure-ml-with-nvidia-rapids](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb) | | | | | | | + + +| [auto-ml-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb) | | | | | | | + + +| [auto-ml-classification-bank-marketing](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb) | | | | | | | + + +| [auto-ml-classification-credit-card-fraud](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb) | | | | | | | + + +| [auto-ml-classification-with-deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb) | | | | | | | + + +| [auto-ml-classification-with-onnx](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb) | | | | | | | + + +| [auto-ml-classification-with-whitelisting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb) | | | | | | | + + +| [auto-ml-dataset](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb) | | | | | | | + + +| [auto-ml-dataset-remote-execution](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb) | | | | | | | + + +| [auto-ml-exploring-previous-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/exploring-previous-runs/auto-ml-exploring-previous-runs.ipynb) | | | | | | | + + +| [auto-ml-forecasting-bike-share](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb) | | | | | | | + + +| [auto-ml-forecasting-energy-demand](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb) | | | | | | | + + +| [auto-ml-forecasting-orange-juice-sales](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb) | | | | | | | + + +| [auto-ml-missing-data-blacklist-early-termination](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb) | | | | | | | + + +| [auto-ml-model-explanation](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb) | | | | | | | + + +| [auto-ml-regression](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb) | | | | | | | + + +| [auto-ml-regression-concrete-strength](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb) | | | | | | | + + +| [auto-ml-regression-hardware-performance](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb) | | | | | | | + + +| [auto-ml-remote-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb) | | | | | | | + + +| [auto-ml-remote-amlcompute-with-onnx](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb) | | | | | | | + + +| [auto-ml-sample-weight](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb) | | | | | | | + + +| [auto-ml-sparse-data-train-test-split](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb) | | | | | | | + + +| [auto-ml-sql-energy-demand](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/auto-ml-sql-energy-demand.ipynb) | | | | | | | + + +| [auto-ml-sql-setup](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb) | | | | | | | + + +| [auto-ml-subsampling-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb) | | | | | | | + + +| [build-model-run-history-03](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb) | | | | | | | + + +| [deploy-to-aci-04](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb) | | | | | | | + + +| [deploy-to-aks-existingimage-05](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.ipynb) | | | | | | | + + +| [ingest-data-02](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb) | | | | | | | + + +| [installation-and-configuration-01](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb) | | | | | | | + + +| [automl-databricks-local-01](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb) | | | | | | | + + +| [automl-databricks-local-with-deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb) | | | | | | | + + +| [aml-pipelines-use-databricks-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb) | | | | | | | + + +| [automl_hdi_local_classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.ipynb) | | | | | | | + + +| [model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.ipynb) | | | | | | | + + +| [register-model-deploy-local-advanced](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deploy-to-local/register-model-deploy-local-advanced.ipynb) | | | | | | | + + +| [register-model-deploy-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deploy-to-local/register-model-deploy-local.ipynb) | | | | | | | + + +| [accelerated-models-object-detection](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/accelerated-models/accelerated-models-object-detection.ipynb) | | | | | | | + + +| [accelerated-models-quickstart](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/accelerated-models/accelerated-models-quickstart.ipynb) | | | | | | | + + +| [accelerated-models-training](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/accelerated-models/accelerated-models-training.ipynb) | | | | | | | + + +| [model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb) | | | | | | | + + +| [register-model-deploy-local-advanced](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb) | | | | | | | + + +| [register-model-deploy-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb) | | | | | | | + + +| [enable-app-insights-in-production-service](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) | | | | | | | + + +| [enable-data-collection-for-models-in-aks](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb) | | | | | | | + + +| [onnx-convert-aml-deploy-tinyyolo](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb) | | | | | | | + + +| [onnx-inference-facial-expression-recognition-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) | | | | | | | + + +| [onnx-inference-mnist-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) | | | | | | | + + +| [onnx-modelzoo-aml-deploy-resnet50](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb) | | | | | | | + + +| [onnx-train-pytorch-aml-deploy-mnist](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb) | | | | | | | + + +| [production-deploy-to-aks](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) | | | | | | | + + +| [production-deploy-to-aks-gpu](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb) | | | | | | | + + +| [register-model-create-image-deploy-service](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb) | | | | | | | + + +| [explain-model-on-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb) | | | | | | | + + +| [save-retrieve-explanations-run-history](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb) | | | | | | | + + +| [train-explain-model-locally-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb) | | | | | | | + + +| [train-explain-model-on-amlcompute-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb) | | | | | | | + + +| [advanced-feature-transformations-explain-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/tabular-data/advanced-feature-transformations-explain-local.ipynb) | | | | | | | + + +| [explain-binary-classification-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/tabular-data/explain-binary-classification-local.ipynb) | | | | | | | + + +| [explain-multiclass-classification-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/tabular-data/explain-multiclass-classification-local.ipynb) | | | | | | | + + +| [explain-regression-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/tabular-data/explain-regression-local.ipynb) | | | | | | | + + +| [simple-feature-transformations-explain-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/explain-model/tabular-data/simple-feature-transformations-explain-local.ipynb) | | | | | | | + + +| [aml-pipelines-data-transfer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb) | | | | | | | + + +| [aml-pipelines-getting-started](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb) | | | | | | | + + +| [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb) | | | | | | | + + +| [aml-pipelines-how-to-use-estimatorstep](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb) | | | | | | | + + +| [aml-pipelines-how-to-use-pipeline-drafts](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb) | | | | | | | + + +| [aml-pipelines-parameter-tuning-with-hyperdrive](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb) | | | | | | | + + +| [aml-pipelines-publish-and-run-using-rest-endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) | | | | | | | + + +| [aml-pipelines-setup-schedule-for-a-published-pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb) | | | | | | | + + +| [aml-pipelines-setup-versioned-pipeline-endpoints](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb) | | | | | | | + + +| [aml-pipelines-use-adla-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb) | | | | | | | + + +| [aml-pipelines-use-databricks-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb) | | | | | | | + + +| [aml-pipelines-with-automated-machine-learning-step](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb) | | | | | | | + + +| [aml-pipelines-with-data-dependency-steps](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb) | | | | | | | + + +| [nyc-taxi-data-regression-model-building](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) | | | | | | | + + +| [pipeline-batch-scoring](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb) | | | | | | | + + +| [pipeline-style-transfer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb) | | | | | | | + + +| [authentication-in-azureml](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.ipynb) | | | | | | | + + +| [azure-ml-datadrift](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb) | | | | | | | + + +| [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | | None | None | None | None | + + +| [manage-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb) | | | | | | | + + +| [tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb) | | | | | | | + + +| [deploy-model](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb) | | | | | | | + + +| [train-and-deploy-pytorch](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb) | | | | | | | + + +| [train-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb) | | | | | | | + + +| [train-remote](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb) | | | | | | | + + +| [logging-api](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/logging-api/logging-api.ipynb) | | | | | | | + + +| [manage-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/manage-runs/manage-runs.ipynb) | | | | | | | + + +| [train-hyperparameter-tune-deploy-with-sklearn](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb) | | | | | | | + + +| [train-in-spark](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb) | | | | | | | + + +| [train-on-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) | | | | | | | + + +| [train-on-local](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-on-local/train-on-local.ipynb) | | | | | | | + + +| [train-on-remote-vm](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) | | | | | | | + + +| [train-within-notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb) | | | | | | | + + +| [using-environments](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training/using-environments/using-environments.ipynb) | | | | | | | + + +| [distributed-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb) | | | | | | | + + +| [distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb) | | | | | | | + + +| [distributed-pytorch-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb) | | | | | | | + + +| [distributed-tensorflow-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb) | | | | | | | + + +| [distributed-tensorflow-with-parameter-server](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb) | | | | | | | + + +| [export-run-history-to-tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) | | | | | | | + + +| [how-to-use-estimator](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb) | | | | | | | + + +| [notebook_example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb) | | | | | | | + + +| [tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb) | | | | | | | + + +| [train-hyperparameter-tune-deploy-with-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) | | | | | | | + + +| [train-hyperparameter-tune-deploy-with-keras](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb) | | | | | | | + + +| [train-hyperparameter-tune-deploy-with-pytorch](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) | | | | | | | + + +| [train-hyperparameter-tune-deploy-with-tensorflow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) | | | | | | | + + +| [train-tensorflow-resume-training](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | | | | | | | + + +| [new-york-taxi](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi.ipynb) | | | | | | | + + +| [new-york-taxi_scale-out](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi_scale-out.ipynb) | | | | | | | + + +| [add-column-using-expression](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb) | | | | | | | + + +| [append-columns-and-rows](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/append-columns-and-rows.ipynb) | | | | | | | + + +| [assertions](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/assertions.ipynb) | | | | | | | + + +| [auto-read-file](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/auto-read-file.ipynb) | | | | | | | + + +| [cache](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/cache.ipynb) | | | | | | | + + +| [column-manipulations](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/column-manipulations.ipynb) | | | | | | | + + +| [column-type-transforms](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/column-type-transforms.ipynb) | | | | | | | + + +| [custom-python-transforms](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb) | | | | | | | + + +| [data-ingestion](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb) | | | | | | | + + +| [data-profile](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-profile.ipynb) | | | | | | | + + +| [datastore](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/datastore.ipynb) | | | | | | | + + +| [derive-column-by-example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/derive-column-by-example.ipynb) | | | | | | | + + +| [external-references](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/external-references.ipynb) | | | | | | | + + +| [filtering](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb) | | | | | | | + + +| [fuzzy-group](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/fuzzy-group.ipynb) | | | | | | | + + +| [impute-missing-values](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/impute-missing-values.ipynb) | | | | | | | + + +| [join](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/join.ipynb) | | | | | | | + + +| [label-encoder](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/label-encoder.ipynb) | | | | | | | + + +| [min-max-scaler](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/min-max-scaler.ipynb) | | | | | | | + + +| [one-hot-encoder](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb) | | | | | | | + + +| [open-save-dataflows](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb) | | | | | | | + + +| [quantile-transformation](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/quantile-transformation.ipynb) | | | | | | | + + +| [random-split](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb) | | | | | | | + + +| [replace-datasource-replace-reference](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/replace-datasource-replace-reference.ipynb) | | | | | | | + + +| [replace-fill-error](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/replace-fill-error.ipynb) | | | | | | | + + +| [secrets](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/secrets.ipynb) | | | | | | | + + +| [semantic-types](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/semantic-types.ipynb) | | | | | | | + + +| [split-column-by-example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/split-column-by-example.ipynb) | | | | | | | + + +| [subsetting-sampling](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb) | | | | | | | + + +| [summarize](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/summarize.ipynb) | | | | | | | + + +| [working-with-file-streams](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb) | | | | | | | + + +| [writing-data](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/how-to-guides/writing-data.ipynb) | | | | | | | + + +| [getting-started](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/dataprep/tutorials/getting-started/getting-started.ipynb) | | | | | | | + + +| [datasets-diff](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/datasets/datasets-diff/datasets-diff.ipynb) | | | | | | | + + +| [file-dataset-img-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb) | | | | | | | + + +| [tabular-dataset-tutorial](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-dataset-tutorial.ipynb) | | | | | | | + + +| [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/setup-environment/configuration.ipynb) | | | | | | | + + +| [img-classification-part1-training](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/tutorials/img-classification-part1-training.ipynb) | | | | | | | + + +| [img-classification-part2-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/tutorials/img-classification-part2-deploy.ipynb) | | | | | | | + + +| [regression-automated-ml](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/tutorials/regression-automated-ml.ipynb) | | | | | | | + + +| [tutorial-1st-experiment-sdk-train](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/tutorials/tutorial-1st-experiment-sdk-train.ipynb) | | | | | | | + diff --git a/setup-environment/configuration.ipynb b/setup-environment/configuration.ipynb index 1ae793bc..f7c376bb 100644 --- a/setup-environment/configuration.ipynb +++ b/setup-environment/configuration.ipynb @@ -102,7 +102,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.57 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.0.60 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/tutorials/img-classification-part1-training.ipynb b/tutorials/img-classification-part1-training.ipynb index 9205cf89..c783128b 100644 --- a/tutorials/img-classification-part1-training.ipynb +++ b/tutorials/img-classification-part1-training.ipynb @@ -93,7 +93,7 @@ "source": [ "# load workspace configuration from the config.json file in the current folder.\n", "ws = Workspace.from_config()\n", - "print(ws.name, ws.location, ws.resource_group, ws.location, sep='\\t')" + "print(ws.name, ws.location, ws.resource_group, sep='\\t')" ] }, { @@ -125,10 +125,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create or attach existing compute target\n", + "### Create or Attach existing compute resource\n", "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", "\n", - "**Creation of compute target takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." + "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." ] }, { @@ -258,9 +258,9 @@ "\n", "### Upload data to the cloud\n", "\n", - "You downloaded and used the training data on the computer your notebook is running on. In the next section, you will train a model on the remote Azure Machine Learning Compute. The remote compute resource will also need access to your data. To provide access, upload your data to a centralized datastore associated with your workspace. This datastore provides fast access to your data when using remote compute targets in the cloud, as it is in the Azure data center.\n", + "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", "\n", - "Upload the MNIST files into a directory named `mnist` at the root of the datastore: See [access data from your datastores](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) for more information." + "The MNIST files are uploaded into a directory named `mnist` at the root of the datastore. See [access data from your datastores](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-access-data) for more information." ] }, { @@ -690,4 +690,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} +} \ No newline at end of file diff --git a/tutorials/img-classification-part2-deploy.ipynb b/tutorials/img-classification-part2-deploy.ipynb index b52dba82..c01b3149 100644 --- a/tutorials/img-classification-part2-deploy.ipynb +++ b/tutorials/img-classification-part2-deploy.ipynb @@ -1,625 +1,625 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n", - "\n", - "This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](img-classification-part1-training.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n", - "\n", - "Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n", - "\n", - "In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n", - "\n", - "> * Set up your testing environment\n", - "> * Retrieve the model from your workspace\n", - "> * Test the model locally\n", - "> * Deploy the model to ACI\n", - "> * Test the deployed model\n", - "\n", - "ACI is a great solution for testing and understanding the workflow. For scalable production deployments, consider using Azure Kubernetes Service. For more information, see [how to deploy and where](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where).\n", - "\n", - "\n", - "## Prerequisites\n", - "\n", - "Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](train-models.ipynb) notebook. \n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "# If you did NOT complete the tutorial, you can instead run this cell \n", - "# This will register a model and download the data needed for this tutorial\n", - "# These prerequisites are created in the training tutorial\n", - "# Feel free to skip this cell if you completed the training tutorial \n", - "\n", - "# register a model\n", - "from azureml.core import Workspace\n", - "ws = Workspace.from_config()\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "model_name = \"sklearn_mnist\"\n", - "model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n", - " model_name=model_name,\n", - " tags={\"data\": \"mnist\", \"model\": \"classification\"},\n", - " description=\"Mnist handwriting recognition\",\n", - " workspace=ws)\n", - "\n", - "# download test data\n", - "import os\n", - "import urllib.request\n", - "\n", - "data_folder = os.path.join(os.getcwd(), 'data')\n", - "os.makedirs(data_folder, exist_ok = True)\n", - "\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up the environment\n", - "\n", - "Start by setting up a testing environment.\n", - "\n", - "### Import packages\n", - "\n", - "Import the Python packages needed for this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "check version" - ] - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - " \n", - "import azureml.core\n", - "\n", - "# display the core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the model\n", - "\n", - "You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "load workspace", - "download model" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.model import Model\n", - "import os \n", - "ws = Workspace.from_config()\n", - "model=Model(ws, 'sklearn_mnist')\n", - "\n", - "model.download(target_dir=os.getcwd(), exist_ok=True)\n", - "\n", - "# verify the downloaded model file\n", - "file_path = os.path.join(os.getcwd(), \"sklearn_mnist_model.pkl\")\n", - "\n", - "os.stat(file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test model locally\n", - "\n", - "Before deploying, make sure your model is working locally by:\n", - "* Loading test data\n", - "* Predicting test data\n", - "* Examining the confusion matrix\n", - "\n", - "### Load test data\n", - "\n", - "Load the test data from the **./data/** directory created during the training tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from utils import load_data\n", - "import os\n", - "\n", - "data_folder = os.path.join(os.getcwd(), 'data')\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", - "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", - "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Predict test data\n", - "\n", - "Feed the test dataset to the model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pickle\n", - "from sklearn.externals import joblib\n", - "\n", - "clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))\n", - "y_hat = clf.predict(X_test)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Examine the confusion matrix\n", - "\n", - "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import confusion_matrix\n", - "\n", - "conf_mx = confusion_matrix(y_test, y_hat)\n", - "print(conf_mx)\n", - "print('Overall accuracy:', np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n", - "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", - "norm_conf_mx = conf_mx / row_sums\n", - "np.fill_diagonal(norm_conf_mx, 0)\n", - "\n", - "fig = plt.figure(figsize=(8,5))\n", - "ax = fig.add_subplot(111)\n", - "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", - "ticks = np.arange(0, 10, 1)\n", - "ax.set_xticks(ticks)\n", - "ax.set_yticks(ticks)\n", - "ax.set_xticklabels(ticks)\n", - "ax.set_yticklabels(ticks)\n", - "fig.colorbar(cax)\n", - "plt.ylabel('true labels', fontsize=14)\n", - "plt.xlabel('predicted values', fontsize=14)\n", - "plt.savefig('conf.png')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy as web service\n", - "\n", - "Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n", - "\n", - "To build the correct environment for ACI, provide the following:\n", - "* A scoring script to show how to use the model\n", - "* An environment file to show what packages need to be installed\n", - "* A configuration file to build the ACI\n", - "* The model you trained before\n", - "\n", - "### Create scoring script\n", - "\n", - "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", - "\n", - "You must include two required functions into the scoring script:\n", - "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", - "\n", - "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import pickle\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global model\n", - " # retrieve the path to the model file using the model name\n", - " model_path = Model.get_model_path('sklearn_mnist')\n", - " model = joblib.load(model_path)\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " y_hat = model.predict(data)\n", - " # you can return any data type as long as it is JSON-serializable\n", - " return y_hat.tolist()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create environment file\n", - "\n", - "Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "set conda dependencies" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies()\n", - "myenv.add_conda_package(\"scikit-learn\")\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Review the content of the `myenv.yml` file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(\"myenv.yml\",\"r\") as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create configuration file\n", - "\n", - "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure web service", - "aci" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", - " description='Predict MNIST with sklearn')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy in ACI\n", - "Estimated time to complete: **about 7-8 minutes**\n", - "\n", - "Configure the image and deploy. The following code goes through these steps:\n", - "\n", - "1. Build an image using:\n", - " * The scoring file (`score.py`)\n", - " * The environment file (`myenv.yml`)\n", - " * The model file\n", - "1. Register that image under the workspace. \n", - "1. Send the image to the ACI container.\n", - "1. Start up a container in ACI using the image.\n", - "1. Get the web service HTTP endpoint." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure image", - "create image", - "deploy web service", - "aci" - ] - }, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "from azureml.core.model import InferenceConfig\n", - "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\")\n", - "\n", - "service = Model.deploy(workspace=ws, \n", - " name='sklearn-mnist-svc', \n", - " models=[model], \n", - " inference_config=inference_config, \n", - " deployment_config=aciconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "get scoring uri" - ] - }, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test deployed service\n", - "\n", - "Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n", - "\n", - "The following code goes through these steps:\n", - "1. Send the data as a JSON array to the web service hosted in ACI. \n", - "\n", - "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n", - "\n", - "1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n", - "\n", - " Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "score web service" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding='utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = service.run(input_data=test_samples)\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also send raw HTTP request to test the web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "score web service" - ] - }, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "# for AKS deployment you'd need to the service key in the header as well\n", - "# api_key = service.get_key()\n", - "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up resources\n", - "\n", - "To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "delete web service" - ] - }, - "outputs": [], - "source": [ - "service.delete()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n", - "\n", - "\n", - "## Next steps\n", - "\n", - "In this Azure Machine Learning tutorial, you used Python to:\n", - "\n", - "> * Set up your testing environment\n", - "> * Retrieve the model from your workspace\n", - "> * Test the model locally\n", - "> * Deploy the model to ACI\n", - "> * Test the deployed model\n", - " \n", - "You can also try out the [regression tutorial](regression-part1-data-prep.ipynb)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/img-classification-part2-deploy.png)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial #2: Deploy an image classification model in Azure Container Instance (ACI)\n", + "\n", + "This tutorial is **part two of a two-part tutorial series**. In the [previous tutorial](img-classification-part1-training.ipynb), you trained machine learning models and then registered a model in your workspace on the cloud. \n", + "\n", + "Now, you're ready to deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/azure/container-instances/) (ACI). A web service is an image, in this case a Docker image, that encapsulates the scoring logic and the model itself. \n", + "\n", + "In this part of the tutorial, you use Azure Machine Learning service (Preview) to:\n", + "\n", + "> * Set up your testing environment\n", + "> * Retrieve the model from your workspace\n", + "> * Test the model locally\n", + "> * Deploy the model to ACI\n", + "> * Test the deployed model\n", + "\n", + "ACI is a great solution for testing and understanding the workflow. For scalable production deployments, consider using Azure Kubernetes Service. For more information, see [how to deploy and where](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where).\n", + "\n", + "\n", + "## Prerequisites\n", + "\n", + "Complete the model training in the [Tutorial #1: Train an image classification model with Azure Machine Learning](train-models.ipynb) notebook. \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "# If you did NOT complete the tutorial, you can instead run this cell \n", + "# This will register a model and download the data needed for this tutorial\n", + "# These prerequisites are created in the training tutorial\n", + "# Feel free to skip this cell if you completed the training tutorial \n", + "\n", + "# register a model\n", + "from azureml.core import Workspace\n", + "ws = Workspace.from_config()\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "model_name = \"sklearn_mnist\"\n", + "model = Model.register(model_path=\"sklearn_mnist_model.pkl\",\n", + " model_name=model_name,\n", + " tags={\"data\": \"mnist\", \"model\": \"classification\"},\n", + " description=\"Mnist handwriting recognition\",\n", + " workspace=ws)\n", + "\n", + "# download test data\n", + "import os\n", + "import urllib.request\n", + "\n", + "data_folder = os.path.join(os.getcwd(), 'data')\n", + "os.makedirs(data_folder, exist_ok = True)\n", + "\n", + "\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", + "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up the environment\n", + "\n", + "Start by setting up a testing environment.\n", + "\n", + "### Import packages\n", + "\n", + "Import the Python packages needed for this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "check version" + ] + }, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + " \n", + "import azureml.core\n", + "\n", + "# display the core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the model\n", + "\n", + "You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "load workspace", + "download model" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "from azureml.core.model import Model\n", + "import os \n", + "ws = Workspace.from_config()\n", + "model=Model(ws, 'sklearn_mnist')\n", + "\n", + "model.download(target_dir=os.getcwd(), exist_ok=True)\n", + "\n", + "# verify the downloaded model file\n", + "file_path = os.path.join(os.getcwd(), \"sklearn_mnist_model.pkl\")\n", + "\n", + "os.stat(file_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test model locally\n", + "\n", + "Before deploying, make sure your model is working locally by:\n", + "* Loading test data\n", + "* Predicting test data\n", + "* Examining the confusion matrix\n", + "\n", + "### Load test data\n", + "\n", + "Load the test data from the **./data/** directory created during the training tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from utils import load_data\n", + "import os\n", + "\n", + "data_folder = os.path.join(os.getcwd(), 'data')\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", + "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", + "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Predict test data\n", + "\n", + "Feed the test dataset to the model to get predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pickle\n", + "from sklearn.externals import joblib\n", + "\n", + "clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))\n", + "y_hat = clf.predict(X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Examine the confusion matrix\n", + "\n", + "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import confusion_matrix\n", + "\n", + "conf_mx = confusion_matrix(y_test, y_hat)\n", + "print(conf_mx)\n", + "print('Overall accuracy:', np.average(y_hat == y_test))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n", + "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", + "norm_conf_mx = conf_mx / row_sums\n", + "np.fill_diagonal(norm_conf_mx, 0)\n", + "\n", + "fig = plt.figure(figsize=(8,5))\n", + "ax = fig.add_subplot(111)\n", + "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", + "ticks = np.arange(0, 10, 1)\n", + "ax.set_xticks(ticks)\n", + "ax.set_yticks(ticks)\n", + "ax.set_xticklabels(ticks)\n", + "ax.set_yticklabels(ticks)\n", + "fig.colorbar(cax)\n", + "plt.ylabel('true labels', fontsize=14)\n", + "plt.xlabel('predicted values', fontsize=14)\n", + "plt.savefig('conf.png')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploy as web service\n", + "\n", + "Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n", + "\n", + "To build the correct environment for ACI, provide the following:\n", + "* A scoring script to show how to use the model\n", + "* An environment file to show what packages need to be installed\n", + "* A configuration file to build the ACI\n", + "* The model you trained before\n", + "\n", + "### Create scoring script\n", + "\n", + "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", + "\n", + "You must include two required functions into the scoring script:\n", + "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", + "\n", + "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", + "import os\n", + "import pickle\n", + "from sklearn.externals import joblib\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "from azureml.core.model import Model\n", + "\n", + "def init():\n", + " global model\n", + " # retrieve the path to the model file using the model name\n", + " model_path = Model.get_model_path('sklearn_mnist')\n", + " model = joblib.load(model_path)\n", + "\n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " y_hat = model.predict(data)\n", + " # you can return any data type as long as it is JSON-serializable\n", + " return y_hat.tolist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create environment file\n", + "\n", + "Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "set conda dependencies" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myenv = CondaDependencies()\n", + "myenv.add_conda_package(\"scikit-learn\")\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Review the content of the `myenv.yml` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"myenv.yml\",\"r\") as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create configuration file\n", + "\n", + "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure web service", + "aci" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", + " description='Predict MNIST with sklearn')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy in ACI\n", + "Estimated time to complete: **about 7-8 minutes**\n", + "\n", + "Configure the image and deploy. The following code goes through these steps:\n", + "\n", + "1. Build an image using:\n", + " * The scoring file (`score.py`)\n", + " * The environment file (`myenv.yml`)\n", + " * The model file\n", + "1. Register that image under the workspace. \n", + "1. Send the image to the ACI container.\n", + "1. Start up a container in ACI using the image.\n", + "1. Get the web service HTTP endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "configure image", + "create image", + "deploy web service", + "aci" + ] + }, + "outputs": [], + "source": [ + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import InferenceConfig\n", + "\n", + "inference_config = InferenceConfig(runtime= \"python\", \n", + " entry_script=\"score.py\",\n", + " conda_file=\"myenv.yml\")\n", + "\n", + "service = Model.deploy(workspace=ws, \n", + " name='sklearn-mnist-svc', \n", + " models=[model], \n", + " inference_config=inference_config, \n", + " deployment_config=aciconfig)\n", + "\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "get scoring uri" + ] + }, + "outputs": [], + "source": [ + "print(service.scoring_uri)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test deployed service\n", + "\n", + "Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n", + "\n", + "The following code goes through these steps:\n", + "1. Send the data as a JSON array to the web service hosted in ACI. \n", + "\n", + "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n", + "\n", + "1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n", + "\n", + " Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "score web service" + ] + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# find 30 random samples from test set\n", + "n = 30\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "\n", + "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", + "test_samples = bytes(test_samples, encoding='utf8')\n", + "\n", + "# predict using the deployed model\n", + "result = service.run(input_data=test_samples)\n", + "\n", + "# compare actual value vs. the predicted values:\n", + "i = 0\n", + "plt.figure(figsize = (20, 1))\n", + "\n", + "for s in sample_indices:\n", + " plt.subplot(1, n, i + 1)\n", + " plt.axhline('')\n", + " plt.axvline('')\n", + " \n", + " # use different color for misclassified sample\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", + " \n", + " plt.text(x=10, y =-10, s=result[i], fontsize=18, color=font_color)\n", + " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", + " \n", + " i = i + 1\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also send raw HTTP request to test the web service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "score web service" + ] + }, + "outputs": [], + "source": [ + "import requests\n", + "\n", + "# send a random row from the test set to score\n", + "random_index = np.random.randint(0, len(X_test)-1)\n", + "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", + "\n", + "headers = {'Content-Type':'application/json'}\n", + "\n", + "# for AKS deployment you'd need to the service key in the header as well\n", + "# api_key = service.get_key()\n", + "# headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)} \n", + "\n", + "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", + "\n", + "print(\"POST to url\", service.scoring_uri)\n", + "#print(\"input data:\", input_data)\n", + "print(\"label:\", y_test[random_index])\n", + "print(\"prediction:\", resp.text)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up resources\n", + "\n", + "To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "delete web service" + ] + }, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you're not going to use what you've created here, delete the resources you just created with this quickstart so you don't incur any charges. In the Azure portal, select and delete your resource group. You can also keep the resource group, but delete a single workspace by displaying the workspace properties and selecting the Delete button.\n", + "\n", + "\n", + "## Next steps\n", + "\n", + "In this Azure Machine Learning tutorial, you used Python to:\n", + "\n", + "> * Set up your testing environment\n", + "> * Retrieve the model from your workspace\n", + "> * Test the model locally\n", + "> * Deploy the model to ACI\n", + "> * Test the deployed model\n", + " \n", + "You can also try out the [regression tutorial](regression-part1-data-prep.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/img-classification-part2-deploy.png)" + ] + } ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" + "metadata": { + "authors": [ + { + "name": "roastala" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + }, + "msauthor": "sgilley" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "sgilley" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/tutorials/regression-automated-ml.ipynb b/tutorials/regression-automated-ml.ipynb index 59629f3a..6482feb1 100644 --- a/tutorials/regression-automated-ml.ipynb +++ b/tutorials/regression-automated-ml.ipynb @@ -1,654 +1,654 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/regression-part2-automated-ml.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial: Use automated machine learning to predict taxi fares" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial, you use automated machine learning in Azure Machine Learning service to create a regression model to predict NYC taxi fare prices. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.\n", - "\n", - "In this tutorial you learn the following tasks:\n", - "\n", - "* Download, transform, and clean data using Azure Open Datasets\n", - "* Train an automated machine learning regression model\n", - "* Calculate model accuracy\n", - "\n", - "If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version](https://aka.ms/AMLFree) of Azure Machine Learning service today." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n", - "* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n", - "\n", - "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download and prepare data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Import the necessary packages. The Open Datasets package contains a class representing each data source (`NycTlcGreen` for example) to easily filter date parameters before downloading." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.opendatasets import NycTlcGreen\n", - "import pandas as pd\n", - "from datetime import datetime\n", - "from dateutil.relativedelta import relativedelta" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "green_taxi_df = pd.DataFrame([])\n", - "start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n", - "end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n", - "\n", - "for sample_month in range(12):\n", - " temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", - " .to_pandas_dataframe()\n", - " green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n", - " \n", - "green_taxi_df.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that the initial data is loaded, define a function to create various time-based features from the pickup datetime field. This will create new fields for the month number, day of month, day of week, and hour of day, and will allow the model to factor in time-based seasonality. \n", - "\n", - "Use the `apply()` function on the dataframe to iteratively apply the `build_time_features()` function to each row in the taxi data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def build_time_features(vector):\n", - " pickup_datetime = vector[0]\n", - " month_num = pickup_datetime.month\n", - " day_of_month = pickup_datetime.day\n", - " day_of_week = pickup_datetime.weekday()\n", - " hour_of_day = pickup_datetime.hour\n", - " \n", - " return pd.Series((month_num, day_of_month, day_of_week, hour_of_day))\n", - "\n", - "green_taxi_df[[\"month_num\", \"day_of_month\",\"day_of_week\", \"hour_of_day\"]] = green_taxi_df[[\"lpepPickupDatetime\"]].apply(build_time_features, axis=1)\n", - "green_taxi_df.head(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Remove some of the columns that you won't need for training or additional feature building." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "columns_to_remove = [\"lpepPickupDatetime\", \"lpepDropoffDatetime\", \"puLocationId\", \"doLocationId\", \"extra\", \"mtaTax\",\n", - " \"improvementSurcharge\", \"tollsAmount\", \"ehailFee\", \"tripType\", \"rateCodeID\", \n", - " \"storeAndFwdFlag\", \"paymentType\", \"fareAmount\", \"tipAmount\"\n", - " ]\n", - "for col in columns_to_remove:\n", - " green_taxi_df.pop(col)\n", - " \n", - "green_taxi_df.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Cleanse data " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the `describe()` function on the new dataframe to see summary statistics for each field." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "green_taxi_df.describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the summary statistics, you see that there are several fields that have outliers or values that will reduce model accuracy. First filter the lat/long fields to be within the bounds of the Manhattan area. This will filter out longer taxi trips or trips that are outliers in respect to their relationship with other features. \n", - "\n", - "Additionally filter the `tripDistance` field to be greater than zero but less than 31 miles (the haversine distance between the two lat/long pairs). This eliminates long outlier trips that have inconsistent trip cost.\n", - "\n", - "Lastly, the `totalAmount` field has negative values for the taxi fares, which don't make sense in the context of our model, and the `passengerCount` field has bad data with the minimum values being zero.\n", - "\n", - "Filter out these anomalies using query functions, and then remove the last few columns unnecessary for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "final_df = green_taxi_df.query(\"pickupLatitude>=40.53 and pickupLatitude<=40.88\")\n", - "final_df = final_df.query(\"pickupLongitude>=-74.09 and pickupLongitude<=-73.72\")\n", - "final_df = final_df.query(\"tripDistance>=0.25 and tripDistance<31\")\n", - "final_df = final_df.query(\"passengerCount>0 and totalAmount>0\")\n", - "\n", - "columns_to_remove_for_training = [\"pickupLongitude\", \"pickupLatitude\", \"dropoffLongitude\", \"dropoffLatitude\"]\n", - "for col in columns_to_remove_for_training:\n", - " final_df.pop(col)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Call `describe()` again on the data to ensure cleansing worked as expected. You now have a prepared and cleansed set of taxi, holiday, and weather data to use for machine learning model training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "final_df.describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure workspace\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a workspace object from the existing workspace. A [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is a class that accepts your Azure subscription and resource information. It also creates a cloud resource to monitor and track your model runs. `Workspace.from_config()` reads the file **config.json** and loads the authentication details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "ws = Workspace.from_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Split the data into train and test sets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Split the data into training and test sets by using the `train_test_split` function in the `scikit-learn` library. This function segregates the data into the x (**features**) data set for model training and the y (**values to predict**) data set for testing. The `test_size` parameter determines the percentage of data to allocate to testing. The `random_state` parameter sets a seed to the random generator, so that your train-test splits are deterministic." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.model_selection import train_test_split\n", - "\n", - "y_df = final_df.pop(\"totalAmount\")\n", - "x_df = final_df\n", - "\n", - "x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The purpose of this step is to have data points to test the finished model that haven't been used to train the model, in order to measure true accuracy. \n", - "\n", - "In other words, a well-trained model should be able to accurately make predictions from data it hasn't already seen. You now have data prepared for auto-training a machine learning model." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Automatically train a model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To automatically train a model, take the following steps:\n", - "1. Define settings for the experiment run. Attach your training data to the configuration, and modify settings that control the training process.\n", - "1. Submit the experiment for model tuning. After submitting the experiment, the process iterates through different machine learning algorithms and hyperparameter settings, adhering to your defined constraints. It chooses the best-fit model by optimizing an accuracy metric." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Define training settings" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Define the experiment parameter and model settings for training. View the full list of [settings](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train). Submitting the experiment with these default settings will take approximately 5-10 min, but if you want a shorter run time, reduce the `iterations` parameter.\n", - "\n", - "\n", - "|Property| Value in this tutorial |Description|\n", - "|----|----|---|\n", - "|**iteration_timeout_minutes**|2|Time limit in minutes for each iteration. Reduce this value to decrease total runtime.|\n", - "|**iterations**|20|Number of iterations. In each iteration, a new machine learning model is trained with your data. This is the primary value that affects total run time.|\n", - "|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|\n", - "|**preprocess**| True | By using **True**, the experiment can preprocess the input data (handling missing data, converting text to numeric, etc.)|\n", - "|**verbosity**| logging.INFO | Controls the level of logging.|\n", - "|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data is not specified.|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "\n", - "automl_settings = {\n", - " \"iteration_timeout_minutes\": 2,\n", - " \"iterations\": 20,\n", - " \"primary_metric\": 'spearman_correlation',\n", - " \"preprocess\": True,\n", - " \"verbosity\": logging.INFO,\n", - " \"n_cross_validations\": 5\n", - "}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use your defined training settings as a `**kwargs` parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl import AutoMLConfig\n", - "\n", - "automl_config = AutoMLConfig(task='regression',\n", - " debug_log='automated_ml_errors.log',\n", - " X=x_train.values,\n", - " y=y_train.values.flatten(),\n", - " **automl_settings)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same pre-processing steps applied during training are applied to your input data automatically." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train the automatic regression model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create an experiment object in your workspace. An experiment acts as a container for your individual runs. Pass the defined `automl_config` object to the experiment, and set the output to `True` to view progress during the run. \n", - "\n", - "After starting the experiment, the output shown updates live as the experiment runs. For each iteration, you see the model type, the run duration, and the training accuracy. The field `BEST` tracks the best running training score based on your metric type." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.experiment import Experiment\n", - "experiment = Experiment(ws, \"taxi-experiment\")\n", - "local_run = experiment.submit(automl_config, show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Explore the results of automatic training with a [Jupyter widget](https://docs.microsoft.com/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py). The widget allows you to see a graph and table of all individual run iterations, along with training accuracy metrics and metadata. Additionally, you can filter on different accuracy metrics than your primary metric with the dropdown selector." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(local_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the best model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Select the best model from your iterations. The `get_output` function returns the best run and the fitted model for the last fit invocation. By using the overloads on `get_output`, you can retrieve the best run and fitted model for any logged metric or a particular iteration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the best model accuracy" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the best model to run predictions on the test data set to predict taxi fares. The function `predict` uses the best model and predicts the values of y, **trip cost**, from the `x_test` data set. Print the first 10 predicted cost values from `y_predict`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "y_predict = fitted_model.predict(x_test.values)\n", - "print(y_predict[:10])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calculate the `root mean squared error` of the results. Convert the `y_test` dataframe to a list to compare to the predicted values. The function `mean_squared_error` takes two arrays of values and calculates the average squared error between them. Taking the square root of the result gives an error in the same units as the y variable, **cost**. It indicates roughly how far the taxi fare predictions are from the actual fares." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import mean_squared_error\n", - "from math import sqrt\n", - "\n", - "y_actual = y_test.values.flatten().tolist()\n", - "rmse = sqrt(mean_squared_error(y_actual, y_predict))\n", - "rmse" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run the following code to calculate mean absolute percent error (MAPE) by using the full `y_actual` and `y_predict` data sets. This metric calculates an absolute difference between each predicted and actual value and sums all the differences. Then it expresses that sum as a percent of the total of the actual values." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sum_actuals = sum_errors = 0\n", - "\n", - "for actual_val, predict_val in zip(y_actual, y_predict):\n", - " abs_error = actual_val - predict_val\n", - " if abs_error < 0:\n", - " abs_error = abs_error * -1\n", - "\n", - " sum_errors = sum_errors + abs_error\n", - " sum_actuals = sum_actuals + actual_val\n", - "\n", - "mean_abs_percent_error = sum_errors / sum_actuals\n", - "print(\"Model MAPE:\")\n", - "print(mean_abs_percent_error)\n", - "print()\n", - "print(\"Model Accuracy:\")\n", - "print(1 - mean_abs_percent_error)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "From the two prediction accuracy metrics, you see that the model is fairly good at predicting taxi fares from the data set's features, typically within +- $4.00, and approximately 15% error. \n", - "\n", - "The traditional machine learning model development process is highly resource-intensive, and requires significant domain knowledge and time investment to run and compare the results of dozens of models. Using automated machine learning is a great way to rapidly test many different models for your scenario." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up resources" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Do not complete this section if you plan on running other Azure Machine Learning service tutorials." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Stop the notebook VM" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you used a cloud notebook server, stop the VM when you are not using it to reduce cost." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "1. In your workspace, select **Notebook VMs**.\n", - "1. From the list, select the VM.\n", - "1. Select **Stop**.\n", - "1. When you're ready to use the server again, select **Start**." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete everything" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If you don't plan to use the resources you created, delete them, so you don't incur any charges." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "1. In the Azure portal, select **Resource groups** on the far left.\n", - "1. From the list, select the resource group you created.\n", - "1. Select **Delete resource group**.\n", - "1. Enter the resource group name. Then select **Delete**.\n", - "\n", - "You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next steps" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this automated machine learning tutorial, you did the following tasks:\n", - "\n", - "> * Configured a workspace and prepared data for an experiment.\n", - "> * Trained by using an automated regression model locally with custom parameters.\n", - "> * Explored and reviewed training results.\n", - "\n", - "[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "jeffshep" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/regression-part2-automated-ml.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Tutorial: Use automated machine learning to predict taxi fares" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this tutorial, you use automated machine learning in Azure Machine Learning service to create a regression model to predict NYC taxi fare prices. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.\n", + "\n", + "In this tutorial you learn the following tasks:\n", + "\n", + "* Download, transform, and clean data using Azure Open Datasets\n", + "* Train an automated machine learning regression model\n", + "* Calculate model accuracy\n", + "\n", + "If you don\u00e2\u20ac\u2122t have an Azure subscription, create a free account before you begin. Try the [free or paid version](https://aka.ms/AMLFree) of Azure Machine Learning service today." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n", + "* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n", + "\n", + "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download and prepare data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Import the necessary packages. The Open Datasets package contains a class representing each data source (`NycTlcGreen` for example) to easily filter date parameters before downloading." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.opendatasets import NycTlcGreen\n", + "import pandas as pd\n", + "from datetime import datetime\n", + "from dateutil.relativedelta import relativedelta" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "green_taxi_df = pd.DataFrame([])\n", + "start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n", + "end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n", + "\n", + "for sample_month in range(12):\n", + " temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n", + " .to_pandas_dataframe()\n", + " green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n", + " \n", + "green_taxi_df.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that the initial data is loaded, define a function to create various time-based features from the pickup datetime field. This will create new fields for the month number, day of month, day of week, and hour of day, and will allow the model to factor in time-based seasonality. \n", + "\n", + "Use the `apply()` function on the dataframe to iteratively apply the `build_time_features()` function to each row in the taxi data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def build_time_features(vector):\n", + " pickup_datetime = vector[0]\n", + " month_num = pickup_datetime.month\n", + " day_of_month = pickup_datetime.day\n", + " day_of_week = pickup_datetime.weekday()\n", + " hour_of_day = pickup_datetime.hour\n", + " \n", + " return pd.Series((month_num, day_of_month, day_of_week, hour_of_day))\n", + "\n", + "green_taxi_df[[\"month_num\", \"day_of_month\",\"day_of_week\", \"hour_of_day\"]] = green_taxi_df[[\"lpepPickupDatetime\"]].apply(build_time_features, axis=1)\n", + "green_taxi_df.head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Remove some of the columns that you won't need for training or additional feature building." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "columns_to_remove = [\"lpepPickupDatetime\", \"lpepDropoffDatetime\", \"puLocationId\", \"doLocationId\", \"extra\", \"mtaTax\",\n", + " \"improvementSurcharge\", \"tollsAmount\", \"ehailFee\", \"tripType\", \"rateCodeID\", \n", + " \"storeAndFwdFlag\", \"paymentType\", \"fareAmount\", \"tipAmount\"\n", + " ]\n", + "for col in columns_to_remove:\n", + " green_taxi_df.pop(col)\n", + " \n", + "green_taxi_df.head(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Cleanse data " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the `describe()` function on the new dataframe to see summary statistics for each field." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "green_taxi_df.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the summary statistics, you see that there are several fields that have outliers or values that will reduce model accuracy. First filter the lat/long fields to be within the bounds of the Manhattan area. This will filter out longer taxi trips or trips that are outliers in respect to their relationship with other features. \n", + "\n", + "Additionally filter the `tripDistance` field to be greater than zero but less than 31 miles (the haversine distance between the two lat/long pairs). This eliminates long outlier trips that have inconsistent trip cost.\n", + "\n", + "Lastly, the `totalAmount` field has negative values for the taxi fares, which don't make sense in the context of our model, and the `passengerCount` field has bad data with the minimum values being zero.\n", + "\n", + "Filter out these anomalies using query functions, and then remove the last few columns unnecessary for training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "final_df = green_taxi_df.query(\"pickupLatitude>=40.53 and pickupLatitude<=40.88\")\n", + "final_df = final_df.query(\"pickupLongitude>=-74.09 and pickupLongitude<=-73.72\")\n", + "final_df = final_df.query(\"tripDistance>=0.25 and tripDistance<31\")\n", + "final_df = final_df.query(\"passengerCount>0 and totalAmount>0\")\n", + "\n", + "columns_to_remove_for_training = [\"pickupLongitude\", \"pickupLatitude\", \"dropoffLongitude\", \"dropoffLatitude\"]\n", + "for col in columns_to_remove_for_training:\n", + " final_df.pop(col)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call `describe()` again on the data to ensure cleansing worked as expected. You now have a prepared and cleansed set of taxi, holiday, and weather data to use for machine learning model training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "final_df.describe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure workspace\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create a workspace object from the existing workspace. A [Workspace](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) is a class that accepts your Azure subscription and resource information. It also creates a cloud resource to monitor and track your model runs. `Workspace.from_config()` reads the file **config.json** and loads the authentication details into an object named `ws`. `ws` is used throughout the rest of the code in this tutorial." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "ws = Workspace.from_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Split the data into train and test sets" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Split the data into training and test sets by using the `train_test_split` function in the `scikit-learn` library. This function segregates the data into the x (**features**) data set for model training and the y (**values to predict**) data set for testing. The `test_size` parameter determines the percentage of data to allocate to testing. The `random_state` parameter sets a seed to the random generator, so that your train-test splits are deterministic." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.model_selection import train_test_split\n", + "\n", + "y_df = final_df.pop(\"totalAmount\")\n", + "x_df = final_df\n", + "\n", + "x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The purpose of this step is to have data points to test the finished model that haven't been used to train the model, in order to measure true accuracy. \n", + "\n", + "In other words, a well-trained model should be able to accurately make predictions from data it hasn't already seen. You now have data prepared for auto-training a machine learning model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Automatically train a model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To automatically train a model, take the following steps:\n", + "1. Define settings for the experiment run. Attach your training data to the configuration, and modify settings that control the training process.\n", + "1. Submit the experiment for model tuning. After submitting the experiment, the process iterates through different machine learning algorithms and hyperparameter settings, adhering to your defined constraints. It chooses the best-fit model by optimizing an accuracy metric." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define training settings" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Define the experiment parameter and model settings for training. View the full list of [settings](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train). Submitting the experiment with these default settings will take approximately 5-10 min, but if you want a shorter run time, reduce the `iterations` parameter.\n", + "\n", + "\n", + "|Property| Value in this tutorial |Description|\n", + "|----|----|---|\n", + "|**iteration_timeout_minutes**|2|Time limit in minutes for each iteration. Reduce this value to decrease total runtime.|\n", + "|**iterations**|20|Number of iterations. In each iteration, a new machine learning model is trained with your data. This is the primary value that affects total run time.|\n", + "|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|\n", + "|**preprocess**| True | By using **True**, the experiment can preprocess the input data (handling missing data, converting text to numeric, etc.)|\n", + "|**verbosity**| logging.INFO | Controls the level of logging.|\n", + "|**n_cross_validations**|5|Number of cross-validation splits to perform when validation data is not specified.|" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "automl_settings = {\n", + " \"iteration_timeout_minutes\": 2,\n", + " \"iterations\": 20,\n", + " \"primary_metric\": 'spearman_correlation',\n", + " \"preprocess\": True,\n", + " \"verbosity\": logging.INFO,\n", + " \"n_cross_validations\": 5\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use your defined training settings as a `**kwargs` parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.automl import AutoMLConfig\n", + "\n", + "automl_config = AutoMLConfig(task='regression',\n", + " debug_log='automated_ml_errors.log',\n", + " X=x_train.values,\n", + " y=y_train.values.flatten(),\n", + " **automl_settings)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Automated machine learning pre-processing steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same pre-processing steps applied during training are applied to your input data automatically." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Train the automatic regression model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Create an experiment object in your workspace. An experiment acts as a container for your individual runs. Pass the defined `automl_config` object to the experiment, and set the output to `True` to view progress during the run. \n", + "\n", + "After starting the experiment, the output shown updates live as the experiment runs. For each iteration, you see the model type, the run duration, and the training accuracy. The field `BEST` tracks the best running training score based on your metric type." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.experiment import Experiment\n", + "experiment = Experiment(ws, \"taxi-experiment\")\n", + "local_run = experiment.submit(automl_config, show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore the results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Explore the results of automatic training with a [Jupyter widget](https://docs.microsoft.com/python/api/azureml-widgets/azureml.widgets?view=azure-ml-py). The widget allows you to see a graph and table of all individual run iterations, along with training accuracy metrics and metadata. Additionally, you can filter on different accuracy metrics than your primary metric with the dropdown selector." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(local_run).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Retrieve the best model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Select the best model from your iterations. The `get_output` function returns the best run and the fitted model for the last fit invocation. By using the overloads on `get_output`, you can retrieve the best run and fitted model for any logged metric or a particular iteration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_run, fitted_model = local_run.get_output()\n", + "print(best_run)\n", + "print(fitted_model)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Test the best model accuracy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Use the best model to run predictions on the test data set to predict taxi fares. The function `predict` uses the best model and predicts the values of y, **trip cost**, from the `x_test` data set. Print the first 10 predicted cost values from `y_predict`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "y_predict = fitted_model.predict(x_test.values)\n", + "print(y_predict[:10])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Calculate the `root mean squared error` of the results. Convert the `y_test` dataframe to a list to compare to the predicted values. The function `mean_squared_error` takes two arrays of values and calculates the average squared error between them. Taking the square root of the result gives an error in the same units as the y variable, **cost**. It indicates roughly how far the taxi fare predictions are from the actual fares." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.metrics import mean_squared_error\n", + "from math import sqrt\n", + "\n", + "y_actual = y_test.values.flatten().tolist()\n", + "rmse = sqrt(mean_squared_error(y_actual, y_predict))\n", + "rmse" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the following code to calculate mean absolute percent error (MAPE) by using the full `y_actual` and `y_predict` data sets. This metric calculates an absolute difference between each predicted and actual value and sums all the differences. Then it expresses that sum as a percent of the total of the actual values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum_actuals = sum_errors = 0\n", + "\n", + "for actual_val, predict_val in zip(y_actual, y_predict):\n", + " abs_error = actual_val - predict_val\n", + " if abs_error < 0:\n", + " abs_error = abs_error * -1\n", + "\n", + " sum_errors = sum_errors + abs_error\n", + " sum_actuals = sum_actuals + actual_val\n", + "\n", + "mean_abs_percent_error = sum_errors / sum_actuals\n", + "print(\"Model MAPE:\")\n", + "print(mean_abs_percent_error)\n", + "print()\n", + "print(\"Model Accuracy:\")\n", + "print(1 - mean_abs_percent_error)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "From the two prediction accuracy metrics, you see that the model is fairly good at predicting taxi fares from the data set's features, typically within +- $4.00, and approximately 15% error. \n", + "\n", + "The traditional machine learning model development process is highly resource-intensive, and requires significant domain knowledge and time investment to run and compare the results of dozens of models. Using automated machine learning is a great way to rapidly test many different models for your scenario." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean up resources" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Do not complete this section if you plan on running other Azure Machine Learning service tutorials." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Stop the notebook VM" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you used a cloud notebook server, stop the VM when you are not using it to reduce cost." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. In your workspace, select **Notebook VMs**.\n", + "1. From the list, select the VM.\n", + "1. Select **Stop**.\n", + "1. When you're ready to use the server again, select **Start**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete everything" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you don't plan to use the resources you created, delete them, so you don't incur any charges." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "1. In the Azure portal, select **Resource groups** on the far left.\n", + "1. From the list, select the resource group you created.\n", + "1. Select **Delete resource group**.\n", + "1. Enter the resource group name. Then select **Delete**.\n", + "\n", + "You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this automated machine learning tutorial, you did the following tasks:\n", + "\n", + "> * Configured a workspace and prepared data for an experiment.\n", + "> * Trained by using an automated regression model locally with custom parameters.\n", + "> * Explored and reviewed training results.\n", + "\n", + "[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning service." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" + "metadata": { + "authors": [ + { + "name": "jeffshep" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "msauthor": "trbye" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - }, - "msauthor": "trbye" - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/tutorials/regression-automated-ml.yml b/tutorials/regression-automated-ml.yml index 20fa3135..f971e01d 100644 --- a/tutorials/regression-automated-ml.yml +++ b/tutorials/regression-automated-ml.yml @@ -4,4 +4,4 @@ dependencies: - azureml-sdk - azureml-train-automl - azureml-widgets - - azureml-opendatasets \ No newline at end of file + - azureml-opendatasets