diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index ae0f1c25..00000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,30 +0,0 @@ ---- -name: Bug report -about: Create a report to help us improve -title: "[Notebook issue]" -labels: '' -assignees: '' - ---- - -**Describe the bug** -A clear and concise description of what the bug is. - -Provide the following if applicable: -+ Your Python & SDK version -+ Python Scripts or the full notebook name -+ Pipeline definition -+ Environment definition -+ Example data -+ Any log files. -+ Run and Workspace Id - -**To Reproduce** -Steps to reproduce the behavior: -1. - -**Expected behavior** -A clear and concise description of what you expected to happen. - -**Additional context** -Add any other context about the problem here. diff --git a/.github/ISSUE_TEMPLATE/notebook-issue.md b/.github/ISSUE_TEMPLATE/notebook-issue.md deleted file mode 100644 index c012da21..00000000 --- a/.github/ISSUE_TEMPLATE/notebook-issue.md +++ /dev/null @@ -1,43 +0,0 @@ ---- -name: Notebook issue -about: Describe your notebook issue -title: "[Notebook] DESCRIPTIVE TITLE" -labels: notebook -assignees: '' - ---- - -### DESCRIPTION: Describe clearly + concisely - - -. -### REPRODUCIBLE: Steps - - -. -### EXPECTATION: Clear description - - -. -### CONFIG/ENVIRONMENT: -```Provide where applicable - -## Your Python & SDK version: - -## Environment definition: - -## Notebook name or Python scripts: - -## Run and Workspace Id: - -## Pipeline definition: - -## Example data: - -## Any log files: - - - - - -``` diff --git a/README.md b/README.md index a935eeef..b9e9241b 100644 --- a/README.md +++ b/README.md @@ -58,8 +58,10 @@ Visit this [community repository](https://github.com/microsoft/MLOps/tree/master Visit following repos to see projects contributed by Azure ML users: - [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis. - - [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT) + - [Learn about Natural Language Processing best practices using Azure Machine Learning service](https://github.com/microsoft/nlp) + - [Pre-Train BERT models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT) - [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion) + - [UMass Amherst Student Samples](https://github.com/katiehouse3/microsoft-azure-ml-notebooks) - A number of end-to-end machine learning notebooks, including machine translation, image classification, and customer churn, created by students in the 696DS course at UMass Amherst. ## Data/Telemetry This repository collects usage data and sends it to Mircosoft to help improve our products and services. Read Microsoft's [privacy statement to learn more](https://privacy.microsoft.com/en-US/privacystatement) diff --git a/configuration.ipynb b/configuration.ipynb index 555b0c73..95532139 100644 --- a/configuration.ipynb +++ b/configuration.ipynb @@ -103,7 +103,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.62 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.0.65 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/contrib/RAPIDS/README.md b/contrib/RAPIDS/README.md deleted file mode 100644 index c3628350..00000000 --- a/contrib/RAPIDS/README.md +++ /dev/null @@ -1,307 +0,0 @@ -## How to use the RAPIDS on AzureML materials -### Setting up requirements -The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: pip install \ with new package = progressbar2 (pip install progressbar2) - -After following the directions, the user should end up setting a conda environment (myenv)that can be activated in an Anaconda prompt - -The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage. - -  -### Getting and running the material -Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory: - -* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git - -On a conda prompt navigate to the local directory, activate the conda environment (myenv), where the Azure ML SDK was installed and launch Jupyter Notebook. - -* (myenv) C:\local_directory>jupyter notebook - -From the resulting browser at http://localhost:8888/tree, navigate to the master notebook: - -* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb - -  -The following notebook will appear: - -![](imgs/NotebookHome.png) - -  -### Master Jupyter Notebook -The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.) - -The first couple of functional steps import the necessary AzureML libraries. If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions. - -  -#### Setting up a Workspace -The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal: - -![](imgs/WorkSpaceSetUp.png) - - -It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is: - - subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d") - resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing") - workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester") - workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2") - -  -The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota. - -The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace - -![](imgs/saved_workspace.png) - -Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in: - -* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json - -  -#### Creating an AML Compute Target -Following step, creates an AML Compute Target - -![](imgs/target_creation.png) - -Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used. - -  -If the output of running the step has an error of the form: - -![](imgs/targeterror1.png) - -It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family. -You will need to request an increase node quota for that family in that region for **Machine Learning Services**. - -  -Another possible error is the following: - -![](imgs/targeterror2.png) - -Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled. - -The successful creation of the compute target would have an output like the following: - -![](imgs/targetsuccess.png) -  -#### RAPIDS script uploading and viewing -The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail). -If the user wants to use a different RAPIDS script, the references to the process_data.py script have to be changed - -![](imgs/scriptuploading.png) -  -#### Data Uploading -The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data - -The next few steps download and decompress the data and is made available to the script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data). - -  -The following functions are used to download and decompress the input data - - -![](imgs/dcf1.png) -![](imgs/dcf2.png) -![](imgs/dcf3.png) -![](imgs/dcf4.png) - -  -The next step uses those functions to download locally file: -http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz' -And to decompress it, into local folder path = .\mortgage_2000-2001 -The step takes several minutes, the intermediate outputs provide progress indicators. - -![](imgs/downamddecom.png) - -  -The decompressed data should have the following structure: -* .\mortgage_2000-2001\acq\Acquisition_Q.txt -* .\mortgage_2000-2001\perf\Performance_Q.txt -* .\mortgage_2000-2001\names.csv - -The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later. - -  -The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference fileroot = mortgage_2000-2001 - -The step takes several minutes to load the data, the output provides a progress indicator. - -![](imgs/datastore.png) - -Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the mortgage_2000-2001 data store reference - -  -#### Setting up required libraries and environment to run RAPIDS code -There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook. - -![](imgs/install2.png) - -  -#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment - -The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: *cpu\_training*; a flag that indicates if the run is meant to be processed with CPU-only, *gpu\_count*; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used - -![](imgs/wrapper.png) - -  -The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script. -In addition to the wrapper function arguments, two other arguments are passed: *data\_dir*, the directory where the data is stored and *end_year* is the largest year to use partition from. - - -As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary *max\_gpu\_count\_data\_partition_mapping* maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered -If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed - -  -#### Submitting Experiments -We are ready to submit experiments: launching the RAPIDS script with different sets of parameters. - -  -The following couple of steps submit experiments under different conditions. - -![](imgs/submission1.png) - -  -The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error - -  -If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear - -![](imgs/queue.png) - -  -When the experiment starts running, its status would appeared as Running and the output would change to something like this: - -![](imgs/running.png) - -  -#### Reproducing the performance gains plot results on the Blog Post -When the run has finished successfully, its status would appeared as Completed and the output would change to something like this: - -  -![](imgs/completed.png) - -Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post - -  - -![](imgs/2GPUs.png) - - -This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post - -  -![](imgs/3GPUs.png) - -This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post - -  -![](imgs/4gpus.png) - -This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost - -  -![](imgs/CPUBase.png) - -This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post - -  -![](imgs/OOM.png) - -This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed. - -  -##### Freeing Resources -In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.) - -![](imgs/clusterdelete.png) - -  -### RAPIDS Script -The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed - -The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook. - -The all internal functions to be used by the script are defined. - -  -#### Wrapper Auxiliary Functions: -The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface: - -![](imgs/wap1.png)![](imgs/wap2.png) - -  -A couple of other functions are wrappers for the submission of jobs to the DASK client: - -![](imgs/wap3.png) -![](imgs/wap4.png) - -  -#### Data Loading Functions: -The data is loaded through the use of the following three functions - -![](imgs/DLF1.png)![](imgs/DLF2.png)![](imgs/DLF3.png) - -All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas. - -  -#### Data Transformation and Feature Extraction Functions: -The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose: -![](imgs/fef1.png)![](imgs/fef2.png)![](imgs/fef3.png)![](imgs/fef4.png)![](imgs/fef5.png) - -![](imgs/fef6.png)![](imgs/fef7.png)![](imgs/fef8.png)![](imgs/fef9.png) - -  -#### Main() Function -The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client - -  -##### Setting Up DASK client: -The following lines: - -![](imgs/daskini.png) - -  -Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output: - -![](imgs/daskoutput.png) - -##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition - -![](imgs/ETL.png) - -  -##### Concentrating the data assigned to each DASK worker -The partitions assigned to each worker are concatenated and set up for training. - -![](imgs/Dask2.png) - -  -##### Setting Training Parameters -The parameters used for the training of a gradient boosted decision tree model are set up in the following code block: -![](imgs/PArameters.png) - -Notice how the parameters are modified when using the CPU-only mode. - -  -##### Launching the training of a gradient boosted decision tree model using XGBoost. - -![](imgs/training.png) - -The outputs of the script can be observed in the master notebook as the script is executed - -![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/RAPIDS/README.png) - - - - - - - - - - - - - - - - - diff --git a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb index f69e5080..97fecf56 100644 --- a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb +++ b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb @@ -1,554 +1,559 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids/azure-ml-with-nvidia-rapids.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# NVIDIA RAPIDS in Azure Machine Learning" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model in Azure.\n", - " \n", - "In this notebook, we will do the following:\n", - " \n", - "* Create an Azure Machine Learning Workspace\n", - "* Create an AMLCompute target\n", - "* Use a script to process our data and train a model\n", - "* Obtain the data required to run this sample\n", - "* Create an AML run configuration to launch a machine learning job\n", - "* Run the script to prepare data for training and train the model\n", - " \n", - "Prerequisites:\n", - "* An Azure subscription to create a Machine Learning Workspace\n", - "* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n", - "* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Verify if Azure ML SDK is installed" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from azureml.core import Workspace, Experiment\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "from azureml.core.compute import AmlCompute, ComputeTarget\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core import ScriptRunConfig\n", - "from azureml.widgets import RunDetails" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Azure ML Workspace" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n", - "skip this workspace creation step and move on to the next step to load the workspace.\n", - " \n", - "Important: in the code cell below, be sure to set the correct values for the subscription_id, \n", - "resource_group, workspace_name, region before executing this code cell." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"\")\n", - "resource_group = os.environ.get(\"RESOURCE_GROUP\", \"\")\n", - "workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"\")\n", - "workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"\")\n", - "\n", - "ws = Workspace.create(workspace_name, subscription_id=subscription_id, resource_group=resource_group, location=workspace_region)\n", - "\n", - "# write config to a local directory for future use\n", - "ws.write_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Load existing Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n", - "# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)\n", - "\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", - "\n", - "scripts_folder = \"scripts_folder\"\n", - "\n", - "if not os.path.isdir(scripts_folder):\n", - " os.mkdir(scripts_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create AML Compute Target" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n", - " \n", - "Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n", - " \n", - "The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "gpu_cluster_name = \"gpucluster\"\n", - "\n", - "if gpu_cluster_name in ws.compute_targets:\n", - " gpu_cluster = ws.compute_targets[gpu_cluster_name]\n", - " if gpu_cluster and type(gpu_cluster) is AmlCompute:\n", - " print('Found compute target. Will use {0} '.format(gpu_cluster_name))\n", - "else:\n", - " print(\"creating new cluster\")\n", - " # vm_size parameter below could be modified to one of the RAPIDS-supported VM types\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_NC6s_v2\", min_nodes=1, max_nodes = 1)\n", - "\n", - " # create the cluster\n", - " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n", - " gpu_cluster.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Script to process data and train model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS Mortgage E2E example](https://github.com/rapidsai/notebooks-contrib/blob/master/intermediate_notebooks/E2E/mortgage/mortgage_e2e.ipynb)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# copy process_data.py into the script folder\n", - "import shutil\n", - "shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Data required to run this sample" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Downloading Data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tarfile\n", - "import hashlib\n", - "from urllib.request import urlretrieve\n", - "\n", - "def validate_downloaded_data(path):\n", - " if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n", - " if(os.path.isdir(path + '//acq' ) and len(os.listdir(path + '//acq')) == 8):\n", - " if(os.path.isdir(path + '//perf' ) and len(os.listdir(path + '//perf')) == 11):\n", - " print(\"Data has been downloaded and decompressed at: {0}\".format(path))\n", - " return True\n", - " print(\"Data has not been downloaded and decompressed\")\n", - " return False\n", - "\n", - "def show_progress(count, block_size, total_size):\n", - " global pbar\n", - " global processed\n", - " \n", - " if count == 0:\n", - " pbar = ProgressBar(maxval=total_size)\n", - " processed = 0\n", - " \n", - " processed += block_size\n", - " processed = min(processed,total_size)\n", - " pbar.update(processed)\n", - "\n", - " \n", - "def download_file(fileroot):\n", - " filename = fileroot + '.tgz'\n", - " if(not os.path.exists(filename) or hashlib.md5(open(filename, 'rb').read()).hexdigest() != '82dd47135053303e9526c2d5c43befd5' ):\n", - " url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n", - " url = url_format.format(fileroot)\n", - " print(\"...Downloading file :{0}\".format(filename))\n", - " urlretrieve(url, filename)\n", - " pbar.finish()\n", - " print(\"...File :{0} finished downloading\".format(filename))\n", - " else:\n", - " print(\"...File :{0} has been downloaded already\".format(filename))\n", - " return filename\n", - "\n", - "def decompress_file(filename,path):\n", - " tar = tarfile.open(filename)\n", - " print(\"...Getting information from {0} about files to decompress\".format(filename))\n", - " members = tar.getmembers()\n", - " numFiles = len(members)\n", - " so_far = 0\n", - " for member_info in members:\n", - " tar.extract(member_info,path=path)\n", - " so_far += 1\n", - " print(\"...All {0} files have been decompressed\".format(numFiles))\n", - " tar.close()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "fileroot = 'mortgage_2000-2001'\n", - "path = '.\\\\{0}'.format(fileroot)\n", - "pbar = None\n", - "processed = 0\n", - "\n", - "if(not validate_downloaded_data(path)):\n", - " print(\"Downloading and Decompressing Input Data\")\n", - " filename = download_file(fileroot)\n", - " decompress_file(filename,path)\n", - " print(\"Input Data has been Downloaded and Decompressed\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Uploading Data to Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "\n", - "# download and uncompress data in a local directory before uploading to data store\n", - "# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n", - "\n", - "# ---->>>> UNCOMMENT THE BELOW LINE TO UPLOAD YOUR DATA IF NOT DONE SO ALREADY <<<<----\n", - "# ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n", - "\n", - "# data already uploaded to the datastore\n", - "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create AML run configuration to launch a machine learning job" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n", - "1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n", - "2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n", - "\n", - "The second option is the recommended option in AML. \n", - "The following steps have code for both options. You can pick the one that is more appropriate for your requirements. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Specify prebuilt conda environment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subequent runs use the cached image and the overhead is minimal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n", - "run_config = RunConfiguration(conda_dependencies=cd)\n", - "run_config.framework = 'python'\n", - "run_config.target = gpu_cluster_name\n", - "run_config.environment.docker.enabled = True\n", - "run_config.environment.docker.gpu_support = True\n", - "run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/base-gpu:intelmpi2018.3-cuda10.0-cudnn7-ubuntu16.04\"\n", - "run_config.environment.spark.precache_packages = False\n", - "run_config.data_references={'data':data_ref.to_config()}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Using Docker" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can specify RAPIDS Docker image." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# run_config = RunConfiguration()\n", - "# run_config.framework = 'python'\n", - "# run_config.environment.python.user_managed_dependencies = True\n", - "# run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n", - "# run_config.target = gpu_cluster_name\n", - "# run_config.environment.docker.enabled = True\n", - "# run_config.environment.docker.gpu_support = True\n", - "# run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n", - "# # run_config.environment.docker.base_image_registry.address = '' # not required if the base_image is in Docker hub\n", - "# # run_config.environment.docker.base_image_registry.username = '' # needed only for private images\n", - "# # run_config.environment.docker.base_image_registry.password = '' # needed only for private images\n", - "# run_config.environment.spark.precache_packages = False\n", - "# run_config.data_references={'data':data_ref.to_config()}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Wrapper function to submit Azure Machine Learning experiment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n", - "# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n", - "def run_rapids_experiment(cpu_training, gpu_count, part_count):\n", - " # any value between 1-4 is allowed here depending the type of VMs available in gpu_cluster\n", - " if gpu_count not in [1, 2, 3, 4]:\n", - " raise Exception('Value specified for the number of GPUs to use {0} is invalid'.format(gpu_count))\n", - "\n", - " # following data partition mapping is empirical (specific to GPUs used and current data partitioning scheme) and may need to be tweaked\n", - " max_gpu_count_data_partition_mapping = {1: 3, 2: 4, 3: 6, 4: 8}\n", - " \n", - " if part_count > max_gpu_count_data_partition_mapping[gpu_count]:\n", - " print(\"Too many partitions for the number of GPUs, exceeding memory threshold\")\n", - " \n", - " if part_count > 11:\n", - " print(\"Warning: Maximum number of partitions available is 11\")\n", - " part_count = 11\n", - " \n", - " end_year = 2000\n", - " \n", - " if part_count > 4:\n", - " end_year = 2001 # use more data with more GPUs\n", - "\n", - " src = ScriptRunConfig(source_directory=scripts_folder, \n", - " script='process_data.py', \n", - " arguments = ['--num_gpu', gpu_count, '--data_dir', str(data_ref),\n", - " '--part_count', part_count, '--end_year', end_year,\n", - " '--cpu_predictor', cpu_training\n", - " ],\n", - " run_config=run_config\n", - " )\n", - "\n", - " exp = Experiment(ws, 'rapidstest')\n", - " run = exp.submit(config=src)\n", - " RunDetails(run).show()\n", - " return run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit experiment (ETL & training on GPU)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cpu_predictor = False\n", - "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n", - "num_gpu = 1\n", - "data_part_count = 1\n", - "# train using CPU, use GPU for both ETL and training\n", - "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit experiment (ETL on GPU, training on CPU)\n", - "\n", - "To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "cpu_predictor = True\n", - "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n", - "num_gpu = 1\n", - "data_part_count = 1\n", - "# train using CPU, use GPU for ETL\n", - "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete cluster" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# delete the cluster\n", - "# gpu_cluster.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ksivas" - } + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# NVIDIA RAPIDS in Azure Machine Learning" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c3\u201a\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c2\u00a0in Azure.\n", + " \n", + "In this notebook, we will do the following:\n", + " \n", + "* Create an Azure Machine Learning Workspace\n", + "* Create an AMLCompute target\n", + "* Use a script to process our data and train a model\n", + "* Obtain the data required to run this sample\n", + "* Create an AML run configuration to launch a machine learning job\n", + "* Run the script to prepare data for training and train the model\n", + " \n", + "Prerequisites:\n", + "* An Azure subscription to create a Machine Learning Workspace\n", + "* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n", + "* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Verify if Azure ML SDK is installed" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "from azureml.core import Workspace, Experiment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core import ScriptRunConfig\n", + "from azureml.widgets import RunDetails" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Azure ML Workspace" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n", + "skip this workspace creation step and move on to the next step to load the workspace.\n", + " \n", + "Important: in the code cell below, be sure to set the correct values for the subscription_id, \n", + "resource_group, workspace_name, region before executing this code cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"\")\n", + "resource_group = os.environ.get(\"RESOURCE_GROUP\", \"\")\n", + "workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"\")\n", + "workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"\")\n", + "\n", + "ws = Workspace.create(workspace_name, subscription_id=subscription_id, resource_group=resource_group, location=workspace_region)\n", + "\n", + "# write config to a local directory for future use\n", + "ws.write_config()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Load existing Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n", + "# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", + "\n", + "scripts_folder = \"scripts_folder\"\n", + "\n", + "if not os.path.isdir(scripts_folder):\n", + " os.mkdir(scripts_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create AML Compute Target" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n", + " \n", + "Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n", + " \n", + "The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "gpu_cluster_name = \"gpucluster\"\n", + "\n", + "if gpu_cluster_name in ws.compute_targets:\n", + " gpu_cluster = ws.compute_targets[gpu_cluster_name]\n", + " if gpu_cluster and type(gpu_cluster) is AmlCompute:\n", + " print('found compute target. just use it. ' + gpu_cluster_name)\n", + "else:\n", + " print(\"creating new cluster\")\n", + " # vm_size parameter below could be modified to one of the RAPIDS-supported VM types\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_NC6s_v2\", min_nodes=1, max_nodes = 1)\n", + "\n", + " # create the cluster\n", + " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n", + " gpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Script to process data and train model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The _process_data.py_ script used in the step below is a slightly modified implementation of [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# copy process_data.py into the script folder\n", + "import shutil\n", + "shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))\n", + "\n", + "with open(os.path.join(scripts_folder, './process_data.py'), 'r') as process_data_script:\n", + " print(process_data_script.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data required to run this sample" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Downloading Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Important: Python package progressbar2 is necessary to run the following cell. If it is not available in your environment where this notebook is running, please install it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import tarfile\n", + "import hashlib\n", + "from urllib.request import urlretrieve\n", + "from progressbar import ProgressBar\n", + "\n", + "def validate_downloaded_data(path):\n", + " if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n", + " if(os.path.isdir(path + '//acq' ) and len(os.listdir(path + '//acq')) == 8):\n", + " if(os.path.isdir(path + '//perf' ) and len(os.listdir(path + '//perf')) == 11):\n", + " print(\"Data has been downloaded and decompressed at: {0}\".format(path))\n", + " return True\n", + " print(\"Data has not been downloaded and decompressed\")\n", + " return False\n", + "\n", + "def show_progress(count, block_size, total_size):\n", + " global pbar\n", + " global processed\n", + " \n", + " if count == 0:\n", + " pbar = ProgressBar(maxval=total_size)\n", + " processed = 0\n", + " \n", + " processed += block_size\n", + " processed = min(processed,total_size)\n", + " pbar.update(processed)\n", + "\n", + " \n", + "def download_file(fileroot):\n", + " filename = fileroot + '.tgz'\n", + " if(not os.path.exists(filename) or hashlib.md5(open(filename, 'rb').read()).hexdigest() != '82dd47135053303e9526c2d5c43befd5' ):\n", + " url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n", + " url = url_format.format(fileroot)\n", + " print(\"...Downloading file :{0}\".format(filename))\n", + " urlretrieve(url, filename,show_progress)\n", + " pbar.finish()\n", + " print(\"...File :{0} finished downloading\".format(filename))\n", + " else:\n", + " print(\"...File :{0} has been downloaded already\".format(filename))\n", + " return filename\n", + "\n", + "def decompress_file(filename,path):\n", + " tar = tarfile.open(filename)\n", + " print(\"...Getting information from {0} about files to decompress\".format(filename))\n", + " members = tar.getmembers()\n", + " numFiles = len(members)\n", + " so_far = 0\n", + " for member_info in members:\n", + " tar.extract(member_info,path=path)\n", + " show_progress(so_far, 1, numFiles)\n", + " so_far += 1\n", + " pbar.finish()\n", + " print(\"...All {0} files have been decompressed\".format(numFiles))\n", + " tar.close()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fileroot = 'mortgage_2000-2001'\n", + "path = '.\\\\{0}'.format(fileroot)\n", + "pbar = None\n", + "processed = 0\n", + "\n", + "if(not validate_downloaded_data(path)):\n", + " print(\"Downloading and Decompressing Input Data\")\n", + " filename = download_file(fileroot)\n", + " decompress_file(filename,path)\n", + " print(\"Input Data has been Downloaded and Decompressed\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Uploading Data to Workspace" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ds = ws.get_default_datastore()\n", + "\n", + "# download and uncompress data in a local directory before uploading to data store\n", + "# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n", + "ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n", + "\n", + "# data already uploaded to the datastore\n", + "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create AML run configuration to launch a machine learning job" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n", + "1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n", + "2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n", + "\n", + "The second option is the recommended option in AML. \n", + "The following steps have code for both options. You can pick the one that is more appropriate for your requirements. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Specify prebuilt conda environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code shows how to use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/) that has a prebuilt conda environment named 'rapids' when creating a RunConfiguration. Note that this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc. This package is automatically installed when you use 'Specify package dependencies' option and that is why it is the recommended option to create RunConfiguraiton in AML." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run_config = RunConfiguration()\n", + "run_config.framework = 'python'\n", + "run_config.environment.python.user_managed_dependencies = True\n", + "run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n", + "run_config.target = gpu_cluster_name\n", + "run_config.environment.docker.enabled = True\n", + "run_config.environment.docker.gpu_support = True\n", + "run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n", + "# run_config.environment.docker.base_image_registry.address = '' # not required if the base_image is in Docker hub\n", + "# run_config.environment.docker.base_image_registry.username = '' # needed only for private images\n", + "# run_config.environment.docker.base_image_registry.password = '' # needed only for private images\n", + "run_config.environment.spark.precache_packages = False\n", + "run_config.data_references={'data':data_ref.to_config()}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Specify package dependencies" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The following code shows how to list package dependencies in a conda environment definition file (rapids.yml) when creating a RunConfiguration" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n", + "# run_config = RunConfiguration(conda_dependencies=cd)\n", + "# run_config.framework = 'python'\n", + "# run_config.target = gpu_cluster_name\n", + "# run_config.environment.docker.enabled = True\n", + "# run_config.environment.docker.gpu_support = True\n", + "# run_config.environment.docker.base_image = \"\"\n", + "# run_config.environment.docker.base_image_registry.address = '' # not required if the base_image is in Docker hub\n", + "# run_config.environment.docker.base_image_registry.username = '' # needed only for private images\n", + "# run_config.environment.docker.base_image_registry.password = '' # needed only for private images\n", + "# run_config.environment.spark.precache_packages = False\n", + "# run_config.data_references={'data':data_ref.to_config()}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Wrapper function to submit Azure Machine Learning experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n", + "# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n", + "def run_rapids_experiment(cpu_training, gpu_count, part_count):\n", + " # any value between 1-4 is allowed here depending the type of VMs available in gpu_cluster\n", + " if gpu_count not in [1, 2, 3, 4]:\n", + " raise Exception('Value specified for the number of GPUs to use {0} is invalid'.format(gpu_count))\n", + "\n", + " # following data partition mapping is empirical (specific to GPUs used and current data partitioning scheme) and may need to be tweaked\n", + " max_gpu_count_data_partition_mapping = {1: 3, 2: 4, 3: 6, 4: 8}\n", + " \n", + " if part_count > max_gpu_count_data_partition_mapping[gpu_count]:\n", + " print(\"Too many partitions for the number of GPUs, exceeding memory threshold\")\n", + " \n", + " if part_count > 11:\n", + " print(\"Warning: Maximum number of partitions available is 11\")\n", + " part_count = 11\n", + " \n", + " end_year = 2000\n", + " \n", + " if part_count > 4:\n", + " end_year = 2001 # use more data with more GPUs\n", + "\n", + " src = ScriptRunConfig(source_directory=scripts_folder, \n", + " script='process_data.py', \n", + " arguments = ['--num_gpu', gpu_count, '--data_dir', str(data_ref),\n", + " '--part_count', part_count, '--end_year', end_year,\n", + " '--cpu_predictor', cpu_training\n", + " ],\n", + " run_config=run_config\n", + " )\n", + "\n", + " exp = Experiment(ws, 'rapidstest')\n", + " run = exp.submit(config=src)\n", + " RunDetails(run).show()\n", + " return run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit experiment (ETL & training on GPU)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cpu_predictor = False\n", + "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n", + "num_gpu = 1\n", + "data_part_count = 1\n", + "# train using CPU, use GPU for both ETL and training\n", + "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit experiment (ETL on GPU, training on CPU)\n", + "\n", + "To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "cpu_predictor = True\n", + "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n", + "num_gpu = 1\n", + "data_part_count = 1\n", + "# train using CPU, use GPU for ETL\n", + "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Delete cluster" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# delete the cluster\n", + "# gpu_cluster.delete()" + ] + } ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" + "metadata": { + "authors": [ + { + "name": "ksivas" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/contrib/RAPIDS/imgs/2GPUs.png b/contrib/RAPIDS/imgs/2GPUs.png deleted file mode 100644 index 07e38374..00000000 Binary files a/contrib/RAPIDS/imgs/2GPUs.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/3GPUs.png b/contrib/RAPIDS/imgs/3GPUs.png deleted file mode 100644 index 80e44c4e..00000000 Binary files a/contrib/RAPIDS/imgs/3GPUs.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/4gpus.png b/contrib/RAPIDS/imgs/4gpus.png deleted file mode 100644 index 28411cdd..00000000 Binary files a/contrib/RAPIDS/imgs/4gpus.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/CPUBase.png b/contrib/RAPIDS/imgs/CPUBase.png deleted file mode 100644 index f84869de..00000000 Binary files a/contrib/RAPIDS/imgs/CPUBase.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/DLF1.png b/contrib/RAPIDS/imgs/DLF1.png deleted file mode 100644 index 673454fe..00000000 Binary files a/contrib/RAPIDS/imgs/DLF1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/DLF2.png b/contrib/RAPIDS/imgs/DLF2.png deleted file mode 100644 index ea45be22..00000000 Binary files a/contrib/RAPIDS/imgs/DLF2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/DLF3.png b/contrib/RAPIDS/imgs/DLF3.png deleted file mode 100644 index 2cf0ab9d..00000000 Binary files a/contrib/RAPIDS/imgs/DLF3.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/Dask2.png b/contrib/RAPIDS/imgs/Dask2.png deleted file mode 100644 index 2a4c9248..00000000 Binary files a/contrib/RAPIDS/imgs/Dask2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/ETL.png b/contrib/RAPIDS/imgs/ETL.png deleted file mode 100644 index 2b8001d1..00000000 Binary files a/contrib/RAPIDS/imgs/ETL.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/NotebookHome.png b/contrib/RAPIDS/imgs/NotebookHome.png deleted file mode 100644 index 16b45760..00000000 Binary files a/contrib/RAPIDS/imgs/NotebookHome.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/OOM.png b/contrib/RAPIDS/imgs/OOM.png deleted file mode 100644 index 0121f1b0..00000000 Binary files a/contrib/RAPIDS/imgs/OOM.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/PArameters.png b/contrib/RAPIDS/imgs/PArameters.png deleted file mode 100644 index 6279164d..00000000 Binary files a/contrib/RAPIDS/imgs/PArameters.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/WorkSpaceSetUp.png b/contrib/RAPIDS/imgs/WorkSpaceSetUp.png deleted file mode 100644 index fb09d2f0..00000000 Binary files a/contrib/RAPIDS/imgs/WorkSpaceSetUp.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/clusterdelete.png b/contrib/RAPIDS/imgs/clusterdelete.png deleted file mode 100644 index 634b92d6..00000000 Binary files a/contrib/RAPIDS/imgs/clusterdelete.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/completed.png b/contrib/RAPIDS/imgs/completed.png deleted file mode 100644 index ddf04e20..00000000 Binary files a/contrib/RAPIDS/imgs/completed.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/daskini.png b/contrib/RAPIDS/imgs/daskini.png deleted file mode 100644 index f1cd700d..00000000 Binary files a/contrib/RAPIDS/imgs/daskini.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/daskoutput.png b/contrib/RAPIDS/imgs/daskoutput.png deleted file mode 100644 index b69d988d..00000000 Binary files a/contrib/RAPIDS/imgs/daskoutput.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/datastore.png b/contrib/RAPIDS/imgs/datastore.png deleted file mode 100644 index 0a5b3289..00000000 Binary files a/contrib/RAPIDS/imgs/datastore.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/dcf1.png b/contrib/RAPIDS/imgs/dcf1.png deleted file mode 100644 index 173b2dc9..00000000 Binary files a/contrib/RAPIDS/imgs/dcf1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/dcf2.png b/contrib/RAPIDS/imgs/dcf2.png deleted file mode 100644 index 4c890759..00000000 Binary files a/contrib/RAPIDS/imgs/dcf2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/dcf3.png b/contrib/RAPIDS/imgs/dcf3.png deleted file mode 100644 index 58ba3be4..00000000 Binary files a/contrib/RAPIDS/imgs/dcf3.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/dcf4.png b/contrib/RAPIDS/imgs/dcf4.png deleted file mode 100644 index 086815f1..00000000 Binary files a/contrib/RAPIDS/imgs/dcf4.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/downamddecom.png b/contrib/RAPIDS/imgs/downamddecom.png deleted file mode 100644 index f02b5b89..00000000 Binary files a/contrib/RAPIDS/imgs/downamddecom.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef1.png b/contrib/RAPIDS/imgs/fef1.png deleted file mode 100644 index e15ee2d3..00000000 Binary files a/contrib/RAPIDS/imgs/fef1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef2.png b/contrib/RAPIDS/imgs/fef2.png deleted file mode 100644 index dd5426ee..00000000 Binary files a/contrib/RAPIDS/imgs/fef2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef3.png b/contrib/RAPIDS/imgs/fef3.png deleted file mode 100644 index 5fe4ecb2..00000000 Binary files a/contrib/RAPIDS/imgs/fef3.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef4.png b/contrib/RAPIDS/imgs/fef4.png deleted file mode 100644 index 0883617e..00000000 Binary files a/contrib/RAPIDS/imgs/fef4.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef5.png b/contrib/RAPIDS/imgs/fef5.png deleted file mode 100644 index ec3e4428..00000000 Binary files a/contrib/RAPIDS/imgs/fef5.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef6.png b/contrib/RAPIDS/imgs/fef6.png deleted file mode 100644 index 295a86d5..00000000 Binary files a/contrib/RAPIDS/imgs/fef6.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef7.png b/contrib/RAPIDS/imgs/fef7.png deleted file mode 100644 index 1281df0b..00000000 Binary files a/contrib/RAPIDS/imgs/fef7.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef8.png b/contrib/RAPIDS/imgs/fef8.png deleted file mode 100644 index 49f096d5..00000000 Binary files a/contrib/RAPIDS/imgs/fef8.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/fef9.png b/contrib/RAPIDS/imgs/fef9.png deleted file mode 100644 index 8f5abbce..00000000 Binary files a/contrib/RAPIDS/imgs/fef9.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/install2.png b/contrib/RAPIDS/imgs/install2.png deleted file mode 100644 index 24f3d29c..00000000 Binary files a/contrib/RAPIDS/imgs/install2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/installation.png b/contrib/RAPIDS/imgs/installation.png deleted file mode 100644 index 8b06c540..00000000 Binary files a/contrib/RAPIDS/imgs/installation.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/queue.png b/contrib/RAPIDS/imgs/queue.png deleted file mode 100644 index ab51a1e5..00000000 Binary files a/contrib/RAPIDS/imgs/queue.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/running.png b/contrib/RAPIDS/imgs/running.png deleted file mode 100644 index 13a327fe..00000000 Binary files a/contrib/RAPIDS/imgs/running.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/saved_workspace.png b/contrib/RAPIDS/imgs/saved_workspace.png deleted file mode 100644 index fdc1919f..00000000 Binary files a/contrib/RAPIDS/imgs/saved_workspace.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/scriptuploading.png b/contrib/RAPIDS/imgs/scriptuploading.png deleted file mode 100644 index d0726784..00000000 Binary files a/contrib/RAPIDS/imgs/scriptuploading.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/submission1.png b/contrib/RAPIDS/imgs/submission1.png deleted file mode 100644 index d07e0889..00000000 Binary files a/contrib/RAPIDS/imgs/submission1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/target_creation.png b/contrib/RAPIDS/imgs/target_creation.png deleted file mode 100644 index b98d623a..00000000 Binary files a/contrib/RAPIDS/imgs/target_creation.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/targeterror1.png b/contrib/RAPIDS/imgs/targeterror1.png deleted file mode 100644 index d1c2884a..00000000 Binary files a/contrib/RAPIDS/imgs/targeterror1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/targeterror2.png b/contrib/RAPIDS/imgs/targeterror2.png deleted file mode 100644 index 69a3d9b8..00000000 Binary files a/contrib/RAPIDS/imgs/targeterror2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/targetsuccess.png b/contrib/RAPIDS/imgs/targetsuccess.png deleted file mode 100644 index 301ebefb..00000000 Binary files a/contrib/RAPIDS/imgs/targetsuccess.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/training.png b/contrib/RAPIDS/imgs/training.png deleted file mode 100644 index d047a9ce..00000000 Binary files a/contrib/RAPIDS/imgs/training.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/wap1.png b/contrib/RAPIDS/imgs/wap1.png deleted file mode 100644 index 1d336565..00000000 Binary files a/contrib/RAPIDS/imgs/wap1.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/wap2.png b/contrib/RAPIDS/imgs/wap2.png deleted file mode 100644 index 245458a5..00000000 Binary files a/contrib/RAPIDS/imgs/wap2.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/wap3.png b/contrib/RAPIDS/imgs/wap3.png deleted file mode 100644 index 8d5553da..00000000 Binary files a/contrib/RAPIDS/imgs/wap3.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/wap4.png b/contrib/RAPIDS/imgs/wap4.png deleted file mode 100644 index 56ce1a10..00000000 Binary files a/contrib/RAPIDS/imgs/wap4.png and /dev/null differ diff --git a/contrib/RAPIDS/imgs/wrapper.png b/contrib/RAPIDS/imgs/wrapper.png deleted file mode 100644 index 0f4ab763..00000000 Binary files a/contrib/RAPIDS/imgs/wrapper.png and /dev/null differ diff --git a/contrib/RAPIDS/process_data.py b/contrib/RAPIDS/process_data.py index be8d54de..474cc83a 100644 --- a/contrib/RAPIDS/process_data.py +++ b/contrib/RAPIDS/process_data.py @@ -15,6 +15,21 @@ from glob import glob import os import argparse +def initialize_rmm_pool(): + from librmm_cffi import librmm_config as rmm_cfg + + rmm_cfg.use_pool_allocator = True + #rmm_cfg.initial_pool_size = 2<<30 # set to 2GiB. Default is 1/2 total GPU memory + import cudf + return cudf._gdf.rmm_initialize() + +def initialize_rmm_no_pool(): + from librmm_cffi import librmm_config as rmm_cfg + + rmm_cfg.use_pool_allocator = False + import cudf + return cudf._gdf.rmm_initialize() + def run_dask_task(func, **kwargs): task = func(**kwargs) return task @@ -192,26 +207,26 @@ def gpu_load_names(col_path): def create_ever_features(gdf, **kwargs): everdf = gdf[['loan_id', 'current_loan_delinquency_status']] - everdf = everdf.groupby('loan_id', method='hash').max().reset_index() + everdf = everdf.groupby('loan_id', method='hash').max() del(gdf) - everdf['ever_30'] = (everdf['current_loan_delinquency_status'] >= 1).astype('int8') - everdf['ever_90'] = (everdf['current_loan_delinquency_status'] >= 3).astype('int8') - everdf['ever_180'] = (everdf['current_loan_delinquency_status'] >= 6).astype('int8') - everdf.drop_column('current_loan_delinquency_status') + everdf['ever_30'] = (everdf['max_current_loan_delinquency_status'] >= 1).astype('int8') + everdf['ever_90'] = (everdf['max_current_loan_delinquency_status'] >= 3).astype('int8') + everdf['ever_180'] = (everdf['max_current_loan_delinquency_status'] >= 6).astype('int8') + everdf.drop_column('max_current_loan_delinquency_status') return everdf def create_delinq_features(gdf, **kwargs): delinq_gdf = gdf[['loan_id', 'monthly_reporting_period', 'current_loan_delinquency_status']] del(gdf) - delinq_30 = delinq_gdf.query('current_loan_delinquency_status >= 1')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index() - delinq_30['delinquency_30'] = delinq_30['monthly_reporting_period'] - delinq_30.drop_column('monthly_reporting_period') - delinq_90 = delinq_gdf.query('current_loan_delinquency_status >= 3')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index() - delinq_90['delinquency_90'] = delinq_90['monthly_reporting_period'] - delinq_90.drop_column('monthly_reporting_period') - delinq_180 = delinq_gdf.query('current_loan_delinquency_status >= 6')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index() - delinq_180['delinquency_180'] = delinq_180['monthly_reporting_period'] - delinq_180.drop_column('monthly_reporting_period') + delinq_30 = delinq_gdf.query('current_loan_delinquency_status >= 1')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min() + delinq_30['delinquency_30'] = delinq_30['min_monthly_reporting_period'] + delinq_30.drop_column('min_monthly_reporting_period') + delinq_90 = delinq_gdf.query('current_loan_delinquency_status >= 3')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min() + delinq_90['delinquency_90'] = delinq_90['min_monthly_reporting_period'] + delinq_90.drop_column('min_monthly_reporting_period') + delinq_180 = delinq_gdf.query('current_loan_delinquency_status >= 6')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min() + delinq_180['delinquency_180'] = delinq_180['min_monthly_reporting_period'] + delinq_180.drop_column('min_monthly_reporting_period') del(delinq_gdf) delinq_merge = delinq_30.merge(delinq_90, how='left', on=['loan_id'], type='hash') delinq_merge['delinquency_90'] = delinq_merge['delinquency_90'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]')) @@ -264,15 +279,16 @@ def create_joined_df(gdf, everdf, **kwargs): def create_12_mon_features(joined_df, **kwargs): testdfs = [] n_months = 12 - for y in range(1, n_months + 1): tmpdf = joined_df[['loan_id', 'timestamp_year', 'timestamp_month', 'delinquency_12', 'upb_12']] tmpdf['josh_months'] = tmpdf['timestamp_year'] * 12 + tmpdf['timestamp_month'] tmpdf['josh_mody_n'] = ((tmpdf['josh_months'].astype('float64') - 24000 - y) / 12).floor() - tmpdf = tmpdf.groupby(['loan_id', 'josh_mody_n'], method='hash').agg({'delinquency_12': 'max','upb_12': 'min'}).reset_index() - tmpdf['delinquency_12'] = (tmpdf['delinquency_12']>3).astype('int32') - tmpdf['delinquency_12'] +=(tmpdf['upb_12']==0).astype('int32') - tmpdf['upb_12'] = tmpdf['upb_12'] + tmpdf = tmpdf.groupby(['loan_id', 'josh_mody_n'], method='hash').agg({'delinquency_12': 'max','upb_12': 'min'}) + tmpdf['delinquency_12'] = (tmpdf['max_delinquency_12']>3).astype('int32') + tmpdf['delinquency_12'] +=(tmpdf['min_upb_12']==0).astype('int32') + tmpdf.drop_column('max_delinquency_12') + tmpdf['upb_12'] = tmpdf['min_upb_12'] + tmpdf.drop_column('min_upb_12') tmpdf['timestamp_year'] = (((tmpdf['josh_mody_n'] * n_months) + 24000 + (y - 1)) / 12).floor().astype('int16') tmpdf['timestamp_month'] = np.int8(y) tmpdf.drop_column('josh_mody_n') @@ -313,7 +329,6 @@ def last_mile_cleaning(df, **kwargs): 'delinquency_30', 'delinquency_90', 'delinquency_180', 'upb_12', 'zero_balance_effective_date','foreclosed_after', 'disposition_date','timestamp' ] - for column in drop_list: df.drop_column(column) for col, dtype in df.dtypes.iteritems(): @@ -327,6 +342,7 @@ def last_mile_cleaning(df, **kwargs): return df.to_arrow(preserve_index=False) def main(): + #print('XGBOOST_BUILD_DOC is ' + os.environ['XGBOOST_BUILD_DOC']) parser = argparse.ArgumentParser("rapidssample") parser.add_argument("--data_dir", type=str, help="location of data") parser.add_argument("--num_gpu", type=int, help="Number of GPUs to use", default=1) @@ -348,6 +364,7 @@ def main(): print('data_dir = {0}'.format(data_dir)) print('num_gpu = {0}'.format(num_gpu)) print('part_count = {0}'.format(part_count)) + #part_count = part_count + 1 # adding one because the usage below is not inclusive print('end_year = {0}'.format(end_year)) print('cpu_predictor = {0}'.format(cpu_predictor)) @@ -363,17 +380,19 @@ def main(): client print(client.ncores()) - # to download data for this notebook, visit https://rapidsai.github.io/demos/datasets/mortgage-data and update the following paths accordingly +# to download data for this notebook, visit https://rapidsai.github.io/demos/datasets/mortgage-data and update the following paths accordingly acq_data_path = "{0}/acq".format(data_dir) #"/rapids/data/mortgage/acq" perf_data_path = "{0}/perf".format(data_dir) #"/rapids/data/mortgage/perf" col_names_path = "{0}/names.csv".format(data_dir) # "/rapids/data/mortgage/names.csv" start_year = 2000 +#end_year = 2000 # end_year is inclusive -- converted to parameter +#part_count = 2 # the number of data files to train against -- converted to parameter + client.run(initialize_rmm_pool) client - print('--->>> Workers used: {0}'.format(client.ncores())) - - # NOTE: The ETL calculates additional features which are then dropped before creating the XGBoost DMatrix. - # This can be optimized to avoid calculating the dropped features. + print(client.ncores()) +# NOTE: The ETL calculates additional features which are then dropped before creating the XGBoost DMatrix. +# This can be optimized to avoid calculating the dropped features. print("Reading ...") t1 = datetime.datetime.now() gpu_dfs = [] @@ -395,9 +414,14 @@ def main(): wait(gpu_dfs) t2 = datetime.datetime.now() - print("Reading time: {0}".format(str(t2-t1))) - print('--->>> Number of data parts: {0}'.format(len(gpu_dfs))) - + print("Reading time ...") + print(t2-t1) + print('len(gpu_dfs) is {0}'.format(len(gpu_dfs))) + + client.run(cudf._gdf.rmm_finalize) + client.run(initialize_rmm_no_pool) + client + print(client.ncores()) dxgb_gpu_params = { 'nround': 100, 'max_depth': 8, @@ -414,7 +438,7 @@ def main(): 'n_gpus': 1, 'distributed_dask': True, 'loss': 'ls', - 'objective': 'reg:squarederror', + 'objective': 'gpu:reg:linear', 'max_features': 'auto', 'criterion': 'friedman_mse', 'grow_policy': 'lossguide', @@ -422,13 +446,13 @@ def main(): } if cpu_predictor: - print('\n---->>>> Training using CPUs <<<<----\n') + print('Training using CPUs') dxgb_gpu_params['predictor'] = 'cpu_predictor' dxgb_gpu_params['tree_method'] = 'hist' dxgb_gpu_params['objective'] = 'reg:linear' else: - print('\n---->>>> Training using GPUs <<<<----\n') + print('Training using GPUs') print('Training parameters are {0}'.format(dxgb_gpu_params)) @@ -457,13 +481,14 @@ def main(): gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs] gc.collect() wait(gpu_dfs) - - # TRAIN THE MODEL + labels = None t1 = datetime.datetime.now() bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_dfs, labels, num_boost_round=dxgb_gpu_params['nround']) t2 = datetime.datetime.now() - print('\n---->>>> Training time: {0} <<<<----\n'.format(str(t2-t1))) + print("Training time ...") + print(t2-t1) + print('str(bst) is {0}'.format(str(bst))) print('Exiting script') if __name__ == '__main__': diff --git a/contrib/RAPIDS/rapids.yml b/contrib/RAPIDS/rapids.yml deleted file mode 100644 index 24d01713..00000000 --- a/contrib/RAPIDS/rapids.yml +++ /dev/null @@ -1,48 +0,0 @@ -name: rapids0.9 -channels: - - nvidia - - rapidsai/label/xgboost - - rapidsai - - conda-forge - - numba - - pytorch -dependencies: - - python=3.7 - - pytorch - - cudatoolkit=10.0 - - dask-cuda=0.9.1 - - cudf=0.9.* - - cuml=0.9.* - - cugraph=0.9.* - - rapidsai/label/xgboost::xgboost=0.90.rapidsdev1 - - rapidsai/label/xgboost::dask-xgboost=0.2.* - - conda-forge::numpy=1.16.4 - - cython - - dask - - distributed=2.3.2 - - pynvml=8.0.2 - - gcsfs - - requests - - jupyterhub - - jupyterlab - - matplotlib - - ipywidgets - - ipyvolume - - seaborn - - scipy - - pandas - - boost - - nodejs - - pytest - - pip - - pip: - - git+https://github.com/cupy/cupy.git - - setuptools - - torch - - torchvision - - pytorch-ignite - - graphviz - - networkx - - dask-kubernetes - - dask_labextension - - jupyterlab-nvdashboard diff --git a/how-to-use-azureml/automated-machine-learning/automl_env.yml b/how-to-use-azureml/automated-machine-learning/automl_env.yml index 8114c9d8..20bf96b9 100644 --- a/how-to-use-azureml/automated-machine-learning/automl_env.yml +++ b/how-to-use-azureml/automated-machine-learning/automl_env.yml @@ -6,7 +6,7 @@ dependencies: - python>=3.5.2,<3.6.8 - nb_conda - matplotlib==2.1.0 -- numpy>=1.11.0,<=1.16.2 +- numpy>=1.16.0,<=1.16.2 - cython - urllib3<1.24 - scipy>=1.0.0,<=1.1.0 @@ -14,6 +14,7 @@ dependencies: - pandas>=0.22.0,<=0.23.4 - py-xgboost<=0.80 - pyarrow>=0.11.0 +- conda-forge::fbprophet==0.5 - pip: # Required packages for AzureML execution, history, and data preparation. diff --git a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml index 36114400..179e46b5 100644 --- a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml +++ b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml @@ -7,7 +7,7 @@ dependencies: - python>=3.5.2,<3.6.8 - nb_conda - matplotlib==2.1.0 -- numpy>=1.11.0,<=1.16.2 +- numpy>=1.16.0,<=1.16.2 - cython - urllib3<1.24 - scipy>=1.0.0,<=1.1.0 @@ -15,6 +15,7 @@ dependencies: - pandas>=0.22.0,<0.23.0 - py-xgboost<=0.80 - pyarrow>=0.11.0 +- conda-forge::fbprophet==0.5 - pip: # Required packages for AzureML execution, history, and data preparation. diff --git a/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb b/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb index d13b3bb2..6204db34 100644 --- a/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb +++ b/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb @@ -191,7 +191,7 @@ "source": [ "### Load Data\n", "\n", - "Load the bank marketing dataset into X_train and y_train. X_train contains the training features, which are inputs to the model. y_train contains the training labels, which are the expected output of the model." + "Load the bank marketing dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model." ] }, { @@ -202,8 +202,6 @@ "source": [ "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_train.csv\"\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n", - "X_train = dataset.drop_columns(columns=['y'])\n", - "y_train = dataset.keep_columns(columns=['y'], validate=True)\n", "dataset.take(5).to_pandas_dataframe()" ] }, @@ -222,8 +220,8 @@ "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", + "|**training_data**|Input dataset, containing both features and label column.|\n", + "|**label_column_name**|The name of the label column.|\n", "\n", "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)" ] @@ -247,8 +245,8 @@ "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", " run_configuration=conda_run_config,\n", - " X = X_train,\n", - " y = y_train,\n", + " training_data = dataset,\n", + " label_column_name = 'y',\n", " **automl_settings\n", " )" ] @@ -428,7 +426,7 @@ "outputs": [], "source": [ "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", - " pip_packages=['azureml-train-automl'])\n", + " pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "conda_env_file_name = 'myenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -465,45 +463,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image\n", - "\n", - "Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n", - "or when testing a model that is under development." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"bmData\", 'type': \"automl_classification\"},\n", - " description = \"Image for automl classification sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance\n", - "\n", - "Deploy an image that contains the model and other assets needed by the service." + "### Deploy the model as a Web Service on Azure Container Instance" ] }, { @@ -512,28 +472,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 1, \n", " tags = {'area': \"bmData\", 'type': \"automl_classification\"}, \n", - " description = 'sample service for Automl Classification')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", + " description = 'sample service for Automl Classification')\n", "\n", "aci_service_name = 'automl-sample-bankmarketing'\n", "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] diff --git a/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb b/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb index ffcf6261..56f6fdb0 100644 --- a/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb +++ b/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb @@ -191,7 +191,7 @@ "source": [ "### Load Data\n", "\n", - "Load the credit card dataset into X and y. X contains the features, which are inputs to the model. y contains the labels, which are the expected output of the model. Next split the data using random_split and return X_train and y_train for training the model." + "Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model." ] }, { @@ -202,10 +202,10 @@ "source": [ "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n", "dataset = Dataset.Tabular.from_delimited_files(data)\n", - "X = dataset.drop_columns(columns=['Class'])\n", - "y = dataset.keep_columns(columns=['Class'], validate=True)\n", - "X_train, X_test = X.random_split(percentage=0.8, seed=223)\n", - "y_train, y_test = y.random_split(percentage=0.8, seed=223)" + "training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n", + "label_column_name = 'Class'\n", + "X_test = validation_data.drop_columns(columns=[label_column_name])\n", + "y_test = validation_data.keep_columns(columns=[label_column_name], validate=True)\n" ] }, { @@ -223,8 +223,8 @@ "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", + "|**training_data**|Input dataset, containing both features and label column.|\n", + "|**label_column_name**|The name of the label column.|\n", "\n", "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)" ] @@ -255,8 +255,8 @@ "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", " run_configuration=conda_run_config,\n", - " X = X_train,\n", - " y = y_train,\n", + " training_data = training_data,\n", + " label_column_name = label_column_name,\n", " **automl_settings\n", " )" ] @@ -435,7 +435,7 @@ "outputs": [], "source": [ "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", - " pip_packages=['azureml-train-automl'])\n", + " pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "conda_env_file_name = 'myenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -472,45 +472,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image\n", - "\n", - "Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n", - "or when testing a model that is under development." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"cards\", 'type': \"automl_classification\"},\n", - " description = \"Image for automl classification sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance\n", - "\n", - "Deploy an image that contains the model and other assets needed by the service." + "### Deploy the model as a Web Service on Azure Container Instance" ] }, { @@ -519,28 +481,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 1, \n", " tags = {'area': \"cards\", 'type': \"automl_classification\"}, \n", - " description = 'sample service for Automl Classification')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", + " description = 'sample service for Automl Classification')\n", "\n", "aci_service_name = 'automl-sample-creditcard'\n", "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] diff --git a/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb b/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb index 930fb4f1..7f428fd8 100644 --- a/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb +++ b/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb @@ -305,7 +305,7 @@ "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost<=0.80'],\n", - " pip_packages=['azureml-train-automl'])\n", + " pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "conda_env_file_name = 'myenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -342,40 +342,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"digits\", 'type': \"automl_classification\"},\n", - " description = \"Image for automl classification sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance" + "### Deploy the model as a Web Service on Azure Container Instance\n", + "\n", + "Create the configuration needed for deploying the model as a web service service." ] }, { @@ -384,8 +353,13 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", + "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 1, \n", " tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n", @@ -399,17 +373,33 @@ "outputs": [], "source": [ "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", "\n", "aci_service_name = 'automl-sample-01'\n", "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Get the logs from service deployment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if aci_service.state != 'Healthy':\n", + " # run this command for debugging.\n", + " print(aci_service.get_logs())" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -426,22 +416,6 @@ "#aci_service.delete()" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get Logs from a Deployed Web Service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#aci_service.get_logs()" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb b/how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb index 2dd27e1f..d57166e1 100644 --- a/how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb +++ b/how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb @@ -138,8 +138,8 @@ "metadata": {}, "outputs": [], "source": [ - "X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n", - "y = dataset.keep_columns(columns=['Primary Type'], validate=True)" + "training_data = dataset.drop_columns(columns=['FBI Code'])\n", + "label_column_name = 'Primary Type'" ] }, { @@ -251,8 +251,8 @@ "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", " run_configuration=conda_run_config,\n", - " X = X,\n", - " y = y,\n", + " training_data = training_data,\n", + " label_column_name = label_column_name,\n", " **automl_settings)" ] }, diff --git a/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb b/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb index 89ac30d8..727bd939 100644 --- a/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb +++ b/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb @@ -138,8 +138,8 @@ "metadata": {}, "outputs": [], "source": [ - "X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n", - "y = dataset.keep_columns(columns=['Primary Type'], validate=True)" + "training_data = dataset.drop_columns(columns=['FBI Code'])\n", + "label_column_name = 'Primary Type'" ] }, { @@ -183,8 +183,8 @@ "source": [ "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", - " X = X,\n", - " y = y,\n", + " training_data = training_data,\n", + " label_column_name = label_column_name,\n", " **automl_settings)" ] }, diff --git a/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.yml b/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.yml index 87242fe5..5fc7806a 100644 --- a/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.yml +++ b/how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.yml @@ -6,3 +6,4 @@ dependencies: - azureml-widgets - matplotlib - pandas_ml + - azureml-dataprep[pandas] diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb index e15f87cd..a721a891 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb @@ -218,8 +218,8 @@ "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error\n", "|**iterations**|Number of iterations. In each iteration, Auto ML trains a specific pipeline on the given data|\n", "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n", + "|**training_data**|Input dataset, containing both features and label column.|\n", + "|**label_column_name**|The name of the label column.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", "|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n", "\n", @@ -247,9 +247,9 @@ " blacklist_models = ['ExtremeRandomTrees'],\n", " iterations=10,\n", " iteration_timeout_minutes=5,\n", - " X=X_train,\n", - " y=y_train,\n", - " n_cross_validations=3,\n", + " training_data=train,\n", + " label_column_name=target_column_name,\n", + " n_cross_validations=3, \n", " verbosity=logging.INFO,\n", " **automl_settings)" ] diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb index bf7764e5..3d9d497d 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb @@ -227,7 +227,7 @@ "automl_config = AutoMLConfig(task='forecasting',\n", " debug_log='automl_nyc_energy_errors.log',\n", " primary_metric='normalized_root_mean_squared_error',\n", - " blacklist_models = ['ExtremeRandomTrees'],\n", + " blacklist_models = ['ExtremeRandomTrees', 'AutoArima'],\n", " iterations=10,\n", " iteration_timeout_minutes=5,\n", " X=X_train,\n", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb index eec96f7f..6f5ff246 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb @@ -663,7 +663,7 @@ "for p in ['azureml-train-automl', 'azureml-core']:\n", " print('{}\\t{}'.format(p, dependencies[p]))\n", "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-train-automl'])\n", + "myenv = CondaDependencies.create(conda_packages=['numpy>=1.16.0,<=1.16.2','scikit-learn','fbprophet==0.5'], pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "myenv.save_to_file('.', conda_env_file_name)" ] @@ -700,40 +700,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'type': \"automl-forecasting\"},\n", - " description = \"Image for automl forecasting sample\")\n", - "\n", - "image = Image.create(name = \"automl-fcast-image\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance" + "### Deploy the model as a Web Service on Azure Container Instance" ] }, { @@ -742,29 +709,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 2, \n", " tags = {'type': \"automl-forecasting\"},\n", - " description = \"Automl forecasting sample service\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", + " description = \"Automl forecasting sample service\")\n", "\n", "aci_service_name = 'automl-forecast-01'\n", "print(aci_service_name)\n", - "\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] diff --git a/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb new file mode 100644 index 00000000..129cdf89 --- /dev/null +++ b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb @@ -0,0 +1,593 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Automated Machine Learning\n", + "_**Regression on remote compute using Computer Hardware dataset with model explanations**_\n", + "\n", + "## Contents\n", + "1. [Introduction](#Introduction)\n", + "1. [Setup](#Setup)\n", + "1. [Train](#Train)\n", + "1. [Results](#Results)\n", + "1. [Explanations](#Explanations)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction\n", + "\n", + "In this example we use the Hardware Performance Dataset to showcase how you can use AutoML for a simple regression problem. After training AutoML models for this regression data set, we show how you can compute model explanations on your remote compute using a sample explainer script.\n", + "\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Configure AutoML using `AutoMLConfig`.\n", + "3. Train the model using remote compute.\n", + "4. Explore the results.\n", + "5. Setup remote compute for computing the model explanations for a given AutoML model.\n", + "6. Start an AzureML experiment on your remote compute to compute explanations for an AutoML model.\n", + "7. Download the feature importance for engineered features and visualize the explanations for engineered features. \n", + "8. Download the feature importance for raw features and visualize the explanations for raw features. \n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import logging\n", + "\n", + "from matplotlib import pyplot as plt\n", + "import pandas as pd\n", + "import os\n", + "\n", + "import azureml.core\n", + "from azureml.core.experiment import Experiment\n", + "from azureml.core.workspace import Workspace\n", + "from azureml.core.dataset import Dataset\n", + "from azureml.train.automl import AutoMLConfig" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "\n", + "# choose a name for experiment\n", + "experiment_name = 'automl-regression-computer-hardware'\n", + "\n", + "experiment=Experiment(ws, experiment_name)\n", + "\n", + "output = {}\n", + "output['SDK version'] = azureml.core.VERSION\n", + "output['Subscription ID'] = ws.subscription_id\n", + "output['Workspace'] = ws.name\n", + "output['Resource Group'] = ws.resource_group\n", + "output['Location'] = ws.location\n", + "output['Experiment Name'] = experiment.name\n", + "pd.set_option('display.max_colwidth', -1)\n", + "outputDf = pd.DataFrame(data = output, index = [''])\n", + "outputDf.T" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach existing AmlCompute\n", + "You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n", + "#### Creation of AmlCompute takes approximately 5 minutes. \n", + "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", + "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import AmlCompute\n", + "from azureml.core.compute import ComputeTarget\n", + "\n", + "# Choose a name for your cluster.\n", + "amlcompute_cluster_name = \"automlcl\"\n", + "\n", + "found = False\n", + "# Check if this compute target already exists in the workspace.\n", + "cts = ws.compute_targets\n", + "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[amlcompute_cluster_name]\n", + " \n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", + " #vm_priority = 'lowpriority', # optional\n", + " max_nodes = 6)\n", + "\n", + " # Create the cluster.\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + " \n", + "print('Checking cluster status...')\n", + "# Can poll for a minimum number of nodes and for a specific timeout.\n", + "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", + "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", + " \n", + "# For a more detailed view of current AmlCompute status, use get_status()." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conda Dependecies for AutoML training experiment\n", + "\n", + "Create the conda dependencies for running AutoML experiment on remote compute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "import pkg_resources\n", + "\n", + "# create a new RunConfig object\n", + "conda_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to AmlCompute\n", + "conda_run_config.target = compute_target\n", + "conda_run_config.environment.docker.enabled = True\n", + "\n", + "cd = CondaDependencies.create(conda_packages=['numpy','py-xgboost<=0.80'])\n", + "conda_run_config.environment.python.conda_dependencies = cd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup Training and Test Data for AutoML experiment\n", + "\n", + "Here we create the train and test datasets for hardware performance dataset. We also register the datasets in your workspace using a name so that these datasets may be accessed from the remote compute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Data source\n", + "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n", + "\n", + "# Create dataset from the url\n", + "dataset = Dataset.Tabular.from_delimited_files(data)\n", + "\n", + "# Split the dataset into train and test datasets\n", + "train_dataset, test_dataset = dataset.random_split(percentage=0.8, seed=223)\n", + "\n", + "# Register the train dataset with your workspace\n", + "train_dataset.register(workspace = ws, name = 'hardware_performance_train_dataset',\n", + " description = 'hardware performance training data',\n", + " create_new_version=True)\n", + "\n", + "# Register the test dataset with your workspace\n", + "test_dataset.register(workspace = ws, name = 'hardware_performance_test_dataset',\n", + " description = 'hardware performance test data',\n", + " create_new_version=True)\n", + "\n", + "# Drop the labeled column from the train dataset\n", + "X_train = train_dataset.drop_columns(columns=['ERP'])\n", + "y_train = train_dataset.keep_columns(columns=['ERP'], validate=True)\n", + "\n", + "# Drop the labeled column from the test dataset\n", + "X_test = test_dataset.drop_columns(columns=['ERP']) \n", + "\n", + "# Display the top rows in the train dataset\n", + "X_train.take(5).to_pandas_dataframe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train\n", + "\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "\n", + "|Property|Description|\n", + "|-|-|\n", + "|**task**|classification or regression|\n", + "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n", + "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", + "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", + "|**n_cross_validations**|Number of cross validation splits.|\n", + "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", + "|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n", + "\n", + "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_settings = {\n", + " \"iteration_timeout_minutes\": 5,\n", + " \"iterations\": 10,\n", + " \"n_cross_validations\": 2,\n", + " \"primary_metric\": 'spearman_correlation',\n", + " \"preprocess\": True,\n", + " \"max_concurrent_iterations\": 1,\n", + " \"verbosity\": logging.INFO,\n", + "}\n", + "\n", + "automl_config = AutoMLConfig(task = 'regression',\n", + " debug_log = 'automl_errors_model_exp.log',\n", + " run_configuration=conda_run_config,\n", + " X = X_train,\n", + " y = y_train,\n", + " **automl_settings\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while.\n", + "In this example, we specify `show_output = True` to print currently running iterations to the console." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run = experiment.submit(automl_config, show_output = True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "remote_run" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Results" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Widget for Monitoring Runs\n", + "\n", + "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "\n", + "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(remote_run).show() " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explanations\n", + "This section will walk you through the workflow to compute model explanations for an AutoML model on your remote compute.\n", + "\n", + "### Retrieve any AutoML Model for explanations\n", + "\n", + "Below we select the some AutoML pipeline from our iterations. The `get_output` method returns the a AutoML run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_run, fitted_model = remote_run.get_output(iteration=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup model explanation run on the remote compute\n", + "The following section provides details on how to setup an AzureML experiment to run model explanations for an AutoML model on your remote compute." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Sample script used for computing explanations\n", + "View the sample script for computing the model explanations for your AutoML model on remote compute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open('train_explainer.py', 'r') as cefr:\n", + " print(cefr.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Substitute values in your sample script\n", + "The following cell shows how you change the values in the sample script so that you can change the sample script according to your experiment and dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "# create script folder\n", + "script_folder = './sample_projects/automl-regression-computer-hardware'\n", + "if not os.path.exists(script_folder):\n", + " os.makedirs(script_folder)\n", + "\n", + "# Copy the sample script to script folder.\n", + "shutil.copy('train_explainer.py', script_folder)\n", + "\n", + "# Create the explainer script that will run on the remote compute.\n", + "script_file_name = script_folder + '/train_explainer.py'\n", + "\n", + "# Open the sample script for modification\n", + "with open(script_file_name, 'r') as cefr:\n", + " content = cefr.read()\n", + "\n", + "# Replace the values in train_explainer.py file with the appropriate values\n", + "content = content.replace('<>', automl_run.experiment.name) # your experiment name.\n", + "content = content.replace('<>', automl_run.id) # Run-id of the AutoML run for which you want to explain the model.\n", + "content = content.replace('<>', 'ERP') # Your target column name\n", + "content = content.replace('<>', 'regression') # Training task type\n", + "# Name of your training dataset register with your workspace\n", + "content = content.replace('<>', 'hardware_performance_train_dataset') \n", + "# Name of your test dataset register with your workspace\n", + "content = content.replace('<>', 'hardware_performance_test_dataset')\n", + "\n", + "# Write sample file into your script folder.\n", + "with open(script_file_name, 'w') as cefw:\n", + " cefw.write(content)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Create conda configuration for model explanations experiment\n", + "We need `azureml-explain-model`, `azureml-train-automl` and `azureml-core` packages for computing model explanations for your AutoML model on remote compute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "import pkg_resources\n", + "\n", + "# create a new RunConfig object\n", + "conda_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to AmlCompute\n", + "conda_run_config.target = compute_target\n", + "conda_run_config.environment.docker.enabled = True\n", + "azureml_pip_packages = [\n", + " 'azureml-train-automl', 'azureml-core', 'azureml-explain-model'\n", + "]\n", + "\n", + "# specify CondaDependencies obj\n", + "conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", + " conda_packages=['scikit-learn', 'numpy','py-xgboost<=0.80'],\n", + " pip_packages=azureml_pip_packages)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Submit the experiment for model explanations\n", + "Submit the experiment with the above `run_config` and the sample script for computing explanations." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Now submit a run on AmlCompute for model explanations\n", + "from azureml.core.script_run_config import ScriptRunConfig\n", + "\n", + "script_run_config = ScriptRunConfig(source_directory=script_folder,\n", + " script='train_explainer.py',\n", + " run_config=conda_run_config)\n", + "\n", + "run = experiment.submit(script_run_config)\n", + "\n", + "# Show run details\n", + "run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "# Shows output of the run on stdout.\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Feature importance and explanation dashboard\n", + "In this section we describe how you can download the explanation results from the explanations experiment and visualize the feature importance for your AutoML model. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Setup for visualizing the model explanation results\n", + "For visualizing the explanation results for the *fitted_model* we need to perform the following steps:-\n", + "1. Featurize test data samples.\n", + "\n", + "The *automl_explainer_setup_obj* contains all the structures from above list. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n", + "explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download engineered feature importance from artifact store\n", + "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *automl_run*. You can also use ExplanationDashboard to view the dash board visualization of the feature importance values of the engineered features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.explain.model._internal.explanation_client import ExplanationClient\n", + "from azureml.contrib.explain.model.visualize import ExplanationDashboard\n", + "client = ExplanationClient.from_run(automl_run)\n", + "engineered_explanations = client.download_model_explanation(raw=False)\n", + "print(engineered_explanations.get_feature_importance_dict())\n", + "ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, explainer_setup_class.X_test_transform)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download raw feature importance from artifact store\n", + "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *automl_run*. You can also use ExplanationDashboard to view the dash board visualization of the feature importance values of the raw features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "raw_explanations = client.download_model_explanation(raw=True)\n", + "print(raw_explanations.get_feature_importance_dict())\n", + "ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, explainer_setup_class.X_test_raw)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "v-rasav" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.yml b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.yml new file mode 100644 index 00000000..d5733851 --- /dev/null +++ b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.yml @@ -0,0 +1,10 @@ +name: auto-ml-model-explanations-remote-compute +dependencies: +- pip: + - azureml-sdk + - azureml-train-automl + - azureml-widgets + - matplotlib + - pandas_ml + - azureml-explain-model + - azureml-contrib-explain-model diff --git a/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/train_explainer.py b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/train_explainer.py new file mode 100644 index 00000000..1748d3eb --- /dev/null +++ b/how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/train_explainer.py @@ -0,0 +1,64 @@ +# Copyright (c) Microsoft. All rights reserved. +# Licensed under the MIT license. +import os + +from azureml.core.run import Run +from azureml.core.experiment import Experiment +from sklearn.externals import joblib +from azureml.core.dataset import Dataset +from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations +from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel +from azureml.explain.model.mimic_wrapper import MimicWrapper +from automl.client.core.common.constants import MODEL_PATH + + +OUTPUT_DIR = './outputs/' +os.makedirs(OUTPUT_DIR, exist_ok=True) + +# Get workspace from the run context +run = Run.get_context() +ws = run.experiment.workspace + +# Get the AutoML run object from the experiment name and the workspace +experiment = Experiment(ws, '<>') +automl_run = Run(experiment=experiment, run_id='<>') + +# Download the best model from the artifact store +automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl') + +# Load the AutoML model into memory +fitted_model = joblib.load('model.pkl') + +# Get the train dataset from the workspace +train_dataset = Dataset.get_by_name(workspace=ws, name='<>') +# Drop the lablled column to get the training set. +X_train = train_dataset.drop_columns(columns=['<>']) +y_train = train_dataset.keep_columns(columns=['<>'], validate=True) + +# Get the train dataset from the workspace +test_dataset = Dataset.get_by_name(workspace=ws, name='<>') +# Drop the lablled column to get the testing set. +X_test = test_dataset.drop_columns(columns=['<>']) + +# Setup the class for explaining the AtuoML models +automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, '<>', + X=X_train, X_test=X_test, + y=y_train) + +# Initialize the Mimic Explainer +explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, + init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run, + features=automl_explainer_setup_obj.engineered_feature_names, + feature_maps=[automl_explainer_setup_obj.feature_map], + classes=automl_explainer_setup_obj.classes) + +# Compute the engineered explanations +engineered_explanations = explainer.explain(['local', 'global'], + eval_dataset=automl_explainer_setup_obj.X_test_transform) + +# Compute the raw explanations +raw_explanations = explainer.explain(['local', 'global'], get_raw=True, + raw_feature_names=automl_explainer_setup_obj.raw_feature_names, + eval_dataset=automl_explainer_setup_obj.X_test_transform) + +print("Engineered and raw explanations computed successfully") diff --git a/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb b/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb index 58d00ff6..1e9c85d8 100644 --- a/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb +++ b/how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb @@ -21,14 +21,16 @@ "metadata": {}, "source": [ "# Automated Machine Learning\n", - "_**Explain classification model and visualize the explanation**_\n", + "_**Explain classification model, visualize the explanation and operationalize the explainer along with AutoML model**_\n", "\n", "## Contents\n", "1. [Introduction](#Introduction)\n", "1. [Setup](#Setup)\n", "1. [Data](#Data)\n", "1. [Train](#Train)\n", - "1. [Results](#Results)" + "1. [Results](#Results)\n", + "1. [Explanations](#Explanations)\n", + "1. [Operationailze](#Operationailze)" ] }, { @@ -45,7 +47,8 @@ "2. Instantiating AutoMLConfig\n", "3. Training the Model using local compute and explain the model\n", "4. Visualization model's feature importance in widget\n", - "5. Explore best model's explanation" + "5. Explore any model's explanation\n", + "6. Operationalize the AutoML model and the explaination model" ] }, { @@ -70,7 +73,8 @@ "from azureml.core.experiment import Experiment\n", "from azureml.core.workspace import Workspace\n", "from azureml.train.automl import AutoMLConfig\n", - "from azureml.core.dataset import Dataset" + "from azureml.core.dataset import Dataset\n", + "from azureml.explain.model._internal.explanation_client import ExplanationClient" ] }, { @@ -83,8 +87,6 @@ "\n", "# choose a name for experiment\n", "experiment_name = 'automl-model-explanation'\n", - "# project folder\n", - "project_folder = './sample_projects/automl-model-explanation'\n", "\n", "experiment=Experiment(ws, experiment_name)\n", "\n", @@ -94,7 +96,6 @@ "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", @@ -140,7 +141,7 @@ "metadata": {}, "outputs": [], "source": [ - "test_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_validate.csv\"\n", + "test_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/bankmarketing_test.csv\"\n", "test_dataset = Dataset.Tabular.from_delimited_files(test_data)\n", "X_test = test_dataset.drop_columns(columns=['y']).to_pandas_dataframe()\n", "y_test = test_dataset.keep_columns(columns=['y'], validate=True).to_pandas_dataframe()" @@ -162,8 +163,7 @@ "|**iterations**|Number of iterations. In each iteration Auto ML trains the data with a specific pipeline|\n", "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", - "|**model_explainability**|Indicate to explain each trained pipeline or not |\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. |" + "|**model_explainability**|Indicate to explain each trained pipeline or not |" ] }, { @@ -182,8 +182,7 @@ " X = X_train, \n", " y = y_train,\n", " n_cross_validations = 5,\n", - " model_explainability=True,\n", - " path=project_folder)" + " model_explainability=True)" ] }, { @@ -209,7 +208,7 @@ "metadata": {}, "outputs": [], "source": [ - "best_run, fitted_model = local_run.get_output()" + "local_run" ] }, { @@ -266,56 +265,69 @@ "source": [ "### Best Model 's explanation\n", "\n", - "Retrieve the explanation from the best_run. And explanation information includes:\n", - "\n", - "1.\tshap_values: The explanation information generated by shap lib\n", - "2.\texpected_values: The expected value of the model applied to set of X_train data.\n", - "3.\toverall_summary: The model level feature importance values sorted in descending order\n", - "4.\toverall_imp: The feature names sorted in the same order as in overall_summary\n", - "5.\tper_class_summary: The class level feature importance values sorted in descending order. Only available for the classification case\n", - "6.\tper_class_imp: The feature names sorted in the same order as in per_class_summary. Only available for the classification case\n", - "\n", - "Note:- The **retrieve_model_explanation()** API only works in case AutoML has been configured with **'model_explainability'** flag set to **True**. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl.automlexplainer import retrieve_model_explanation\n", - "\n", - "shap_values, expected_values, overall_summary, overall_imp, per_class_summary, per_class_imp = \\\n", - " retrieve_model_explanation(best_run)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(overall_summary)\n", - "print(overall_imp)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(per_class_summary)\n", - "print(per_class_imp)" + "Retrieve the explanation from the *best_run* which includes explanations for engineered features and raw features." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Computing model explanations and visualizing the explanations using azureml-explain-model package\n", - "Beside retrieve the existed model explanation information, explain the model with different train/test data. The following steps will allow you to compute and visualize engineered feature importance and raw feature importance based on your test data. " + "#### Download engineered feature importance from artifact store\n", + "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *best_run*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "client = ExplanationClient.from_run(best_run)\n", + "engineered_explanations = client.download_model_explanation(raw=False)\n", + "print(engineered_explanations.get_feature_importance_dict())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Download raw feature importance from artifact store\n", + "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *best_run*." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "client = ExplanationClient.from_run(best_run)\n", + "raw_explanations = client.download_model_explanation(raw=True)\n", + "print(raw_explanations.get_feature_importance_dict())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explanations\n", + "In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance and raw feature importance based on your test data. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Retrieve any other AutoML model from training" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_run, fitted_model = local_run.get_output(iteration=0)" ] }, { @@ -349,7 +361,7 @@ "metadata": {}, "source": [ "#### Initialize the Mimic Explainer for feature importance\n", - "For explaining the AutoML models, use the *MimicWrapper* from *azureml.explain.model* package. The *MimicWrapper* can be initialized with fields in *automl_explainer_setup_obj*, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (*fitted_model* here). The *MimicWrapper* also takes the *best_run* object where the raw and engineered explanations will be uploaded." + "For explaining the AutoML models, use the *MimicWrapper* from *azureml.explain.model* package. The *MimicWrapper* can be initialized with fields in *automl_explainer_setup_obj*, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (*fitted_model* here). The *MimicWrapper* also takes the *automl_run* object where the raw and engineered explanations will be uploaded." ] }, { @@ -361,7 +373,7 @@ "from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n", "from azureml.explain.model.mimic_wrapper import MimicWrapper\n", "explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n", - " init_dataset=automl_explainer_setup_obj.X_transform, run=best_run,\n", + " init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n", " features=automl_explainer_setup_obj.engineered_feature_names, \n", " feature_maps=[automl_explainer_setup_obj.feature_map],\n", " classes=automl_explainer_setup_obj.classes)" @@ -413,8 +425,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Download engineered feature importance from artifact store\n", - "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *best_run*." + "### Operationailze\n", + "In this section we will show how you can operationalize an AutoML model and the explainer which was used to compute the explanations in the previous section.\n", + "\n", + "#### Register the AutoML model and the scoring explainer\n", + "We use the *TreeScoringExplainer* from *azureml.explain.model* package to create the scoring explainer which will be used to compute the raw and engineered feature importances at the inference time. Note that, we initialize the scoring explainer with the *feature_map* that was computed previously. The *feature_map* will be used by the scoring explainer to return the raw feature importance.\n", + "\n", + "In the cell below, we pickle the scoring explainer and register the AutoML model and the scoring explainer with the Model Management Service." ] }, { @@ -423,18 +440,29 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.explain.model._internal.explanation_client import ExplanationClient\n", - "client = ExplanationClient.from_run(best_run)\n", - "engineered_explanations = client.download_model_explanation(raw=False)\n", - "print(engineered_explanations.get_feature_importance_dict())" + "from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save\n", + "\n", + "# Initialize the ScoringExplainer\n", + "scoring_explainer = TreeScoringExplainer(explainer._internal_explainer, feature_maps=[automl_explainer_setup_obj.feature_map])\n", + "\n", + "# Pickle scoring explainer locally\n", + "save(scoring_explainer, exist_ok=True)\n", + "\n", + "# Register trained automl model present in the 'outputs' folder in the artifacts\n", + "original_model = automl_run.register_model(model_name='automl_model', \n", + " model_path='outputs/model.pkl')\n", + "\n", + "# Register scoring explainer\n", + "automl_run.upload_file('scoring_explainer.pkl', 'scoring_explainer.pkl')\n", + "scoring_explainer_model = automl_run.register_model(model_name='scoring_explainer', model_path='scoring_explainer.pkl')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Download raw feature importance from artifact store\n", - "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *best_run*." + "#### Create the conda dependencies for setting up the service\n", + "We need to create the conda dependencies comprising of the *azureml-explain-model*, *azureml-train-automl* and *azureml-defaults* packages. " ] }, { @@ -443,10 +471,135 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.explain.model._internal.explanation_client import ExplanationClient\n", - "client = ExplanationClient.from_run(best_run)\n", - "raw_explanations = client.download_model_explanation(raw=True)\n", - "print(raw_explanations.get_feature_importance_dict())" + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "azureml_pip_packages = [\n", + " 'azureml-explain-model', 'azureml-train-automl', 'azureml-defaults'\n", + "]\n", + " \n", + "\n", + "# specify CondaDependencies obj\n", + "myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],\n", + " pip_packages=azureml_pip_packages,\n", + " pin_sdk_version=True)\n", + "\n", + "with open(\"myenv.yml\",\"w\") as f:\n", + " f.write(myenv.serialize_to_string())\n", + "\n", + "with open(\"myenv.yml\",\"r\") as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View your scoring file" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "with open(\"score_local_explain.py\",\"r\") as f:\n", + " print(f.read())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deploy the service\n", + "In the cell below, we deploy the service using the conda file and the scoring file from the previous steps. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.model import Model\n", + "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"Bank Marketing\", \n", + " \"method\" : \"local_explanation\"}, \n", + " description='Get local explanations for Bank marketing test data')\n", + "\n", + "inference_config = InferenceConfig(runtime= \"python\", \n", + " entry_script=\"score_local_explain.py\",\n", + " conda_file=\"myenv.yml\")\n", + "\n", + "# Use configs and models generated above\n", + "service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", + "service.wait_for_deployment(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### View the service logs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.get_logs()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Inference using some test data\n", + "Inference using some test data to see the predicted value from autml model, view the engineered feature importance for the predicted value and raw feature importance for the predicted value." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if service.state == 'Healthy':\n", + " # Serialize the first row of the test data into json\n", + " X_test_json = X_test[:1].to_json(orient='records')\n", + " print(X_test_json)\n", + " # Call the service to get the predictions and the engineered and raw explanations\n", + " output = service.run(X_test_json)\n", + " # Print the predicted value\n", + " print(output['predictions'])\n", + " # Print the engineered feature importances for the predicted value\n", + " print(output['engineered_local_importance_values'])\n", + " # Print the raw feature importances for the predicted value\n", + " print(output['raw_local_importance_values'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Delete the service\n", + "Delete the service once you have finished inferencing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "service.delete()" ] } ], @@ -471,7 +624,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.6.7" } }, "nbformat": 4, diff --git a/how-to-use-azureml/automated-machine-learning/model-explanation/score_local_explain.py b/how-to-use-azureml/automated-machine-learning/model-explanation/score_local_explain.py new file mode 100644 index 00000000..8061f3da --- /dev/null +++ b/how-to-use-azureml/automated-machine-learning/model-explanation/score_local_explain.py @@ -0,0 +1,42 @@ +import json +import numpy as np +import pandas as pd +import os +import pickle +import azureml.train.automl +import azureml.explain.model +from azureml.train.automl.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations +from sklearn.externals import joblib +from azureml.core.model import Model + + +def init(): + + global automl_model + global scoring_explainer + + # Retrieve the path to the model file using the model name + # Assume original model is named original_prediction_model + automl_model_path = Model.get_model_path('automl_model') + scoring_explainer_path = Model.get_model_path('scoring_explainer') + + automl_model = joblib.load(automl_model_path) + scoring_explainer = joblib.load(scoring_explainer_path) + + +def run(raw_data): + # Get predictions and explanations for each data point + data = pd.read_json(raw_data, orient='records') + # Make prediction + predictions = automl_model.predict(data) + # Setup for inferencing explanations + automl_explainer_setup_obj = automl_setup_model_explanations(automl_model, + X_test=data, task='classification') + # Retrieve model explanations for engineered explanations + engineered_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform) + # Retrieve model explanations for raw explanations + raw_local_importance_values = scoring_explainer.explain(automl_explainer_setup_obj.X_test_transform, get_raw=True) + # You can return any data type as long as it is JSON-serializable + return {'predictions': predictions.tolist(), + 'engineered_local_importance_values': engineered_local_importance_values, + 'raw_local_importance_values': raw_local_importance_values} diff --git a/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb b/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb index 832902ae..1eab80ff 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb +++ b/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb @@ -473,7 +473,7 @@ "metadata": {}, "outputs": [], "source": [ - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "conda_env_file_name = 'myenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -510,45 +510,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image\n", - "\n", - "Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n", - "or when testing a model that is under development." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"digits\", 'type': \"automl_regression\"},\n", - " description = \"Image for automl regression sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance\n", - "\n", - "Deploy an image that contains the model and other assets needed by the service." + "### Deploy the model as a Web Service on Azure Container Instance" ] }, { @@ -557,28 +519,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 1, \n", " tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n", - " description = 'sample service for Automl Regression')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", + " description = 'sample service for Automl Regression')\n", "\n", "aci_service_name = 'automl-sample-concrete'\n", "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] diff --git a/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.yml b/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.yml index e29c5b3e..30e42672 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.yml +++ b/how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.yml @@ -8,3 +8,4 @@ dependencies: - azureml-widgets - matplotlib - pandas_ml + - azureml-dataprep[pandas] diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb b/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb index 13d7581a..b92b3fe2 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb +++ b/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb @@ -491,7 +491,7 @@ "metadata": {}, "outputs": [], "source": [ - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-train-automl'])\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn','py-xgboost==0.80'], pip_packages=['azureml-defaults','azureml-train-automl'])\n", "\n", "conda_env_file_name = 'myenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -528,45 +528,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a Container Image\n", - "\n", - "Next use Azure Container Instances for deploying models as a web service for quickly deploying and validating your model\n", - "or when testing a model that is under development." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script = script_file_name,\n", - " conda_file = conda_env_file_name,\n", - " tags = {'area': \"digits\", 'type': \"automl_regression\"},\n", - " description = \"Image for automl regression sample\")\n", - "\n", - "image = Image.create(name = \"automlsampleimage\",\n", - " # this is the model object \n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)\n", - "\n", - "if image.creation_state == 'Failed':\n", - " print(\"Image build log at: \" + image.image_build_log_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the Image as a Web Service on Azure Container Instance\n", - "\n", - "Deploy an image that contains the model and other assets needed by the service." + "### Deploy the model as a Web Service on Azure Container Instance" ] }, { @@ -575,28 +537,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = script_file_name,\n", + " conda_file = conda_env_file_name)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", " memory_gb = 1, \n", " tags = {'area': \"digits\", 'type': \"automl_regression\"}, \n", - " description = 'sample service for Automl Regression')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", + " description = 'sample service for Automl Regression')\n", "\n", "aci_service_name = 'automl-sample-hardware'\n", "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", "aci_service.wait_for_deployment(True)\n", "print(aci_service.state)" ] diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.yml b/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.yml index 94323586..7d1b2aec 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.yml +++ b/how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.yml @@ -8,3 +8,4 @@ dependencies: - azureml-widgets - matplotlib - pandas_ml + - azureml-dataprep[pandas] diff --git a/how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb b/how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb index 32c06d56..6f0e65f5 100644 --- a/how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb +++ b/how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb @@ -93,9 +93,8 @@ "source": [ "ws = Workspace.from_config()\n", "\n", - "# Choose a name for the run history container in the workspace.\n", + "# Choose an experiment name.\n", "experiment_name = 'automl-remote-amlcompute-with-onnx'\n", - "project_folder = './project'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", @@ -105,7 +104,6 @@ "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", @@ -179,12 +177,6 @@ "source": [ "iris = datasets.load_iris()\n", "\n", - "if not os.path.isdir('data'):\n", - " os.mkdir('data')\n", - "\n", - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)\n", - "\n", "X_train, X_test, y_train, y_test = train_test_split(iris.data, \n", " iris.target, \n", " test_size=0.2, \n", @@ -211,6 +203,9 @@ "X_test = pd.DataFrame(X_test, columns=['c1', 'c2', 'c3', 'c4'])\n", "y_train = pd.DataFrame(y_train, columns=['label'])\n", "\n", + "if not os.path.isdir('data'):\n", + " os.mkdir('data')\n", + "\n", "X_train.to_csv(\"data/X_train.csv\", index=False)\n", "y_train.to_csv(\"data/y_train.csv\", index=False)\n", "\n", @@ -264,7 +259,7 @@ "source": [ "## Train\n", "\n", - "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "You can specify `automl_settings` as `**kwargs` as well. \n", "\n", "**Note:** Set the parameter enable_onnx_compatible_models=True, if you also want to generate the ONNX compatible models. Please note, the forecasting task and TensorFlow models are not ONNX compatible yet.\n", "\n", @@ -276,7 +271,7 @@ "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n", + "|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of nodes in the AmlCompute cluster.|\n", "|**enable_onnx_compatible_models**|Enable the ONNX compatible models in the experiment.|" ] }, @@ -305,7 +300,6 @@ "\n", "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", - " path = project_folder,\n", " run_configuration=conda_run_config,\n", " X = X,\n", " y = y,\n", diff --git a/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb b/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb index c3591826..3e9bc0c6 100644 --- a/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb +++ b/how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb @@ -95,9 +95,8 @@ "source": [ "ws = Workspace.from_config()\n", "\n", - "# Choose a name for the run history container in the workspace.\n", + "# Choose an experiment name.\n", "experiment_name = 'automl-remote-amlcompute'\n", - "project_folder = './project'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", @@ -107,7 +106,6 @@ "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", @@ -183,10 +181,7 @@ "\n", "if not os.path.isdir('data'):\n", " os.mkdir('data')\n", - " \n", - "if not os.path.exists(project_folder):\n", - " os.makedirs(project_folder)\n", - " \n", + "\n", "pd.DataFrame(data_train.data[100:,:]).to_csv(\"data/X_train.csv\", index=False)\n", "pd.DataFrame(data_train.target[100:]).to_csv(\"data/y_train.csv\", index=False)\n", "\n", @@ -240,7 +235,7 @@ "source": [ "## Train\n", "\n", - "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "You can specify `automl_settings` as `**kwargs` as well.\n", "\n", "**Note:** When using AmlCompute, you can't pass Numpy arrays directly to the fit method.\n", "\n", @@ -250,7 +245,7 @@ "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|" + "|**max_concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of nodes in the AmlCompute cluster.|" ] }, { @@ -261,7 +256,7 @@ "source": [ "automl_settings = {\n", " \"iteration_timeout_minutes\": 10,\n", - " \"iterations\": 20,\n", + " \"iterations\": 10,\n", " \"n_cross_validations\": 5,\n", " \"primary_metric\": 'AUC_weighted',\n", " \"preprocess\": False,\n", @@ -271,7 +266,6 @@ "\n", "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", - " path = project_folder,\n", " run_configuration=conda_run_config,\n", " X = X,\n", " y = y,\n", diff --git a/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb b/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb index cdcaff88..1816dd7e 100644 --- a/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb +++ b/how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb @@ -82,8 +82,6 @@ "experiment_name = 'non_sample_weight_experiment'\n", "sample_weight_experiment_name = 'sample_weight_experiment'\n", "\n", - "project_folder = './sample_projects/sample_weight'\n", - "\n", "experiment = Experiment(ws, experiment_name)\n", "sample_weight_experiment=Experiment(ws, sample_weight_experiment_name)\n", "\n", @@ -93,7 +91,6 @@ "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", @@ -131,8 +128,7 @@ " n_cross_validations = 2,\n", " verbosity = logging.INFO,\n", " X = X_train, \n", - " y = y_train,\n", - " path = project_folder)\n", + " y = y_train)\n", "\n", "automl_sample_weight = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", @@ -143,8 +139,7 @@ " verbosity = logging.INFO,\n", " X = X_train, \n", " y = y_train,\n", - " sample_weight = sample_weight,\n", - " path = project_folder)" + " sample_weight = sample_weight)" ] }, { diff --git a/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb b/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb index e0772263..208490ba 100644 --- a/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb +++ b/how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb @@ -87,8 +87,6 @@ "\n", "# choose a name for the experiment\n", "experiment_name = 'sparse-data-train-test-split'\n", - "# project folder\n", - "project_folder = './sample_projects/sparse-data-train-test-split'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", @@ -98,7 +96,6 @@ "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "outputDf = pd.DataFrame(data = output, index = [''])\n", @@ -165,8 +162,7 @@ "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**y**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", "|**X_valid**|(sparse) array-like, shape = [n_samples, n_features] for the custom validation set.|\n", - "|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|" + "|**y_valid**|(sparse) array-like, shape = [n_samples, ], Multi-class targets.|" ] }, { @@ -185,8 +181,7 @@ " X = X_train, \n", " y = y_train,\n", " X_valid = X_valid, \n", - " y_valid = y_valid, \n", - " path = project_folder)" + " y_valid = y_valid)" ] }, { diff --git a/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/PredictEnergyDemand.sql b/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/PredictEnergyDemand.sql deleted file mode 100644 index 1161a758..00000000 --- a/how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/PredictEnergyDemand.sql +++ /dev/null @@ -1,17 +0,0 @@ --- This shows using the AutoMLPredict stored procedure to predict using a forecasting model for the nyc_energy dataset. - -DECLARE @Model NVARCHAR(MAX) = (SELECT TOP 1 Model FROM dbo.aml_model - WHERE ExperimentName = 'automl-sql-forecast' - ORDER BY CreatedDate DESC) - -EXEC dbo.AutoMLPredict @input_query=' -SELECT CAST(timeStamp AS NVARCHAR(30)) AS timeStamp, - demand, - precip, - temp -FROM nyc_energy -WHERE demand IS NOT NULL AND precip IS NOT NULL AND temp IS NOT NULL -AND timeStamp >= ''2017-02-01''', -@label_column='demand', -@model=@model -WITH RESULT SETS ((timeStamp NVARCHAR(30), actual_demand FLOAT, precip FLOAT, temp FLOAT, predicted_demand FLOAT)) \ No newline at end of file diff --git a/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb b/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb index b45e186a..dd5d5203 100644 --- a/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb +++ b/how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb @@ -77,9 +77,8 @@ "source": [ "ws = Workspace.from_config()\n", "\n", - "# Choose a name for the experiment and specify the project folder.\n", + "# Choose a name for the experiment.\n", "experiment_name = 'automl-subsampling'\n", - "project_folder = './sample_projects/automl-subsampling'\n", "\n", "experiment = Experiment(ws, experiment_name)\n", "\n", @@ -89,7 +88,6 @@ "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", "output['Experiment Name'] = experiment.name\n", "pd.set_option('display.max_colwidth', -1)\n", "pd.DataFrame(data = output, index = ['']).T" @@ -150,8 +148,7 @@ " verbosity = logging.INFO,\n", " X = X_train, \n", " y = y_train,\n", - " enable_subsampling=True,\n", - " path = project_folder)" + " enable_subsampling=True)" ] }, { @@ -170,13 +167,6 @@ "source": [ "local_run = experiment.submit(automl_config, show_output = True)" ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { diff --git a/how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc b/how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc deleted file mode 100644 index 74d80537..00000000 Binary files a/how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc and /dev/null differ diff --git a/how-to-use-azureml/azure-databricks/README.md b/how-to-use-azureml/azure-databricks/README.md index 3a063f0c..4749c0c6 100644 --- a/how-to-use-azureml/azure-databricks/README.md +++ b/how-to-use-azureml/azure-databricks/README.md @@ -1,73 +1,33 @@ -Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster. +Azure Databricks is a managed Spark offering on Azure and customers already use it for advanced analytics. It provides a collaborative Notebook based environment with CPU or GPU based compute cluster. -In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. +In this section, you will find sample notebooks on how to use Azure Machine Learning SDK with Azure Databricks. You can train a model using Spark MLlib and then deploy the model to ACI/AKS from within Azure Databricks. You can also use Automated ML capability (**public preview**) of Azure ML SDK with Azure Databricks. -- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning. -- You can keep the data within the same cluster. -- You can leverage the local worker nodes with autoscale and auto termination capabilities. -- You can use multiple cores of your Azure Databricks cluster to perform simultenous training. -- You can further tune the model generated by automated machine learning if you chose to. -- Every run (including the best run) is available as a pipeline, which you can tune further if needed. +- Customers who use Azure Databricks for advanced analytics can now use the same cluster to run experiments with or without automated machine learning. +- You can keep the data within the same cluster. +- You can leverage the local worker nodes with autoscale and auto termination capabilities. +- You can use multiple cores of your Azure Databricks cluster to perform simultenous training. +- You can further tune the model generated by automated machine learning if you chose to. +- Every run (including the best run) is available as a pipeline, which you can tune further if needed. - The model trained using Azure Databricks can be registered in Azure ML SDK workspace and then deployed to Azure managed compute (ACI or AKS) using the Azure Machine learning SDK. Please follow our [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#azure-databricks) to install the sdk in your Azure Databricks cluster before trying any of the sample notebooks. -**Single file** - +**Single file** - The following archive contains all the sample notebooks. You can the run notebooks after importing [DBC](Databricks_AMLSDK_1-4_6.dbc) in your Databricks workspace instead of downloading individually. -Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. +Notebooks 1-4 have to be run sequentially & are related to Income prediction experiment based on this [dataset](https://archive.ics.uci.edu/ml/datasets/adult) and demonstrate how to data prep, train and operationalize a Spark ML model with Azure ML Python SDK from within Azure Databricks. Notebook 6 is an Automated ML sample notebook for Classification. Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service. -**Databricks as a Compute Target from Azure ML Pipelines** -You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb). - -# Linked Azure Databricks and Azure Machine Learning Workspaces (Preview) -Customers can now link Azure Databricks and AzureML Workspaces to better enable cross-Azure ML scenarios by [managing their tracking data in a single place when using the MLflow client](https://mlflow.org/docs/latest/tracking.html#mlflow-tracking) - the Azure ML workspace. - -## Linking the Workspaces (Admin operation) - -1. The Azure Databricks Azure portal blade now includes a new button to link an Azure ML workspace. -![New ADB Portal Link button](./img/adb-link-button.png) -2. Both a new or existing Azure ML Workspace can be linked in the resulting prompt. Follow any instructions to set up the Azure ML Workspace. -![Link Prompt](./img/link-prompt.png) -3. After a successful link operation, you should see the Azure Databricks overview reflect the linked status -![Linked Successfully](./img/adb-successful-link.png) - -## Configure MLflow to send data to Azure ML (All roles) - -1. Add azureml-mlflow as a library to any notebook or cluster that should send data to Azure ML. You can do this via: - 1. [DBUtils](https://docs.azuredatabricks.net/user-guide/dev-tools/dbutils.html#dbutils-library) - ``` - dbutils.library.installPyPI("azureml-mlflow") - dbutils.library.restartPython() # Removes Python state - ``` - 2. [Cluster Libraries](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster) - ![Cluster Library](./img/cluster-library.png) -2. [Set the MLflow tracking URI](https://mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) to the following scheme: - ``` - adbazureml://${azuremlRegion}.experiments.azureml.net/history/v1.0/subscriptions/${azuremlSubscriptionId}/resourceGroups/${azuremlResourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${azuremlWorkspaceName} - ``` - 1. You can automatically configure this on your clusters for all subsequent notebook sessions using this helper script instead of manually setting the tracking URI in the notebook: - * [AzureML Tracking Cluster Init Script](./linking/README.md) -3. If configured correctly, you'll now be able to see your MLflow tracking data in both Azure ML (via the REST API and all clients) and Azure Databricks (in the MLflow UI and using the MLflow client) - - -## Known Preview Limitations -While we roll this experience out to customers for feedback, there are some known limitations we'd love comments on in addition to any other issues seen in your workflow. -### 1-to-1 Workspace linking -Currently, an Azure ML Workspace can only be linked to one Azure Databricks Workspace at a time. -### Data synchronization -At the moment, data is only generated in the Azure Machine Learning workspace for tracking. Editing tags via the Azure Databricks MLflow UI won't be reflected in the Azure ML UI. -### Java and R support -The experience currently is only available from the Python MLflow client. +**Databricks as a Compute Target from AML Pipelines** +You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb). For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks). **Please let us know your feedback.** + - -![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png) +![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/README.png) \ No newline at end of file diff --git a/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb index 6220209b..6584e388 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb @@ -11,13 +11,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/build-model-run-history-03.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb index 7c849e3f..b5464501 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb @@ -11,13 +11,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/deploy-to-aci-04.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -178,42 +171,18 @@ "source": [ "#deploy to ACI\n", "from azureml.core.webservice import AciWebservice, Webservice\n", + "from azureml.core.model import InferenceConfig\n", "\n", - "myaci_config = AciWebservice.deploy_configuration(\n", - " cpu_cores = 2, \n", - " memory_gb = 2, \n", - " tags = {'name':'Databricks Azure ML ACI'}, \n", - " description = 'This is for ADB and AML example. Azure Databricks & Azure ML SDK demo with ACI by Parashar.')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# this will take 10-15 minutes to finish\n", + "myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n", + " memory_gb = 2, \n", + " tags = {'name':'Databricks Azure ML ACI'}, \n", + " description = 'This is for ADB and AML example.')\n", "\n", - "service_name = \"aciws\"\n", - "runtime = \"spark-py\" \n", - "driver_file = \"score_sparkml.py\"\n", - "my_conda_file = \"mydeployenv.yml\"\n", - "\n", - "# image creation\n", - "from azureml.core.image import ContainerImage\n", - "myimage_config = ContainerImage.image_configuration(execution_script = driver_file, \n", - " runtime = runtime, \n", - " conda_file = my_conda_file)\n", - "\n", - "# Webservice creation\n", - "myservice = Webservice.deploy_from_model(\n", - " workspace=ws, \n", - " name=service_name,\n", - " deployment_config = myaci_config,\n", - " models = [mymodel],\n", - " image_config = myimage_config\n", - " )\n", + "inference_config = InferenceConfig(runtime= 'spark-py', \n", + " entry_script='score_sparkml.py',\n", + " conda_file='mydeployenv.yml')\n", "\n", + "myservice = Model.deploy(ws, 'aciws', [mymodel], inference_config, myaci_config)\n", "myservice.wait_for_deployment(show_output=True)" ] }, diff --git a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb similarity index 60% rename from how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.ipynb rename to how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb index 77cd7a65..a46e9685 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb @@ -11,13 +11,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -70,11 +63,23 @@ "metadata": {}, "outputs": [], "source": [ - "# List images by ws\n", + "#Register the model\n", + "import os\n", + "from azureml.core.model import Model\n", "\n", - "from azureml.core.image import ContainerImage\n", - "for i in ContainerImage.list(workspace = ws):\n", - " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" + "model_name = \"AdultCensus_runHistory_aks.mml\" # \n", + "model_name_dbfs = os.path.join(\"/dbfs\", model_name)\n", + "\n", + "print(\"copy model from dbfs to local\")\n", + "model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n", + "dbutils.fs.cp(model_name, model_local, True)\n", + "\n", + "mymodel = Model.register(model_path = model_name, # this points to a local file\n", + " model_name = model_name, # this is the name the model is registered as, am using same name for both path and name. \n", + " description = \"ADB trained model by Parashar\",\n", + " workspace = ws)\n", + "\n", + "print(mymodel.name, mymodel.description, mymodel.version)" ] }, { @@ -83,8 +88,69 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.image import Image\n", - "myimage = Image(workspace=ws, name=\"aciws\")" + "#%%writefile score_sparkml.py\n", + "score_sparkml = \"\"\"\n", + " \n", + "import json\n", + " \n", + "def init():\n", + " # One-time initialization of PySpark and predictive model\n", + " import pyspark\n", + " from azureml.core.model import Model\n", + " from pyspark.ml import PipelineModel\n", + " \n", + " global trainedModel\n", + " global spark\n", + " \n", + " spark = pyspark.sql.SparkSession.builder.appName(\"ADB and AML notebook by Parashar\").getOrCreate()\n", + " model_name = \"{model_name}\" #interpolated\n", + " model_path = Model.get_model_path(model_name)\n", + " trainedModel = PipelineModel.load(model_path)\n", + " \n", + "def run(input_json):\n", + " if isinstance(trainedModel, Exception):\n", + " return json.dumps({{\"trainedModel\":str(trainedModel)}})\n", + " \n", + " try:\n", + " sc = spark.sparkContext\n", + " input_list = json.loads(input_json)\n", + " input_rdd = sc.parallelize(input_list)\n", + " input_df = spark.read.json(input_rdd)\n", + " \n", + " # Compute prediction\n", + " prediction = trainedModel.transform(input_df)\n", + " #result = prediction.first().prediction\n", + " predictions = prediction.collect()\n", + " \n", + " #Get each scored result\n", + " preds = [str(x['prediction']) for x in predictions]\n", + " result = \",\".join(preds)\n", + " # you can return any data type as long as it is JSON-serializable\n", + " return result.tolist()\n", + " except Exception as e:\n", + " result = str(e)\n", + " return result\n", + " \n", + "\"\"\".format(model_name=model_name)\n", + " \n", + "exec(score_sparkml)\n", + " \n", + "with open(\"score_sparkml.py\", \"w\") as file:\n", + " file.write(score_sparkml)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies \n", + "\n", + "myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n", + "\n", + "with open(\"mydeployenv.yml\",\"w\") as f:\n", + " f.write(myacienv.serialize_to_string())" ] }, { @@ -120,34 +186,17 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.webservice import Webservice\n", - "help( Webservice.deploy_from_image)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice, AksWebservice\n", - "from azureml.core.image import ContainerImage\n", + "#deploy to AKS\n", + "from azureml.core.webservice import AksWebservice, Webservice\n", + "from azureml.core.model import InferenceConfig\n", "\n", - "#Set the web service configuration (using default here with app insights)\n", "aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n", "\n", - "#unique service name\n", - "service_name ='ps-aks-service'\n", - "\n", - "# Webservice creation using single command, there is a variant to use image directly as well.\n", - "aks_service = Webservice.deploy_from_image(\n", - " workspace=ws, \n", - " name=service_name,\n", - " deployment_config = aks_config,\n", - " image = myimage,\n", - " deployment_target = aks_target\n", - " )\n", + "inference_config = InferenceConfig(runtime = 'spark-py', \n", + " entry_script ='score_sparkml.py',\n", + " conda_file ='mydeployenv.yml')\n", "\n", + "aks_service = Model.deploy(ws, 'ps-aks-service', [mymodel], inference_config, aks_config, aks_target)\n", "aks_service.wait_for_deployment(show_output=True)" ] }, @@ -206,7 +255,6 @@ "source": [ "#comment to not delete the web service\n", "aks_service.delete()\n", - "#image.delete()\n", "#model.delete()\n", "aks_target.delete() " ] diff --git a/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb index b3fb67d6..ee2996cf 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb @@ -11,13 +11,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/ingest-data-02.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb index db4e2974..1db74aa5 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb @@ -11,13 +11,6 @@ "Licensed under the MIT License." ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/azure-databricks/amlsdk/installation-and-configuration-01.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb index f765cccf..5ade28a3 100644 --- a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb +++ b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb @@ -638,7 +638,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-sdk[automl]'])\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n", "\n", "conda_env_file_name = 'mydeployenv.yml'\n", "myenv.save_to_file('.', conda_env_file_name)" @@ -648,30 +648,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Create ACI config" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#deploy to ACI\n", - "from azureml.core.webservice import AciWebservice, Webservice\n", - "\n", - "myaci_config = AciWebservice.deploy_configuration(\n", - " cpu_cores = 2, \n", - " memory_gb = 2, \n", - " tags = {'name':'Databricks Azure ML ACI'}, \n", - " description = 'This is for ADB and AutoML example.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the Image as a Web Service on Azure Container Instance\n", + "## Deploy the model as a Web Service on Azure Container Instance\n", "Replace servicename with any meaningful name of service" ] }, @@ -683,30 +660,26 @@ "source": [ "# this will take 10-15 minutes to finish\n", "\n", + "from azureml.core.webservice import AciWebservice, Webservice\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.model import Model\n", "import uuid\n", - "from azureml.core.image import ContainerImage\n", + "\n", + "myaci_config = AciWebservice.deploy_configuration(\n", + " cpu_cores = 2, \n", + " memory_gb = 2, \n", + " tags = {'name':'Databricks Azure ML ACI'}, \n", + " description = 'This is for ADB and AutoML example.')\n", + "\n", + "inference_config = InferenceConfig(runtime= 'spark-py', \n", + " entry_script='score.py',\n", + " conda_file='mydeployenv.yml')\n", "\n", "guid = str(uuid.uuid4()).split(\"-\")[0]\n", "service_name = \"myservice-{}\".format(guid)\n", "print(\"Creating service with name: {}\".format(service_name))\n", - "runtime = \"spark-py\" \n", - "driver_file = \"score.py\"\n", - "my_conda_file = \"mydeployenv.yml\"\n", - "\n", - "# image creation\n", - "myimage_config = ContainerImage.image_configuration(execution_script = driver_file, \n", - " runtime = runtime, \n", - " conda_file = 'mydeployenv.yml')\n", - "\n", - "# Webservice creation\n", - "myservice = Webservice.deploy_from_model(\n", - " workspace=ws, \n", - " name=service_name,\n", - " deployment_config = myaci_config,\n", - " models = [model],\n", - " image_config = myimage_config\n", - " )\n", "\n", + "myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n", "myservice.wait_for_deployment(show_output=True)" ] }, diff --git a/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/README.md b/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/README.md deleted file mode 100644 index 440463bb..00000000 --- a/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/README.md +++ /dev/null @@ -1,16 +0,0 @@ -# Using Databricks as a Compute Target from Azure Machine Learning Pipeline -To use Databricks as a compute target from Azure Machine Learning Pipeline, a DatabricksStep is used. This notebook demonstrates the use of DatabricksStep in Azure Machine Learning Pipeline. - -The notebook will show: - -1. Running an arbitrary Databricks notebook that the customer has in Databricks workspace -2. Running an arbitrary Python script that the customer has in DBFS -3. Running an arbitrary Python script that is available on local computer (will upload to DBFS, and then run in Databricks) -4. Running a JAR job that the customer has in DBFS. - -## Before you begin: -1. **Create an Azure Databricks workspace** in the same subscription where you have your Azure Machine Learning workspace. -You will need details of this workspace later on to define DatabricksStep. [More information](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.Databricks%2Fworkspaces). -2. **Create PAT (access token)** at the Azure Databricks portal. [More information](https://docs.databricks.com/api/latest/authentication.html#generate-a-token). -3. **Add demo notebook to ADB** This notebook has a sample you can use as is. Launch Azure Databricks attached to your Azure Machine Learning workspace and add a new notebook. -4. **Create/attach a Blob storage** for use from ADB \ No newline at end of file diff --git a/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb b/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb index 3e2ea7ac..51a46fc7 100644 --- a/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb +++ b/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb @@ -403,7 +403,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "databricksstep-remarks-sample" + ] + }, "outputs": [], "source": [ "notebook_path=os.getenv(\"DATABRICKS_NOTEBOOK_PATH\", \"\") # Databricks notebook path\n", diff --git a/how-to-use-azureml/azure-databricks/img/adb-link-button.png b/how-to-use-azureml/azure-databricks/img/adb-link-button.png deleted file mode 100755 index 03cc1d4e..00000000 Binary files a/how-to-use-azureml/azure-databricks/img/adb-link-button.png and /dev/null differ diff --git a/how-to-use-azureml/azure-databricks/img/adb-successful-link.png b/how-to-use-azureml/azure-databricks/img/adb-successful-link.png deleted file mode 100755 index f2d62cbf..00000000 Binary files a/how-to-use-azureml/azure-databricks/img/adb-successful-link.png and /dev/null differ diff --git a/how-to-use-azureml/azure-databricks/img/cluster-library.png b/how-to-use-azureml/azure-databricks/img/cluster-library.png deleted file mode 100755 index b86c5f51..00000000 Binary files a/how-to-use-azureml/azure-databricks/img/cluster-library.png and /dev/null differ diff --git a/how-to-use-azureml/azure-databricks/img/link-prompt.png b/how-to-use-azureml/azure-databricks/img/link-prompt.png deleted file mode 100755 index 3384edc1..00000000 Binary files a/how-to-use-azureml/azure-databricks/img/link-prompt.png and /dev/null differ diff --git a/how-to-use-azureml/azure-databricks/linking/README.md b/how-to-use-azureml/azure-databricks/linking/README.md deleted file mode 100644 index 5bcb788f..00000000 --- a/how-to-use-azureml/azure-databricks/linking/README.md +++ /dev/null @@ -1,56 +0,0 @@ -# Adding an init script to an Azure Databricks cluster - -The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to -1. Use the configured AzureML Workspace with Workspace.from_config() -2. Set the default MLflow Tracking Server to be the AzureML managed one - -Modify azureml-cluster-init.sh by providing the values for region, subscriptionId, resourceGroupName, and workspaceName of your target Azure ML workspace in the highlighted section at the top of the script. - -To create the Azure Databricks cluster-scoped init script - -1. Create the base directory you want to store the init script in if it does not exist. - ``` - dbutils.fs.mkdirs("dbfs:/databricks//") - ``` - -2. Create the script by copying the contents of azureml-cluster-init.sh - ``` - dbutils.fs.put("/databricks//azureml-cluster-init.sh",""" - - """, True) - -3. Check that the script exists. - ``` - display(dbutils.fs.ls("dbfs:/databricks//azureml-cluster-init.sh")) - ``` - -1. Configure the cluster to run the script. - * Using the cluster configuration page - 1. On the cluster configuration page, click the Advanced Options toggle. - 1. At the bottom of the page, click the Init Scripts tab. - 1. In the Destination drop-down, select a destination type. Example: 'DBFS' - 1. Specify a path to the init script. - ``` - dbfs:/databricks//azureml-cluster-init.sh - ``` - 1. Click Add - - * Using the API. - ``` - curl -n -X POST -H 'Content-Type: application/json' -d '{ - "cluster_id": "", - "num_workers": , - "spark_version": "", - "node_type_id": "", - "cluster_log_conf": { - "dbfs" : { - "destination": "dbfs:/cluster-logs" - } - }, - "init_scripts": [ { - "dbfs": { - "destination": "dbfs:/databricks//azureml-cluster-init.sh" - } - } ] - }' https:///api/2.0/clusters/edit - ``` diff --git a/how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh b/how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh deleted file mode 100644 index 36ecfa52..00000000 --- a/how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh +++ /dev/null @@ -1,24 +0,0 @@ -#!/bin/bash -# This script configures the environment to -# 1. Use the configured AzureML Workspace with azureml.core.Workspace.from_config() -# 2. Set the default MLflow Tracking Server to be the AzureML managed one - -############## START CONFIGURATION ################# -# Provide the required *AzureML* workspace information -region="" # example: westus2 -subscriptionId="" # example: bcb65f42-f234-4bff-91cf-9ef816cd9936 -resourceGroupName="" # example: dev-rg -workspaceName="" # example: myazuremlws - -# Optional config directory -configLocation="/databricks/config.json" -############### END CONFIGURATION ################# - - -# Drop the workspace configuration on the cluster -sudo touch $configLocation -sudo echo {\\"subscription_id\\": \\"${subscriptionId}\\", \\"resource_group\\": \\"${resourceGroupName}\\", \\"workspace_name\\": \\"${workspaceName}\\"} > $configLocation - -# Set the MLflow Tracking URI -trackingUri="adbazureml://${region}.experiments.azureml.net/history/v1.0/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${workspaceName}" -sudo echo export MLFLOW_TRACKING_URI=${trackingUri} >> /databricks/spark/conf/spark-env.sh diff --git a/how-to-use-azureml/azure-hdi/README.md b/how-to-use-azureml/azure-hdi/README.md deleted file mode 100644 index 91a7582d..00000000 --- a/how-to-use-azureml/azure-hdi/README.md +++ /dev/null @@ -1,55 +0,0 @@ -**Azure HDInsight** - -Azure HDInsight is a fully managed cloud Hadoop & Spark offering the gives -optimized open-source analytic clusters for Spark, Hive, MapReduce, HBase, -Storm, and Kafka. HDInsight Spark clusters provide kernels that you can use with -the Jupyter notebook on [Apache Spark](https://spark.apache.org/) for testing -your applications.  - -How Azure HDInsight works with Azure Machine Learning service - -- You can train a model using Spark clusters and deploy the model to ACI/AKS - from within Azure HDInsight. - -- You can also use [automated machine - learning](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-automated-ml) capabilities - integrated within Azure HDInsight. - -You can use Azure HDInsight as a compute target from an [Azure Machine Learning -pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). - -**Set up your HDInsight cluster** - -Create [HDInsight -cluster](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters) - -**Quick create: Basic cluster setup** - -This article walks you through setup in the [Azure -portal](https://portal.azure.com/), where you can create an HDInsight cluster -using *Quick create* or *Custom*. - -![hdinsight create options custom quick create](media/0a235b34c0b881117e51dc31a232dbe1.png) - -Follow instructions on the screen to do a basic cluster setup. Details are -provided below for: - -- [Resource group - name](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#resource-group-name) - -- [Cluster types and - configuration](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#cluster-types) - (Cluster must be Spark 2.3 (HDI 3.6) or greater) - -- Cluster login and SSH username - -- [Location](https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters#location) - -**Import the sample HDI notebook in Jupyter** - -**Important links:** - -Create HDI cluster: - - -![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-hdi/README.png) diff --git a/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.ipynb b/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.ipynb deleted file mode 100644 index 8f3fa251..00000000 --- a/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.ipynb +++ /dev/null @@ -1,612 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-hdi/automl_hdi_local_classification.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Automated ML on Azure HDInsight\n", - "\n", - "In this example we use the scikit-learn's digit dataset to showcase how you can use AutoML for a simple classification problem.\n", - "\n", - "In this notebook you will learn how to:\n", - "1. Create Azure Machine Learning Workspace object and initialize your notebook directory to easily reload this object from a configuration file.\n", - "2. Create an `Experiment` in an existing `Workspace`.\n", - "3. Configure Automated ML using `AutoMLConfig`.\n", - "4. Train the model using Azure HDInsight.\n", - "5. Explore the results.\n", - "6. Test the best fitted model.\n", - "\n", - "Before running this notebook, please follow the readme for using Automated ML on Azure HDI for installing necessary libraries to your cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Check the Azure ML Core SDK Version to Validate Your Installation" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "import pandas as pd\n", - "from azureml.core.authentication import ServicePrincipalAuthentication\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun\n", - "import logging\n", - "\n", - "print(\"SDK Version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize an Azure ML Workspace\n", - "### What is an Azure ML Workspace and Why Do I Need One?\n", - "\n", - "An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, operationalization, and the monitoring of operationalized models.\n", - "\n", - "\n", - "### What do I Need?\n", - "\n", - "To create or access an Azure ML workspace, you will need to import the Azure ML library and specify following information:\n", - "* A name for your workspace. You can choose one.\n", - "* Your subscription id. Use the `id` value from the `az account show` command output above.\n", - "* The resource group name. The resource group organizes Azure resources and provides a default region for the resources in the group. The resource group will be created if it doesn't exist. Resource groups can be created and viewed in the [Azure portal](https://portal.azure.com)\n", - "* Supported regions include `eastus2`, `eastus`,`westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "import pandas as pd\n", - "from azureml.core.authentication import ServicePrincipalAuthentication\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun\n", - "import logging\n", - "\n", - "subscription_id = \"\" #you should be owner or contributor\n", - "resource_group = \"\" #you should be owner or contributor\n", - "workspace_name = \"\" #your workspace name\n", - "workspace_region = \"\" #your region\n", - "\n", - "\n", - "tenant_id = \"\"\n", - "app_id = \"\"\n", - "app_key = \"\"\n", - "\n", - "auth_sp = ServicePrincipalAuthentication(tenant_id = tenant_id,\n", - " service_principal_id = app_id,\n", - " service_principal_password = app_key)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating a Workspace\n", - "If you already have access to an Azure ML workspace you want to use, you can skip this cell. Otherwise, this cell will create an Azure ML workspace for you in the specified subscription, provided you have the correct permissions for the given `subscription_id`.\n", - "\n", - "This will fail when:\n", - "1. The workspace already exists.\n", - "2. You do not have permission to create a workspace in the resource group.\n", - "3. You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription.\n", - "\n", - "If workspace creation fails for any reason other than already existing, please work with your IT administrator to provide you with the appropriate permissions or to provision the required resources.\n", - "\n", - "**Note:** Creation of a new workspace can take several minutes." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configuring Your Local Environment\n", - "You can validate that you have access to the specified workspace and write a configuration file to the default configuration location, `./aml_config/config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace(workspace_name = workspace_name,\n", - " subscription_id = subscription_id,\n", - " resource_group = resource_group,\n", - " auth = auth_sp)\n", - "\n", - "# Persist the subscription id, resource group name, and workspace name in aml_config/config.json.\n", - "ws.write_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a Folder to Host Sample Projects\n", - "Finally, create a folder where all the sample projects will be hosted." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "sample_projects_folder = './sample_projects'\n", - "\n", - "if not os.path.isdir(sample_projects_folder):\n", - " os.mkdir(sample_projects_folder)\n", - " \n", - "print('Sample projects will be created in {}.'.format(sample_projects_folder))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create an Experiment\n", - "\n", - "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import logging\n", - "import os\n", - "import random\n", - "import time\n", - "\n", - "from matplotlib import pyplot as plt\n", - "from matplotlib.pyplot import imshow\n", - "import numpy as np\n", - "import pandas as pd\n", - "\n", - "import azureml.core\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.train.automl import AutoMLConfig\n", - "from azureml.train.automl.run import AutoMLRun" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Choose a name for the experiment and specify the project folder.\n", - "experiment_name = 'automl-local-classification-hdi'\n", - "project_folder = './sample_projects/automl-local-classification-hdi'\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace Name'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Experiment Name'] = experiment.name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "pd.DataFrame(data = output, index = ['']).T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Registering Datastore" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Datastore is the way to save connection information to a storage service (e.g. Azure Blob, Azure Data Lake, Azure SQL) information to your workspace so you can access them without exposing credentials in your code. The first thing you will need to do is register a datastore, you can refer to our [python SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) on how to register datastores. __Note: for best security practices, please do not check in code that contains registering datastores with secrets into your source control__\n", - "\n", - "The code below registers a datastore pointing to a publicly readable blob container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Datastore\n", - "\n", - "datastore_name = 'demo_training'\n", - "container_name = 'digits' \n", - "account_name = 'automlpublicdatasets'\n", - "Datastore.register_azure_blob_container(\n", - " workspace = ws, \n", - " datastore_name = datastore_name, \n", - " container_name = container_name, \n", - " account_name = account_name,\n", - " overwrite = True\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Below is an example on how to register a private blob container\n", - "```python\n", - "datastore = Datastore.register_azure_blob_container(\n", - " workspace = ws, \n", - " datastore_name = 'example_datastore', \n", - " container_name = 'example-container', \n", - " account_name = 'storageaccount',\n", - " account_key = 'accountkey'\n", - ")\n", - "```\n", - "The example below shows how to register an Azure Data Lake store. Please make sure you have granted the necessary permissions for the service principal to access the data lake.\n", - "```python\n", - "datastore = Datastore.register_azure_data_lake(\n", - " workspace = ws,\n", - " datastore_name = 'example_datastore',\n", - " store_name = 'adlsstore',\n", - " tenant_id = 'tenant-id-of-service-principal',\n", - " client_id = 'client-id-of-service-principal',\n", - " client_secret = 'client-secret-of-service-principal'\n", - ")\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load Training Data Using DataPrep" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Automated ML takes a Dataflow as input.\n", - "\n", - "If you are familiar with Pandas and have done your data preparation work in Pandas already, you can use the `read_pandas_dataframe` method in dprep to convert the DataFrame to a Dataflow.\n", - "```python\n", - "df = pd.read_csv(...)\n", - "# apply some transforms\n", - "dprep.read_pandas_dataframe(df, temp_folder='/path/accessible/by/both/driver/and/worker')\n", - "```\n", - "\n", - "If you just need to ingest data without doing any preparation, you can directly use AzureML Data Prep (Data Prep) to do so. The code below demonstrates this scenario. Data Prep also has data preparation capabilities, we have many [sample notebooks](https://github.com/Microsoft/AMLDataPrepDocs) demonstrating the capabilities.\n", - "\n", - "You will get the datastore you registered previously and pass it to Data Prep for reading. The data comes from the digits dataset: `sklearn.datasets.load_digits()`. `DataPath` points to a specific location within a datastore. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.dataprep as dprep\n", - "from azureml.data.datapath import DataPath\n", - "\n", - "datastore = Datastore.get(workspace = ws, datastore_name = datastore_name)\n", - "\n", - "X_train = dprep.read_csv(datastore.path('X.csv'))\n", - "y_train = dprep.read_csv(datastore.path('y.csv')).to_long(dprep.ColumnSelector(term='.*', use_regex = True))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Review the Data Preparation Result\n", - "You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only j records for all the steps in the Dataflow, which makes it fast even against large datasets." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X_train.get_profile()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "y_train.get_profile()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure AutoML\n", - "\n", - "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", - "\n", - "|Property|Description|\n", - "|-|-|\n", - "|**task**|classification or regression|\n", - "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
average_precision_score_weighted
norm_macro_recall
precision_score_weighted|\n", - "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n", - "|**iteration_timeout_minutes**|Time limit in minutes for each iteration.|\n", - "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", - "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**spark_context**|Spark Context object. for HDInsight, use spark_context=sc|\n", - "|**max_concurrent_iterations**|Maximum number of iterations to execute in parallel. This should be <= number of worker nodes in your Azure HDInsight cluster.|\n", - "|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n", - "|**y**|(sparse) array-like, shape = [n_samples, ], [n_samples, n_classes]
Multi-class targets. An indicator matrix turns on multilabel classification. This should be an array of integers.|\n", - "|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder.|\n", - "|**preprocess**|set this to True to enable pre-processing of data eg. string to numeric using one-hot encoding|\n", - "|**exit_score**|Target score for experiment. It is associated with the metric. eg. exit_score=0.995 will exit experiment after that|" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", - " primary_metric = 'AUC_weighted',\n", - " iteration_timeout_minutes = 10,\n", - " iterations = 3,\n", - " preprocess = True,\n", - " n_cross_validations = 10,\n", - " max_concurrent_iterations = 2, #change it based on number of worker nodes\n", - " verbosity = logging.INFO,\n", - " spark_context=sc, #HDI /spark related\n", - " X = X_train, \n", - " y = y_train,\n", - " path = project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train the Models\n", - "\n", - "Call the `submit` method on the experiment object and pass the run configuration. Execution of local runs is synchronous. Depending on the data and the number of iterations this can run for a while." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "local_run = experiment.submit(automl_config, show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Explore the Results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The following will show the child runs and waits for the parent run to complete." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Retrieve All Child Runs after the experiment is completed (in portal)\n", - "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "children = list(local_run.get_children())\n", - "metricslist = {}\n", - "for run in children:\n", - " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", - " metricslist[int(properties['iteration'])] = metrics\n", - "\n", - "rundata = pd.DataFrame(metricslist).sort_index(1)\n", - "rundata" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve the Best Model after the above run is complete \n", - "\n", - "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n", - "Show the run and the model that has the smallest `log_loss` value:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the Best Fitted Model\n", - "\n", - "#### Load Test Data - you can split the dataset beforehand & pass Train dataset to AutoML and use Test dataset to evaluate the best model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "blob_location = \"https://{}.blob.core.windows.net/{}\".format(account_name, container_name)\n", - "X_test = pd.read_csv(\"{}./X_valid.csv\".format(blob_location), header=0)\n", - "y_test = pd.read_csv(\"{}/y_valid.csv\".format(blob_location), header=0)\n", - "images = pd.read_csv(\"{}/images.csv\".format(blob_location), header=None)\n", - "images = np.reshape(images.values, (100,8,8))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Testing Our Best Fitted Model\n", - "We will try to predict digits and see how our model works. This is just an example to show you." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Randomly select digits and test.\n", - "for index in np.random.choice(len(y_test), 2, replace = False):\n", - " print(index)\n", - " predicted = fitted_model.predict(X_test[index:index + 1])[0]\n", - " label = y_test.values[index]\n", - " title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n", - " fig = plt.figure(3, figsize = (5,5))\n", - " ax1 = fig.add_axes((0,0,.8,.8))\n", - " ax1.set_title(title)\n", - " plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n", - " display(fig)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "When deploying an automated ML trained model, please specify _pippackages=['azureml-sdk[automl]']_ in your CondaDependencies.\n", - "\n", - "Please refer to only the **Deploy** section in this notebook - Deployment of Automated ML trained model" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "savitam" - }, - { - "name": "sasum" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "Python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "python", - "version": 3 - }, - "mimetype": "text/x-python", - "name": "pyspark3", - "pygments_lexer": "python3" - }, - "name": "auto-ml-classification-local-adb", - "notebookId": 587284549713154 - }, - "nbformat": 4, - "nbformat_minor": 1 -} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb index ef681642..10637e59 100644 --- a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb +++ b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb @@ -16,13 +16,6 @@ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.png)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.png)" - ] - }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb deleted file mode 100644 index 70e02ff9..00000000 --- a/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb +++ /dev/null @@ -1,407 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploying a web service to Azure Kubernetes Service (AKS)\n", - "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", - "We then test and delete the service, image and model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.webservice import Webservice, AksWebservice\n", - "from azureml.core.image import Image\n", - "from azureml.core.model import Model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "print(azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Get workspace\n", - "Load existing workspace from the config file info." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Register the model\n", - "Register an existing trained model, add descirption and tags. Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. You can download a [pretrained resnet50](https://github.com/tensorflow/models/tree/master/official/resnet#pre-trained-model) and unpack it to that directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "from azureml.core.model import Model\n", - "model = Model.register(model_path = \"resnet50\", # this points to a local file\n", - " model_name = \"resnet50\", # this is the name the model is registered as\n", - " tags = {'area': \"Image classification\", 'type': \"classification\"},\n", - " description = \"Image classification trained on Imagenet Dataset\",\n", - " workspace = ws)\n", - "\n", - "print(model.name, model.description, model.version)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create an image\n", - "Create an image using the registered model the script that will load and run the model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import tensorflow as tf\n", - "import numpy as np\n", - "import ujson\n", - "from azureml.core.model import Model\n", - "from azureml.contrib.services.aml_request import AMLRequest, rawhttp\n", - "from azureml.contrib.services.aml_response import AMLResponse\n", - "\n", - "def init():\n", - " global session\n", - " global input_name\n", - " global output_name\n", - " \n", - " session = tf.Session()\n", - "\n", - " model_path = Model.get_model_path('resnet50')\n", - " model = tf.saved_model.loader.load(session, ['serve'], model_path)\n", - " if len(model.signature_def['serving_default'].inputs) > 1:\n", - " raise ValueError(\"This score.py only supports one input\")\n", - " if len(model.signature_def['serving_default'].outputs) > 1:\n", - " raise ValueError(\"This score.py only supports one input\")\n", - " input_name = [tensor.name for tensor in model.signature_def['serving_default'].inputs.values()][0]\n", - " output_name = [tensor.name for tensor in model.signature_def['serving_default'].outputs.values()][0]\n", - " \n", - "\n", - "@rawhttp\n", - "def run(request):\n", - " if request.method == 'POST':\n", - " reqBody = request.get_data(False)\n", - " resp = score(reqBody)\n", - " return AMLResponse(resp, 200)\n", - " if request.method == 'GET':\n", - " respBody = str.encode(\"GET is not supported\")\n", - " return AMLResponse(respBody, 405)\n", - " return AMLResponse(\"bad request\", 500)\n", - "\n", - "def score(data):\n", - " result = session.run(output_name, {input_name: [data]})\n", - " return ujson.dumps(result[0])\n", - "\n", - "if __name__ == \"__main__\":\n", - " init()\n", - " with open(\"test_image.jpg\", 'rb') as f:\n", - " content = f.read()\n", - " print(score(content))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['tensorflow-gpu==1.12.0','numpy','ujson','azureml-contrib-services'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script = \"score.py\",\n", - " runtime = \"python\",\n", - " conda_file = \"myenv.yml\",\n", - " gpu_enabled = True\n", - " )\n", - "\n", - "image = ContainerImage.create(name = \"GpuImage\",\n", - " # this is the model object\n", - " models = [model],\n", - " image_config = image_config,\n", - " workspace = ws)\n", - "\n", - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Provision the AKS Cluster\n", - "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the default configuration (can also provide parameters to customize)\n", - "prov_config = AksCompute.provisioning_configuration(vm_size=\"Standard_NC6\")\n", - "\n", - "aks_name = 'my-aks-9' \n", - "# Create the cluster\n", - "aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Create AKS Cluster in an existing virtual network (optional)\n", - "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-enable-virtual-network#use-azure-kubernetes-service) for more details." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "'''\n", - "from azureml.core.compute import ComputeTarget, AksCompute\n", - "\n", - "# Create the compute configuration and set virtual network information\n", - "config = AksCompute.provisioning_configuration(vm_size=\"Standard_NC6\", location=\"eastus2\")\n", - "config.vnet_resourcegroup_name = \"mygroup\"\n", - "config.vnet_name = \"mynetwork\"\n", - "config.subnet_name = \"default\"\n", - "config.service_cidr = \"10.0.0.0/16\"\n", - "config.dns_service_ip = \"10.0.0.10\"\n", - "config.docker_bridge_cidr = \"172.17.0.1/16\"\n", - "\n", - "# Create the compute target\n", - "aks_target = ComputeTarget.create(workspace = ws,\n", - " name = \"myaks\",\n", - " provisioning_configuration = config)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Enable SSL on the AKS Cluster (optional)\n", - "See code snippet below. Check the documentation [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-secure-web-service) for more details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# provisioning_config = AksCompute.provisioning_configuration(ssl_cert_pem_file=\"cert.pem\", ssl_key_pem_file=\"key.pem\", ssl_cname=\"www.contoso.com\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_target.wait_for_completion(show_output = True)\n", - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Optional step: Attach existing AKS cluster\n", - "\n", - "If you have existing AKS cluster in your Azure subscription, you can attach it to the Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "'''\n", - "# Use the default configuration (can also provide parameters to customize)\n", - "resource_id = '/subscriptions/92c76a2f-0e1c-4216-b65e-abf7a3f34c1e/resourcegroups/raymondsdk0604/providers/Microsoft.ContainerService/managedClusters/my-aks-0605d37425356b7d01'\n", - "\n", - "create_name='my-existing-aks' \n", - "# Create the cluster\n", - "attach_config = AksCompute.attach_configuration(resource_id=resource_id)\n", - "aks_target = ComputeTarget.attach(workspace=ws, name=create_name, attach_configuration=attach_config)\n", - "# Wait for the operation to complete\n", - "aks_target.wait_for_completion(True)\n", - "'''" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deploy web service to AKS" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Set the web service configuration (using default here)\n", - "aks_config = AksWebservice.deploy_configuration()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service_name ='aks-service-1'\n", - "\n", - "aks_service = Webservice.deploy_from_image(workspace = ws, \n", - " name = aks_service_name,\n", - " image = image,\n", - " deployment_config = aks_config,\n", - " deployment_target = aks_target)\n", - "aks_service.wait_for_deployment(show_output = True)\n", - "print(aks_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Test the web service\n", - "We test the web sevice by passing the test images content." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "import requests\n", - "key1, key2 = aks_service.get_keys()\n", - "\n", - "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", - "test_sampe = open('test_image.jpg', 'rb').read()\n", - "resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Clean up\n", - "Delete the service, image, model and compute target" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "aks_service.delete()\n", - "image.delete()\n", - "model.delete()\n", - "aks_target.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "aashishb" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.0" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/score.py b/how-to-use-azureml/explain-model/azure-integration/scoring-time/score.py deleted file mode 100644 index 82b0bd0f..00000000 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/score.py +++ /dev/null @@ -1,33 +0,0 @@ -import json -import numpy as np -import pandas as pd -import os -import pickle -from sklearn.externals import joblib -from sklearn.linear_model import LogisticRegression -from azureml.core.model import Model - - -def init(): - - global original_model - global scoring_explainer - - # Retrieve the path to the model file using the model name - # Assume original model is named original_prediction_model - original_model_path = Model.get_model_path('original_model') - scoring_explainer_path = Model.get_model_path('IBM_attrition_explainer') - - original_model = joblib.load(original_model_path) - scoring_explainer = joblib.load(scoring_explainer_path) - - -def run(raw_data): - # Get predictions and explanations for each data point - data = pd.read_json(raw_data) - # Make prediction - predictions = original_model.predict(data) - # Retrieve model explanations - local_importance_values = scoring_explainer.explain(data) - # You can return any data type as long as it is JSON-serializable - return {'predictions': predictions.tolist(), 'local_importance_values': local_importance_values} diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb index 37f41b7f..403536e7 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb @@ -311,21 +311,6 @@ "Deploy Model and ScoringExplainer" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"data\": \"IBM_Attrition\", \n", - " \"method\" : \"local_explanation\"}, \n", - " description='Get local explanations for IBM Employee Attrition data')" - ] - }, { "cell_type": "code", "execution_count": null, @@ -381,21 +366,23 @@ "outputs": [], "source": [ "from azureml.core.webservice import Webservice\n", - "from azureml.core.image import ContainerImage\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.model import Model\n", "\n", - "# Use the custom scoring, docker, and conda files we created above\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score_local_explain.py\",\n", - " docker_file=\"dockerfile\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"IBM_Attrition\", \n", + " \"method\" : \"local_explanation\"}, \n", + " description='Get local explanations for IBM Employee Attrition data')\n", + "\n", + "inference_config = InferenceConfig(runtime= \"python\", \n", + " entry_script=\"score_local_explain.py\",\n", + " conda_file=\"myenv.yml\",\n", + " extra_docker_file_steps=\"dockerfile\")\n", "\n", "# Use configs and models generated above\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='model-scoring',\n", - " deployment_config=aciconfig,\n", - " models=[scoring_explainer_model, original_model],\n", - " image_config=image_config)\n", - "\n", + "service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", "service.wait_for_deployment(show_output=True)" ] }, diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb index 33e5d191..7138574f 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb @@ -380,15 +380,7 @@ "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"data\": \"IBM_Attrition\", \n", - " \"method\" : \"local_explanation\"}, \n", - " description='Get local explanations for IBM Employee Attrition data')" - ] + "source": [] }, { "cell_type": "code", @@ -444,21 +436,23 @@ "outputs": [], "source": [ "from azureml.core.webservice import Webservice\n", - "from azureml.core.image import ContainerImage\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.model import Model\n", "\n", - "# Use the custom scoring, docker, and conda files we created above\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score_remote_explain.py\",\n", - " docker_file=\"dockerfile\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"IBM_Attrition\", \n", + " \"method\" : \"local_explanation\"}, \n", + " description='Get local explanations for IBM Employee Attrition data')\n", + "\n", + "inference_config = InferenceConfig(runtime= \"python\", \n", + " entry_script=\"score_remote_explain.py\",\n", + " conda_file=\"myenv.yml\",\n", + " extra_docker_file_steps=\"dockerfile\")\n", "\n", "# Use configs and models generated above\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='model-scoring-service',\n", - " deployment_config=aciconfig,\n", - " models=[scoring_explainer_model, original_model],\n", - " image_config=image_config)\n", - "\n", + "service = Model.deploy(ws, 'model-scoring-service', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", "service.wait_for_deployment(show_output=True)" ] }, diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md index 6437e363..09eab61c 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md @@ -17,5 +17,6 @@ These notebooks below are designed to go in sequence. 12. [aml-pipelines-setup-versioned-pipeline-endpoints.ipynb](https://aka.ms/pl-ver-endpoint): This notebook shows how you can setup PipelineEndpoint and submit a Pipeline using the PipelineEndpoint. 13. [aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb](https://aka.ms/pl-datapath): This notebook showcases how to use DataPath and PipelineParameter in AML Pipeline. 14. [aml-pipelines-how-to-use-pipeline-drafts.ipynb](http://aka.ms/pl-pl-draft): This notebook shows how to use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines. +15. [aml-pipelines-hot-to-use-modulestep.ipynb](https://aka.ms/pl-modulestep): This notebook shows how to define Module, ModuleVersion and how to use them in an AML Pipeline using ModuleStep. ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.png) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb index b22fb9fa..2f9b6fb9 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb @@ -155,8 +155,6 @@ "metadata": {}, "outputs": [], "source": [ - "from msrest.exceptions import HttpOperationError\n", - "\n", "datastore_name='MyAdlsDatastore'\n", "subscription_id=os.getenv(\"ADL_SUBSCRIPTION_62\", \"\") # subscription id of ADLS account\n", "resource_group=os.getenv(\"ADL_RESOURCE_GROUP_62\", \"\") # resource group of ADLS account\n", @@ -493,6 +491,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "ADF" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Azure Machine Learning Pipeline with DataTranferStep", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -509,7 +522,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" - } + }, + "order_index": 4, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of DataTranferStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb index 490113df..e775c753 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb @@ -607,6 +607,21 @@ "name": "sanpil" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Getting Started with Azure Machine Learning Pipelines", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -623,7 +638,12 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 1, + "tags": [ + "None" + ], + "task": "Getting Started notebook for ANML Pipelines" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb index 16f46f2e..95146af2 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb @@ -304,7 +304,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "azurebatchstep-remarks-sample" + ] + }, "outputs": [], "source": [ "step = AzureBatchStep(\n", @@ -360,6 +364,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "Azure Batch" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Azure Machine Learning Pipeline with AzureBatchStep", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -376,7 +395,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 9, + "star_tag": [ + "None" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of AzureBatchStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb index c5011aaa..a038b9e9 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb @@ -212,7 +212,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "estimatorstep-remarks-sample" + ] + }, "outputs": [], "source": [ "from azureml.pipeline.steps import EstimatorStep\n", @@ -269,6 +273,21 @@ "name": "sanpil" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Azure Machine Learning Pipeline with EstimatorStep", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -285,7 +304,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 7, + "star_tag": [ + "None" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of EstimatorStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb new file mode 100644 index 00000000..3e8ed493 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb @@ -0,0 +1,500 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "# How to create Module, ModuleVersion, and use them in a pipeline with ModuleStep.\n", + "In this notebook, we introduce the concept of versioned modules and how to use them in an Azure Machine Learning Pipeline.\n", + "\n", + "The core idea behind introducing Module, ModuleVersion and ModuleStep is to allow the separation between a reusable executable components and their actual usage. These reusable software components (such as scripts or executables) can be used in different scenarios and by different users. This follows the same idea of separating software frameworks/libraries and their actual usage in applications. Module and ModuleVersion take the role of the reusable executable components where ModuleStep is there to link them to an actual usage.\n", + "\n", + "A module is an elaborated container of its versions, where each version is the actual computational unit. It is up to users to define the semantics of this hierarchical structure of container and versions. For example, they could be different versions for different use cases, development progress, etc.\n", + "\n", + "Each ModuleVersion may have inputs, outputs and rely on parameters and its environment configuration to operate.\n", + "\n", + "Because Modules can now be separated from execution in a pipeline, there's a need for a mechanism to reconnect these again, and allow using Modules and their versions in a Pipeline. This is done using a new kind of Step called ModuleStep, which allows embedding a Module (and more precisely, a version of it) in a Pipeline.\n", + " \n", + "This notebook shows the usage of a module that computes the sum and product of two numbers. As a module can only be used as a step in a pipeline, we define two different versions for it, to be used in two different use cases:\n", + "\n", + "1) As the module powering the initial step of a pipeline, where the step does not receive any input from preceding steps.\n", + "\n", + "2) As a module powering a step in the pipeline that receives inputs from preceding steps.\n", + "\n", + "Once these two versions are defined, we show how to embed them as steps in the pipeline." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites and AML Basics\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", + "\n", + "### Initialization Steps" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace, Experiment, Datastore, RunConfiguration\n", + "from azureml.core.compute import AmlCompute\n", + "from azureml.core.compute import ComputeTarget\n", + "from azureml.pipeline.core import Pipeline, PipelineData, PipelineParameter\n", + "from azureml.pipeline.core.graph import InputPortDef, OutputPortDef\n", + "from azureml.pipeline.core.module import Module\n", + "from azureml.pipeline.steps import ModuleStep\n", + "\n", + "workspace = Workspace.from_config()\n", + "print(workspace.name, workspace.resource_group, workspace.location, workspace.subscription_id, sep = '\\n')\n", + "\n", + "aml_compute_target = \"cpu-cluster\"\n", + "try:\n", + " aml_compute = AmlCompute(workspace, aml_compute_target)\n", + " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", + "except:\n", + " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", + " \n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", + " min_nodes = 1, \n", + " max_nodes = 4) \n", + " aml_compute = ComputeTarget.create(workspace, aml_compute_target, provisioning_config)\n", + " aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "datastore = Datastore(workspace=workspace, name=\"workspaceblobstore\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a Module" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A Module is a container that manages computational units. Each such computational unit is a version of the module, and is called a ModuleVersion. We start by either creating a module or fetching an existing one by its ID or by its name." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "module = Module.create(workspace, name=\"AddAndMultiply\", description=\"A module that adds and multiplies\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Calculation entry ModuleVersion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A ModuleVersion is an actual computational unit. Defining it involves defining its inputs, outputs, the computation and other configuration items. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here we define that this version is to be used at the beginning of the pipeline, hence does not have incoming ports, only outgoing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "module-remarks-sample" + ] + }, + "outputs": [], + "source": [ + "out_sum = OutputPortDef(name=\"out_sum\", default_datastore_name=datastore.name, default_datastore_mode=\"mount\", \n", + " label=\"Sum of two numbers\")\n", + "out_prod = OutputPortDef(name=\"out_prod\", default_datastore_name=datastore.name, default_datastore_mode=\"mount\", \n", + " label=\"Product of two numbers\")\n", + "entry_version = module.publish_python_script(\"calculate.py\", \"initial\", \n", + " inputs=[], outputs=[out_sum, out_prod], params = {\"initialNum\":12},\n", + " version=\"1\", source_directory=\"./calc\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Calculation middle/end ModuleVersion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Another version of the module performs a computation in the middle or at the end of the pipeline. This version has both outputs and inputs, as it is to be either followed by another computation, or emits its outputs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "moduleversion-remarks-sample" + ] + }, + "outputs": [], + "source": [ + "in1_mid = InputPortDef(name=\"in1\", default_datastore_mode=\"mount\", \n", + " default_data_reference_name=datastore.name, label=\"First input number\")\n", + "in2_mid = InputPortDef(name=\"in2\", default_datastore_mode=\"mount\", \n", + " default_data_reference_name=datastore.name, label=\"Second input number\")\n", + "out_sum_mid = OutputPortDef(name=\"out_sum\", default_datastore_name=datastore.name, default_datastore_mode=\"mount\",\n", + " label=\"Sum of two numbers\")\n", + "out_prod_mid = OutputPortDef(name=\"out_prod\", default_datastore_name=datastore.name, default_datastore_mode=\"mount\",\n", + " label=\"Product of two numbers\")\n", + "module.publish_python_script(\n", + " \"calculate.py\", \"middle\", inputs=[in1_mid, in2_mid], outputs=[out_sum_mid, out_prod_mid], version=\"2\", is_default=True, \n", + " source_directory=\"./calc\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using a Module in a Pipeline with ModuleStep" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Introduction" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using a Module, and more precisely, a specific version, in a pipeline is done via a specialized kind of step. This step is called ModuleStep. It is used as a step in a pipeline, one that holds enough information that allows pinpointing to a specific ModuleVersion. \n", + "\n", + "Another responsibility of a ModuleStep is to wire the actual data that is used in the pipeline to the inputs/outputs definitions of the ModuleVersion. This wiring is done by mapping each of the inputs and the outputs definitions to a data element in the pipeline. Defining the wiring is done using a dictionary whose keys are the name of the inputs/outputs, and the mapped value is the data element (e.g., a PipelineData object)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deciding which ModuleVersion to use - resolving" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is up to the ModuleStep to decide which ModuleVersion to use. That decision is based on the parameters given to the ModuleStep, and it follows this process:\n", + "1. If a ModuleVersion object was provided, use it.\n", + "2. For the given Module object, if a version was provided, use it.\n", + "3. The given Module object resolves which is the right version:\n", + " 1. If a default ModuleVersion was defined for the Module, use it.\n", + " 2. If all the versions of the ModuleVersions in the Module follow semantic versioning, take the one with the highest version.\n", + " 3. Take the ModuleVersion with the most recent update." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### First Step and its wires" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

The first step in a pipeline does not have incoming inputs, but it does have outputs. For that we'd use the ModuleVersion that was designed for this use case.

\n", + "We start off by preparing the outgoing edges as two PipelineData objects (to be later linked to another step), as well as wiring these to the moduleVersion's outputs by creating a dictionary mapping." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "first_sum = PipelineData(\"sum_out\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "first_prod = PipelineData(\"prod_out\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "step_output_wiring = {\"out_sum\":first_sum, \"out_prod\":first_prod}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Initial ModuleStep" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "

In order for the step to know which ModuleVersion to use, we provided the initial ModuleVersion object. We wire the ModuleVersion's outputs with the step_output_wiring map we just created.

\n", + "The initial ModuleStep uses the ModuleVersion that does not have inputs from the pipeline, however, it still needs to receive two numbers to operate upon. We'll provide these numbers as arguments to the step. The first is provided as a parameter, the other one is hard coded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "first_num_param = PipelineParameter(name=\"initialNum\", default_value=17)\n", + "first_step = ModuleStep(module_version=entry_version,\n", + " inputs_map={}, outputs_map=step_output_wiring, \n", + " runconfig=RunConfiguration(), \n", + " compute_target=aml_compute, \n", + " arguments = [\"--output_sum\", first_sum, \n", + " \"--output_product\", first_prod,\n", + " \"--arg_num1\", first_num_param, \n", + " \"--arg_num2\", \"2\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Second step and its wires" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The second step in the pipeline receives its inputs from the previous step, and emits its outputs to the next step. Thus the ModuleStep here needs a different kind of ModuleVersion, one that has both inputs and outputs defined for. We have defined such ModuleVersion, and moreover, defined it to be the default version of our Module. This allows us to provide to the ModuleStep the Module object, which would resolve to that default ModuleVersion when needed." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Wires" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The wiring to the previous step relies on the PipelineData objects we defined before, and for them we create a new dictionary mapping to the ModuleVersion. The wiring to the next step requires us to define another pair of PipelineData objects, for which also a dictionary mapping is needed." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "modulestep-remarks-sample2" + ] + }, + "outputs": [], + "source": [ + "middle_step_input_wiring = {\"in1\":first_sum, \"in2\":first_prod}\n", + "middle_sum = PipelineData(\"middle_sum\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "middle_prod = PipelineData(\"middle_prod\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "middle_step_output_wiring = {\"out_sum\":middle_sum, \"out_prod\":middle_prod}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Middle ModuleStep - resolving to the default ModuleVersion" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "modulestep-remarks-sample" + ] + }, + "outputs": [], + "source": [ + "middle_step = ModuleStep(module=module,\n", + " inputs_map= middle_step_input_wiring, \n", + " outputs_map= middle_step_output_wiring,\n", + " runconfig=RunConfiguration(), compute_target=aml_compute,\n", + " arguments = [\"--file_num1\", first_sum, \"--file_num2\", first_prod,\n", + " \"--output_sum\", middle_sum, \"--output_product\", middle_prod])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## End step and its wires" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The last step in the pipeline also has input and outputs, thus its configuration would be similar to the previous step. In this case we would still use Pipeline data as the step's outputs, even though they are not read by any following step, but rather act as the end result of the pipeline." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Wires" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "last_step_input_wiring = {\"in1\":middle_sum, \"in2\":middle_prod}\n", + "end_sum = PipelineData(\"end_sum\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "end_prod = PipelineData(\"end_prod\", datastore=datastore, output_mode=\"mount\",is_directory=False)\n", + "last_step_output_wiring = {\"out_sum\":end_sum, \"out_prod\":end_prod}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Last ModuleStep - specifing the exact version" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "end_step = ModuleStep(module=module, version=\"2\",\n", + " inputs_map= last_step_input_wiring,\n", + " outputs_map= last_step_output_wiring,\n", + " runconfig=RunConfiguration(), compute_target=aml_compute,\n", + " arguments=[\"--file_num1\", middle_sum, \"--file_num2\", middle_prod,\n", + " \"--output_sum\", end_sum, \"--output_product\", end_prod])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Pipeline, experiment, submission" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The last thing to be done is to create a pipeline out of the previously defined steps, then create an experiment and submit the pipeline to the experiment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline = Pipeline(workspace=workspace, steps=[first_step, middle_step, end_step])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "experiment = Experiment(workspace, 'testmodulestesp')\n", + "experiment.submit(pipeline)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "authors": [ + { + "name": "yrubin" + } + ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to use ModuleStep with AML Pipelines", + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "order_index": 14, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of ModuleStep" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.yml b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.yml similarity index 57% rename from how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.yml rename to how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.yml index 0c2ef761..2b63300b 100644 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.yml +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.yml @@ -1,4 +1,4 @@ -name: distributed-chainer +name: aml-pipelines-how-to-use-modulestep dependencies: - pip: - azureml-sdk diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb index 21cabfe2..aca0155d 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb @@ -41,7 +41,6 @@ "source": [ "import azureml.core\n", "from azureml.core import Workspace\n", - "from azureml.core import Run, Experiment, Datastore\n", "from azureml.widgets import RunDetails\n", "\n", "# Check core SDK version number\n", @@ -66,11 +65,12 @@ "outputs": [], "source": [ "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.core.compute_target import ComputeTargetException\n", "aml_compute_target = \"cpu-cluster\"\n", "try:\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n", " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", - "except:\n", + "except ComputeTargetException:\n", " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", " \n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", @@ -243,6 +243,21 @@ "name": "elihop" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to use Pipeline Drafts to create a Published Pipeline", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -259,7 +274,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" - } + }, + "order_index": 14, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of Pipeline Drafts" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb index 2b5139e0..230fbb85 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb @@ -19,7 +19,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Azure Machine Learning Pipeline with HyperDriveStep\n", + "# Azure Machine Learning Pipeline with HyperDriveStep\n", "\n", "\n", "This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline." @@ -377,7 +377,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "hyperdrivestep-remarks-sample" + ] + }, "outputs": [], "source": [ "metrics_output_name = 'metrics_output'\n", @@ -471,7 +475,7 @@ "import pandas as pd\n", "import json\n", "with open(metrics_output._path_on_datastore) as f: \n", - " metrics_output_result = f.read()\n", + " metrics_output_result = f.read()\n", " \n", "deserialized_metrics_output = json.loads(metrics_output_result)\n", "df = pd.DataFrame(deserialized_metrics_output)\n", @@ -534,7 +538,7 @@ "metadata": {}, "source": [ "## Deploy the model in ACI\n", - "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", + "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). \n", "### Create score.py\n", "First, we will create a scoring script that will be invoked by the web service call. \n", "\n", @@ -605,50 +609,7 @@ "metadata": {}, "source": [ "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", - " description='Tensorflow DNN on MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Deployment Process\n", - "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", - "1. **Register model** \n", - "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", - "2. **Build Docker image** \n", - "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", - "3. **Register image** \n", - "Register that image under the workspace. \n", - "4. **Ship to ACI** \n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" + "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, AzureML will build a Docker container image with the given configuration, if already not available. This image will be deployed to the ACI infrastructure and the scoring script and model will be mounted on the container. The model will then be available as a web service with an HTTP endpoint to accept REST client calls." ] }, { @@ -658,14 +619,21 @@ "outputs": [], "source": [ "%%time\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AciWebservice\n", "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='tf-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", + "inference_config = InferenceConfig(runtime = \"python\", \n", + " entry_script = \"score.py\",\n", + " conda_file = \"myenv.yml\")\n", "\n", + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", + " description='Tensorflow DNN on MNIST')\n", + "\n", + "service = Model.deploy(ws, 'tf-mnist-svc', [model], inference_config, aciconfig)\n", "service.wait_for_deployment(show_output=True)" ] }, @@ -831,6 +799,21 @@ "name": "sanpil" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Azure Machine Learning Pipeline with HyperDriveStep", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -847,7 +830,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 8, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of HyperDriveStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb index 04cccd09..e6a439be 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb @@ -403,6 +403,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to Publish a Pipeline and Invoke the REST endpoint", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -419,7 +434,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 3, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of Published Pipelines" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb index 7a410a75..efc984ca 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb @@ -57,18 +57,6 @@ "#### Retrieve an already attached Azure Machine Learning Compute" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Run, Experiment, Datastore\n", - "\n", - "from azureml.widgets import RunDetails\n", - "\n" - ] - }, { "cell_type": "code", "execution_count": null, @@ -76,11 +64,12 @@ "outputs": [], "source": [ "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.core.compute_target import ComputeTargetException\n", "aml_compute_target = \"cpu-cluster\"\n", "try:\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n", " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", - "except:\n", + "except ComputeTargetException:\n", " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", " \n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", @@ -437,6 +426,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to Setup a Schedule for a Published Pipeline", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -453,7 +457,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 10, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of Schedules for Published Pipelines" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb index 4b956e86..9e25d26d 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb @@ -87,17 +87,17 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core import Run, Experiment, Datastore\n", "from azureml.core.compute import AmlCompute, ComputeTarget\n", "from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.pipeline.core import Pipeline\n", "\n", "#Retrieve an already attached Azure Machine Learning Compute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "aml_compute_target = \"cpu-cluster\"\n", "try:\n", " aml_compute = AmlCompute(ws, aml_compute_target)\n", " print(\"Found existing compute target: {}\".format(aml_compute_target))\n", - "except:\n", + "except ComputeTargetException:\n", " print(\"Creating new compute target: {}\".format(aml_compute_target))\n", " \n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", @@ -530,6 +530,21 @@ "name": "mameghwa" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to setup a versioned Pipeline Endpoint", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -546,7 +561,12 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" - } + }, + "order_index": 12, + "tags": [ + "None" + ], + "task": "Demonstrates the use of PipelineEndpoint to run a specific version of the Published Pipeline" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb index 903362cf..607f24f0 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb @@ -456,6 +456,21 @@ "name": "sanpil" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to use DataPath as a PipelineParameter", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -472,7 +487,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 13, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of DataPath as a PipelineParameter" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb index dbf3caa1..2082ee1d 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb @@ -298,7 +298,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "adlastep-remarks-sample" + ] + }, "outputs": [], "source": [ "adla_step = AdlaStep(\n", @@ -353,6 +357,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "Azure Data Lake Analytics" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to use AdlaStep with AML Pipelines", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -369,7 +388,12 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "order_index": 6, + "tags": [ + "None" + ], + "task": "Demonstrates the use of AdlaStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb index afbd1082..ddc12107 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb @@ -709,6 +709,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "Azure Databricks" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML, Azure Databricks" + ], + "friendly_name": "How to use DatabricksStep with AML Pipelines", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -725,7 +740,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" - } + }, + "order_index": 5, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of DatabricksStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb index aa943c48..3b7c8f63 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb @@ -477,6 +477,21 @@ "name": "sanpil" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Automated Machine Learning" + ], + "friendly_name": "How to use AutoMLStep with AML Pipelines", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -493,7 +508,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 11, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of AutoMLStep" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb index 3a32e59a..19dd96d5 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb @@ -527,6 +527,21 @@ "name": "diray" } ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Azure Machine Learning Pipelines with Data Dependency", "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -543,7 +558,15 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "order_index": 2, + "star_tag": [ + "featured" + ], + "tags": [ + "None" + ], + "task": "Demonstrates how to construct a Pipeline with data dependency between steps" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/calc/calculate.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/calc/calculate.py new file mode 100644 index 00000000..c1beb5eb --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/calc/calculate.py @@ -0,0 +1,63 @@ +# Copyright (c) Microsoft. All rights reserved. +# Licensed under the MIT license. + +import argparse +import os +import codecs + +print("In calculate.py") +parser = argparse.ArgumentParser("calculate") +parser.add_argument("--arg_num1", type=int, help="First number as parameter") +parser.add_argument("--arg_num2", type=int, help="Second number as parameter") +parser.add_argument("--file_num1", type=str, help="First number, read from file") +parser.add_argument("--file_num2", type=str, help="Second number, read from file") +parser.add_argument("--output_sum", type=str, help="output_sum directory") +parser.add_argument("--output_product", type=str, help="output_product directory") + +args = parser.parse_args() + +print("Argument 1: %s" % args.arg_num1) +print("Argument 2: %s" % args.arg_num2) +print("Argument 3: %s" % args.file_num1) +print("Argument 4: %s" % args.file_num2) +print("Argument 5: %s" % args.output_sum) +print("Argument 6: %s" % args.output_product) + + +def get_number_from_file(file_path): + with codecs.open(file_path, "r", encoding="utf-8-sig") as f: + val = int(f.read()) + f.close() + return val + + +def get_num(arg_num, file_num): + if arg_num is None and not file_num: + return 0 + else: + num = arg_num if arg_num is not None else get_number_from_file(file_num) + return num + + +def write_num_to_file(num, file_path): + if file_path is not None and file_path is not '': + output_dir = file_path + else: + output_dir = '.' + filename = output_dir + + if output_dir != '.' and not os.path.exists(os.path.dirname(filename)): + os.makedirs(os.path.dirname(filename)) + + fo = open(filename, 'w+') + fo.write(str(num)) + fo.close() + + +num1 = get_num(args.arg_num1, args.file_num1) +num2 = get_num(args.arg_num2, args.file_num2) +res_sum = num1 + num2 +res_product = num1 * num2 +print("results: sum:", res_sum, ", product:", res_product) +write_num_to_file(res_sum, args.output_sum) +write_num_to_file(res_product, args.output_product) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/compare.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/compare.py deleted file mode 100644 index 1784bc7b..00000000 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/compare.py +++ /dev/null @@ -1,24 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -import argparse -import os - -print("In compare.py") -print("As a data scientist, this is where I use my compare code.") -parser = argparse.ArgumentParser("compare") -parser.add_argument("--compare_data1", type=str, help="compare_data1 data") -parser.add_argument("--compare_data2", type=str, help="compare_data2 data") -parser.add_argument("--output_compare", type=str, help="output_compare directory") -parser.add_argument("--pipeline_param", type=int, help="pipeline parameter") - -args = parser.parse_args() - -print("Argument 1: %s" % args.compare_data1) -print("Argument 2: %s" % args.compare_data2) -print("Argument 3: %s" % args.output_compare) -print("Argument 4: %s" % args.pipeline_param) - -if not (args.output_compare is None): - os.makedirs(args.output_compare, exist_ok=True) - print("%s created" % args.output_compare) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/dummy_train.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/dummy_train.py deleted file mode 100644 index 0ad3b5ff..00000000 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/dummy_train.py +++ /dev/null @@ -1,30 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. -import argparse -import os - -print("*********************************************************") -print("Hello Azure ML!") - -parser = argparse.ArgumentParser() -parser.add_argument('--datadir', type=str, help="data directory") -parser.add_argument('--output', type=str, help="output") -args = parser.parse_args() - -print("Argument 1: %s" % args.datadir) -print("Argument 2: %s" % args.output) - -if not (args.output is None): - os.makedirs(args.output, exist_ok=True) - print("%s created" % args.output) - -try: - from azureml.core import Run - run = Run.get_context() - print("Log Fibonacci numbers.") - run.log_list('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]) - run.complete() -except: - print("Warning: you need to install Azure ML SDK in order to log metrics.") - -print("*********************************************************") diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/extract.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/extract.py deleted file mode 100644 index 0134a090..00000000 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/extract.py +++ /dev/null @@ -1,21 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -import argparse -import os - -print("In extract.py") -print("As a data scientist, this is where I use my extract code.") - -parser = argparse.ArgumentParser("extract") -parser.add_argument("--input_extract", type=str, help="input_extract data") -parser.add_argument("--output_extract", type=str, help="output_extract directory") - -args = parser.parse_args() - -print("Argument 1: %s" % args.input_extract) -print("Argument 2: %s" % args.output_extract) - -if not (args.output_extract is None): - os.makedirs(args.output_extract, exist_ok=True) - print("%s created" % args.output_extract) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train-db-local.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train-db-local.py deleted file mode 100644 index 99b511af..00000000 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train-db-local.py +++ /dev/null @@ -1,5 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -print("In train.py") -print("As a data scientist, this is where I use my training code.") diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train.py b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train.py deleted file mode 100644 index 961f5ebf..00000000 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/train.py +++ /dev/null @@ -1,22 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -import argparse -import os - -print("In train.py") -print("As a data scientist, this is where I use my training code.") - -parser = argparse.ArgumentParser("train") - -parser.add_argument("--input_data", type=str, help="input data") -parser.add_argument("--output_train", type=str, help="output_train directory") - -args = parser.parse_args() - -print("Argument 1: %s" % args.input_data) -print("Argument 2: %s" % args.output_train) - -if not (args.output_train is None): - os.makedirs(args.output_train, exist_ok=True) - print("%s created" % args.output_train) diff --git a/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb b/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb index db24ffee..ef1fbcb6 100644 --- a/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb +++ b/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb @@ -718,7 +718,26 @@ "pygments_lexer": "ipython3", "version": "3.6.6" }, - "msauthor": "dipeck" + "friendly_name": "Train a model with hyperparameter tuning", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train a Convolutional Neural Network (CNN)", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "Azure Container Instance" + ], + "framework": [ + "Chainer" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb b/how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb index 012cc9d2..542d5a73 100644 --- a/how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb +++ b/how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb @@ -313,7 +313,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Distributed Training with Chainer", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Use the Chainer estimator to perform distributed training", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "Chainer" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb b/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb index 821381ac..f4554e78 100644 --- a/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb +++ b/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb @@ -708,7 +708,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Training with hyperparameter tuning using PyTorch", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train an image classification model using transfer learning with the PyTorch estimator", + "datasets": [ + "ImageNet" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "Azure Container Instance" + ], + "framework": [ + "PyTorch" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb b/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb index 2aaf0d8c..f711b27f 100644 --- a/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb @@ -333,7 +333,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Distributed PyTorch", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train a model using the distributed training via Horovod", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "PyTorch" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb b/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb index 151cce38..91ab32c7 100644 --- a/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb @@ -375,7 +375,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" - } + }, + "friendly_name": "Distributed training with PyTorch", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train a model using distributed training via Nccl/Gloo", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "PyTorch" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb b/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb index 7b274376..f0a72833 100644 --- a/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb +++ b/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb @@ -561,7 +561,27 @@ "pygments_lexer": "ipython3", "version": "3.6.6" }, - "msauthor": "dipeck" + "msauthor": "dipeck", + "friendly_name": "Training and hyperparameter tuning with Scikit-learn", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train a support vector machine (SVM) to perform classification", + "datasets": [ + "Iris" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "Scikit-learn" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py index f5ab7099..34bb8fa0 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py +++ b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py @@ -5,6 +5,7 @@ import numpy as np import argparse import os import tensorflow as tf +import glob from azureml.core import Run from utils import load_data @@ -21,17 +22,22 @@ parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', defau parser.add_argument('--learning-rate', type=float, dest='learning_rate', default=0.01, help='learning rate') args = parser.parse_args() -data_folder = os.path.join(args.data_folder, 'mnist') +data_folder = args.data_folder +print('Data folder:', data_folder) -print('training dataset is stored here:', data_folder) - -X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 -X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 - -y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) -y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) +# load train and test set into numpy arrays +# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster. +X_train = load_data(glob.glob(os.path.join(data_folder, '**/train-images-idx3-ubyte.gz'), + recursive=True)[0], False) / 255.0 +X_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-images-idx3-ubyte.gz'), + recursive=True)[0], False) / 255.0 +y_train = load_data(glob.glob(os.path.join(data_folder, '**/train-labels-idx1-ubyte.gz'), + recursive=True)[0], True).reshape(-1) +y_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-labels-idx1-ubyte.gz'), + recursive=True)[0], True).reshape(-1) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n') + training_set_size = X_train.shape[0] n_inputs = 28 * 28 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb index ff9786c0..eb90fd07 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb @@ -228,8 +228,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Upload MNIST dataset to default datastore \n", - "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." + "## Create a FileDataset\n", + "A FileDataset references single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. [Learn More](https://aka.ms/azureml/howto/createdatasets)" ] }, { @@ -238,14 +238,20 @@ "metadata": {}, "outputs": [], "source": [ - "ds = ws.get_default_datastore()" + "from azureml.core.dataset import Dataset\n", + "web_paths = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'\n", + " ]\n", + "dataset = Dataset.File.from_files(path = web_paths)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." + "Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." ] }, { @@ -254,7 +260,12 @@ "metadata": {}, "outputs": [], "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" + "dataset = dataset.register(workspace = ws,\n", + " name = 'mnist dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)\n", + "# list the files referenced by dataset\n", + "dataset.to_path()" ] }, { @@ -409,6 +420,27 @@ "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.environment import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# set up environment\\n\n", + "env = Environment('my_env')\n", + "# ensure latest azureml-dataprep and other required packages installed in the environment\n", + "cd = CondaDependencies.create(pip_packages=['keras',\n", + " 'azureml-sdk',\n", + " 'tensorflow-gpu',\n", + " 'matplotlib',\n", + " 'azureml-dataprep[pandas,fuse]>=1.1.14'])\n", + "\n", + "env.python.conda_dependencies = cd" + ] + }, { "cell_type": "code", "execution_count": null, @@ -422,7 +454,7 @@ "from azureml.train.dnn import TensorFlow\n", "\n", "script_params = {\n", - " '--data-folder': ws.get_default_datastore().as_mount(),\n", + " '--data-folder': dataset.as_named_input('mnist').as_mount(),\n", " '--batch-size': 50,\n", " '--first-layer-neurons': 300,\n", " '--second-layer-neurons': 100,\n", @@ -433,8 +465,8 @@ " script_params=script_params,\n", " compute_target=compute_target,\n", " entry_script='tf_mnist.py', \n", - " use_gpu=True, \n", - " framework_version='1.13')" + " framework_version='1.13',\n", + " environment_definition= env)" ] }, { @@ -722,10 +754,10 @@ "outputs": [], "source": [ "est = TensorFlow(source_directory=script_folder,\n", - " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", + " script_params={'--data-folder': dataset.as_named_input('mnist').as_mount()},\n", " compute_target=compute_target,\n", " entry_script='tf_mnist.py', \n", - " use_gpu=True)" + " environment_definition = env)" ] }, { @@ -1122,6 +1154,22 @@ "name": "ninhu" } ], + "category": "training", + "compute": [ + "AML Compute" + ], + "datasets": [ + "MNIST" + ], + "deployment": [ + "Azure Container Instance" + ], + "exclude_from_index": false, + "framework": [ + "TensorFlow" + ], + "friendly_name": "Training and hyperparameter tuning using the TensorFlow estimator", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -1137,8 +1185,12 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.8" - } + "version": "3.6.9" + }, + "tags": [ + "None" + ], + "task": "Train a deep neural network" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml index 4b9dd138..2d50f3c4 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml +++ b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml @@ -6,3 +6,9 @@ dependencies: - pip: - azureml-sdk - azureml-widgets + - pandas + - keras + - tensorflow-gpu + - matplotlib + - azureml-dataprep + - fuse diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb index 568b7648..d46379fe 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb @@ -20,7 +20,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Distributed Tensorflow with Horovod\n", + "# Distributed TensorFlow with Horovod\n", "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." ] }, @@ -402,7 +402,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Distributed training using TensorFlow with Horovod", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Use the TensorFlow estimator to train a word2vec model", + "datasets": [ + "None" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "TensorFlow" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb index a5e3e143..b1d7058b 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb @@ -314,7 +314,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Distributed TensorFlow with parameter server", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Use the TensorFlow estimator to train a model using distributed training", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "TensorFlow" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py index 85e80cbd..acfd711e 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py @@ -6,6 +6,7 @@ import argparse import os import re import tensorflow as tf +import glob from azureml.core import Run from utils import load_data @@ -24,17 +25,23 @@ previous_model_location = args.resume_from # You can also use environment variable to get the model/checkpoint files location # previous_model_location = os.path.expandvars(os.getenv("AZUREML_DATAREFERENCE_MODEL_LOCATION", None)) -data_folder = os.path.join(args.data_folder, 'mnist') +data_folder = args.data_folder +print('Data folder:', data_folder) -print('training dataset is stored here:', data_folder) +# load train and test set into numpy arrays +# note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster. -X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 -X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 - -y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) -y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) +X_train = load_data(glob.glob(os.path.join(data_folder, '**/train-images-idx3-ubyte.gz'), + recursive=True)[0], False) / 255.0 +X_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-images-idx3-ubyte.gz'), + recursive=True)[0], False) / 255.0 +y_train = load_data(glob.glob(os.path.join(data_folder, '**/train-labels-idx1-ubyte.gz'), + recursive=True)[0], True).reshape(-1) +y_test = load_data(glob.glob(os.path.join(data_folder, '**/t10k-labels-idx1-ubyte.gz'), + recursive=True)[0], True).reshape(-1) print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n') + training_set_size = X_train.shape[0] n_inputs = 28 * 28 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb index da294e7d..cca765a2 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb @@ -146,17 +146,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Upload data to datastore\n", - "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", - "\n", - "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First download the data from Yan LeCun's web site directly and save them in a data folder locally." + "## Create a FileDataset\n", + "A FileDataset references single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. [Learn More](https://aka.ms/azureml/howto/createdatasets)" ] }, { @@ -165,22 +156,21 @@ "metadata": {}, "outputs": [], "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" + "#initialize file dataset \n", + "from azureml.core.dataset import Dataset\n", + "web_paths = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',\n", + " 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'\n", + " ]\n", + "dataset = Dataset.File.from_files(path = web_paths)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." + "Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." ] }, { @@ -189,15 +179,11 @@ "metadata": {}, "outputs": [], "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Upload MNIST data to the default datastore." + "#register dataset to workspace\n", + "dataset = dataset.register(workspace = ws,\n", + " name = 'mnist dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)" ] }, { @@ -206,24 +192,8 @@ "metadata": {}, "outputs": [], "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For convenience, let's get a reference to the datastore. In the next section, we can then pass this reference to our training script's `--data-folder` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds_data = ds.as_mount()\n", - "print(ds_data)" + "# list the files referenced by dataset\n", + "dataset.to_path()" ] }, { @@ -247,6 +217,7 @@ "metadata": {}, "outputs": [], "source": [ + "import os\n", "script_folder = './tf-resume-training'\n", "os.makedirs(script_folder, exist_ok=True)" ] @@ -303,6 +274,27 @@ "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.environment import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "# set up environment\\n\n", + "env = Environment('my_env')\n", + "# ensure latest azureml-dataprep and other required packages installed in the environment\n", + "cd = CondaDependencies.create(pip_packages=['keras',\n", + " 'azureml-sdk',\n", + " 'tensorflow-gpu',\n", + " 'matplotlib',\n", + " 'azureml-dataprep[pandas,fuse]>=1.1.14'])\n", + "\n", + "env.python.conda_dependencies = cd" + ] + }, { "cell_type": "code", "execution_count": null, @@ -312,13 +304,23 @@ "from azureml.train.dnn import TensorFlow\n", "\n", "script_params={\n", - " '--data-folder': ds_data\n", + " '--data-folder': dataset.as_named_input('mnist').as_mount()\n", "}\n", "\n", "estimator= TensorFlow(source_directory=script_folder,\n", " compute_target=compute_target,\n", " script_params=script_params,\n", - " entry_script='tf_mnist_with_checkpoint.py')" + " entry_script='tf_mnist_with_checkpoint.py',\n", + " environment_definition= env)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dataset.to_path()" ] }, { @@ -420,14 +422,15 @@ "from azureml.train.dnn import TensorFlow\n", "\n", "script_params={\n", - " '--data-folder': ds_data\n", + " '--data-folder': dataset.as_named_input('mnist').as_mount()\n", "}\n", "\n", "estimator2 = TensorFlow(source_directory=script_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_with_checkpoint.py',\n", - " resume_from=model_location)" + " compute_target=compute_target,\n", + " script_params=script_params,\n", + " entry_script='tf_mnist_with_checkpoint.py',\n", + " resume_from=model_location,\n", + " environment_definition = env)" ] }, { @@ -463,6 +466,22 @@ "name": "hesuri" } ], + "category": "training", + "compute": [ + "AML Compute" + ], + "datasets": [ + "MNIST" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "TensorFlow" + ], + "friendly_name": "Resuming a model", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -478,9 +497,13 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.6.9" }, - "msauthor": "hesuri" + "msauthor": "hesuri", + "tags": [ + "None" + ], + "task": "Resume a model in TensorFlow from a previously submitted run" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.yml b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.yml index c814eef5..1731084f 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.yml +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.yml @@ -3,3 +3,9 @@ dependencies: - pip: - azureml-sdk - azureml-widgets + - pandas + - keras + - tensorflow-gpu + - matplotlib + - azureml-dataprep + - fuse diff --git a/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb b/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb index b80973a3..a04ae149 100644 --- a/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb +++ b/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb @@ -137,7 +137,7 @@ " '722071', '720326', '725415', '724504', '725665', '725424',\n", " '725066']\n", "\n", - "columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n", + "columns = ['wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n", "\n", "\n", "def enrich_weather_noaa_data(noaa_df):\n", @@ -604,12 +604,14 @@ "services = [service_name]\n", "start = datetime.now() - timedelta(days=2)\n", "end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n", - "feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n", + "feature_list = ['latitude', 'longitude', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n", "alert_config = AlertConfiguration([email_address]) if email_address else None\n", "\n", "# there will be an exception indicating using get() method if DataDrift object already exist\n", "try:\n", - " datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n", + " # With consideration for data latency, by default the scheduled jobs will process previous day's data. \n", + " # In this demo, scoring data will be generated from current day, therefore set schedule start time to next day to process current day's data.\n", + " datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", schedule_start=datetime.utcnow() + timedelta(days=1), alert_config=alert_config)\n", "except KeyError:\n", " datadrift = DataDriftDetector.get(ws, model.name, model.version)\n", " \n", diff --git a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb index 84264fa6..7f0f6169 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb @@ -100,7 +100,7 @@ "\n", "# Check core SDK version number\n", "\n", - "print(\"This notebook was created using SDK version 1.0.62, you are currently running version\", azureml.core.VERSION)" + "print(\"This notebook was created using SDK version 1.0.65, you are currently running version\", azureml.core.VERSION)" ] }, { diff --git a/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb b/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb index ae0b0536..7003c125 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb @@ -595,7 +595,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "friendly_name": "Managing your training runs", + "exclude_from_index": false, + "index_order": 2, + "category": "training", + "task": "Monitor and complete runs", + "datasets": [ + "None" + ], + "compute": [ + "Local" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb index 2d72f3fd..aa283715 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb @@ -22,7 +22,7 @@ "source": [ "# Tensorboard Integration with Run History\n", "\n", - "1. Run a Tensorflow job locally and view its TB output live.\n", + "1. Run a TensorFlow job locally and view its TB output live.\n", "2. The same, for a DSVM.\n", "3. And once more, with an AmlCompute cluster.\n", "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." @@ -555,7 +555,29 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Tensorboard integration with run history", + "exclude_from_index": false, + "index_order": 3, + "category": "training", + "task": "Run a TensorFlow job and view its Tensorboard output live", + "datasets": [ + "None" + ], + "compute": [ + "Local", + "DSVM", + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "TensorFlow" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb index 8f36497f..daaa76d3 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb @@ -315,7 +315,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" - } + }, + "friendly_name": "Deploy a model as a web service using MLflow", + "exclude_from_index": false, + "index_order": 4, + "category": "deployment", + "task": "Use MLflow with AML", + "datasets": [ + "Diabetes" + ], + "compute": [ + "None" + ], + "deployment": [ + "Azure Container Instance" + ], + "framework": [ + "Scikit-learn" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb index 40c75405..4c7ec016 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb @@ -473,6 +473,26 @@ "pygments_lexer": "ipython3", "version": "3.7.3" }, + "friendly_name": "Use MLflow with Azure Machine Learning for training and deployment", + "exclude_from_index": false, + "index_order": 6, + "category": "tutorial", + "task": "Use MLflow with Azure Machine Learning to train and deploy Pa yTorch image classifier model", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "Azure Container Instance" + ], + "framework": [ + "PyTorch" + ], + "tags": [ + "None" + ], "name": "mlflow-sparksummit-pytorch", "notebookId": 2495374963457641 }, diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb index bcd99417..9d845942 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb @@ -241,7 +241,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" - } + }, + "friendly_name": "Use MLflow with AML for a local training run", + "exclude_from_index": false, + "index_order": 7, + "category": "training", + "task": "Use MLflow tracking APIs together with Azure Machine Learning for storing your metrics and artifacts", + "datasets": [ + "Diabetes" + ], + "compute": [ + "Local" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb index c6da843d..9953a074 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb @@ -311,7 +311,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" - } + }, + "friendly_name": "Use MLflow with AML for a remote training run", + "exclude_from_index": false, + "index_order": 8, + "category": "training", + "task": "Use MLflow tracking APIs together with AML for storing your metrics and artifacts", + "datasets": [ + "Diabetes" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/chainer_mnist_distributed.py b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/chainer_mnist_distributed.py deleted file mode 100644 index deb4b5f6..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/chainer_mnist_distributed.py +++ /dev/null @@ -1,153 +0,0 @@ - -import argparse - -import chainer -import chainer.cuda -import chainer.functions as F -import chainer.links as L -from chainer import training -from chainer.training import extensions - -import chainermn -import chainermn.datasets -import chainermn.functions - - -chainer.disable_experimental_feature_warning = True - - -class MLP0SubA(chainer.Chain): - def __init__(self, comm, n_out): - super(MLP0SubA, self).__init__( - l1=L.Linear(784, n_out)) - - def __call__(self, x): - return F.relu(self.l1(x)) - - -class MLP0SubB(chainer.Chain): - def __init__(self, comm): - super(MLP0SubB, self).__init__() - - def __call__(self, y): - return y - - -class MLP0(chainermn.MultiNodeChainList): - # Model on worker 0. - def __init__(self, comm, n_out): - super(MLP0, self).__init__(comm=comm) - self.add_link(MLP0SubA(comm, n_out), rank_in=None, rank_out=1) - self.add_link(MLP0SubB(comm), rank_in=1, rank_out=None) - - -class MLP1Sub(chainer.Chain): - def __init__(self, n_units, n_out): - super(MLP1Sub, self).__init__( - l2=L.Linear(None, n_units), - l3=L.Linear(None, n_out)) - - def __call__(self, h0): - h1 = F.relu(self.l2(h0)) - return self.l3(h1) - - -class MLP1(chainermn.MultiNodeChainList): - # Model on worker 1. - def __init__(self, comm, n_units, n_out): - super(MLP1, self).__init__(comm=comm) - self.add_link(MLP1Sub(n_units, n_out), rank_in=0, rank_out=0) - - -def main(): - parser = argparse.ArgumentParser( - description='ChainerMN example: pipelined neural network') - parser.add_argument('--batchsize', '-b', type=int, default=100, - help='Number of images in each mini-batch') - parser.add_argument('--epoch', '-e', type=int, default=20, - help='Number of sweeps over the dataset to train') - parser.add_argument('--gpu', '-g', action='store_true', - help='Use GPU') - parser.add_argument('--out', '-o', default='result', - help='Directory to output the result') - parser.add_argument('--unit', '-u', type=int, default=1000, - help='Number of units') - args = parser.parse_args() - - # Prepare ChainerMN communicator. - if args.gpu: - comm = chainermn.create_communicator('hierarchical') - data_axis, model_axis = comm.rank % 2, comm.rank // 2 - data_comm = comm.split(data_axis, comm.rank) - model_comm = comm.split(model_axis, comm.rank) - device = comm.intra_rank - else: - comm = chainermn.create_communicator('naive') - data_axis, model_axis = comm.rank % 2, comm.rank // 2 - data_comm = comm.split(data_axis, comm.rank) - model_comm = comm.split(model_axis, comm.rank) - device = -1 - - if model_comm.size != 2: - raise ValueError( - 'This example can only be executed on the even number' - 'of processes.') - - if comm.rank == 0: - print('==========================================') - if args.gpu: - print('Using GPUs') - print('Num unit: {}'.format(args.unit)) - print('Num Minibatch-size: {}'.format(args.batchsize)) - print('Num epoch: {}'.format(args.epoch)) - print('==========================================') - - if data_axis == 0: - model = L.Classifier(MLP0(model_comm, args.unit)) - elif data_axis == 1: - model = MLP1(model_comm, args.unit, 10) - - if device >= 0: - chainer.cuda.get_device_from_id(device).use() - model.to_gpu() - - optimizer = chainermn.create_multi_node_optimizer( - chainer.optimizers.Adam(), data_comm) - optimizer.setup(model) - - # Original dataset on worker 0 and 1. - # Datasets of worker 0 and 1 are split and distributed to all workers. - if model_axis == 0: - train, test = chainer.datasets.get_mnist() - if data_axis == 1: - train = chainermn.datasets.create_empty_dataset(train) - test = chainermn.datasets.create_empty_dataset(test) - else: - train, test = None, None - train = chainermn.scatter_dataset(train, data_comm, shuffle=True) - test = chainermn.scatter_dataset(test, data_comm, shuffle=True) - - train_iter = chainer.iterators.SerialIterator( - train, args.batchsize, shuffle=False) - test_iter = chainer.iterators.SerialIterator( - test, args.batchsize, repeat=False, shuffle=False) - - updater = training.StandardUpdater(train_iter, optimizer, device=device) - trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out) - evaluator = extensions.Evaluator(test_iter, model, device=device) - evaluator = chainermn.create_multi_node_evaluator(evaluator, data_comm) - trainer.extend(evaluator) - - # Some display and output extentions are necessary only for worker 0. - if comm.rank == 0: - trainer.extend(extensions.LogReport()) - trainer.extend(extensions.PrintReport( - ['epoch', 'main/loss', 'validation/main/loss', - 'main/accuracy', 'validation/main/accuracy', 'elapsed_time'])) - trainer.extend(extensions.ProgressBar()) - - trainer.run() - - -if __name__ == '__main__': - main() diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb deleted file mode 100644 index 1911927f..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb +++ /dev/null @@ -1,321 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed Chainer\n", - "In this tutorial, you will run a Chainer training example on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using ChainerMN distributed training across a GPU cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current AmlCompute. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the AmlCompute ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './chainer-distr'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `train_mnist.py`. In practice, you should be able to take any custom Chainer training script as is and run it with Azure ML without having to modify your code." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `train_mnist.py` into the project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('train_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed Chainer tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'chainer-distr'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Chainer estimator\n", - "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import Chainer\n", - "\n", - "estimator = Chainer(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='train_mnist.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, Chainer and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `Chainer` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ninhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/train_mnist.py b/how-to-use-azureml/training-with-deep-learning/distributed-chainer/train_mnist.py deleted file mode 100644 index 29c77f2d..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-chainer/train_mnist.py +++ /dev/null @@ -1,125 +0,0 @@ -# Official ChainerMN example taken from -# https://github.com/chainer/chainer/blob/master/examples/chainermn/mnist/train_mnist.py - -from __future__ import print_function - -import argparse - -import chainer -import chainer.functions as F -import chainer.links as L -from chainer import training -from chainer.training import extensions - -import chainermn - - -class MLP(chainer.Chain): - - def __init__(self, n_units, n_out): - super(MLP, self).__init__( - # the size of the inputs to each layer will be inferred - l1=L.Linear(784, n_units), # n_in -> n_units - l2=L.Linear(n_units, n_units), # n_units -> n_units - l3=L.Linear(n_units, n_out), # n_units -> n_out - ) - - def __call__(self, x): - h1 = F.relu(self.l1(x)) - h2 = F.relu(self.l2(h1)) - return self.l3(h2) - - -def main(): - parser = argparse.ArgumentParser(description='ChainerMN example: MNIST') - parser.add_argument('--batchsize', '-b', type=int, default=100, - help='Number of images in each mini-batch') - parser.add_argument('--communicator', type=str, - default='non_cuda_aware', help='Type of communicator') - parser.add_argument('--epoch', '-e', type=int, default=20, - help='Number of sweeps over the dataset to train') - parser.add_argument('--gpu', '-g', default=True, - help='Use GPU') - parser.add_argument('--out', '-o', default='result', - help='Directory to output the result') - parser.add_argument('--resume', '-r', default='', - help='Resume the training from snapshot') - parser.add_argument('--unit', '-u', type=int, default=1000, - help='Number of units') - args = parser.parse_args() - - # Prepare ChainerMN communicator. - - if args.gpu: - if args.communicator == 'naive': - print("Error: 'naive' communicator does not support GPU.\n") - exit(-1) - comm = chainermn.create_communicator(args.communicator) - device = comm.intra_rank - else: - if args.communicator != 'naive': - print('Warning: using naive communicator ' - 'because only naive supports CPU-only execution') - comm = chainermn.create_communicator('naive') - device = -1 - - if comm.rank == 0: - print('==========================================') - print('Num process (COMM_WORLD): {}'.format(comm.size)) - if args.gpu: - print('Using GPUs') - print('Using {} communicator'.format(args.communicator)) - print('Num unit: {}'.format(args.unit)) - print('Num Minibatch-size: {}'.format(args.batchsize)) - print('Num epoch: {}'.format(args.epoch)) - print('==========================================') - - model = L.Classifier(MLP(args.unit, 10)) - if device >= 0: - chainer.cuda.get_device_from_id(device).use() - model.to_gpu() - - # Create a multi node optimizer from a standard Chainer optimizer. - optimizer = chainermn.create_multi_node_optimizer( - chainer.optimizers.Adam(), comm) - optimizer.setup(model) - - # Split and distribute the dataset. Only worker 0 loads the whole dataset. - # Datasets of worker 0 are evenly split and distributed to all workers. - if comm.rank == 0: - train, test = chainer.datasets.get_mnist() - else: - train, test = None, None - train = chainermn.scatter_dataset(train, comm, shuffle=True) - test = chainermn.scatter_dataset(test, comm, shuffle=True) - - train_iter = chainer.iterators.SerialIterator(train, args.batchsize) - test_iter = chainer.iterators.SerialIterator(test, args.batchsize, - repeat=False, shuffle=False) - - updater = training.StandardUpdater(train_iter, optimizer, device=device) - trainer = training.Trainer(updater, (args.epoch, 'epoch'), out=args.out) - - # Create a multi node evaluator from a standard Chainer evaluator. - evaluator = extensions.Evaluator(test_iter, model, device=device) - evaluator = chainermn.create_multi_node_evaluator(evaluator, comm) - trainer.extend(evaluator) - - # Some display and output extensions are necessary only for one worker. - # (Otherwise, there would just be repeated outputs.) - if comm.rank == 0: - trainer.extend(extensions.dump_graph('main/loss')) - trainer.extend(extensions.LogReport()) - trainer.extend(extensions.PrintReport( - ['epoch', 'main/loss', 'validation/main/loss', - 'main/accuracy', 'validation/main/accuracy', 'elapsed_time'])) - trainer.extend(extensions.ProgressBar()) - - if args.resume: - chainer.serializers.load_npz(args.resume, trainer) - - trainer.run() - - -if __name__ == '__main__': - main() diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb deleted file mode 100644 index 2f75cab0..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb +++ /dev/null @@ -1,341 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed PyTorch with Horovod\n", - "In this tutorial, you will train a PyTorch model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using distributed training via [Horovod](https://github.com/uber/horovod) across a GPU cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) on single-node PyTorch training using Azure Machine Learning" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current AmlCompute. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the AmlCompute ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the script for distributed training of MNIST is already provided for you at `pytorch_horovod_mnist.py`. In practice, you should be able to take any custom PyTorch training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [metric logging](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#logging) capabilities, you will have to add a small amount of Azure ML logic inside your training script. In this example, at each logging interval, we will log the loss for that minibatch to our Azure ML run.\n", - "\n", - "To do so, in `pytorch_horovod_mnist.py`, we will first access the Azure ML `Run` object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "Later within the script, we log the loss metric to our run:\n", - "```Python\n", - "run.log('loss', loss.item())\n", - "```" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `pytorch_horovod_mnist.py` into the project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('pytorch_horovod_mnist.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import PyTorch\n", - "\n", - "estimator = PyTorch(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " entry_script='pytorch_horovod_mnist.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, PyTorch, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `PyTorch` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes. You can see that the widget automatically plots and visualizes the loss metric that we logged to the Azure ML run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ninhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.yml b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.yml deleted file mode 100644 index 58bb77d8..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.yml +++ /dev/null @@ -1,5 +0,0 @@ -name: distributed-pytorch-with-horovod -dependencies: -- pip: - - azureml-sdk - - azureml-widgets diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/pytorch_horovod_mnist.py b/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/pytorch_horovod_mnist.py deleted file mode 100644 index 83562526..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/pytorch_horovod_mnist.py +++ /dev/null @@ -1,170 +0,0 @@ -# Copyright (c) 2017, PyTorch contributors -# Modifications copyright (C) Microsoft Corporation -# Licensed under the BSD license -# Adapted from https://github.com/uber/horovod/blob/master/examples/pytorch_mnist.py - -from __future__ import print_function -import argparse -import torch.nn as nn -import torch.nn.functional as F -import torch.optim as optim -from torchvision import datasets, transforms -import torch.utils.data.distributed -import horovod.torch as hvd - -from azureml.core.run import Run -# get the Azure ML run object -run = Run.get_context() - -print("Torch version:", torch.__version__) - -# Training settings -parser = argparse.ArgumentParser(description='PyTorch MNIST Example') -parser.add_argument('--batch-size', type=int, default=64, metavar='N', - help='input batch size for training (default: 64)') -parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N', - help='input batch size for testing (default: 1000)') -parser.add_argument('--epochs', type=int, default=10, metavar='N', - help='number of epochs to train (default: 10)') -parser.add_argument('--lr', type=float, default=0.01, metavar='LR', - help='learning rate (default: 0.01)') -parser.add_argument('--momentum', type=float, default=0.5, metavar='M', - help='SGD momentum (default: 0.5)') -parser.add_argument('--no-cuda', action='store_true', default=False, - help='disables CUDA training') -parser.add_argument('--seed', type=int, default=42, metavar='S', - help='random seed (default: 42)') -parser.add_argument('--log-interval', type=int, default=10, metavar='N', - help='how many batches to wait before logging training status') -parser.add_argument('--fp16-allreduce', action='store_true', default=False, - help='use fp16 compression during allreduce') -args = parser.parse_args() -args.cuda = not args.no_cuda and torch.cuda.is_available() - -hvd.init() -torch.manual_seed(args.seed) - -if args.cuda: - # Horovod: pin GPU to local rank. - torch.cuda.set_device(hvd.local_rank()) - torch.cuda.manual_seed(args.seed) - - -kwargs = {} -train_dataset = \ - datasets.MNIST('data-%d' % hvd.rank(), train=True, download=True, - transform=transforms.Compose([ - transforms.ToTensor(), - transforms.Normalize((0.1307,), (0.3081,)) - ])) -train_sampler = torch.utils.data.distributed.DistributedSampler( - train_dataset, num_replicas=hvd.size(), rank=hvd.rank()) -train_loader = torch.utils.data.DataLoader( - train_dataset, batch_size=args.batch_size, sampler=train_sampler, **kwargs) - -test_dataset = \ - datasets.MNIST('data-%d' % hvd.rank(), train=False, transform=transforms.Compose([ - transforms.ToTensor(), - transforms.Normalize((0.1307,), (0.3081,)) - ])) -test_sampler = torch.utils.data.distributed.DistributedSampler( - test_dataset, num_replicas=hvd.size(), rank=hvd.rank()) -test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=args.test_batch_size, - sampler=test_sampler, **kwargs) - - -class Net(nn.Module): - def __init__(self): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(1, 10, kernel_size=5) - self.conv2 = nn.Conv2d(10, 20, kernel_size=5) - self.conv2_drop = nn.Dropout2d() - self.fc1 = nn.Linear(320, 50) - self.fc2 = nn.Linear(50, 10) - - def forward(self, x): - x = F.relu(F.max_pool2d(self.conv1(x), 2)) - x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) - x = x.view(-1, 320) - x = F.relu(self.fc1(x)) - x = F.dropout(x, training=self.training) - x = self.fc2(x) - return F.log_softmax(x) - - -model = Net() - -if args.cuda: - # Move model to GPU. - model.cuda() - -# Horovod: broadcast parameters. -hvd.broadcast_parameters(model.state_dict(), root_rank=0) - -# Horovod: scale learning rate by the number of GPUs. -optimizer = optim.SGD(model.parameters(), lr=args.lr * hvd.size(), - momentum=args.momentum) - -# Horovod: (optional) compression algorithm. -compression = hvd.Compression.fp16 if args.fp16_allreduce else hvd.Compression.none - -# Horovod: wrap optimizer with DistributedOptimizer. -optimizer = hvd.DistributedOptimizer(optimizer, - named_parameters=model.named_parameters(), - compression=compression) - - -def train(epoch): - model.train() - train_sampler.set_epoch(epoch) - for batch_idx, (data, target) in enumerate(train_loader): - if args.cuda: - data, target = data.cuda(), target.cuda() - optimizer.zero_grad() - output = model(data) - loss = F.nll_loss(output, target) - loss.backward() - optimizer.step() - if batch_idx % args.log_interval == 0: - print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( - epoch, batch_idx * len(data), len(train_sampler), - 100. * batch_idx / len(train_loader), loss.item())) - - # log the loss to the Azure ML run - run.log('loss', loss.item()) - - -def metric_average(val, name): - tensor = torch.tensor(val) - avg_tensor = hvd.allreduce(tensor, name=name) - return avg_tensor.item() - - -def test(): - model.eval() - test_loss = 0. - test_accuracy = 0. - for data, target in test_loader: - if args.cuda: - data, target = data.cuda(), target.cuda() - output = model(data) - # sum up batch loss - test_loss += F.nll_loss(output, target, size_average=False).item() - # get the index of the max log-probability - pred = output.data.max(1, keepdim=True)[1] - test_accuracy += pred.eq(target.data.view_as(pred)).cpu().float().sum() - - test_loss /= len(test_sampler) - test_accuracy /= len(test_sampler) - - test_loss = metric_average(test_loss, 'avg_loss') - test_accuracy = metric_average(test_accuracy, 'avg_accuracy') - - if hvd.rank() == 0: - print('\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n'.format( - test_loss, 100. * test_accuracy)) - - -for epoch in range(1, args.epochs + 1): - train(epoch) - test() diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb deleted file mode 100644 index 1ad9d946..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb +++ /dev/null @@ -1,410 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed Tensorflow with Horovod\n", - "In this tutorial, you will train a word2vec model in TensorFlow using distributed training via [Horovod](https://github.com/uber/horovod)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data to datastore\n", - "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", - "\n", - "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, download the training data from [here](http://mattmahoney.net/dc/text8.zip) to your local machine:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data', exist_ok=True)\n", - "download_url = 'http://mattmahoney.net/dc/text8.zip'\n", - "urllib.request.urlretrieve(download_url, filename='./data/text8.zip')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Upload the contents of the data directory to the path `./data` on the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='data', target_path='data', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For convenience, let's get a reference to the path on the datastore with the zip file of training data. We can do so using the `path` method. In the next section, we can then pass this reference to our training script's `--input_data` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "path_on_datastore = 'data/text8.zip'\n", - "ds_data = ds.path(path_on_datastore)\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "project_folder = './tf-distr-hvd'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_horovod_word2vec.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('tf_horovod_word2vec.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-hvd'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow).\n", - "\n", - "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import MpiConfiguration\n", - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--input_data': ds_data\n", - "}\n", - "\n", - "estimator= TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_horovod_word2vec.py',\n", - " node_count=2,\n", - " distributed_training=MpiConfiguration(),\n", - " framework_version='1.13')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with one worker per node. In order to execute a distributed run using MPI/Horovod, you must provide the argument `distributed_backend='mpi'`. Using this estimator with these settings, TensorFlow, Horovod and their dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters.\n", - "\n", - "Note that we passed our training data reference `ds_data` to our script's `--input_data` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.yml b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.yml deleted file mode 100644 index 15d0a491..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.yml +++ /dev/null @@ -1,5 +0,0 @@ -name: distributed-tensorflow-with-horovod -dependencies: -- pip: - - azureml-sdk - - azureml-widgets diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py deleted file mode 100644 index f29fb278..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py +++ /dev/null @@ -1,259 +0,0 @@ -# Copyright 2015 The TensorFlow Authors. All Rights Reserved. -# Modifications copyright (C) 2017 Uber Technologies, Inc. -# Additional modifications copyright (C) Microsoft Corporation -# Licensed under the Apache License, Version 2.0 -# Script adapted from: https://github.com/uber/horovod/blob/master/examples/tensorflow_word2vec.py -# ====================================== -"""Basic word2vec example.""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import collections -import math -import os -import random -import zipfile -import argparse - -import numpy as np -from six.moves import urllib -from six.moves import xrange # pylint: disable=redefined-builtin -import tensorflow as tf -import horovod.tensorflow as hvd -from azureml.core.run import Run - -# Horovod: initialize Horovod. -hvd.init() - -parser = argparse.ArgumentParser() -parser.add_argument('--input_data', type=str, help='training data') - -args = parser.parse_args() - -input_data = args.input_data -print("the input data is at %s" % input_data) - -# Step 1: Download the data. -url = 'http://mattmahoney.net/dc/text8.zip' - - -def maybe_download(filename, expected_bytes): - """Download a file if not present, and make sure it's the right size.""" - if not filename: - filename = "text8.zip" - if not os.path.exists(filename): - print("Downloading the data from http://mattmahoney.net/dc/text8.zip") - filename, _ = urllib.request.urlretrieve(url, filename) - else: - print("Use the data from %s" % input_data) - statinfo = os.stat(filename) - if statinfo.st_size == expected_bytes: - print('Found and verified', filename) - else: - print(statinfo.st_size) - raise Exception( - 'Failed to verify ' + url + '. Can you get to it with a browser?') - return filename - - -filename = maybe_download(input_data, 31344016) - - -# Read the data into a list of strings. -def read_data(filename): - """Extract the first file enclosed in a zip file as a list of words.""" - with zipfile.ZipFile(filename) as f: - data = tf.compat.as_str(f.read(f.namelist()[0])).split() - return data - - -vocabulary = read_data(filename) -print('Data size', len(vocabulary)) - -# Step 2: Build the dictionary and replace rare words with UNK token. -vocabulary_size = 50000 - - -def build_dataset(words, n_words): - """Process raw inputs into a dataset.""" - count = [['UNK', -1]] - count.extend(collections.Counter(words).most_common(n_words - 1)) - dictionary = dict() - for word, _ in count: - dictionary[word] = len(dictionary) - data = list() - unk_count = 0 - for word in words: - if word in dictionary: - index = dictionary[word] - else: - index = 0 # dictionary['UNK'] - unk_count += 1 - data.append(index) - count[0][1] = unk_count - reversed_dictionary = dict(zip(dictionary.values(), dictionary.keys())) - return data, count, dictionary, reversed_dictionary - - -data, count, dictionary, reverse_dictionary = build_dataset(vocabulary, - vocabulary_size) -del vocabulary # Hint to reduce memory. -print('Most common words (+UNK)', count[:5]) -print('Sample data', data[:10], [reverse_dictionary[i] for i in data[:10]]) - - -# Step 3: Function to generate a training batch for the skip-gram model. -def generate_batch(batch_size, num_skips, skip_window): - assert num_skips <= 2 * skip_window - # Adjust batch_size to match num_skips - batch_size = batch_size // num_skips * num_skips - span = 2 * skip_window + 1 # [ skip_window target skip_window ] - # Backtrack a little bit to avoid skipping words in the end of a batch - data_index = random.randint(0, len(data) - span - 1) - batch = np.ndarray(shape=(batch_size), dtype=np.int32) - labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32) - buffer = collections.deque(maxlen=span) - for _ in range(span): - buffer.append(data[data_index]) - data_index = (data_index + 1) % len(data) - for i in range(batch_size // num_skips): - target = skip_window # target label at the center of the buffer - targets_to_avoid = [skip_window] - for j in range(num_skips): - while target in targets_to_avoid: - target = random.randint(0, span - 1) - targets_to_avoid.append(target) - batch[i * num_skips + j] = buffer[skip_window] - labels[i * num_skips + j, 0] = buffer[target] - buffer.append(data[data_index]) - data_index = (data_index + 1) % len(data) - return batch, labels - - -batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1) -for i in range(8): - print(batch[i], reverse_dictionary[batch[i]], - '->', labels[i, 0], reverse_dictionary[labels[i, 0]]) - -# Step 4: Build and train a skip-gram model. - -max_batch_size = 128 -embedding_size = 128 # Dimension of the embedding vector. -skip_window = 1 # How many words to consider left and right. -num_skips = 2 # How many times to reuse an input to generate a label. - -# We pick a random validation set to sample nearest neighbors. Here we limit the -# validation samples to the words that have a low numeric ID, which by -# construction are also the most frequent. -valid_size = 16 # Random set of words to evaluate similarity on. -valid_window = 100 # Only pick dev samples in the head of the distribution. -valid_examples = np.random.choice(valid_window, valid_size, replace=False) -num_sampled = 64 # Number of negative examples to sample. - -graph = tf.Graph() - -with graph.as_default(): - - # Input data. - train_inputs = tf.placeholder(tf.int32, shape=[None]) - train_labels = tf.placeholder(tf.int32, shape=[None, 1]) - valid_dataset = tf.constant(valid_examples, dtype=tf.int32) - - # Look up embeddings for inputs. - embeddings = tf.Variable( - tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0)) - embed = tf.nn.embedding_lookup(embeddings, train_inputs) - - # Construct the variables for the NCE loss - nce_weights = tf.Variable( - tf.truncated_normal([vocabulary_size, embedding_size], - stddev=1.0 / math.sqrt(embedding_size))) - nce_biases = tf.Variable(tf.zeros([vocabulary_size])) - - # Compute the average NCE loss for the batch. - # tf.nce_loss automatically draws a new sample of the negative labels each - # time we evaluate the loss. - loss = tf.reduce_mean( - tf.nn.nce_loss(weights=nce_weights, - biases=nce_biases, - labels=train_labels, - inputs=embed, - num_sampled=num_sampled, - num_classes=vocabulary_size)) - - # Horovod: adjust learning rate based on number of GPUs. - optimizer = tf.train.GradientDescentOptimizer(1.0 * hvd.size()) - - # Horovod: add Horovod Distributed Optimizer. - optimizer = hvd.DistributedOptimizer(optimizer) - - train_op = optimizer.minimize(loss) - - # Compute the cosine similarity between minibatch examples and all embeddings. - norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True)) - normalized_embeddings = embeddings / norm - valid_embeddings = tf.nn.embedding_lookup( - normalized_embeddings, valid_dataset) - similarity = tf.matmul( - valid_embeddings, normalized_embeddings, transpose_b=True) - - # Add variable initializer. - init = tf.global_variables_initializer() - - # Horovod: broadcast initial variable states from rank 0 to all other processes. - # This is necessary to ensure consistent initialization of all workers when - # training is started with random weights or restored from a checkpoint. - bcast = hvd.broadcast_global_variables(0) - -# Step 5: Begin training. - -# Horovod: adjust number of steps based on number of GPUs. -num_steps = 4000 // hvd.size() + 1 - -# Horovod: pin GPU to be used to process local rank (one GPU per process) -config = tf.ConfigProto() -config.gpu_options.allow_growth = True -config.gpu_options.visible_device_list = str(hvd.local_rank()) - -with tf.Session(graph=graph, config=config) as session: - # We must initialize all variables before we use them. - init.run() - bcast.run() - print('Initialized') - run = Run.get_context() - average_loss = 0 - for step in xrange(num_steps): - # simulate various sentence length by randomization - batch_size = random.randint(max_batch_size // 2, max_batch_size) - batch_inputs, batch_labels = generate_batch( - batch_size, num_skips, skip_window) - feed_dict = {train_inputs: batch_inputs, train_labels: batch_labels} - - # We perform one update step by evaluating the optimizer op (including it - # in the list of returned values for session.run() - _, loss_val = session.run([train_op, loss], feed_dict=feed_dict) - average_loss += loss_val - - if step % 2000 == 0: - if step > 0: - average_loss /= 2000 - # The average loss is an estimate of the loss over the last 2000 batches. - print('Average loss at step ', step, ': ', average_loss) - run.log("Loss", average_loss) - average_loss = 0 - final_embeddings = normalized_embeddings.eval() - - # Evaluate similarity in the end on worker 0. - if hvd.rank() == 0: - sim = similarity.eval() - for i in xrange(valid_size): - valid_word = reverse_dictionary[valid_examples[i]] - top_k = 8 # number of nearest neighbors - nearest = (-sim[i, :]).argsort()[1:top_k + 1] - log_str = 'Nearest to %s:' % valid_word - for k in xrange(top_k): - close_word = reverse_dictionary[nearest[k]] - log_str = '%s %s,' % (log_str, close_word) - print(log_str) diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb deleted file mode 100644 index 2bca25dd..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb +++ /dev/null @@ -1,325 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Distributed TensorFlow with parameter server\n", - "In this tutorial, you will train a TensorFlow model on the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset using native [distributed TensorFlow](https://www.tensorflow.org/deploy/distributed)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that we have the cluster ready to go, let's run our distributed training job." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './tf-distr-ps'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_mnist_replica.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('tf_mnist_replica.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-distr-ps'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import TensorflowConfiguration\n", - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--num_gpus': 1,\n", - " '--train_steps': 500\n", - "}\n", - "\n", - "distributed_training = TensorflowConfiguration()\n", - "distributed_training.worker_count = 2\n", - "\n", - "estimator = TensorFlow(source_directory=project_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_replica.py',\n", - " node_count=2,\n", - " distributed_training=distributed_training,\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code specifies that we will run our training script on `2` nodes, with two workers and one parameter server. In order to execute a native distributed TensorFlow run, you must provide the argument `distributed_backend='ps'`. Using this estimator with these settings, TensorFlow and its dependencies will be installed for you. However, if your script also uses other packages, make sure to install them via the `TensorFlow` constructor's `pip_packages` or `conda_packages` parameters." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True) # this provides a verbose log" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ninhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.yml b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.yml deleted file mode 100644 index bc5a30eb..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.yml +++ /dev/null @@ -1,5 +0,0 @@ -name: distributed-tensorflow-with-parameter-server -dependencies: -- pip: - - azureml-sdk - - azureml-widgets diff --git a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/tf_mnist_replica.py b/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/tf_mnist_replica.py deleted file mode 100644 index 96d40fed..00000000 --- a/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/tf_mnist_replica.py +++ /dev/null @@ -1,271 +0,0 @@ -# Copyright 2016 The TensorFlow Authors. All Rights Reserved. -# Licensed under the Apache License, Version 2.0 -# Script adapted from: -# https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/dist_test/python/mnist_replica.py -# ============================================================================== -"""Distributed MNIST training and validation, with model replicas. -A simple softmax model with one hidden layer is defined. The parameters -(weights and biases) are located on one parameter server (ps), while the ops -are executed on two worker nodes by default. The TF sessions also run on the -worker node. -Multiple invocations of this script can be done in parallel, with different -values for --task_index. There should be exactly one invocation with ---task_index, which will create a master session that carries out variable -initialization. The other, non-master, sessions will wait for the master -session to finish the initialization before proceeding to the training stage. -The coordination between the multiple worker invocations occurs due to -the definition of the parameters on the same ps devices. The parameter updates -from one worker is visible to all other workers. As such, the workers can -perform forward computation and gradient calculation in parallel, which -should lead to increased training speed for the simple model. -""" - -from __future__ import absolute_import -from __future__ import division -from __future__ import print_function - -import os -import math -import sys -import tempfile -import time -import json - -import tensorflow as tf -from tensorflow.examples.tutorials.mnist import input_data -from azureml.core.run import Run - -flags = tf.app.flags -flags.DEFINE_string("data_dir", "/tmp/mnist-data", - "Directory for storing mnist data") -flags.DEFINE_boolean("download_only", False, - "Only perform downloading of data; Do not proceed to " - "session preparation, model definition or training") -flags.DEFINE_integer("num_gpus", 0, "Total number of gpus for each machine." - "If you don't use GPU, please set it to '0'") -flags.DEFINE_integer("replicas_to_aggregate", None, - "Number of replicas to aggregate before parameter update " - "is applied (For sync_replicas mode only; default: " - "num_workers)") -flags.DEFINE_integer("hidden_units", 100, - "Number of units in the hidden layer of the NN") -flags.DEFINE_integer("train_steps", 200, - "Number of (global) training steps to perform") -flags.DEFINE_integer("batch_size", 100, "Training batch size") -flags.DEFINE_float("learning_rate", 0.01, "Learning rate") -flags.DEFINE_boolean( - "sync_replicas", False, - "Use the sync_replicas (synchronized replicas) mode, " - "wherein the parameter updates from workers are aggregated " - "before applied to avoid stale gradients") -flags.DEFINE_boolean( - "existing_servers", False, "Whether servers already exists. If True, " - "will use the worker hosts via their GRPC URLs (one client process " - "per worker host). Otherwise, will create an in-process TensorFlow " - "server.") - -FLAGS = flags.FLAGS - -IMAGE_PIXELS = 28 - - -def main(unused_argv): - data_root = os.path.join("outputs", "MNIST") - mnist = None - tf_config = os.environ.get("TF_CONFIG") - if not tf_config or tf_config == "": - raise ValueError("TF_CONFIG not found.") - tf_config_json = json.loads(tf_config) - cluster = tf_config_json.get('cluster') - job_name = tf_config_json.get('task', {}).get('type') - task_index = tf_config_json.get('task', {}).get('index') - job_name = "worker" if job_name == "master" else job_name - sentinel_path = os.path.join(data_root, "complete.txt") - if job_name == "worker" and task_index == 0: - mnist = input_data.read_data_sets(data_root, one_hot=True) - with open(sentinel_path, 'w+') as f: - f.write("download complete") - else: - while not os.path.exists(sentinel_path): - time.sleep(0.01) - mnist = input_data.read_data_sets(data_root, one_hot=True) - - if FLAGS.download_only: - sys.exit(0) - - print("job name = %s" % job_name) - print("task index = %d" % task_index) - print("number of GPUs = %d" % FLAGS.num_gpus) - - # Construct the cluster and start the server - cluster_spec = tf.train.ClusterSpec(cluster) - - # Get the number of workers. - num_workers = len(cluster_spec.task_indices("worker")) - - if not FLAGS.existing_servers: - # Not using existing servers. Create an in-process server. - server = tf.train.Server( - cluster_spec, job_name=job_name, task_index=task_index) - if job_name == "ps": - server.join() - - is_chief = (task_index == 0) - if FLAGS.num_gpus > 0: - # Avoid gpu allocation conflict: now allocate task_num -> #gpu - # for each worker in the corresponding machine - gpu = (task_index % FLAGS.num_gpus) - worker_device = "/job:worker/task:%d/gpu:%d" % (task_index, gpu) - elif FLAGS.num_gpus == 0: - # Just allocate the CPU to worker server - cpu = 0 - worker_device = "/job:worker/task:%d/cpu:%d" % (task_index, cpu) - # The device setter will automatically place Variables ops on separate - # parameter servers (ps). The non-Variable ops will be placed on the workers. - # The ps use CPU and workers use corresponding GPU - with tf.device( - tf.train.replica_device_setter( - worker_device=worker_device, - ps_device="/job:ps/cpu:0", - cluster=cluster)): - global_step = tf.Variable(0, name="global_step", trainable=False) - - # Variables of the hidden layer - hid_w = tf.Variable( - tf.truncated_normal( - [IMAGE_PIXELS * IMAGE_PIXELS, FLAGS.hidden_units], - stddev=1.0 / IMAGE_PIXELS), - name="hid_w") - hid_b = tf.Variable(tf.zeros([FLAGS.hidden_units]), name="hid_b") - - # Variables of the softmax layer - sm_w = tf.Variable( - tf.truncated_normal( - [FLAGS.hidden_units, 10], - stddev=1.0 / math.sqrt(FLAGS.hidden_units)), - name="sm_w") - sm_b = tf.Variable(tf.zeros([10]), name="sm_b") - - # Ops: located on the worker specified with task_index - x = tf.placeholder(tf.float32, [None, IMAGE_PIXELS * IMAGE_PIXELS]) - y_ = tf.placeholder(tf.float32, [None, 10]) - - hid_lin = tf.nn.xw_plus_b(x, hid_w, hid_b) - hid = tf.nn.relu(hid_lin) - - y = tf.nn.softmax(tf.nn.xw_plus_b(hid, sm_w, sm_b)) - cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))) - - opt = tf.train.AdamOptimizer(FLAGS.learning_rate) - - if FLAGS.sync_replicas: - if FLAGS.replicas_to_aggregate is None: - replicas_to_aggregate = num_workers - else: - replicas_to_aggregate = FLAGS.replicas_to_aggregate - - opt = tf.train.SyncReplicasOptimizer( - opt, - replicas_to_aggregate=replicas_to_aggregate, - total_num_replicas=num_workers, - name="mnist_sync_replicas") - - train_step = opt.minimize(cross_entropy, global_step=global_step) - - if FLAGS.sync_replicas: - local_init_op = opt.local_step_init_op - if is_chief: - local_init_op = opt.chief_init_op - - ready_for_local_init_op = opt.ready_for_local_init_op - - # Initial token and chief queue runners required by the sync_replicas mode - chief_queue_runner = opt.get_chief_queue_runner() - sync_init_op = opt.get_init_tokens_op() - - init_op = tf.global_variables_initializer() - train_dir = tempfile.mkdtemp() - - if FLAGS.sync_replicas: - sv = tf.train.Supervisor( - is_chief=is_chief, - logdir=train_dir, - init_op=init_op, - local_init_op=local_init_op, - ready_for_local_init_op=ready_for_local_init_op, - recovery_wait_secs=1, - global_step=global_step) - else: - sv = tf.train.Supervisor( - is_chief=is_chief, - logdir=train_dir, - init_op=init_op, - recovery_wait_secs=1, - global_step=global_step) - - sess_config = tf.ConfigProto( - allow_soft_placement=True, - log_device_placement=False, - device_filters=["/job:ps", - "/job:worker/task:%d" % task_index]) - - # The chief worker (task_index==0) session will prepare the session, - # while the remaining workers will wait for the preparation to complete. - if is_chief: - print("Worker %d: Initializing session..." % task_index) - else: - print("Worker %d: Waiting for session to be initialized..." % - task_index) - - if FLAGS.existing_servers: - server_grpc_url = "grpc://" + task_index - print("Using existing server at: %s" % server_grpc_url) - - sess = sv.prepare_or_wait_for_session(server_grpc_url, config=sess_config) - else: - sess = sv.prepare_or_wait_for_session(server.target, config=sess_config) - - print("Worker %d: Session initialization complete." % task_index) - - if FLAGS.sync_replicas and is_chief: - # Chief worker will start the chief queue runner and call the init op. - sess.run(sync_init_op) - sv.start_queue_runners(sess, [chief_queue_runner]) - - # Perform training - time_begin = time.time() - print("Training begins @ %f" % time_begin) - - local_step = 0 - while True: - # Training feed - batch_xs, batch_ys = mnist.train.next_batch(FLAGS.batch_size) - train_feed = {x: batch_xs, y_: batch_ys} - - _, step = sess.run([train_step, global_step], feed_dict=train_feed) - local_step += 1 - - now = time.time() - print("%f: Worker %d: training step %d done (global step: %d)" % - (now, task_index, local_step, step)) - - if step >= FLAGS.train_steps: - break - - time_end = time.time() - print("Training ends @ %f" % time_end) - training_time = time_end - time_begin - print("Training elapsed time: %f s" % training_time) - - # Validation feed - val_feed = {x: mnist.validation.images, y_: mnist.validation.labels} - val_xent = sess.run(cross_entropy, feed_dict=val_feed) - print("After %d training step(s), validation cross entropy = %g" % - (FLAGS.train_steps, val_xent)) - if job_name == "worker" and task_index == 0: - run = Run.get_context() - run.log("CrossEntropy", val_xent) - - -if __name__ == "__main__": - tf.app.run() diff --git a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb index ff076fb8..b350103c 100644 --- a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb @@ -253,7 +253,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "friendly_name": "Using Tensorboard", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Export the run history as Tensorboard logs", + "datasets": [ + "None" + ], + "compute": [ + "None" + ], + "deployment": [ + "None" + ], + "framework": [ + "TensorFlow" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb index 83c901d4..a2f15bf3 100644 --- a/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb @@ -534,7 +534,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" - } + }, + "friendly_name": "Estimators in AML with hyperparameter tuning", + "exclude_from_index": false, + "index_order": 1, + "category": "starter", + "task": "Use the Estimator pattern in Azure Machine Learning SDK", + "datasets": [ + "None" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb b/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb deleted file mode 100644 index 2f638e63..00000000 --- a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb +++ /dev/null @@ -1,562 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tensorboard Integration with Run History\n", - "\n", - "1. Run a Tensorflow job locally and view its TB output live.\n", - "2. The same, for a DSVM.\n", - "3. And once more, with an AmlCompute cluster.\n", - "4. Finally, we'll collect all of these historical runs together into a single Tensorboard graph." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) notebook to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set experiment name and create project\n", - "Choose a name for your run history container in the workspace, and create a folder for the project." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from os import path, makedirs\n", - "experiment_name = 'tensorboard-demo'\n", - "\n", - "# experiment folder\n", - "exp_dir = './sample_projects/' + experiment_name\n", - "\n", - "if not path.exists(exp_dir):\n", - " makedirs(exp_dir)\n", - "\n", - "# runs we started in this session, for the finale\n", - "runs = []" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download Tensorflow Tensorboard demo code\n", - "\n", - "Tensorflow's repository has an MNIST demo with extensive Tensorboard instrumentation. We'll use it here for our purposes.\n", - "\n", - "Note that we don't need to make any code changes at all - the code works without modification from the Tensorflow repository." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import os\n", - "\n", - "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", - "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", - " file.write(tf_code.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure and run locally\n", - "\n", - "We'll start by running this locally. While it might not initially seem that useful to use this for a local run - why not just run TB against the files generated locally? - even in this case there is some value to using this feature. Your local run will be registered in the run history, and your Tensorboard logs will be uploaded to the artifact store associated with this run. Later, you'll be able to restore the logs from any run, regardless of where it happened.\n", - "\n", - "Note that for this run, you will need to install Tensorflow on your local machine by yourself. Further, the Tensorboard module (that is, the one included with Tensorflow) must be accessible to this notebook's kernel, as the local machine is what runs Tensorboard." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "\n", - "# Create a run configuration.\n", - "run_config = RunConfiguration()\n", - "run_config.environment.python.user_managed_dependencies = True\n", - "\n", - "# You can choose a specific Python environment by pointing to a Python path \n", - "#run_config.environment.python.interpreter_path = '/home/ninghai/miniconda3/envs/sdk2/bin/python'" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "from azureml.core.script_run_config import ScriptRunConfig\n", - "\n", - "logs_dir = os.path.join(os.curdir, \"logs\")\n", - "data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n", - "\n", - "if not path.exists(data_dir):\n", - " makedirs(data_dir)\n", - "\n", - "os.environ[\"TEST_TMPDIR\"] = data_dir\n", - "\n", - "# Writing logs to ./logs results in their being uploaded to Artifact Service,\n", - "# and thus, made accessible to our Tensorboard instance.\n", - "arguments_list = [\"--log_dir\", logs_dir]\n", - "\n", - "# Create an experiment\n", - "exp = Experiment(ws, experiment_name)\n", - "\n", - "# If you would like the run to go for longer, add --max_steps 5000 to the arguments list:\n", - "# arguments_list += [\"--max_steps\", \"5000\"]\n", - "\n", - "script = ScriptRunConfig(exp_dir,\n", - " script=\"mnist_with_summaries.py\",\n", - " run_config=run_config,\n", - " arguments=arguments_list)\n", - "\n", - "run = exp.submit(script)\n", - "# You can also wait for the run to complete\n", - "# run.wait_for_completion(show_output=True)\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard\n", - "\n", - "Now, while the run is in progress, we just need to start Tensorboard with the run as its target, and it will begin streaming logs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "tensorboard-sample" - ] - }, - "outputs": [], - "source": [ - "from azureml.tensorboard import Tensorboard\n", - "\n", - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Now, with a DSVM\n", - "\n", - "Tensorboard uploading works with all compute targets. Here we demonstrate it from a DSVM.\n", - "Note that the Tensorboard instance itself will be run by the notebook kernel. Again, this means this notebook's kernel must have access to the Tensorboard module.\n", - "\n", - "If you are unfamiliar with DSVM configuration, check [Train in a remote VM](../../training/train-on-remote-vm/train-on-remote-vm.ipynb) for a more detailed breakdown.\n", - "\n", - "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The `DSVMCompute` class will be deprecated in a later release, but the DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n", - "\n", - "```shell\n", - "# create a DSVM in your resource group\n", - "# note you need to be at least a contributor to the resource group in order to execute this command successfully.\n", - "(myenv) $ az vm create --resource-group --name --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username --admin-password --generate-ssh-keys --authentication-type password\n", - "```\n", - "You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, RemoteCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "username = os.getenv('AZUREML_DSVM_USERNAME', default='')\n", - "address = os.getenv('AZUREML_DSVM_ADDRESS', default='')\n", - "\n", - "compute_target_name = 'cpudsvm'\n", - "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n", - "try:\n", - " attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n", - " print('found existing:', attached_dsvm_compute.name)\n", - "except ComputeTargetException:\n", - " config = RemoteCompute.attach_configuration(username=username,\n", - " address=address,\n", - " ssh_port=22,\n", - " private_key_file='./.ssh/id_rsa')\n", - " attached_dsvm_compute = ComputeTarget.attach(ws, compute_target_name, config)\n", - " \n", - " attached_dsvm_compute.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Instead of manually configuring the DSVM environment, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=attached_dsvm_compute,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Just like before." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Once more, with an AmlCompute cluster\n", - "\n", - "Just to prove we can, let's create an AmlCompute CPU cluster, and run our demo there, as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"cpucluster\"\n", - "\n", - "cts = ws.compute_targets\n", - "found = False\n", - "if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[cluster_name]\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - "compute_target.wait_for_completion(show_output=True, min_node_count=None)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "# print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit run using TensorFlow estimator\n", - "\n", - "Again, we can use the TensorFlow estimator and everything is set up automatically." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "script_params = {\"--log_dir\": \"./logs\"}\n", - "\n", - "# If you want the run to go longer, set --max-steps to a higher number.\n", - "# script_params[\"--max_steps\"] = \"5000\"\n", - "\n", - "tf_estimator = TensorFlow(source_directory=exp_dir,\n", - " compute_target=compute_target,\n", - " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", - "\n", - "run = exp.submit(tf_estimator)\n", - "\n", - "runs.append(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Start Tensorboard with this run\n", - "\n", - "Once more..." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs, so be sure and pass it in as a single-element array here\n", - "tb = Tensorboard([run])\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "When you're done, make sure to call the `stop()` method of the Tensorboard object, or it will stay running even after your job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Finale\n", - "\n", - "If you've paid close attention, you'll have noticed that we've been saving the run objects in an array as we went along. We can start a Tensorboard instance that combines all of these run objects into a single process. This way, you can compare historical runs. You can even do this with live runs; if you made some of those previous runs longer via the `--max_steps` parameter, they might still be running, and you'll see them live in this instance as well." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# The Tensorboard constructor takes an array of runs...\n", - "# and it turns out that we have been building one of those all along.\n", - "tb = Tensorboard(runs)\n", - "\n", - "# If successful, start() returns a string with the URI of the instance.\n", - "tb.start()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Stop Tensorboard\n", - "\n", - "As you might already know, make sure to call the `stop()` method of the Tensorboard object, or it will stay running (until you kill the kernel associated with this notebook, at least)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "tb.stop()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.yml b/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.yml deleted file mode 100644 index de683457..00000000 --- a/how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.yml +++ /dev/null @@ -1,6 +0,0 @@ -name: tensorboard -dependencies: -- pip: - - azureml-sdk - - azureml-tensorboard - - tensorflow diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist.py deleted file mode 100644 index df2d6a6e..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist.py +++ /dev/null @@ -1,139 +0,0 @@ - -import argparse -import os - -import numpy as np - -import chainer -from chainer import backend -from chainer import backends -from chainer.backends import cuda -from chainer import Function, gradient_check, report, training, utils, Variable -from chainer import datasets, iterators, optimizers, serializers -from chainer import Link, Chain, ChainList -import chainer.functions as F -import chainer.links as L -from chainer.training import extensions -from chainer.dataset import concat_examples -from chainer.backends.cuda import to_cpu - -from azureml.core.run import Run -run = Run.get_context() - - -class MyNetwork(Chain): - - def __init__(self, n_mid_units=100, n_out=10): - super(MyNetwork, self).__init__() - with self.init_scope(): - self.l1 = L.Linear(None, n_mid_units) - self.l2 = L.Linear(n_mid_units, n_mid_units) - self.l3 = L.Linear(n_mid_units, n_out) - - def forward(self, x): - h = F.relu(self.l1(x)) - h = F.relu(self.l2(h)) - return self.l3(h) - - -def main(): - parser = argparse.ArgumentParser(description='Chainer example: MNIST') - parser.add_argument('--batchsize', '-b', type=int, default=100, - help='Number of images in each mini-batch') - parser.add_argument('--epochs', '-e', type=int, default=20, - help='Number of sweeps over the dataset to train') - parser.add_argument('--output_dir', '-o', default='./outputs', - help='Directory to output the result') - parser.add_argument('--gpu_id', '-g', default=0, - help='ID of the GPU to be used. Set to -1 if you use CPU') - args = parser.parse_args() - - # Download the MNIST data if you haven't downloaded it yet - train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1) - - gpu_id = args.gpu_id - batchsize = args.batchsize - epochs = args.epochs - run.log('Batch size', np.int(batchsize)) - run.log('Epochs', np.int(epochs)) - - train_iter = iterators.SerialIterator(train, batchsize) - test_iter = iterators.SerialIterator(test, batchsize, - repeat=False, shuffle=False) - - model = MyNetwork() - - if gpu_id >= 0: - # Make a specified GPU current - chainer.backends.cuda.get_device_from_id(0).use() - model.to_gpu() # Copy the model to the GPU - - # Choose an optimizer algorithm - optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9) - - # Give the optimizer a reference to the model so that it - # can locate the model's parameters. - optimizer.setup(model) - - while train_iter.epoch < epochs: - # ---------- One iteration of the training loop ---------- - train_batch = train_iter.next() - image_train, target_train = concat_examples(train_batch, gpu_id) - - # Calculate the prediction of the network - prediction_train = model(image_train) - - # Calculate the loss with softmax_cross_entropy - loss = F.softmax_cross_entropy(prediction_train, target_train) - - # Calculate the gradients in the network - model.cleargrads() - loss.backward() - - # Update all the trainable parameters - optimizer.update() - # --------------------- until here --------------------- - - # Check the validation accuracy of prediction after every epoch - if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch - - # Display the training loss - print('epoch:{:02d} train_loss:{:.04f} '.format( - train_iter.epoch, float(to_cpu(loss.array))), end='') - - test_losses = [] - test_accuracies = [] - while True: - test_batch = test_iter.next() - image_test, target_test = concat_examples(test_batch, gpu_id) - - # Forward the test data - prediction_test = model(image_test) - - # Calculate the loss - loss_test = F.softmax_cross_entropy(prediction_test, target_test) - test_losses.append(to_cpu(loss_test.array)) - - # Calculate the accuracy - accuracy = F.accuracy(prediction_test, target_test) - accuracy.to_cpu() - test_accuracies.append(accuracy.array) - - if test_iter.is_new_epoch: - test_iter.epoch = 0 - test_iter.current_position = 0 - test_iter.is_new_epoch = False - test_iter._pushed_position = None - break - - val_accuracy = np.mean(test_accuracies) - print('val_loss:{:.04f} val_accuracy:{:.04f}'.format( - np.mean(test_losses), val_accuracy)) - - run.log("Accuracy", np.float(val_accuracy)) - - serializers.save_npz(os.path.join(args.output_dir, 'model.npz'), model) - - -if __name__ == '__main__': - main() diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist_hd.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist_hd.py deleted file mode 100644 index 46d43588..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist_hd.py +++ /dev/null @@ -1,134 +0,0 @@ - -import argparse - -import numpy as np - -import chainer -from chainer import backend -from chainer import backends -from chainer.backends import cuda -from chainer import Function, gradient_check, report, training, utils, Variable -from chainer import datasets, iterators, optimizers, serializers -from chainer import Link, Chain, ChainList -import chainer.functions as F -import chainer.links as L -from chainer.training import extensions -from chainer.dataset import concat_examples -from chainer.backends.cuda import to_cpu - -from azureml.core.run import Run -run = Run.get_context() - - -class MyNetwork(Chain): - - def __init__(self, n_mid_units=100, n_out=10): - super(MyNetwork, self).__init__() - with self.init_scope(): - self.l1 = L.Linear(None, n_mid_units) - self.l2 = L.Linear(n_mid_units, n_mid_units) - self.l3 = L.Linear(n_mid_units, n_out) - - def forward(self, x): - h = F.relu(self.l1(x)) - h = F.relu(self.l2(h)) - return self.l3(h) - - -def main(): - parser = argparse.ArgumentParser(description='Chainer example: MNIST') - parser.add_argument('--batchsize', '-b', type=int, default=100, - help='Number of images in each mini-batch') - parser.add_argument('--epochs', '-e', type=int, default=20, - help='Number of sweeps over the dataset to train') - parser.add_argument('--output_dir', '-o', default='./outputs', - help='Directory to output the result') - args = parser.parse_args() - - # Download the MNIST data if you haven't downloaded it yet - train, test = datasets.mnist.get_mnist(withlabel=True, ndim=1) - - batchsize = args.batchsize - epochs = args.epochs - run.log('Batch size', np.int(batchsize)) - run.log('Epochs', np.int(epochs)) - - train_iter = iterators.SerialIterator(train, batchsize) - test_iter = iterators.SerialIterator(test, batchsize, - repeat=False, shuffle=False) - - model = MyNetwork() - - gpu_id = -1 # Set to -1 if you use CPU - if gpu_id >= 0: - # Make a specified GPU current - chainer.backends.cuda.get_device_from_id(0).use() - model.to_gpu() # Copy the model to the GPU - - # Choose an optimizer algorithm - optimizer = optimizers.MomentumSGD(lr=0.01, momentum=0.9) - - # Give the optimizer a reference to the model so that it - # can locate the model's parameters. - optimizer.setup(model) - - while train_iter.epoch < epochs: - # ---------- One iteration of the training loop ---------- - train_batch = train_iter.next() - image_train, target_train = concat_examples(train_batch, gpu_id) - - # Calculate the prediction of the network - prediction_train = model(image_train) - - # Calculate the loss with softmax_cross_entropy - loss = F.softmax_cross_entropy(prediction_train, target_train) - - # Calculate the gradients in the network - model.cleargrads() - loss.backward() - - # Update all the trainable parameters - optimizer.update() - # --------------------- until here --------------------- - - # Check the validation accuracy of prediction after every epoch - if train_iter.is_new_epoch: # If this iteration is the final iteration of the current epoch - - # Display the training loss - print('epoch:{:02d} train_loss:{:.04f} '.format( - train_iter.epoch, float(to_cpu(loss.array))), end='') - - test_losses = [] - test_accuracies = [] - while True: - test_batch = test_iter.next() - image_test, target_test = concat_examples(test_batch, gpu_id) - - # Forward the test data - prediction_test = model(image_test) - - # Calculate the loss - loss_test = F.softmax_cross_entropy(prediction_test, target_test) - test_losses.append(to_cpu(loss_test.array)) - - # Calculate the accuracy - accuracy = F.accuracy(prediction_test, target_test) - accuracy.to_cpu() - test_accuracies.append(accuracy.array) - - if test_iter.is_new_epoch: - test_iter.epoch = 0 - test_iter.current_position = 0 - test_iter.is_new_epoch = False - test_iter._pushed_position = None - break - - val_accuracy = np.mean(test_accuracies) - print('val_loss:{:.04f} val_accuracy:{:.04f}'.format( - np.mean(test_losses), val_accuracy)) - - run.log("Accuracy", np.float(val_accuracy)) - - -if __name__ == '__main__': - main() diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_score.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_score.py deleted file mode 100644 index f6ec3a6c..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_score.py +++ /dev/null @@ -1,45 +0,0 @@ -import numpy as np -import os -import json - -from chainer import serializers, using_config, Variable, datasets -import chainer.functions as F -import chainer.links as L -from chainer import Chain - -from azureml.core.model import Model - - -class MyNetwork(Chain): - - def __init__(self, n_mid_units=100, n_out=10): - super(MyNetwork, self).__init__() - with self.init_scope(): - self.l1 = L.Linear(None, n_mid_units) - self.l2 = L.Linear(n_mid_units, n_mid_units) - self.l3 = L.Linear(n_mid_units, n_out) - - def forward(self, x): - h = F.relu(self.l1(x)) - h = F.relu(self.l2(h)) - return self.l3(h) - - -def init(): - global model - - model_root = Model.get_model_path('chainer-dnn-mnist') - - # Load our saved artifacts - model = MyNetwork() - serializers.load_npz(model_root, model) - - -def run(input_data): - i = np.array(json.loads(input_data)['data']) - - _, test = datasets.get_mnist() - x = Variable(np.asarray([test[i][0]])) - y = model(x) - - return np.ndarray.tolist(y.data.argmax(axis=1)) diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb deleted file mode 100644 index 85fd5f53..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb +++ /dev/null @@ -1,779 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train and hyperparameter tune with Chainer\n", - "\n", - "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a Convolutional Neural Network (CNN) on a single-node GPU with Chainer to perform handwritten digit recognition on the popular MNIST dataset. We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!jupyter nbextension install --py --user azureml.widgets\n", - "!jupyter nbextension enable --py --user azureml.widgets" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " min_nodes=2,\n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './chainer-mnist'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `chainer_mnist.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n", - "\n", - "In `chainer_mnist.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "Further within `chainer_mnist.py`, we log the batchsize and epochs parameters, and the highest accuracy the model achieves:\n", - "```Python\n", - "run.log('Batch size', np.int(args.batchsize))\n", - "run.log('Epochs', np.int(args.epochs))\n", - "\n", - "run.log('Accuracy', np.float(val_accuracy))\n", - "```\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `chainer_mnist.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('chainer_mnist.py', project_folder)\n", - "shutil.copy('chainer_score.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Chainer tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'chainer-mnist'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Chainer estimator\n", - "The Azure ML SDK's Chainer estimator enables you to easily submit Chainer training jobs for both single-node and distributed runs. The following code will define a single-node Chainer job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "dnn-chainer-remarks-sample" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.dnn import Chainer\n", - "\n", - "script_params = {\n", - " '--epochs': 10,\n", - " '--batchsize': 128,\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = Chainer(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " pip_packages=['numpy', 'pytest'],\n", - " entry_script='chainer_mnist.py',\n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to get more details of your run\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters\n", - "Now that we've seen how to do a simple Chainer training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep\n", - "First, we will define the hyperparameter space to sweep over. Let's tune the batch size and epochs parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, accuracy.\n", - "\n", - "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `Accuracy` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `3` epochs (`delay_evaluation=3`).\n", - "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n", - "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", - "from azureml.train.hyperdrive.policy import BanditPolicy\n", - "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", - "from azureml.train.hyperdrive.parameter_expressions import choice\n", - " \n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " \"--batchsize\": choice(128, 256),\n", - " \"--epochs\": choice(5, 10, 20, 40)\n", - " }\n", - ")\n", - "\n", - "hyperdrive_config = HyperDriveConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " primary_metric_name='Accuracy',\n", - " policy=BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3),\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=8,\n", - " max_concurrent_runs=4)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor HyperDrive runs\n", - "You can monitor the progress of the runs with the following Jupyter widget. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hyperdrive_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find and register best model\n", - "When all jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hyperdrive_run.get_best_run_by_primary_metric()\n", - "print(best_run.get_details()['runDefinition']['arguments'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `chainer-dnn-mnist` under the workspace for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='chainer-dnn-mnist', model_path='outputs/model.npz')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model in ACI\n", - "Now, we are ready to deploy the model as a web service running in Azure Container Instance, [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", - "\n", - "### Create scoring script\n", - "First, we will create a scoring script that will be invoked by the web service call.\n", - "+ Now that the scoring script must have two required functions, `init()` and `run(input_data)`.\n", - " + In `init()`, you typically load the model into a global object. This function is executed only once when the Docker contianer is started.\n", - " + In `run(input_data)`, the model is used to predict a value based on the input data. The input and output to `run` uses NPZ as the serialization and de-serialization format because it is the preferred format for Chainer, but you are not limited to it.\n", - " \n", - "Refer to the scoring script `chainer_score.py` for this tutorial. Our web service will use this file to predict. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "shutil.copy('chainer_score.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `numpy` and `chainer`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import CondaDependencies\n", - "\n", - "cd = CondaDependencies.create()\n", - "cd.add_conda_package('numpy')\n", - "cd.add_conda_package('chainer')\n", - "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", - "\n", - "print(cd.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigabytes of RAM needed for your ACI container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n", - " auth_enabled=True, # this flag generates API keys to secure access\n", - " memory_gb=1,\n", - " tags={'name': 'mnist', 'framework': 'Chainer'},\n", - " description='Chainer DNN with MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Deployment Process**\n", - "\n", - "Now we can deploy. **This cell will run for about 7-8 minutes.** Behind the scenes, it will do the following:\n", - "\n", - "1. **Build Docker image**\n", - "Build a Docker image using the scoring file (chainer_score.py), the environment file (myenv.yml), and the model object.\n", - "2. **Register image**\n", - "Register that image under the workspace.\n", - "3. **Ship to ACI**\n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"chainer_score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='chainer-mnist-1',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.get_logs())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** `print(service.get_logs())`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is the scoring web service endpoint: `print(service.scoring_uri)`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the deployed model\n", - "Let's test the deployed model. Pick a random sample from the test set, and send it to the web service hosted in ACI for a prediction. Note, here we are using the an HTTP request to invoke the service.\n", - "\n", - "We can retrieve the API keys used for accessing the HTTP endpoint and construct a raw HTTP request to send to the service. Don't forget to add key to the HTTP header." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# retreive the API keys. two keys were generated.\n", - "key1, Key2 = service.get_keys()\n", - "print(key1)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import matplotlib.pyplot as plt\n", - "import urllib\n", - "import gzip\n", - "import numpy as np\n", - "import struct\n", - "import requests\n", - "\n", - "\n", - "# load compressed MNIST gz files and return numpy arrays\n", - "def load_data(filename, label=False):\n", - " with gzip.open(filename) as gz:\n", - " struct.unpack('I', gz.read(4))\n", - " n_items = struct.unpack('>I', gz.read(4))\n", - " if not label:\n", - " n_rows = struct.unpack('>I', gz.read(4))[0]\n", - " n_cols = struct.unpack('>I', gz.read(4))[0]\n", - " res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)\n", - " res = res.reshape(n_items[0], n_rows * n_cols)\n", - " else:\n", - " res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)\n", - " res = res.reshape(n_items[0], 1)\n", - " return res\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False)\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", - "\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(random_index) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", - "\n", - "# send sample to service for scoring\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text[1])\n", - "\n", - "plt.imshow(X_test[random_index].reshape((28,28)), cmap='gray')\n", - "plt.axis('off')\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the workspace after the web service was deployed. You should see\n", - "\n", - " + a registered model named 'chainer-dnn-mnist' and with the id 'chainer-dnn-mnist:1'\n", - " + an image called 'chainer-mnist-svc' and with a docker image location pointing to your workspace's Azure Container Registry (ACR)\n", - " + a webservice called 'chainer-mnist-svc' with some scoring URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", - " \n", - "images = ws.images\n", - "for name, image in images.items():\n", - " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", - " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can delete the ACI deployment with a simple delete API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "dipeck" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "dipeck" - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.yml b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.yml deleted file mode 100644 index 6024bba0..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.yml +++ /dev/null @@ -1,12 +0,0 @@ -name: train-hyperparameter-tune-deploy-with-chainer -dependencies: -- pip: - - azureml-sdk - - azureml-widgets - - numpy - - matplotlib - - json - - urllib - - gzip - - struct - - requests diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb index 45ae4330..fee5fad7 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb @@ -1160,7 +1160,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" - } + }, + "friendly_name": "Train a DNN using hyperparameter tuning and deploying with Keras", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Create a multi-class classifier", + "datasets": [ + "MNIST" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "Azure Container Instance" + ], + "framework": [ + "TensorFlow" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.yml b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.yml index b57c079e..62c4b3b7 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.yml +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.yml @@ -6,3 +6,4 @@ dependencies: - azureml-sdk - azureml-widgets - keras + - pandas diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_score.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_score.py deleted file mode 100644 index 5df2d8dc..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_score.py +++ /dev/null @@ -1,31 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -import torch -import torch.nn as nn -from torchvision import transforms -import json - -from azureml.core.model import Model - - -def init(): - global model - model_path = Model.get_model_path('pytorch-birds') - model = torch.load(model_path, map_location=lambda storage, loc: storage) - model.eval() - - -def run(input_data): - input_data = torch.tensor(json.loads(input_data)['data']) - - # get prediction - with torch.no_grad(): - output = model(input_data) - classes = ['chicken', 'turkey'] - softmax = nn.Softmax(dim=1) - pred_probs = softmax(output).numpy()[0] - index = torch.argmax(output, 1) - - result = {"label": classes[index], "probability": str(pred_probs[index])} - return result diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_train.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_train.py deleted file mode 100644 index 733c9a22..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/pytorch_train.py +++ /dev/null @@ -1,206 +0,0 @@ -# Copyright (c) 2017, PyTorch contributors -# Modifications copyright (C) Microsoft Corporation -# Licensed under the BSD license -# Adapted from https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html - -from __future__ import print_function, division -import torch -import torch.nn as nn -import torch.optim as optim -from torch.optim import lr_scheduler -from torchvision import datasets, models, transforms -import numpy as np -import time -import os -import copy -import argparse - -from azureml.core.run import Run -# get the Azure ML run object -run = Run.get_context() - - -def load_data(data_dir): - """Load the train/val data.""" - - # Data augmentation and normalization for training - # Just normalization for validation - data_transforms = { - 'train': transforms.Compose([ - transforms.RandomResizedCrop(224), - transforms.RandomHorizontalFlip(), - transforms.ToTensor(), - transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) - ]), - 'val': transforms.Compose([ - transforms.Resize(256), - transforms.CenterCrop(224), - transforms.ToTensor(), - transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) - ]), - } - - image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), - data_transforms[x]) - for x in ['train', 'val']} - dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4, - shuffle=True, num_workers=4) - for x in ['train', 'val']} - dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']} - class_names = image_datasets['train'].classes - - return dataloaders, dataset_sizes, class_names - - -def train_model(model, criterion, optimizer, scheduler, num_epochs, data_dir): - """Train the model.""" - - # load training/validation data - dataloaders, dataset_sizes, class_names = load_data(data_dir) - - device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') - - since = time.time() - - best_model_wts = copy.deepcopy(model.state_dict()) - best_acc = 0.0 - - for epoch in range(num_epochs): - print('Epoch {}/{}'.format(epoch, num_epochs - 1)) - print('-' * 10) - - # Each epoch has a training and validation phase - for phase in ['train', 'val']: - if phase == 'train': - scheduler.step() - model.train() # Set model to training mode - else: - model.eval() # Set model to evaluate mode - - running_loss = 0.0 - running_corrects = 0 - - # Iterate over data. - for inputs, labels in dataloaders[phase]: - inputs = inputs.to(device) - labels = labels.to(device) - - # zero the parameter gradients - optimizer.zero_grad() - - # forward - # track history if only in train - with torch.set_grad_enabled(phase == 'train'): - outputs = model(inputs) - _, preds = torch.max(outputs, 1) - loss = criterion(outputs, labels) - - # backward + optimize only if in training phase - if phase == 'train': - loss.backward() - optimizer.step() - - # statistics - running_loss += loss.item() * inputs.size(0) - running_corrects += torch.sum(preds == labels.data) - - epoch_loss = running_loss / dataset_sizes[phase] - epoch_acc = running_corrects.double() / dataset_sizes[phase] - - print('{} Loss: {:.4f} Acc: {:.4f}'.format( - phase, epoch_loss, epoch_acc)) - - # deep copy the model - if phase == 'val' and epoch_acc > best_acc: - best_acc = epoch_acc - best_model_wts = copy.deepcopy(model.state_dict()) - - # log the best val accuracy to AML run - run.log('best_val_acc', np.float(best_acc)) - - print() - - time_elapsed = time.time() - since - print('Training complete in {:.0f}m {:.0f}s'.format( - time_elapsed // 60, time_elapsed % 60)) - print('Best val Acc: {:4f}'.format(best_acc)) - - # load best model weights - model.load_state_dict(best_model_wts) - return model - - -def fine_tune_model(num_epochs, data_dir, learning_rate, momentum): - """Load a pretrained model and reset the final fully connected layer.""" - - # log the hyperparameter metrics to the AML run - run.log('lr', np.float(learning_rate)) - run.log('momentum', np.float(momentum)) - - model_ft = models.resnet18(pretrained=True) - num_ftrs = model_ft.fc.in_features - model_ft.fc = nn.Linear(num_ftrs, 2) # only 2 classes to predict - - device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') - model_ft = model_ft.to(device) - - criterion = nn.CrossEntropyLoss() - - # Observe that all parameters are being optimized - optimizer_ft = optim.SGD(model_ft.parameters(), - lr=learning_rate, momentum=momentum) - - # Decay LR by a factor of 0.1 every 7 epochs - exp_lr_scheduler = lr_scheduler.StepLR( - optimizer_ft, step_size=7, gamma=0.1) - - model = train_model(model_ft, criterion, optimizer_ft, - exp_lr_scheduler, num_epochs, data_dir) - - return model - - -def download_data(): - """Download and extract the training data.""" - import urllib - from zipfile import ZipFile - # download data - data_file = './fowl_data.zip' - download_url = 'https://msdocsdatasets.blob.core.windows.net/pytorchfowl/fowl_data.zip' - urllib.request.urlretrieve(download_url, filename=data_file) - - # extract files - with ZipFile(data_file, 'r') as zip: - print('extracting files...') - zip.extractall() - print('finished extracting') - data_dir = zip.namelist()[0] - - # delete zip file - os.remove(data_file) - return data_dir - - -def main(): - print("Torch version:", torch.__version__) - - # get command-line arguments - parser = argparse.ArgumentParser() - parser.add_argument('--num_epochs', type=int, default=25, - help='number of epochs to train') - parser.add_argument('--output_dir', type=str, help='output directory') - parser.add_argument('--learning_rate', type=float, - default=0.001, help='learning rate') - parser.add_argument('--momentum', type=float, default=0.9, help='momentum') - args = parser.parse_args() - - data_dir = download_data() - print("data directory is: " + data_dir) - model = fine_tune_model(args.num_epochs, data_dir, - args.learning_rate, args.momentum) - os.makedirs(args.output_dir, exist_ok=True) - torch.save(model, os.path.join(args.output_dir, 'model.pt')) - - -if __name__ == "__main__": - main() diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/test_img.jpg b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/test_img.jpg deleted file mode 100644 index f2878b48..00000000 Binary files a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/test_img.jpg and /dev/null differ diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb deleted file mode 100644 index 2a73ef9e..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb +++ /dev/null @@ -1,750 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train, hyperparameter tune, and deploy with PyTorch\n", - "\n", - "In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (Azure ML) Python SDK.\n", - "\n", - "This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify chickens and turkeys by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML `Workspace`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute\n", - "Now that you have your data and training script prepared, you are ready to train on your remote compute cluster. You can take advantage of Azure compute to leverage GPUs to cut down your training time. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './pytorch-birds'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download training data\n", - "The dataset we will use (located on a public blob [here](https://msdocsdatasets.blob.core.windows.net/pytorchfowl/fowl_data.zip) as a zip file) consists of about 120 training images each for turkeys and chickens, with 100 validation images for each class. The images are a subset of the [Open Images v5 Dataset](https://storage.googleapis.com/openimages/web/index.html). We will download and extract the dataset as part of our training script `pytorch_train.py`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script\n", - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `pytorch_train.py`. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script. \n", - "\n", - "In `pytorch_train.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML `Run` object within the script:\n", - "```Python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "Further within `pytorch_train.py`, we log the learning rate and momentum parameters, and the best validation accuracy the model achieves:\n", - "```Python\n", - "run.log('lr', np.float(learning_rate))\n", - "run.log('momentum', np.float(momentum))\n", - "\n", - "run.log('best_val_acc', np.float(best_acc))\n", - "```\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once your script is ready, copy the training script `pytorch_train.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('pytorch_train.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this transfer learning PyTorch tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'pytorch-birds'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a PyTorch estimator\n", - "The Azure ML SDK's PyTorch estimator enables you to easily submit PyTorch training jobs for both single-node and distributed runs. For more information on the PyTorch estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-pytorch). The following code will define a single-node PyTorch job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "dnn-pytorch-remarks-sample" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.dnn import PyTorch\n", - "\n", - "script_params = {\n", - " '--num_epochs': 30,\n", - " '--output_dir': './outputs'\n", - "}\n", - "\n", - "estimator = PyTorch(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='pytorch_train.py',\n", - " use_gpu=True,\n", - " pip_packages=['pillow==5.4.1'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n", - "- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `fowl_data` on our datastore.\n", - "- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by Azure ML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n", - "\n", - "To leverage the Azure VM's GPU for training, we set `use_gpu=True`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# to get more details of your run\n", - "print(run.get_details())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters\n", - "Now that we've seen how to do a simple PyTorch training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep\n", - "First, we will define the hyperparameter space to sweep over. Since our training script uses a learning rate schedule to decay the learning rate every several epochs, let's tune the initial learning rate and the momentum parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (`best_val_acc`).\n", - "\n", - "Then, we specify the early termination policy to use to early terminate poorly performing runs. Here we use the `BanditPolicy`, which will terminate any run that doesn't fall within the slack factor of our primary evaluation metric. In this tutorial, we will apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval=1`). Notice we will delay the first policy evaluation until after the first `10` epochs (`delay_evaluation=10`).\n", - "Refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-tune-hyperparameters#specify-an-early-termination-policy) for more information on the BanditPolicy and other policies available." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, uniform, PrimaryMetricGoal\n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " 'learning_rate': uniform(0.0005, 0.005),\n", - " 'momentum': uniform(0.9, 0.99)\n", - " }\n", - ")\n", - "\n", - "early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)\n", - "\n", - "hyperdrive_config = HyperDriveConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " policy=early_termination_policy,\n", - " primary_metric_name='best_val_acc',\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=8,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor HyperDrive runs\n", - "You can monitor the progress of the runs with the following Jupyter widget. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Or block until the HyperDrive sweep has completed:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hyperdrive_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find and register the best model\n", - "Once all the runs complete, we can find the run that produced the model with the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hyperdrive_run.get_best_run_by_primary_metric()\n", - "best_run_metrics = best_run.get_metrics()\n", - "print(best_run)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('Best Run is:\\n Validation accuracy: {0:.5f} \\n Learning rate: {1:.5f} \\n Momentum: {2:.5f}'.format(\n", - " best_run_metrics['best_val_acc'][-1],\n", - " best_run_metrics['lr'],\n", - " best_run_metrics['momentum'])\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, register the model from your best-performing run to your workspace. The `model_path` parameter takes in the relative path on the remote VM to the model file in your `outputs` directory. In the next section, we will deploy this registered model as a web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name = 'pytorch-birds', model_path = 'outputs/model.pt')\n", - "print(model.name, model.id, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy model as web service\n", - "Once you have your trained model, you can deploy the model on Azure. In this tutorial, we will deploy the model as a web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). For more information on deploying models using Azure ML, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-deploy-and-where)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create scoring script\n", - "\n", - "First, we will create a scoring script that will be invoked by the web service call. Note that the scoring script must have two required functions:\n", - "* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n", - "* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n", - "\n", - "Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is a chicken or a turkey. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create environment file\n", - "Then, we will need to create an environment file (`myenv.yml`) that specifies all of the scoring script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image by Azure ML. In this case, we need to specify `azureml-core`, `torch` and `torchvision`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(pip_packages=['azureml-defaults', 'torch', 'torchvision'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())\n", - " \n", - "print(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure the container image\n", - "Now configure the Docker image that you will use to build your ACI container." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n", - " runtime='python', \n", - " conda_file='myenv.yml',\n", - " description='Image with bird model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure the ACI container\n", - "We are almost ready to deploy. Create a deployment configuration file to specify the number of CPUs and gigabytes of RAM needed for your ACI container. While it depends on your model, the default of `1` core and `1` gigabyte of RAM is usually sufficient for many models." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'data': 'birds', 'method':'transfer learning', 'framework':'pytorch'},\n", - " description='Classify turkey/chickens using transfer learning with PyTorch')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy the registered model\n", - "Finally, let's deploy a web service from our registered model. Deploy the web service using the ACI config and image config files created in the previous steps. We pass the `model` object in a list to the `models` parameter. If you would like to deploy more than one registered model, append the additional models to this list." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service_name = 'aci-birds'\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name=service_name,\n", - " models=[model],\n", - " image_config=image_config,\n", - " deployment_config=aciconfig,)\n", - "\n", - "service.wait_for_deployment(show_output=True)\n", - "print(service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If your deployment fails for any reason and you need to redeploy, make sure to delete the service before you do so: `service.delete()`" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.get_logs()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get the web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the web service\n", - "Finally, let's test our deployed web service. We will send the data as a JSON string to the web service hosted in ACI and use the SDK's `run` API to invoke the service. Here we will take an image from our validation data to predict on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "from PIL import Image\n", - "import matplotlib.pyplot as plt\n", - "\n", - "%matplotlib inline\n", - "plt.imshow(Image.open('test_img.jpg'))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import torch\n", - "from torchvision import transforms\n", - " \n", - "def preprocess(image_file):\n", - " \"\"\"Preprocess the input image.\"\"\"\n", - " data_transforms = transforms.Compose([\n", - " transforms.Resize(256),\n", - " transforms.CenterCrop(224),\n", - " transforms.ToTensor(),\n", - " transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n", - " ])\n", - "\n", - " image = Image.open(image_file)\n", - " image = data_transforms(image).float()\n", - " image = torch.tensor(image)\n", - " image = image.unsqueeze(0)\n", - " return image.numpy()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "input_data = preprocess('test_img.jpg')\n", - "result = service.run(input_data=json.dumps({'data': input_data.tolist()}))\n", - "print(result)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up\n", - "Once you no longer need the web service, you can delete it with a simple API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ninhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.yml b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.yml deleted file mode 100644 index 09f8d5a9..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.yml +++ /dev/null @@ -1,9 +0,0 @@ -name: train-hyperparameter-tune-deploy-with-pytorch -dependencies: -- pip: - - azureml-sdk - - azureml-widgets - - pillow==5.4.1 - - matplotlib - - https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl - - https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp35-cp35m-win_amd64.whl diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/nn.png b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/nn.png deleted file mode 100644 index 8910281e..00000000 Binary files a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/nn.png and /dev/null differ diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py deleted file mode 100644 index f5ab7099..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/tf_mnist.py +++ /dev/null @@ -1,106 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. - -import numpy as np -import argparse -import os -import tensorflow as tf - -from azureml.core import Run -from utils import load_data - -print("TensorFlow version:", tf.VERSION) - -parser = argparse.ArgumentParser() -parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point') -parser.add_argument('--batch-size', type=int, dest='batch_size', default=50, help='mini batch size for training') -parser.add_argument('--first-layer-neurons', type=int, dest='n_hidden_1', default=100, - help='# of neurons in the first layer') -parser.add_argument('--second-layer-neurons', type=int, dest='n_hidden_2', default=100, - help='# of neurons in the second layer') -parser.add_argument('--learning-rate', type=float, dest='learning_rate', default=0.01, help='learning rate') -args = parser.parse_args() - -data_folder = os.path.join(args.data_folder, 'mnist') - -print('training dataset is stored here:', data_folder) - -X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 -X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 - -y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) -y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) - -print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n') -training_set_size = X_train.shape[0] - -n_inputs = 28 * 28 -n_h1 = args.n_hidden_1 -n_h2 = args.n_hidden_2 -n_outputs = 10 -learning_rate = args.learning_rate -n_epochs = 20 -batch_size = args.batch_size - -with tf.name_scope('network'): - # construct the DNN - X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X') - y = tf.placeholder(tf.int64, shape=(None), name='y') - h1 = tf.layers.dense(X, n_h1, activation=tf.nn.relu, name='h1') - h2 = tf.layers.dense(h1, n_h2, activation=tf.nn.relu, name='h2') - output = tf.layers.dense(h2, n_outputs, name='output') - -with tf.name_scope('train'): - cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output) - loss = tf.reduce_mean(cross_entropy, name='loss') - optimizer = tf.train.GradientDescentOptimizer(learning_rate) - train_op = optimizer.minimize(loss) - -with tf.name_scope('eval'): - correct = tf.nn.in_top_k(output, y, 1) - acc_op = tf.reduce_mean(tf.cast(correct, tf.float32)) - -init = tf.global_variables_initializer() -saver = tf.train.Saver() - -# start an Azure ML run -run = Run.get_context() - -with tf.Session() as sess: - init.run() - for epoch in range(n_epochs): - - # randomly shuffle training set - indices = np.random.permutation(training_set_size) - X_train = X_train[indices] - y_train = y_train[indices] - - # batch index - b_start = 0 - b_end = b_start + batch_size - for _ in range(training_set_size // batch_size): - # get a batch - X_batch, y_batch = X_train[b_start: b_end], y_train[b_start: b_end] - - # update batch index for the next batch - b_start = b_start + batch_size - b_end = min(b_start + batch_size, training_set_size) - - # train - sess.run(train_op, feed_dict={X: X_batch, y: y_batch}) - # evaluate training set - acc_train = acc_op.eval(feed_dict={X: X_batch, y: y_batch}) - # evaluate validation set - acc_val = acc_op.eval(feed_dict={X: X_test, y: y_test}) - - # log accuracies - run.log('training_acc', np.float(acc_train)) - run.log('validation_acc', np.float(acc_val)) - print(epoch, '-- Training accuracy:', acc_train, '\b Validation accuracy:', acc_val) - y_hat = np.argmax(output.eval(feed_dict={X: X_test}), axis=1) - - run.log('final_acc', np.float(acc_val)) - - os.makedirs('./outputs/model', exist_ok=True) - # files saved in the "./outputs" folder are automatically uploaded into run history - saver.save(sess, './outputs/model/mnist-tf.model') diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb deleted file mode 100644 index 8af4e963..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ /dev/null @@ -1,1181 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "bf74d2e9-2708-49b1-934b-e0ede342f475" - } - }, - "source": [ - "# Training, hyperparameter tune, and deploy with TensorFlow\n", - "\n", - "## Introduction\n", - "This tutorial shows how to train a simple deep neural network using the MNIST dataset and TensorFlow on Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of `28x28` pixels, representing number from 0 to 9. The goal is to create a multi-class classifier to identify the digit each image represents, and deploy it as a web service in Azure.\n", - "\n", - "For more information about the MNIST dataset, please visit [Yan LeCun's website](http://yann.lecun.com/exdb/mnist/).\n", - "\n", - "## Prerequisite:\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", - "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's get started. First let's import some Python libraries." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "c377ea0c-0cd9-4345-9be2-e20fb29c94c3" - } - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import os\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "edaa7f2f-2439-4148-b57a-8c794c0945ec" - } - }, - "outputs": [], - "source": [ - "import azureml\n", - "from azureml.core import Workspace\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "59f52294-4a25-4c92-bab8-3b07f0f44d15" - } - }, - "source": [ - "## Create an Azure ML experiment\n", - "Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "bc70f780-c240-4779-96f3-bc5ef9a37d59" - } - }, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "script_folder = './tf-mnist'\n", - "os.makedirs(script_folder, exist_ok=True)\n", - "\n", - "exp = Experiment(workspace=ws, name='tf-mnist')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "defe921f-8097-44c3-8336-8af6700804a7" - } - }, - "source": [ - "## Download MNIST dataset\n", - "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "c3f2f57c-7454-4d3e-b38d-b0946cf066ea" - } - }, - "source": [ - "## Show some sample images\n", - "Let's load the downloaded compressed file into numpy arrays using some utility functions included in the `utils.py` library file from the current folder. Then we use `matplotlib` to plot 30 random images from the dataset along with their labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "nbpresent": { - "id": "396d478b-34aa-4afa-9898-cdce8222a516" - } - }, - "outputs": [], - "source": [ - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", - "\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload MNIST dataset to default datastore \n", - "A [datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. A datastore can either be backed by an Azure Blob Storage or and Azure File Share (ADLS will be supported in the future). For simple data handling, each workspace provides a default datastore that can be used, in case the data is not already in Blob Storage or File Share." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this next step, we will upload the training and test set into the workspace's default datastore, which we will then later be mount on an `AmlCompute` cluster for training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpu-cluster' of type `AmlCompute`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "compute_targets = ws.compute_targets\n", - "for name, ct in compute_targets.items():\n", - " print(name, ct.type, ct.provisioning_state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Copy the training files into the script folder\n", - "The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "# the training logic is in the tf_mnist.py file.\n", - "shutil.copy('./tf_mnist.py', script_folder)\n", - "\n", - "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", - "shutil.copy('./utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "nbpresent": { - "id": "2039d2d5-aca6-4f25-a12f-df9ae6529cae" - } - }, - "source": [ - "## Construct neural network in TensorFlow\n", - "In the training script `tf_mnist.py`, it creates a very simple DNN (deep neural network), with just 2 hidden layers. The input layer has 28 * 28 = 784 neurons, each representing a pixel in an image. The first hidden layer has 300 neurons, and the second hidden layer has 100 neurons. The output layer has 10 neurons, each representing a targeted label from 0 to 9.\n", - "\n", - "![DNN](nn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Azure ML concepts \n", - "Please note the following three things in the code below:\n", - "1. The script accepts arguments using the argparse package. In this case there is one argument `--data_folder` which specifies the file system folder in which the script can find the MNIST data\n", - "```\n", - " parser = argparse.ArgumentParser()\n", - " parser.add_argument('--data_folder')\n", - "```\n", - "2. The script is accessing the Azure ML `Run` object by executing `run = Run.get_context()`. Further down the script is using the `run` to report the training accuracy and the validation accuracy as training progresses.\n", - "```\n", - " run.log('training_acc', np.float(acc_train))\n", - " run.log('validation_acc', np.float(acc_val))\n", - "```\n", - "3. When running the script on Azure ML, you can write files out to a folder `./outputs` that is relative to the root directory. This folder is specially tracked by Azure ML in the sense that any files written to that folder during script execution on the remote target will be picked up by Run History; these files (known as artifacts) will be available as part of the run history record." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The next cell will print out the training code for you to inspect it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "with open(os.path.join(script_folder, './tf_mnist.py'), 'r') as f:\n", - " print(f.read())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create TensorFlow estimator\n", - "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object, use the Batch AI cluster as compute target, and pass the mount-point of the datastore to the training code as a parameter.\n", - "\n", - "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n", - "\n", - "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "dnn-tensorflow-remarks-sample" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params = {\n", - " '--data-folder': ws.get_default_datastore().as_mount(),\n", - " '--batch-size': 50,\n", - " '--first-layer-neurons': 300,\n", - " '--second-layer-neurons': 100,\n", - " '--learning-rate': 0.01\n", - "}\n", - "\n", - "est = TensorFlow(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True, \n", - " framework_version='1.13')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Submit job to run\n", - "Submit the estimator to an Azure ML experiment to kick off the execution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = exp.submit(est)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the Run \n", - "As the Run is executed, it will go through the following stages:\n", - "1. Preparing: A docker image is created matching the Python environment specified by the TensorFlow estimator and it will be uploaded to the workspace's Azure Container Registry. This step will only happen once for each Python environment -- the container will then be cached for subsequent runs. Creating and uploading the image takes about **5 minutes**. While the job is preparing, logs are streamed to the run history and can be viewed to monitor the progress of the image creation.\n", - "\n", - "2. Scaling: If the compute needs to be scaled up (i.e. the Batch AI cluster requires more nodes to execute the run than currently available), the cluster will attempt to scale up in order to make the required amount of nodes available. Scaling typically takes about **5 minutes**.\n", - "\n", - "3. Running: All scripts in the script folder are uploaded to the compute target, data stores are mounted/copied and the `entry_script` is executed. While the job is running, stdout and the `./logs` folder are streamed to the run history and can be viewed to monitor the progress of the run.\n", - "\n", - "4. Post-Processing: The `./outputs` folder of the run is copied over to the run history\n", - "\n", - "There are multiple ways to check the progress of a running job. We can use a Jupyter notebook widget. \n", - "\n", - "**Note: The widget will automatically update ever 10-15 seconds, always showing you the most up-to-date information about the run**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also periodically check the status of the run object, and navigate to Azure portal to monitor the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### The Run object \n", - "The Run object provides the interface to the run history -- both to the job and to the control plane (this notebook), and both while the job is running and after it has completed. It provides a number of interesting features for instance:\n", - "* `run.get_details()`: Provides a rich set of properties of the run\n", - "* `run.get_metrics()`: Provides a dictionary with all the metrics that were reported for the Run\n", - "* `run.get_file_names()`: List all the files that were uploaded to the run history for this Run. This will include the `outputs` and `logs` folder, azureml-logs and other logs, as well as files that were explicitly uploaded to the run using `run.upload_file()`\n", - "\n", - "Below are some examples -- please run through them and inspect their output. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_details()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_metrics()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Plot accuracy over epochs\n", - "Since we can retrieve the metrics from the run, we can easily make plots using `matplotlib` in the notebook. Then we can add the plotted image to the run using `run.log_image()`, so all information about the run is kept together." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "\n", - "os.makedirs('./imgs', exist_ok=True)\n", - "metrics = run.get_metrics()\n", - "\n", - "plt.figure(figsize = (13,5))\n", - "plt.plot(metrics['validation_acc'], 'r-', lw=4, alpha=.6)\n", - "plt.plot(metrics['training_acc'], 'b--', alpha=0.5)\n", - "plt.legend(['Full evaluation set', 'Training set mini-batch'])\n", - "plt.xlabel('epochs', fontsize=14)\n", - "plt.ylabel('accuracy', fontsize=14)\n", - "plt.title('Accuracy over Epochs', fontsize=16)\n", - "run.log_image(name='acc_over_epochs.png', plot=plt)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Download the saved model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the training script, a TensorFlow `saver` object is used to persist the model in a local folder (local to the compute target). The model was saved to the `./outputs` folder on the disk of the Batch AI cluster node where the job is run. Azure ML automatically uploaded anything written in the `./outputs` folder into run history file store. Subsequently, we can use the `Run` object to download the model files the `saver` object saved. They are under the the `outputs/model` folder in the run history file store, and are downloaded into a local folder named `model`. Note the TensorFlow model consists of four files in binary format and they are not human-readable." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a model folder in the current directory\n", - "os.makedirs('./model', exist_ok=True)\n", - "\n", - "for f in run.get_file_names():\n", - " if f.startswith('outputs/model'):\n", - " output_file_path = os.path.join('./model', f.split('/')[-1])\n", - " print('Downloading from {} to {} ...'.format(f, output_file_path))\n", - " run.download_file(name=f, output_file_path=output_file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Predict on the test set\n", - "Now load the saved TensorFlow graph, and list all operations under the `network` scope. This way we can discover the input tensor `network/X:0` and the output tensor `network/output/MatMul:0`, and use them in the scoring script in the next step.\n", - "\n", - "Note: if your local TensorFlow version is different than the version running in the cluster where the model is trained, you might see a \"compiletime version mismatch\" warning. You can ignore it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tensorflow as tf\n", - "\n", - "tf.reset_default_graph()\n", - "\n", - "saver = tf.train.import_meta_graph(\"./model/mnist-tf.model.meta\")\n", - "graph = tf.get_default_graph()\n", - "\n", - "for op in graph.get_operations():\n", - " if op.name.startswith('network'):\n", - " print(op.name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Feed test dataset to the persisted model to get predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# input tensor. this is an array of 784 elements, each representing the intensity of a pixel in the digit image.\n", - "X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - "# output tensor. this is an array of 10 elements, each representing the probability of predicted value of the digit.\n", - "output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - "\n", - "with tf.Session() as sess:\n", - " saver.restore(sess, './model/mnist-tf.model')\n", - " k = output.eval(feed_dict={X : X_test})\n", - "# get the prediction, which is the index of the element that has the largest probability value.\n", - "y_hat = np.argmax(k, axis=1)\n", - "\n", - "# print the first 30 labels and predictions\n", - "print('labels: \\t', y_test[:30])\n", - "print('predictions:\\t', y_hat[:30])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Calculate the overall accuracy by comparing the predicted value against the test set." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Accuracy on the test set:\", np.average(y_hat == y_test))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Intelligent hyperparameter tuning\n", - "We have trained the model with one set of hyperparameters, now let's how we can do hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n", - "from azureml.train.hyperdrive import choice, loguniform\n", - "\n", - "ps = RandomParameterSampling(\n", - " {\n", - " '--batch-size': choice(25, 50, 100),\n", - " '--first-layer-neurons': choice(10, 50, 200, 300, 500),\n", - " '--second-layer-neurons': choice(10, 50, 200, 500),\n", - " '--learning-rate': loguniform(-6, -1)\n", - " }\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we will create a new estimator without the above parameters since they will be passed in later. Note we still need to keep the `data-folder` parameter since that's not a hyperparamter we will sweep." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "est = TensorFlow(source_directory=script_folder,\n", - " script_params={'--data-folder': ws.get_default_datastore().as_mount()},\n", - " compute_target=compute_target,\n", - " entry_script='tf_mnist.py', \n", - " use_gpu=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we will define an early termnination policy. The `BanditPolicy` basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we are ready to configure a run configuration object, and specify the primary metric `validation_acc` that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htc = HyperDriveConfig(estimator=est, \n", - " hyperparameter_sampling=ps, \n", - " policy=policy, \n", - " primary_metric_name='validation_acc', \n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n", - " max_total_runs=8,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, let's launch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htr = exp.submit(config=htc)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can use a run history widget to show the progress. Be patient as this might take a while to complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(htr).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "htr.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Find and register best model \n", - "When all the jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = htr.get_best_run_by_primary_metric()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `tf-dnn-mnist` under the workspace for deployment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='tf-dnn-mnist', model_path='outputs/model')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model in ACI\n", - "Now we are ready to deploy the model as a web service running in Azure Container Instance [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n", - "### Create score.py\n", - "First, we will create a scoring script that will be invoked by the web service call. \n", - "\n", - "* Note that the scoring script must have two required functions, `init()` and `run(input_data)`. \n", - " * In `init()` function, you typically load the model into a global object. This function is executed only once when the Docker container is started. \n", - " * In `run(input_data)` function, the model is used to predict a value based on the input data. The input and output to `run` typically use JSON as serialization and de-serialization format but you are not limited to that." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import tensorflow as tf\n", - "\n", - "from azureml.core.model import Model\n", - "\n", - "def init():\n", - " global X, output, sess\n", - " tf.reset_default_graph()\n", - " model_root = Model.get_model_path('tf-dnn-mnist')\n", - " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", - " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", - " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - " \n", - " sess = tf.Session()\n", - " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " out = output.eval(session=sess, feed_dict={X: data})\n", - " y_hat = np.argmax(out, axis=1)\n", - " return y_hat.tolist()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify packages `numpy`, `tensorflow`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import CondaDependencies\n", - "\n", - "cd = CondaDependencies.create()\n", - "cd.add_conda_package('numpy')\n", - "cd.add_tensorflow_conda_package()\n", - "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", - "\n", - "print(cd.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy to ACI\n", - "We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigbyte of RAM needed for your ACI container. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={'name':'mnist', 'framework': 'TensorFlow DNN'},\n", - " description='Tensorflow DNN on MNIST')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Deployment Process\n", - "Now we can deploy. **This cell will run for about 7-8 minutes**. Behind the scene, it will do the following:\n", - "1. **Register model** \n", - "Take the local `model` folder (which contains our previously downloaded trained model files) and register it (and the files inside that folder) as a model named `model` under the workspace. Azure ML will register the model directory or model file(s) we specify to the `model_paths` parameter of the `Webservice.deploy` call.\n", - "2. **Build Docker image** \n", - "Build a Docker image using the scoring file (`score.py`), the environment file (`myenv.yml`), and the `model` folder containing the TensorFlow model files. \n", - "3. **Register image** \n", - "Register that image under the workspace. \n", - "4. **Ship to ACI** \n", - "And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.image import ContainerImage\n", - "\n", - "imgconfig = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "\n", - "service = Webservice.deploy_from_model(workspace=ws,\n", - " name='tf-mnist-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=imgconfig)\n", - "\n", - "service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:**" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.get_logs())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This is the scoring web service endpoint:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(service.scoring_uri)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test the deployed model\n", - "Let's test the deployed model. Pick 30 random samples from the test set, and send it to the web service hosted in ACI. Note here we are using the `run` API in the SDK to invoke the service. You can also make raw HTTP calls using any HTTP tool such as curl.\n", - "\n", - "After the invocation, we print the returned predictions and plot them along with the input images. Use red font color and inversed image (white on black) to highlight the misclassified samples. Note since the model accuracy is pretty high, you might have to run the below cell a few times before you can see a misclassified sample." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "# find 30 random samples from test set\n", - "n = 30\n", - "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", - "\n", - "test_samples = json.dumps({\"data\": X_test[sample_indices].tolist()})\n", - "test_samples = bytes(test_samples, encoding='utf8')\n", - "\n", - "# predict using the deployed model\n", - "result = service.run(input_data=test_samples)\n", - "\n", - "# compare actual value vs. the predicted values:\n", - "i = 0\n", - "plt.figure(figsize = (20, 1))\n", - "\n", - "for s in sample_indices:\n", - " plt.subplot(1, n, i + 1)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " \n", - " # use different color for misclassified sample\n", - " font_color = 'red' if y_test[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", - " \n", - " plt.text(x=10, y=-10, s=y_hat[s], fontsize=18, color=font_color)\n", - " plt.imshow(X_test[s].reshape(28, 28), cmap=clr_map)\n", - " \n", - " i = i + 1\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can also send raw HTTP request to the service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "# send a random row from the test set to score\n", - "random_index = np.random.randint(0, len(X_test)-1)\n", - "input_data = \"{\\\"data\\\": [\" + str(list(X_test[random_index])) + \"]}\"\n", - "\n", - "headers = {'Content-Type':'application/json'}\n", - "\n", - "resp = requests.post(service.scoring_uri, input_data, headers=headers)\n", - "\n", - "print(\"POST to url\", service.scoring_uri)\n", - "#print(\"input data:\", input_data)\n", - "print(\"label:\", y_test[random_index])\n", - "print(\"prediction:\", resp.text)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's look at the workspace after the web service was deployed. You should see \n", - "* a registered model named 'model' and with the id 'model:1'\n", - "* an image called 'tf-mnist' and with a docker image location pointing to your workspace's Azure Container Registry (ACR) \n", - "* a webservice called 'tf-mnist' with some scoring URL" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", - " \n", - "images = ws.images\n", - "for name, image in images.items():\n", - " print(\"Image: {}, location: {}\".format(name, image.image_location))\n", - " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean up\n", - "You can delete the ACI deployment with a simple delete API call." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "ninhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml deleted file mode 100644 index b7a72c28..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.yml +++ /dev/null @@ -1,8 +0,0 @@ -name: train-hyperparameter-tune-deploy-with-tensorflow -dependencies: -- numpy -- matplotlib -- tensorflow -- pip: - - azureml-sdk - - azureml-widgets diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/utils.py b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/utils.py deleted file mode 100644 index 98170ada..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/utils.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. - -import gzip -import numpy as np -import struct - - -# load compressed MNIST gz files and return numpy arrays -def load_data(filename, label=False): - with gzip.open(filename) as gz: - struct.unpack('I', gz.read(4)) - n_items = struct.unpack('>I', gz.read(4)) - if not label: - n_rows = struct.unpack('>I', gz.read(4))[0] - n_cols = struct.unpack('>I', gz.read(4))[0] - res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8) - res = res.reshape(n_items[0], n_rows * n_cols) - else: - res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8) - res = res.reshape(n_items[0], 1) - return res - - -# one-hot encode a 1-D array -def one_hot_encode(array, num_of_classes): - return np.eye(num_of_classes)[array.reshape(-1)] diff --git a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py b/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py deleted file mode 100644 index 85e80cbd..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/tf_mnist_with_checkpoint.py +++ /dev/null @@ -1,123 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. - -import numpy as np -import argparse -import os -import re -import tensorflow as tf - -from azureml.core import Run -from utils import load_data - -print("TensorFlow version:", tf.VERSION) - -parser = argparse.ArgumentParser() -parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point') - -parser.add_argument('--resume-from', type=str, default=None, - help='location of the model or checkpoint files from where to resume the training') -args = parser.parse_args() - - -previous_model_location = args.resume_from -# You can also use environment variable to get the model/checkpoint files location -# previous_model_location = os.path.expandvars(os.getenv("AZUREML_DATAREFERENCE_MODEL_LOCATION", None)) - -data_folder = os.path.join(args.data_folder, 'mnist') - -print('training dataset is stored here:', data_folder) - -X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0 -X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0 - -y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1) -y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1) - -print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n') -training_set_size = X_train.shape[0] - -n_inputs = 28 * 28 -n_h1 = 100 -n_h2 = 100 -n_outputs = 10 -learning_rate = 0.01 -n_epochs = 20 -batch_size = 50 - -with tf.name_scope('network'): - # construct the DNN - X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X') - y = tf.placeholder(tf.int64, shape=(None), name='y') - h1 = tf.layers.dense(X, n_h1, activation=tf.nn.relu, name='h1') - h2 = tf.layers.dense(h1, n_h2, activation=tf.nn.relu, name='h2') - output = tf.layers.dense(h2, n_outputs, name='output') - -with tf.name_scope('train'): - cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output) - loss = tf.reduce_mean(cross_entropy, name='loss') - optimizer = tf.train.GradientDescentOptimizer(learning_rate) - train_op = optimizer.minimize(loss) - -with tf.name_scope('eval'): - correct = tf.nn.in_top_k(output, y, 1) - acc_op = tf.reduce_mean(tf.cast(correct, tf.float32)) - -init = tf.global_variables_initializer() -saver = tf.train.Saver() - -# start an Azure ML run -run = Run.get_context() - -with tf.Session() as sess: - start_epoch = 0 - if previous_model_location: - checkpoint_file_path = tf.train.latest_checkpoint(previous_model_location) - saver.restore(sess, checkpoint_file_path) - checkpoint_filename = os.path.basename(checkpoint_file_path) - num_found = re.search(r'\d+', checkpoint_filename) - if num_found: - start_epoch = int(num_found.group(0)) - print("Resuming from epoch {}".format(str(start_epoch))) - else: - init.run() - - for epoch in range(start_epoch, n_epochs): - - # randomly shuffle training set - indices = np.random.permutation(training_set_size) - X_train = X_train[indices] - y_train = y_train[indices] - - # batch index - b_start = 0 - b_end = b_start + batch_size - for _ in range(training_set_size // batch_size): - # get a batch - X_batch, y_batch = X_train[b_start: b_end], y_train[b_start: b_end] - - # update batch index for the next batch - b_start = b_start + batch_size - b_end = min(b_start + batch_size, training_set_size) - - # train - sess.run(train_op, feed_dict={X: X_batch, y: y_batch}) - # evaluate training set - acc_train = acc_op.eval(feed_dict={X: X_batch, y: y_batch}) - # evaluate validation set - acc_val = acc_op.eval(feed_dict={X: X_test, y: y_test}) - - # log accuracies - run.log('training_acc', np.float(acc_train)) - run.log('validation_acc', np.float(acc_val)) - print(epoch, '-- Training accuracy:', acc_train, '\b Validation accuracy:', acc_val) - y_hat = np.argmax(output.eval(feed_dict={X: X_test}), axis=1) - - if epoch % 5 == 0: - saver.save(sess, './outputs/', global_step=epoch) - - # saving only half of the model and resuming again from same epoch - if not previous_model_location and epoch == 10: - break - - run.log('final_acc', np.float(acc_val)) diff --git a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb b/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb deleted file mode 100644 index 94c51ff4..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb +++ /dev/null @@ -1,487 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/tensorflow-resume-training.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Resuming Tensorflow training from previous run\n", - "In this tutorial, you will resume a mnist model in TensorFlow from a previously submitted run." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n", - "* Go through the [configuration notebook](../../../configuration.ipynb) to:\n", - " * install the AML SDK\n", - " * create a workspace and its configuration file (`config.json`)\n", - "* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "Diagnostics" - ] - }, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace\n", - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep='\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"gpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target.')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " compute_target.wait_for_completion(show_output=True)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Upload data to datastore\n", - "To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n", - "\n", - "If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First download the data from Yan LeCun's web site directly and save them in a data folder locally." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib\n", - "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds = ws.get_default_datastore()\n", - "print(ds.datastore_type, ds.account_name, ds.container_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Upload MNIST data to the default datastore." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "For convenience, let's get a reference to the datastore. In the next section, we can then pass this reference to our training script's `--data-folder` argument. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ds_data = ds.as_mount()\n", - "print(ds_data)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory\n", - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "script_folder = './tf-resume-training'\n", - "os.makedirs(script_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copy the training script `tf_mnist_with_checkpoint.py` into this project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "# the training logic is in the tf_mnist_with_checkpoint.py file.\n", - "shutil.copy('./tf_mnist_with_checkpoint.py', script_folder)\n", - "\n", - "# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n", - "shutil.copy('./utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment\n", - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'tf-resume-training'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a TensorFlow estimator\n", - "The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow).\n", - "\n", - "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--data-folder': ds_data\n", - "}\n", - "\n", - "estimator= TensorFlow(source_directory=script_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_with_checkpoint.py')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the above code, we passed our training data reference `ds_data` to our script's `--data-folder` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job\n", - "### Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)\n", - "print(run)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor your run\n", - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, you can block until the script has completed training before running more code." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Now let's resume the training from the above run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we will get the DataPath to the outputs directory of the above run which\n", - "contains the checkpoint files and/or model" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_location = run._get_outputs_datapath()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, we will create a new TensorFlow estimator and pass in the model location. On passing 'resume_from' parameter, a new entry in script_params is created with key as 'resume_from' and value as the model/checkpoint files location and the location gets automatically mounted on the compute target." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import TensorFlow\n", - "\n", - "script_params={\n", - " '--data-folder': ds_data\n", - "}\n", - "\n", - "estimator2 = TensorFlow(source_directory=script_folder,\n", - " compute_target=compute_target,\n", - " script_params=script_params,\n", - " entry_script='tf_mnist_with_checkpoint.py',\n", - " resume_from=model_location)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you can submit the experiment and it should resume from previous run's checkpoint files." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run2 = experiment.submit(estimator2)\n", - "print(run2)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run2.wait_for_completion(show_output=True)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "hesuri" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "hesuri" - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.yml b/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.yml deleted file mode 100644 index c814eef5..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.yml +++ /dev/null @@ -1,5 +0,0 @@ -name: train-tensorflow-resume-training -dependencies: -- pip: - - azureml-sdk - - azureml-widgets diff --git a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/utils.py b/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/utils.py deleted file mode 100644 index 98170ada..00000000 --- a/how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/utils.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. - -import gzip -import numpy as np -import struct - - -# load compressed MNIST gz files and return numpy arrays -def load_data(filename, label=False): - with gzip.open(filename) as gz: - struct.unpack('I', gz.read(4)) - n_items = struct.unpack('>I', gz.read(4)) - if not label: - n_rows = struct.unpack('>I', gz.read(4))[0] - n_cols = struct.unpack('>I', gz.read(4))[0] - res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8) - res = res.reshape(n_items[0], n_rows * n_cols) - else: - res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8) - res = res.reshape(n_items[0], 1) - return res - - -# one-hot encode a 1-D array -def one_hot_encode(array, num_of_classes): - return np.eye(num_of_classes)[array.reshape(-1)] diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb deleted file mode 100644 index d4890cfa..00000000 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb +++ /dev/null @@ -1,568 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train and hyperparameter tune on Iris Dataset with Scikit-learn\n", - "In this tutorial, we demonstrate how to use the Azure ML Python SDK to train a support vector machine (SVM) on a single-node CPU with Scikit-learn to perform classification on the popular [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris). We will also demonstrate how to perform hyperparameter tuning of the model using Azure ML's HyperDrive service." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "* Go through the [Configuration](../../../configuration.ipynb) notebook to install the Azure Machine Learning Python SDK and create an Azure ML Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Diagnostics" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Opt-in diagnostics for better experience, quality, and security of future releases." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.telemetry import set_diagnostics_collection\n", - "\n", - "set_diagnostics_collection(send_diagnostics=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize workspace" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.workspace import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create AmlCompute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_D2_V2` CPU VMs. This process is broken down into 3 steps:\n", - "1. create the configuration (this step is local and only takes a second)\n", - "2. create the cluster (this step will take about **20 seconds**)\n", - "3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "# choose a name for your cluster\n", - "cluster_name = \"cpu-cluster\"\n", - "\n", - "try:\n", - " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", - " print('Found existing compute target')\n", - "except ComputeTargetException:\n", - " print('Creating a new compute target...')\n", - " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", - " max_nodes=4)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", - "\n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it uses the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - "\n", - "# use get_status() to get a detailed status for the current cluster. \n", - "print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above code retrieves a CPU compute target. Scikit-learn does not support GPU computing." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train model on the remote compute" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that you have your data and training script prepared, you are ready to train on your remote compute. You can take advantage of Azure compute to leverage a CPU cluster." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a project directory" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "\n", - "project_folder = './sklearn-iris'\n", - "os.makedirs(project_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Prepare training script" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you will need to create your training script. In this tutorial, the training script is already provided for you at `train_iris`.py. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.\n", - "\n", - "However, if you would like to use Azure ML's [tracking and metrics](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#metrics) capabilities, you will have to add a small amount of Azure ML code inside your training script.\n", - "\n", - "In `train_iris.py`, we will log some metrics to our Azure ML run. To do so, we will access the Azure ML Run object within the script:\n", - "\n", - "```python\n", - "from azureml.core.run import Run\n", - "run = Run.get_context()\n", - "```\n", - "\n", - "Further within `train_iris.py`, we log the kernel and penalty parameters, and the highest accuracy the model achieves:\n", - "\n", - "```python\n", - "run.log('Kernel type', np.string(args.kernel))\n", - "run.log('Penalty', np.float(args.penalty))\n", - "\n", - "run.log('Accuracy', np.float(accuracy))\n", - "```\n", - "\n", - "These run metrics will become particularly important when we begin hyperparameter tuning our model in the \"Tune model hyperparameters\" section.\n", - "\n", - "Once your script is ready, copy the training script `train_iris.py` into your project directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "\n", - "shutil.copy('train_iris.py', project_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an experiment" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this Scikit-learn tutorial." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'train_iris'\n", - "experiment = Experiment(ws, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Scikit-learn estimator" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The Azure ML SDK's Scikit-learn estimator enables you to easily submit Scikit-learn training jobs for single-node runs. The following code will define a single-node Scikit-learn job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "sklearn-remarks-sample" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.sklearn import SKLearn\n", - "\n", - "script_params = {\n", - " '--kernel': 'linear',\n", - " '--penalty': 1.0,\n", - "}\n", - "\n", - "estimator = SKLearn(source_directory=project_folder, \n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " entry_script='train_iris.py',\n", - " pip_packages=['joblib==0.13.2']\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit job" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Run your experiment by submitting your estimator object. Note that this call is asynchronous." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = experiment.submit(estimator)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Monitor your run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.cancel()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Tune model hyperparameters" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now that we've seen how to do a simple Scikit-learn training run using the SDK, let's see if we can further improve the accuracy of our model. We can optimize our model's hyperparameters using Azure Machine Learning's hyperparameter tuning capabilities." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Start a hyperparameter sweep" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "First, we will define the hyperparameter space to sweep over. Let's tune the `kernel` and `penalty` parameters. In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, `Accuracy`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n", - "from azureml.train.hyperdrive.sampling import RandomParameterSampling\n", - "from azureml.train.hyperdrive.run import PrimaryMetricGoal\n", - "from azureml.train.hyperdrive.parameter_expressions import choice\n", - " \n", - "\n", - "param_sampling = RandomParameterSampling( {\n", - " \"--kernel\": choice('linear', 'rbf', 'poly', 'sigmoid'),\n", - " \"--penalty\": choice(0.5, 1, 1.5)\n", - " }\n", - ")\n", - "\n", - "hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n", - " hyperparameter_sampling=param_sampling, \n", - " primary_metric_name='Accuracy',\n", - " primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n", - " max_total_runs=12,\n", - " max_concurrent_runs=4)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, lauch the hyperparameter tuning job." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# start the HyperDrive run\n", - "hyperdrive_run = experiment.submit(hyperdrive_run_config)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Monitor HyperDrive runs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can monitor the progress of the runs with the following Jupyter widget." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "RunDetails(hyperdrive_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "hyperdrive_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Find and register best model\n", - "When all jobs finish, we can find out the one that has the highest accuracy." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "best_run = hyperdrive_run.get_best_run_by_primary_metric()\n", - "print(best_run.get_details()['runDefinition']['arguments'])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, let's list the model files uploaded during the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(best_run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can then register the folder (and all files in it) as a model named `sklearn-iris` under the workspace for deployment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model = best_run.register_model(model_name='sklearn-iris', model_path='outputs/model.joblib')" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "dipeck" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "msauthor": "dipeck" - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.yml b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.yml deleted file mode 100644 index 2691a849..00000000 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.yml +++ /dev/null @@ -1,6 +0,0 @@ -name: train-hyperparameter-tune-deploy-with-sklearn -dependencies: -- pip: - - azureml-sdk - - azureml-widgets - - numpy diff --git a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py b/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py deleted file mode 100644 index bc9099d8..00000000 --- a/how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train_iris.py +++ /dev/null @@ -1,60 +0,0 @@ -# Modified from https://www.geeksforgeeks.org/multiclass-classification-using-scikit-learn/ - -import argparse -import os - -# importing necessary libraries -import numpy as np - -from sklearn import datasets -from sklearn.metrics import confusion_matrix -from sklearn.model_selection import train_test_split - -import joblib - -from azureml.core.run import Run -run = Run.get_context() - - -def main(): - parser = argparse.ArgumentParser() - - parser.add_argument('--kernel', type=str, default='linear', - help='Kernel type to be used in the algorithm') - parser.add_argument('--penalty', type=float, default=1.0, - help='Penalty parameter of the error term') - - args = parser.parse_args() - run.log('Kernel type', np.str(args.kernel)) - run.log('Penalty', np.float(args.penalty)) - - # loading the iris dataset - iris = datasets.load_iris() - - # X -> features, y -> label - X = iris.data - y = iris.target - - # dividing X, y into train and test data - X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) - - # training a linear SVM classifier - from sklearn.svm import SVC - svm_model_linear = SVC(kernel=args.kernel, C=args.penalty).fit(X_train, y_train) - svm_predictions = svm_model_linear.predict(X_test) - - # model accuracy for X_test - accuracy = svm_model_linear.score(X_test, y_test) - print('Accuracy of SVM classifier on test set: {:.2f}'.format(accuracy)) - run.log('Accuracy', np.float(accuracy)) - # creating a confusion matrix - cm = confusion_matrix(y_test, svm_predictions) - print(cm) - - os.makedirs('outputs', exist_ok=True) - # files saved in the "outputs" folder are automatically uploaded into run history - joblib.dump(svm_model_linear, 'outputs/model.joblib') - - -if __name__ == '__main__': - main() diff --git a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb index d7b17dbf..e426751b 100644 --- a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb +++ b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb @@ -278,7 +278,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" - } + }, + "friendly_name": "Training in Spark", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Submiting a run on a spark cluster", + "datasets": [ + "None" + ], + "compute": [ + "HDI cluster" + ], + "deployment": [ + "None" + ], + "framework": [ + "PySpark" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb index de79e052..38b7d320 100644 --- a/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb +++ b/how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb @@ -441,7 +441,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" - } + }, + "friendly_name": "Train on Azure Machine Learning Compute", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Submit an Azure Machine Leaarning Compute run", + "datasets": [ + "Diabetes" + ], + "compute": [ + "AML Compute" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb index 917e14c9..bf6f42bd 100644 --- a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb +++ b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb @@ -667,7 +667,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "friendly_name": "Train on local compute", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Train a model locally", + "datasets": [ + "Diabetes" + ], + "compute": [ + "Local" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb index a4a70bfa..d19160b8 100644 --- a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb +++ b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb @@ -606,7 +606,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "friendly_name": "Train in a remote Linux virtual machine", + "exclude_from_index": false, + "index_order": 1, + "category": "training", + "task": "Configure and execute a run", + "datasets": [ + "Diabetes" + ], + "compute": [ + "Data Science Virtual Machine" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb index 083bef3c..5961832e 100644 --- a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb +++ b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb @@ -313,8 +313,7 @@ "* registering a model in your workspace\n", "* creating a scoring file containing init and run methods\n", "* creating an environment dependency file describing packages necessary for your scoring file\n", - "* creating a docker image containing a properly described environment, your model, and your scoring file\n", - "* deploying that docker image as a web service" + "* deploying the model and packages as a web service" ] }, { @@ -386,9 +385,9 @@ "source": [ "### Describe your environment\n", "\n", - "Each modelling process may require a unique set of packages. Therefore we need to create a dependency file providing instructions to AML on how to construct a docker image that can support the models and any other objects required for inference. In the following cell, we create a environment dependency file, *myenv.yml* that specifies which libraries are needed by the scoring script. You can create this file manually, or use the `CondaDependencies` class to create it for you.\n", + "Each modelling process may require a unique set of packages. Therefore we need to create an environment object describing the dependencies. \n", "\n", - "Next we use this environment file to describe the docker container that we need to create in order to deploy our model. This container is created using our environment description and includes our scoring script." + "Next we create an inference configuration using this environment object and the scoring script that we created previously." ] }, { @@ -397,24 +396,13 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "from azureml.core.image import ContainerImage\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.environment import Environment\n", + "from azureml.core.model import InferenceConfig\n", "\n", - "# Create an empty conda environment and add the scikit-learn package\n", - "env = CondaDependencies()\n", - "env.add_conda_package(\"scikit-learn\")\n", - "\n", - "# Display the environment\n", - "print(env.serialize_to_string())\n", - "\n", - "# Write the environment to disk\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(env.serialize_to_string())\n", - "\n", - "# Create a configuration object indicating how our deployment container needs to be created\n", - "image_config = ContainerImage.image_configuration(execution_script=\"score.py\", \n", - " runtime=\"python\", \n", - " conda_file=\"myenv.yml\")" + "env = Environment('deploytocloudenv')\n", + "env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)" ] }, { @@ -422,7 +410,7 @@ "metadata": {}, "source": [ "### Describe your target compute\n", - "In addition to the container, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n", + "In addition to the inference configuration, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n", "\n", "For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n" ] @@ -451,18 +439,18 @@ "metadata": {}, "source": [ "### Deploy your webservice\n", - "The final step to deploying your webservice is to call `WebService.deploy_from_model()`. This function uses the deployment and image configurations created above to perform the following:\n", + "The final step to deploying your webservice is to call `Model.deploy()`. This function uses the deployment and inference configurations created above to perform the following:\n", "* Build a docker image\n", "* Deploy to the docker image to an Azure Container Instance\n", "* Copy your model files to the Azure Container Instance\n", "* Call the `init()` function in your scoring file\n", "* Provide an HTTP endpoint for scoring calls\n", "\n", - "The `deploy_from_model` method requires the following parameters\n", + "The `Model.deploy` method requires the following parameters\n", "* `workspace` - the workspace containing the service\n", "* `name` - a unique named used to identify the service in the workspace\n", "* `models` - an array of models to be deployed into the container\n", - "* `image_config` - a configuration object describing the image environment\n", + "* `inference_config` - a configuration object describing the image environment\n", "* `deployment_config` - a configuration object describing the compute type\n", " \n", "**Note:** The web service creation can take several minutes. " @@ -480,14 +468,15 @@ "outputs": [], "source": [ "%%time\n", + "from azureml.core.model import Model\n", "from azureml.core.webservice import Webservice\n", "\n", "# Create the webservice using all of the precreated configurations and our best model\n", - "service = Webservice.deploy_from_model(name='my-aci-svc',\n", - " deployment_config=aciconfig,\n", - " models=[model],\n", - " image_config=image_config,\n", - " workspace=ws)\n", + "service = Model.deploy(workspace=ws,\n", + " name='my-aci-svc',\n", + " models=[model],\n", + " inference_config=inference_config,\n", + " deployment_config=aciconfig)\n", "\n", "# Wait for the service deployment to complete while displaying log output\n", "service.wait_for_deployment(show_output=True)" @@ -682,6 +671,22 @@ "name": "roastala" } ], + "category": "tutorial", + "compute": [ + "Local" + ], + "datasets": [ + "Diabetes" + ], + "deployment": [ + "Azure Container Instance" + ], + "exclude_from_index": false, + "framework": [ + "None" + ], + "friendly_name": "", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -698,7 +703,11 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "tags": [ + "None" + ], + "task": "Training and deploying a model from a notebook" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/using-environments/example/example.py b/how-to-use-azureml/training/using-environments/example/example.py deleted file mode 100644 index 90386313..00000000 --- a/how-to-use-azureml/training/using-environments/example/example.py +++ /dev/null @@ -1,8 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License - -# Very simple script to demonstrate run in environment -# Print message passed in as environment variable -import os - -print(os.environ.get("MESSAGE")) diff --git a/how-to-use-azureml/training/using-environments/using-environments.ipynb b/how-to-use-azureml/training/using-environments/using-environments.ipynb index 3f420c8f..ba566ead 100644 --- a/how-to-use-azureml/training/using-environments/using-environments.ipynb +++ b/how-to-use-azureml/training/using-environments/using-environments.ipynb @@ -369,7 +369,27 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" - } + }, + "friendly_name": "Using Azure ML environments", + "exclude_from_index": false, + "index_order": 1, + "category": "starter", + "task": "Creating and registering environments", + "datasets": [ + "None" + ], + "compute": [ + "Local" + ], + "deployment": [ + "None" + ], + "framework": [ + "None" + ], + "tags": [ + "None" + ] }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/work-with-data/dataprep/data/10x10-float64-csr.npz b/how-to-use-azureml/work-with-data/dataprep/data/10x10-float64-csr.npz new file mode 100644 index 00000000..3f7505df Binary files /dev/null and b/how-to-use-azureml/work-with-data/dataprep/data/10x10-float64-csr.npz differ diff --git a/how-to-use-azureml/work-with-data/dataprep/data/crime.jsonl b/how-to-use-azureml/work-with-data/dataprep/data/crime.jsonl new file mode 100644 index 00000000..56b6e0b0 --- /dev/null +++ b/how-to-use-azureml/work-with-data/dataprep/data/crime.jsonl @@ -0,0 +1,10 @@ +{"ID": 10140490, "Case Number": "HY329907", "Date": "7/5/2015 23:50", "Block": "050XX N NEWLAND AVE 820", "Primary Type": "THEFT"} +{"ID": 10139776, "Case Number": "HY329265", "Date": "7/5/2015 23:30", "Block": "011XX W MORSE AVE 460", "Primary Type": "BATTERY"} +{"ID": 10140270, "Case Number": "HY329253", "Date": "7/5/2015 23:20", "Block": "121XX S FRONT AVE 486", "Primary Type": "BATTERY"} +{"ID": 10139885, "Case Number": "HY329308", "Date": "7/5/2015 23:19", "Block": "051XX W DIVISION ST 610", "Primary Type": "BURGLARY"} +{"ID": 10140379, "Case Number": "HY329556", "Date": "7/5/2015 23:00", "Block": "012XX W LAKE ST 930", "Primary Type": "MOTOR VEHICLE THEFT"} +{"ID": 10140868, "Case Number": "HY330421", "Date": "7/5/2015 23:54", "Block": "118XX S PEORIA ST 1320", "Primary Type": "CRIMINAL DAMAGE"} +{"ID": 10139762, "Case Number": "HY329232", "Date": "7/5/2015 23:42", "Block": "026XX W 37TH PL 1020", "Primary Type": "ARSON"} +{"ID": 10139722, "Case Number": "HY329228", "Date": "7/5/2015 23:30", "Block": "016XX S CENTRAL PARK AVE 1811", "Primary Type": "NARCOTICS"} +{"ID": 10139774, "Case Number": "HY329209", "Date": "7/5/2015 23:15", "Block": "048XX N ASHLAND AVE 1310", "Primary Type": "CRIMINAL DAMAGE"} +{"ID": 10139697, "Case Number": "HY329177", "Date": "7/5/2015 23:10", "Block": "058XX S ARTESIAN AVE 1320", "Primary Type": "CRIMINAL DAMAGE"} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/dataprep/data/file-url.csv b/how-to-use-azureml/work-with-data/dataprep/data/file-url.csv new file mode 100644 index 00000000..fd477e80 --- /dev/null +++ b/how-to-use-azureml/work-with-data/dataprep/data/file-url.csv @@ -0,0 +1,4 @@ +file_url +AmlDatastore://dataprep_blob/crime0-10.csv +AmlDatastore://dataprep_blob/housing.xlsx +AmlDatastore://dataprep_blob/input/crime0-10.xlsx diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb index 43a3ca62..c0653203 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb @@ -109,12 +109,13 @@ "outputs": [], "source": [ "pt_dflow = dflow\n", - "dflow = pt_dflow.transform_partition(\"\"\"\n", + "\n", "def transform(df, index):\n", " df['Latitude'].fillna('0',inplace=True)\n", " df['Longitude'].fillna('0',inplace=True)\n", " return df\n", - "\"\"\")\n", + "\n", + "dflow = pt_dflow.map_partition(fn=transform)\n", "dflow.head(5)" ] }, diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb index 3f04e85a..58b6df4a 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb @@ -41,8 +41,10 @@ "[Read Excel](#excel)
\n", "[Read Fixed Width Files](#fixed-width)
\n", "[Read Parquet](#parquet)
\n", + "[Read Npz](#npz)
\n", "[Read Part Files Using Globbing](#globbing)
\n", "[Read JSON](#json)
\n", + "[Read JSON Lines](#jsonlines)
\n", "[Read SQL](#sql)
\n", "[Read PostgreSQL](#postgresql)
\n", "[Read From Azure Blob](#azure-blob)
\n", @@ -512,6 +514,60 @@ " print(fileindent + f)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read Npz" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For reading `.npz` files use `read_npz_file`.\n", + "\n", + "**Note:** Currently the only supported npz files are those containing CSR Matrixes saved using SciPy's `sparse.save_npz` method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dflow = dprep.read_npz_file('../data/10x10-float64-csr.npz')\n", + "df = dflow.to_pandas_dataframe()\n", + "df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.dataprep.native import preppy_to_ndarrays\n", + "from pandas.util.testing import assert_frame_equal\n", + "import os\n", + "import pandas\n", + "import glob\n", + "from collections import OrderedDict\n", + "\n", + "paths = [os.path.abspath(file) for file in glob.iglob('./testdata/npz-10x10-csr/part-*', recursive=False)]\n", + "paths.sort()\n", + "dataset = preppy_to_ndarrays(paths)\n", + "expected_df = pandas.DataFrame.from_dict(OrderedDict(dataset))\n", + "assert_frame_equal(expected_df, df)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -593,6 +649,32 @@ "dflow_flat_arrays.head(5)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Read JSON Lines Files\n", + "\n", + "In addition to JSON objects, Data Prep can also read files consisting of JSON lines." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dflow_json_lines = dprep.read_json_lines('../data/crime.jsonl')\n", + "dflow_json_lines.head(5)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -845,11 +927,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To configure the ACL for the ADLS filesystem, use the objectId of the user or, here, ServicePrincipal:\n", + "To configure the ACL (Access Control List) for the ADLS filesystem, use the objectId of the user or, here, ServicePrincipal:\n", "```\n", "az ad sp show --id \"fbc406bf-f7c2-410d-bc26-8b08e4dab1aa\" --query objectId\n", "```\n", - "Configure Read and Execute access for the ADLS file system. Since the underlying HDFS ACL model doesn't support inheritance, folders and files need to be ACL-ed individually.\n", + "Configure both Read and Execute access for the ADLS file system. Since the underlying HDFS ACL model doesn't support inheritance, folders and files need to be ACL-ed individually. Please double check if the app also has permission to access the hierarchical containers.\n", "```\n", "az dls fs access set-entry --account dpreptestfiles --acl-spec \"user:999a21ef-75aa-4538-b325-249285672204:r-x\" --path /\n", "az dls fs access set-entry --account dpreptestfiles --acl-spec \"user:999a21ef-75aa-4538-b325-249285672204:r--\" --path /crime-spring.csv\n", @@ -965,7 +1047,24 @@ "ctx = adal.AuthenticationContext('https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47')\n", "token = ctx.acquire_token_with_client_certificate('https://storage.azure.com/', servicePrincipalAppId, certificate, certThumbprint)\n", "dflow = dprep.read_csv(path = ADLSGen2(path='https://adlsgen2datapreptest.dfs.core.windows.net/datapreptest/people.csv', accessToken=token['accessToken']))\n", - "dflow.to_pandas_dataframe().head()" + "dflow.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Reading from ADLSGen2 using the ABFS uri syntax is also supported by Data Prep." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dflow = dprep.read_csv(path = ADLSGen2(path='abfss://adlsgen2datapreptest.dfs.core.windows.net/datapreptest/people.csv', accessToken=token['accessToken']))\n", + "dflow.head()" ] }, { @@ -1076,7 +1175,7 @@ "metadata": {}, "outputs": [], "source": [ - "dflow = dprep.read_csv('https://dprepdata.blob.core.windows.net/test/Sample-Spreadsheet-10-rows.csv')\n", + "dflow = dprep.read_csv(dprep.HttpDataSource('https://dprepdata.blob.core.windows.net/test/Sample-Spreadsheet-10-rows.csv'))\n", "dflow.head(5)" ] } diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb index 92064377..8b17de4e 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb @@ -119,7 +119,8 @@ "metadata": {}, "outputs": [], "source": [ - "dflow.save(temp_dflow_path)" + "dflow.save(temp_dflow_path)\n", + "temp_dflow_path" ] }, { @@ -164,6 +165,18 @@ "language": "python", "name": "python36" }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + }, "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License." }, "nbformat": 4, diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb index 4f87af22..7b8817de 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb @@ -113,6 +113,31 @@ "source": [ "(dflow_test, dflow_train) = dflow.random_split(percentage=0.1, seed=12345)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Multi-Split" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In addition to the random split demonstrated above, it is also possible to split a single Dataflow into multiple Dataflows, each containing a random exclusive subset of the overall data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "total_dflow = dprep.read_csv(path='../data/crime-spring.csv')\n", + "subset_dflows = total_dflow.multi_split(4, seed=2) # Split in 4 parts, each part contains a random 25% of the data\n", + "print([dflow.row_count for dflow in subset_dflows])" + ] } ], "metadata": { diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb index d1abb62f..046edbae 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb @@ -176,6 +176,29 @@ "multi_strata_sample.head(5)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Selecting Partitions\n", + "\n", + "The data produced by Dataflows is processed in partitions. How different data sources and formats are partitioned is guaranteed to be stable for a specific execution mode and version of azureml.dataprep. Usually, these partitions should not be interacted directly and instead higher-level APIs should be leveraged. In certain advanced scenarios, however, it can be useful to create a Dataflow that contains only a subset of the partitions of another. The `select_partitions` method can help accomplish this." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "partition_count = dflow.get_partition_count()\n", + "# We'll keep only even-numbered partitions\n", + "desired_partitions = [p for p in range(0, partition_count) if p % 2 == 0]\n", + "subset_dflow = dflow.select_partitions(desired_partitions)\n", + "\n", + "subset_dflow.to_pandas_dataframe()" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb index e92c1e1c..fb30ae5e 100644 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb +++ b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb @@ -160,6 +160,26 @@ " base_path=dprep.LocalFileOutput('./test_out/crime/'),\n", " file_names_column='CleanName').run_local()" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Converting Data Into Streams\n", + "\n", + "Tabular data can be easily converted into a series of streams containing the data expressed in a binary or text format. These streams can then be written out using the capabilities outlined above. The number of resulting streams will depend on the number of partitions in the input data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tabular_dflow = dprep.auto_read_file('../data/crime-full.csv')\n", + "streams_dflow = tabular_dflow.to_parquet_streams()\n", + "streams_dflow.head(1)" + ] } ], "metadata": { diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb deleted file mode 100644 index d2d99569..00000000 --- a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb +++ /dev/null @@ -1,716 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Train an image classification model with Azure Machine Learning\n", - " \n", - "This tutorial trains a simple logistic regression using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28x28 pixels, representing a number from 0 to 9. The goal is to create a multi-class classifier to identify the digit a given image represents. \n", - "\n", - "Learn how to:\n", - "\n", - "> * Set up your development environment\n", - "> * Access and examine the data via AzureML FileDataset\n", - "> * Train a simple logistic regression model on a remote cluster\n", - "> * Review training results, find and register the best model\n", - "\n", - "## Prerequisites\n", - "\n", - "See prerequisites in the [Azure Machine Learning documentation](https://docs.microsoft.com/azure/machine-learning/service/tutorial-train-models-with-aml#prerequisites)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up your development environment\n", - "\n", - "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", - "\n", - "* Importing Python packages\n", - "* Connecting to a workspace to enable communication between your local computer and remote resources\n", - "* Creating an experiment to track all your runs\n", - "* Creating a remote compute target to use for training\n", - "\n", - "### Import packages\n", - "\n", - "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "check version" - ] - }, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "\n", - "import azureml.core\n", - "from azureml.core import Workspace\n", - "\n", - "# check core SDK version number\n", - "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to workspace\n", - "\n", - "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "load workspace" - ] - }, - "outputs": [], - "source": [ - "# load workspace configuration from the config.json file in the current folder.\n", - "workspace = Workspace.from_config()\n", - "print(workspace.name, workspace.location, workspace.resource_group, workspace.location, sep='\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create experiment\n", - "\n", - "Create an experiment to track the runs in your workspace. A workspace can have muliple experiments. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create experiment" - ] - }, - "outputs": [], - "source": [ - "experiment_name = 'sklearn-mnist'\n", - "\n", - "from azureml.core import Experiment\n", - "exp = Experiment(workspace=workspace, name=experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create or Attach existing compute resource\n", - "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", - "\n", - "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create mlc", - "amlcompute" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", - "\n", - "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"azureml-compute\"\n", - "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = workspace.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - "\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\\n\",\n", - " compute_target = ComputeTarget.create(workspace, amlcompute_cluster_name, provisioning_config)\n", - "\n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - "\n", - "# For a more detailed view of current AmlCompute status, use get_status()." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You now have the necessary packages and compute resources to train a model in the cloud. \n", - "\n", - "## Explore data\n", - "\n", - "Before you train a model, you need to understand the data that you are using to train it. You also need to copy the data into the cloud so it can be accessed by your cloud training environment. In this section you learn how to:\n", - "\n", - "* Download the MNIST dataset\n", - "* Display some sample images\n", - "* Upload data to the cloud\n", - "\n", - "### Download the MNIST dataset\n", - "\n", - "Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import urllib.request\n", - "\n", - "data_folder = os.path.join(os.getcwd(), 'data')\n", - "os.makedirs(data_folder, exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Display some sample images\n", - "\n", - "Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `utils.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compresse files into numpy arrays." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# make sure utils.py is in the same directory as this code\n", - "from utils import load_data\n", - "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", - "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", - "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", - "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", - "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", - "\n", - "# now let's show some randomly chosen images from the traininng set.\n", - "count = 0\n", - "sample_size = 30\n", - "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", - " count = count + 1\n", - " plt.subplot(1, sample_size, count)\n", - " plt.axhline('')\n", - " plt.axvline('')\n", - " plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n", - "plt.show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now you have an idea of what these images look like and the expected prediction outcome.\n", - "\n", - "### Upload data to the cloud\n", - "\n", - "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", - "\n", - "The MNIST files are uploaded into a directory named `mnist` at the root of the datastore. See [access data from your datastores](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data) for more information." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "use datastore" - ] - }, - "outputs": [], - "source": [ - "datastore = workspace.get_default_datastore()\n", - "print(datastore.datastore_type, datastore.account_name, datastore.container_name)\n", - "\n", - "datastore.upload(src_dir=data_folder, target_path='mnist', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a FileDataset\n", - "A FileDataset references single or multiple files in your datastores or public urls. The files can be of any format. FileDataset provides you with the ability to download or mount the files to your compute. By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. [Learn More](https://aka.ms/azureml/howto/createdatasets)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.dataset import Dataset\n", - "\n", - "datastore = workspace.get_default_datastore()\n", - "dataset = Dataset.File.from_files(path = [(datastore, 'mnist/')])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the `register()` method to register datasets to your workspace so they can be shared with others, reused across various experiments, and refered to by name in your training script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dataset = dataset.register(workspace = workspace,\n", - " name = 'mnist dataset',\n", - " description='training and test dataset',\n", - " create_new_version=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train on a remote cluster\n", - "\n", - "For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n", - "* Create a directory\n", - "* Create a training script\n", - "* Create an estimator object\n", - "* Submit the job \n", - "\n", - "### Create a directory\n", - "\n", - "Create a directory to deliver the necessary code from your computer to the remote resource." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "script_folder = os.path.join(os.getcwd(), \"sklearn-mnist\")\n", - "os.makedirs(script_folder, exist_ok=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a training script\n", - "\n", - "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $script_folder/train.py\n", - "\n", - "import argparse\n", - "import os\n", - "import numpy as np\n", - "\n", - "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.externals import joblib\n", - "\n", - "from azureml.core import Run, Dataset\n", - "from utils import load_data\n", - "from uuid import uuid4\n", - "\n", - "# let user feed in the regularization rate of the logistic regression model as an argument\n", - "parser = argparse.ArgumentParser()\n", - "parser.add_argument('--dataset-name', dest='ds_name', help='the name of dataset')\n", - "parser.add_argument('--regularization', type=float, dest='reg', default=0.01, help='regularization rate')\n", - "args = parser.parse_args()\n", - "\n", - "# get hold of the current run\n", - "run = Run.get_context()\n", - "\n", - "workspace = run.experiment.workspace\n", - "dataset_name = args.ds_name\n", - "dataset = Dataset.get_by_name(workspace=workspace, name=dataset_name)\n", - "\n", - "# create a folder on the compute that we will mount the dataset to\n", - "data_folder = '/tmp/mnist/{}'.format(uuid4())\n", - "os.makedirs(data_folder)\n", - "\n", - "with dataset.mount(data_folder):\n", - " import glob\n", - " X_train_path = glob.glob(os.path.join(data_folder, '**/train-images.gz'), recursive=True)[0]\n", - " X_test_path = glob.glob(os.path.join(data_folder, '**/test-images.gz'), recursive=True)[0]\n", - " y_train_path = glob.glob(os.path.join(data_folder, '**/train-labels.gz'), recursive=True)[0]\n", - " y_test_path = glob.glob(os.path.join(data_folder, '**/test-labels.gz'), recursive=True)[0]\n", - " # load train and test set into numpy arrays\n", - " # note we scale the pixel intensity values to 0-1 (by dividing it with 255.0) so the model can converge faster.\n", - " X_train = load_data(X_train_path, False) / 255.0\n", - " X_test = load_data(X_test_path, False) / 255.0\n", - " y_train = load_data(y_train_path, True).reshape(-1)\n", - " y_test = load_data(y_test_path, True).reshape(-1)\n", - " print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\\n')\n", - "\n", - " print('Train a logistic regression model with regularization rate of', args.reg)\n", - " clf = LogisticRegression(C=1.0/args.reg, solver=\"liblinear\", multi_class=\"auto\", random_state=42)\n", - " clf.fit(X_train, y_train)\n", - "\n", - " print('Predict the test set')\n", - " y_hat = clf.predict(X_test)\n", - "\n", - " # calculate accuracy on the prediction\n", - " acc = np.average(y_hat == y_test)\n", - " print('Accuracy is', acc)\n", - "\n", - " run.log('regularization rate', np.float(args.reg))\n", - " run.log('accuracy', np.float(acc))\n", - "\n", - " os.makedirs('outputs', exist_ok=True)\n", - " # note file saved in the outputs folder is automatically uploaded into experiment record\n", - " joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Notice how the script gets data and saves models:\n", - "\n", - "+ The training script gets the mnist dataset registered with the workspace through the Run object, then uses the FileDataset to download file streams defined by it to a target path (data_folder) on the compute." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "+ The training script saves your model into a directory named outputs.
\n", - "`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`
\n", - "Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "shutil.copy('utils.py', script_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create an estimator\n", - "\n", - "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create SKLearn estimator for scikit-learn model, by specifying\n", - "\n", - "* The name of the estimator object, `est`\n", - "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", - "* The compute target. In this case you will use the AmlCompute you created\n", - "* The training script name, train.py\n", - "* Parameters required from the training script \n", - "\n", - "In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.environment import Environment\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "env = Environment('my_env')\n", - "cd = CondaDependencies.create(pip_packages=['azureml-sdk', 'pandas','scikit-learn','azureml-dataprep[pandas,fuse]==1.1.14'])\n", - "\n", - "env.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "configure estimator" - ] - }, - "outputs": [], - "source": [ - "from azureml.train.sklearn import SKLearn\n", - "\n", - "script_params = {\n", - " '--dataset-name': 'mnist dataset',\n", - " '--regularization': 0.5\n", - "}\n", - "\n", - "est = SKLearn(source_directory=script_folder,\n", - " script_params=script_params,\n", - " compute_target=compute_target,\n", - " environment_definition = env,\n", - " entry_script='train.py')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Submit the job to the cluster\n", - "\n", - "Run the experiment by submitting the estimator object. And you can navigate to Azure portal to monitor the run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "remote run", - "amlcompute", - "scikit-learn" - ] - }, - "outputs": [], - "source": [ - "run = exp.submit(config=est)\n", - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.\n", - "\n", - "## Monitor a remote run\n", - "\n", - "In total, the first run takes **approximately 10 minutes**.\n", - "\n", - "Here is what's happening while you wait:\n", - "\n", - "- **Image creation**: A Docker image is created matching the Python environment specified by the estimator. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**. \n", - "\n", - " This stage happens once for each Python environment since the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs.\n", - "\n", - "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\n", - "\n", - "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\n", - "\n", - "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\n", - "\n", - "\n", - "You can check the progress of a running job in multiple ways. This tutorial uses a Jupyter widget as well as a `wait_for_completion` method. \n", - "\n", - "### Jupyter widget\n", - "\n", - "Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "use notebook widget" - ] - }, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(run).show()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get log results upon completion\n", - "\n", - "Model training happens in the background. You can use `wait_for_completion` to block and wait until the model has completed training before running more code. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "remote run", - "amlcompute", - "scikit-learn" - ] - }, - "outputs": [], - "source": [ - "# specify show_output to True for a verbose log\n", - "run.wait_for_completion(show_output=True) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Display run results\n", - "\n", - "You now have a model trained on a remote cluster. Retrieve all the metrics logged during the run, including the accuracy of the model:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "get metrics" - ] - }, - "outputs": [], - "source": [ - "print(run.get_metrics())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register model\n", - "\n", - "The last step in the training script wrote the file `outputs/sklearn_mnist_model.pkl` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n", - "\n", - "You can see files associated with that run." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "query history" - ] - }, - "outputs": [], - "source": [ - "print(run.get_file_names())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from history" - ] - }, - "outputs": [], - "source": [ - "# register model \n", - "model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')\n", - "print(model.name, model.id, model.version, sep='\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/datasets/datasets-tutorial/filedatasets-tutorial.png)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "roastala" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.4" - }, - "msauthor": "sihhu" - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-dataset-tutorial.ipynb b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-dataset-tutorial.ipynb deleted file mode 100644 index 07d03723..00000000 --- a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-dataset-tutorial.ipynb +++ /dev/null @@ -1,312 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tutorial: Learn how to use TabularDatasets in Azure Machine Learning" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this tutorial, you will learn how to use Azure Machine Learning Datasets to train a classification model with the Azure Machine Learning SDK for Python. You will:\n", - "\n", - "☑ Setup a Python environment and import packages\n", - "\n", - "☑ Load the Titanic data from your Azure Blob Storage. (The [original data](https://www.kaggle.com/c/titanic/data) can be found on Kaggle)\n", - "\n", - "☑ Create and register a TabularDataset in your workspace\n", - "\n", - "☑ Train a classification model using the TabularDataset" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pre-requisites:\n", - "To create and work with datasets, you need:\n", - "* An Azure subscription. If you don\u00e2\u20ac\u2122t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.\n", - "* An [Azure Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)\n", - "* The [Azure Machine Learning SDK for Python installed](https://docs.microsoft.com/python/api/overview/azure/ml/install?view=azure-ml-py), which includes the azureml-datasets package.\n", - "\n", - "Data and train.py script to store in your Azure Blob Storage Account.\n", - " * [Titanic data](./train-dataset/Titanic.csv)\n", - " * [train.py](./train-dataset/train.py)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize a Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "from azureml.core import Workspace, Datastore, Dataset\n", - "\n", - "# Get existing workspace from config.json file in the same folder as the tutorial notebook\n", - "# You can download the config file from your workspace\n", - "workspace = Workspace.from_config()\n", - "print(workspace)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a TabularDataset\n", - "\n", - "Datasets are categorized into various types based on how users consume them in training. In this tutorial, you will create and use a TabularDataset in training. A TabularDataset represents data in a tabular format by parsing the provided file or list of files. TabularDataset can be created from csv, tsv, parquet files, SQL query results etc. For the complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference). It provides you with the ability to materialize the data into a pandas DataFrame." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "By creating a dataset, you create a reference to the data source location, along with a copy of its metadata. The data remains in its existing location, so no extra storage cost is incurred.\n", - "\n", - "We will now upload the [Titanic data](./train-dataset/Titanic.csv) to the default datastore(blob) within your workspace.." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "datastore = workspace.get_default_datastore()\n", - "datastore.upload_files(files = ['./train-dataset/Titanic.csv'],\n", - " target_path = 'train-dataset/',\n", - " overwrite = True,\n", - " show_progress = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then we will create an unregistered TabularDataset pointing to the path in the datastore. We also support create a Dataset from multiple paths. [learn more](https://aka.ms/azureml/howto/createdatasets) " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'train-dataset/Titanic.csv')])\n", - "\n", - "#preview the first 3 rows of the dataset\n", - "dataset.take(3).to_pandas_dataframe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the `register()` method to register datasets to your workspace so they can be shared with others, reused across various experiments, and refered to by name in your training script." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dataset = dataset.register(workspace = workspace,\n", - " name = 'titanic dataset',\n", - " description='training dataset')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create or Attach existing AmlCompute\n", - "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your training. In this tutorial, you create `AmlCompute` as your training compute resource.\n", - "\n", - "**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", - "\n", - "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"your cluster name\"\n", - "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = workspace.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - "\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\\n\",\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - "\n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - "\n", - "# For a more detailed view of current AmlCompute status, use get_status()." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create an Experiment\n", - "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "experiment_name = 'training-datasets'\n", - "experiment = Experiment(workspace = workspace, name = experiment_name)\n", - "project_folder = './train-dataset/'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Configure & Run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import RunConfiguration\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "import pkg_resources\n", - "\n", - "# create a new RunConfig object\n", - "conda_run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "# Set compute target to AmlCompute\n", - "conda_run_config.target = compute_target\n", - "conda_run_config.environment.docker.enabled = True\n", - "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", - "\n", - "dprep_dependency = 'azureml-dataprep==' + pkg_resources.get_distribution(\"azureml-dataprep\").version\n", - "\n", - "cd = CondaDependencies.create(pip_packages=['azureml-sdk', 'scikit-learn', 'pandas', dprep_dependency])\n", - "conda_run_config.environment.python.conda_dependencies = cd" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a new RunConfig object\n", - "run_config = RunConfiguration()\n", - "\n", - "run_config.environment.python.user_managed_dependencies = True\n", - "\n", - "from azureml.core import Run\n", - "from azureml.core import ScriptRunConfig\n", - "\n", - "src = ScriptRunConfig(source_directory=project_folder, \n", - " script='train.py', \n", - " run_config=conda_run_config) \n", - "run = experiment.submit(config=src)\n", - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## View run history details" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You have now finished using a dataset from start to finish of your experiment!" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/datasets/datasets-tutorial/datasets-tutorial.png)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "cforbe" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, - "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License." - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/train-dataset/train.py b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/train-dataset/train.py deleted file mode 100644 index 785c8e74..00000000 --- a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/train-dataset/train.py +++ /dev/null @@ -1,43 +0,0 @@ -import azureml.dataprep as dprep -import azureml.core -import pandas as pd -import logging -import os -import datetime -import shutil - -from azureml.core import Workspace, Datastore, Dataset, Experiment, Run -from sklearn.model_selection import train_test_split -from azureml.core.compute import ComputeTarget, AmlCompute -from azureml.core.compute_target import ComputeTargetException -from sklearn.tree import DecisionTreeClassifier -from sklearn.externals import joblib - -run = Run.get_context() -workspace = run.experiment.workspace - -dataset_name = 'titanic dataset' - -dataset = Dataset.get_by_name(workspace=workspace, name=dataset_name) -df = dataset.to_pandas_dataframe() - -x_col = ['Pclass', 'Sex', 'SibSp', 'Parch'] -y_col = ['Survived'] -x_df = df.loc[:, x_col] -y_df = df.loc[:, y_col] - -x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223) - -data = {"train": {"X": x_train, "y": y_train}, - - "test": {"X": x_test, "y": y_test}} - -clf = DecisionTreeClassifier().fit(data["train"]["X"], data["train"]["y"]) -model_file_name = 'decision_tree.pkl' - -print('Accuracy of Decision Tree classifier on training set: {:.2f}'.format(clf.score(x_train, y_train))) -print('Accuracy of Decision Tree classifier on test set: {:.2f}'.format(clf.score(x_test, y_test))) - -os.makedirs('./outputs', exist_ok=True) -with open(model_file_name, "wb") as file: - joblib.dump(value=clf, filename='outputs/' + model_file_name) diff --git a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py b/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py deleted file mode 100644 index 98170ada..00000000 --- a/how-to-use-azureml/work-with-data/datasets/datasets-tutorial/utils.py +++ /dev/null @@ -1,27 +0,0 @@ -# Copyright (c) Microsoft Corporation. All rights reserved. -# Licensed under the MIT License. - -import gzip -import numpy as np -import struct - - -# load compressed MNIST gz files and return numpy arrays -def load_data(filename, label=False): - with gzip.open(filename) as gz: - struct.unpack('I', gz.read(4)) - n_items = struct.unpack('>I', gz.read(4)) - if not label: - n_rows = struct.unpack('>I', gz.read(4))[0] - n_cols = struct.unpack('>I', gz.read(4))[0] - res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8) - res = res.reshape(n_items[0], n_rows * n_cols) - else: - res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8) - res = res.reshape(n_items[0], 1) - return res - - -# one-hot encode a 1-D array -def one_hot_encode(array, num_of_classes): - return np.eye(num_of_classes)[array.reshape(-1)] diff --git a/index.md b/index.md index 51fe0a68..cebdf4d4 100644 --- a/index.md +++ b/index.md @@ -9,19 +9,93 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| +| [Using Azure ML environments](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/using-environments/using-environments.ipynb) | Creating and registering environments | None | Local | None | None | None | + +| [Estimators in AML with hyperparameter tuning](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb) | Use the Estimator pattern in Azure Machine Learning SDK | None | AML Compute | None | None | None | ## Tutorials |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| -| :star:[Use pipelines for batch scoring](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/tutorial-pipeline-batch-scoring-classification.ipynb) | Batch scoring | None | AmlCompute | Published pipeline | Azure ML Pipelines | None | +| [](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb) | Training and deploying a model from a notebook | Diabetes | Local | Azure Container Instance | None | None | + +| [Use MLflow with Azure Machine Learning for training and deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb) | Use MLflow with Azure Machine Learning to train and deploy Pa yTorch image classifier model | MNIST | AML Compute | Azure Container Instance | PyTorch | None | + +| :star:[Azure Machine Learning Pipeline with DataTranferStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb) | Demonstrates the use of DataTranferStep | Custom | ADF | None | Azure ML | None | + +| [Getting Started with Azure Machine Learning Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb) | Getting Started notebook for ANML Pipelines | Custom | AML Compute | None | Azure ML | None | + +| [Azure Machine Learning Pipeline with AzureBatchStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb) | Demonstrates the use of AzureBatchStep | Custom | Azure Batch | None | Azure ML | None | + +| [Azure Machine Learning Pipeline with EstimatorStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb) | Demonstrates the use of EstimatorStep | Custom | AML Compute | None | Azure ML | None | + +| :star:[How to use ModuleStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-modulestep.ipynb) | Demonstrates the use of ModuleStep | Custom | AML Compute | None | Azure ML | None | + +| :star:[How to use Pipeline Drafts to create a Published Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb) | Demonstrates the use of Pipeline Drafts | Custom | AML Compute | None | Azure ML | None | + +| :star:[Azure Machine Learning Pipeline with HyperDriveStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb) | Demonstrates the use of HyperDriveStep | Custom | AML Compute | None | Azure ML | None | + +| :star:[How to Publish a Pipeline and Invoke the REST endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) | Demonstrates the use of Published Pipelines | Custom | AML Compute | None | Azure ML | None | + +| :star:[How to Setup a Schedule for a Published Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb) | Demonstrates the use of Schedules for Published Pipelines | Custom | AML Compute | None | Azure ML | None | + +| [How to setup a versioned Pipeline Endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb) | Demonstrates the use of PipelineEndpoint to run a specific version of the Published Pipeline | Custom | AML Compute | None | Azure ML | None | + +| :star:[How to use DataPath as a PipelineParameter](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb) | Demonstrates the use of DataPath as a PipelineParameter | Custom | AML Compute | None | Azure ML | None | + +| [How to use AdlaStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb) | Demonstrates the use of AdlaStep | Custom | Azure Data Lake Analytics | None | Azure ML | None | + +| :star:[How to use DatabricksStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb) | Demonstrates the use of DatabricksStep | Custom | Azure Databricks | None | Azure ML, Azure Databricks | None | + +| :star:[How to use AutoMLStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb) | Demonstrates the use of AutoMLStep | Custom | AML Compute | None | Automated Machine Learning | None | + +| :star:[Azure Machine Learning Pipelines with Data Dependency](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb) | Demonstrates how to construct a Pipeline with data dependency between steps | Custom | AML Compute | None | Azure ML | None | ## Training |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| +| [Train a model with hyperparameter tuning](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) | Train a Convolutional Neural Network (CNN) | MNIST | AML Compute | Azure Container Instance | Chainer | None | + +| [Distributed Training with Chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb) | Use the Chainer estimator to perform distributed training | MNIST | AML Compute | None | Chainer | None | + +| [Training with hyperparameter tuning using PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) | Train an image classification model using transfer learning with the PyTorch estimator | ImageNet | AML Compute | Azure Container Instance | PyTorch | None | + +| [Distributed PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb) | Train a model using the distributed training via Horovod | MNIST | AML Compute | None | PyTorch | None | + +| [Distributed training with PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb) | Train a model using distributed training via Nccl/Gloo | MNIST | AML Compute | None | PyTorch | None | + +| [Training and hyperparameter tuning with Scikit-learn](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb) | Train a support vector machine (SVM) to perform classification | Iris | AML Compute | None | Scikit-learn | None | + +| [Training and hyperparameter tuning using the TensorFlow estimator](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) | Train a deep neural network | MNIST | AML Compute | Azure Container Instance | TensorFlow | None | + +| [Distributed training using TensorFlow with Horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb) | Use the TensorFlow estimator to train a word2vec model | None | AML Compute | None | TensorFlow | None | + +| [Distributed TensorFlow with parameter server](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb) | Use the TensorFlow estimator to train a model using distributed training | MNIST | AML Compute | None | TensorFlow | None | + +| [Resuming a model](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | Resume a model in TensorFlow from a previously submitted run | MNIST | AML Compute | None | TensorFlow | None | + +| [Training in Spark](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb) | Submiting a run on a spark cluster | None | HDI cluster | None | PySpark | None | + +| [Train on Azure Machine Learning Compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) | Submit an Azure Machine Leaarning Compute run | Diabetes | AML Compute | None | None | None | + +| [Train on local compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-local/train-on-local.ipynb) | Train a model locally | Diabetes | Local | None | None | None | + +| [Train in a remote Linux virtual machine](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) | Configure and execute a run | Diabetes | Data Science Virtual Machine | None | None | None | + +| [Using Tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) | Export the run history as Tensorboard logs | None | None | None | TensorFlow | None | + +| [Train a DNN using hyperparameter tuning and deploying with Keras](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb) | Create a multi-class classifier | MNIST | AML Compute | Azure Container Instance | TensorFlow | None | + +| [Managing your training runs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb) | Monitor and complete runs | None | Local | None | None | None | + +| [Tensorboard integration with run history](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb) | Run a TensorFlow job and view its Tensorboard output live | None | Local, DSVM, AML Compute | None | TensorFlow | None | + +| [Use MLflow with AML for a local training run](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb) | Use MLflow tracking APIs together with Azure Machine Learning for storing your metrics and artifacts | Diabetes | Local | None | None | None | + +| [Use MLflow with AML for a remote training run](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb) | Use MLflow tracking APIs together with AML for storing your metrics and artifacts | Diabetes | AML Compute | None | None | None | @@ -30,177 +104,230 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| +| [Deploy a model as a web service using MLflow](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb) | Use MLflow with AML | Diabetes | None | Azure Container Instance | Scikit-learn | None | ## Other Notebooks |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| -| [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | None | None | None | None | None | | [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) | | | | | | | + | [azure-ml-with-nvidia-rapids](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb) | | | | | | | + | [auto-ml-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb) | | | | | | | + | [auto-ml-classification-bank-marketing](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-bank-marketing/auto-ml-classification-bank-marketing.ipynb) | | | | | | | + | [auto-ml-classification-credit-card-fraud](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb) | | | | | | | + | [auto-ml-classification-with-deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb) | | | | | | | + | [auto-ml-classification-with-onnx](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-with-onnx/auto-ml-classification-with-onnx.ipynb) | | | | | | | + | [auto-ml-classification-with-whitelisting](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-with-whitelisting/auto-ml-classification-with-whitelisting.ipynb) | | | | | | | + | [auto-ml-dataset](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/dataset/auto-ml-dataset.ipynb) | | | | | | | + | [auto-ml-dataset-remote-execution](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/dataset-remote-execution/auto-ml-dataset-remote-execution.ipynb) | | | | | | | + | [auto-ml-exploring-previous-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/exploring-previous-runs/auto-ml-exploring-previous-runs.ipynb) | | | | | | | + | [auto-ml-forecasting-bike-share](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb) | | | | | | | + | [auto-ml-forecasting-energy-demand](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb) | | | | | | | + | [auto-ml-forecasting-orange-juice-sales](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb) | | | | | | | + | [auto-ml-missing-data-blacklist-early-termination](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/missing-data-blacklist-early-termination/auto-ml-missing-data-blacklist-early-termination.ipynb) | | | | | | | + | [auto-ml-model-explanation](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/model-explanation/auto-ml-model-explanation.ipynb) | | | | | | | + +| [auto-ml-model-explanations-remote-compute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/model-explanation-remote-amlcompute/auto-ml-model-explanations-remote-compute.ipynb) | | | | | | | + | [auto-ml-regression](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb) | | | | | | | + | [auto-ml-regression-concrete-strength](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression-concrete-strength/auto-ml-regression-concrete-strength.ipynb) | | | | | | | + | [auto-ml-regression-hardware-performance](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression-hardware-performance/auto-ml-regression-hardware-performance.ipynb) | | | | | | | + | [auto-ml-remote-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/remote-amlcompute/auto-ml-remote-amlcompute.ipynb) | | | | | | | + | [auto-ml-remote-amlcompute-with-onnx](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/remote-amlcompute-with-onnx/auto-ml-remote-amlcompute-with-onnx.ipynb) | | | | | | | + | [auto-ml-sample-weight](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/sample-weight/auto-ml-sample-weight.ipynb) | | | | | | | + | [auto-ml-sparse-data-train-test-split](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/sparse-data-train-test-split/auto-ml-sparse-data-train-test-split.ipynb) | | | | | | | + | [auto-ml-sql-energy-demand](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/sql-server/energy-demand/auto-ml-sql-energy-demand.ipynb) | | | | | | | + | [auto-ml-sql-setup](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb) | | | | | | | + | [auto-ml-subsampling-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/subsampling/auto-ml-subsampling-local.ipynb) | | | | | | | + | [build-model-run-history-03](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb) | | | | | | | + | [deploy-to-aci-04](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb) | | | | | | | -| [deploy-to-aks-existingimage-05](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.ipynb) | | | | | | | + +| [deploy-to-aks-05](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb) | | | | | | | + | [ingest-data-02](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb) | | | | | | | + | [installation-and-configuration-01](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb) | | | | | | | + | [automl-databricks-local-01](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb) | | | | | | | + | [automl-databricks-local-with-deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb) | | | | | | | + | [aml-pipelines-use-databricks-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb) | | | | | | | -| [automl_hdi_local_classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-hdi/automl_hdi_local_classification.ipynb) | | | | | | | -| [model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deploy-to-cloud/model-register-and-deploy.ipynb) | | | | | | | -| [register-model-deploy-local-advanced](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deploy-to-local/register-model-deploy-local-advanced.ipynb) | | | | | | | -| [register-model-deploy-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deploy-to-local/register-model-deploy-local.ipynb) | | | | | | | + | [accelerated-models-object-detection](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/accelerated-models/accelerated-models-object-detection.ipynb) | | | | | | | + | [accelerated-models-quickstart](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/accelerated-models/accelerated-models-quickstart.ipynb) | | | | | | | + | [accelerated-models-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/accelerated-models/accelerated-models-training.ipynb) | | | | | | | + | [model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb) | | | | | | | + | [register-model-deploy-local-advanced](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb) | | | | | | | + | [register-model-deploy-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb) | | | | | | | + | [enable-app-insights-in-production-service](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) | | | | | | | + | [enable-data-collection-for-models-in-aks](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb) | | | | | | | + | [onnx-convert-aml-deploy-tinyyolo](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb) | | | | | | | + | [onnx-inference-facial-expression-recognition-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb) | | | | | | | + | [onnx-inference-mnist-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb) | | | | | | | + | [onnx-modelzoo-aml-deploy-resnet50](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb) | | | | | | | + | [onnx-train-pytorch-aml-deploy-mnist](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb) | | | | | | | + | [production-deploy-to-aks](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) | | | | | | | -| [production-deploy-to-aks-gpu](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb) | | | | | | | + | [register-model-create-image-deploy-service](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb) | | | | | | | + | [explain-model-on-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb) | | | | | | | + | [save-retrieve-explanations-run-history](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb) | | | | | | | + | [train-explain-model-locally-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb) | | | | | | | + | [train-explain-model-on-amlcompute-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb) | | | | | | | + | [advanced-feature-transformations-explain-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/tabular-data/advanced-feature-transformations-explain-local.ipynb) | | | | | | | + | [explain-binary-classification-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/tabular-data/explain-binary-classification-local.ipynb) | | | | | | | + | [explain-multiclass-classification-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/tabular-data/explain-multiclass-classification-local.ipynb) | | | | | | | + | [explain-regression-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/tabular-data/explain-regression-local.ipynb) | | | | | | | + | [simple-feature-transformations-explain-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/tabular-data/simple-feature-transformations-explain-local.ipynb) | | | | | | | -| [aml-pipelines-data-transfer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb) | | | | | | | -| [aml-pipelines-getting-started](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb) | | | | | | | -| [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb) | | | | | | | -| [aml-pipelines-how-to-use-estimatorstep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-estimatorstep.ipynb) | | | | | | | -| [aml-pipelines-how-to-use-pipeline-drafts](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb) | | | | | | | -| [aml-pipelines-parameter-tuning-with-hyperdrive](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb) | | | | | | | -| [aml-pipelines-publish-and-run-using-rest-endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) | | | | | | | -| [aml-pipelines-setup-schedule-for-a-published-pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb) | | | | | | | -| [aml-pipelines-setup-versioned-pipeline-endpoints](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb) | | | | | | | -| [aml-pipelines-showcasing-datapath-and-pipelineparameter](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb) | | | | | | | -| [aml-pipelines-use-adla-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-adla-as-compute-target.ipynb) | | | | | | | -| [aml-pipelines-use-databricks-as-compute-target](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb) | | | | | | | -| [aml-pipelines-with-automated-machine-learning-step](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb) | | | | | | | -| [aml-pipelines-with-data-dependency-steps](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb) | | | | | | | + | [nyc-taxi-data-regression-model-building](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) | | | | | | | + | [pipeline-batch-scoring](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb) | | | | | | | + | [pipeline-style-transfer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb) | | | | | | | + | [authentication-in-azureml](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) | | | | | | | -| [distributed-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/chainer/training/distributed-chainer/distributed-chainer.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-pytorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) | | | | | | | -| [distributed-pytorch-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb) | | | | | | | -| [distributed-pytorch-with-nccl-gloo](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-sklearn](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-tensorflow](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) | | | | | | | -| [distributed-tensorflow-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb) | | | | | | | -| [distributed-tensorflow-with-parameter-server](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb) | | | | | | | -| [train-tensorflow-resume-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | | | | | | | + | [azure-ml-datadrift](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb) | | | | | | | -| [manage-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/manage-runs/manage-runs.ipynb) | | | | | | | -| [tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb) | | | | | | | -| [deploy-model](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb) | | | | | | | -| [train-and-deploy-pytorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb) | | | | | | | -| [train-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-local/train-local.ipynb) | | | | | | | -| [train-remote](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb) | | | | | | | -| [logging-api](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/logging-api/logging-api.ipynb) | | | | | | | -| [manage-runs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/manage-runs/manage-runs.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-sklearn](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb) | | | | | | | -| [train-in-spark](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb) | | | | | | | -| [train-on-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb) | | | | | | | -| [train-on-local](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-local/train-on-local.ipynb) | | | | | | | -| [train-on-remote-vm](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) | | | | | | | -| [train-within-notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb) | | | | | | | -| [using-environments](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/using-environments/using-environments.ipynb) | | | | | | | -| [distributed-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-chainer/distributed-chainer.ipynb) | | | | | | | + +| [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | | None | None | None | None | + | [distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb) | | | | | | | -| [distributed-pytorch-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb) | | | | | | | -| [distributed-tensorflow-with-horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb) | | | | | | | -| [distributed-tensorflow-with-parameter-server](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-parameter-server/distributed-tensorflow-with-parameter-server.ipynb) | | | | | | | -| [export-run-history-to-tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) | | | | | | | -| [how-to-use-estimator](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb) | | | | | | | + | [notebook_example](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb) | | | | | | | -| [tensorboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/tensorboard/tensorboard.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-chainer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-keras](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-pytorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) | | | | | | | -| [train-hyperparameter-tune-deploy-with-tensorflow](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) | | | | | | | -| [train-tensorflow-resume-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb) | | | | | | | + | [new-york-taxi](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi.ipynb) | | | | | | | + | [new-york-taxi_scale-out](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/case-studies/new-york-taxi/new-york-taxi_scale-out.ipynb) | | | | | | | + | [add-column-using-expression](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb) | | | | | | | + | [append-columns-and-rows](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/append-columns-and-rows.ipynb) | | | | | | | + | [assertions](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/assertions.ipynb) | | | | | | | + | [auto-read-file](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/auto-read-file.ipynb) | | | | | | | + | [cache](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/cache.ipynb) | | | | | | | + | [column-manipulations](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/column-manipulations.ipynb) | | | | | | | + | [column-type-transforms](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/column-type-transforms.ipynb) | | | | | | | + | [custom-python-transforms](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/custom-python-transforms.ipynb) | | | | | | | + | [data-ingestion](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-ingestion.ipynb) | | | | | | | + | [data-profile](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/data-profile.ipynb) | | | | | | | + | [datastore](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/datastore.ipynb) | | | | | | | + | [derive-column-by-example](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/derive-column-by-example.ipynb) | | | | | | | + | [external-references](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/external-references.ipynb) | | | | | | | + | [filtering](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb) | | | | | | | + | [fuzzy-group](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/fuzzy-group.ipynb) | | | | | | | + | [impute-missing-values](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/impute-missing-values.ipynb) | | | | | | | + | [join](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/join.ipynb) | | | | | | | + | [label-encoder](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/label-encoder.ipynb) | | | | | | | + | [min-max-scaler](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/min-max-scaler.ipynb) | | | | | | | + | [one-hot-encoder](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb) | | | | | | | + | [open-save-dataflows](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/open-save-dataflows.ipynb) | | | | | | | + | [quantile-transformation](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/quantile-transformation.ipynb) | | | | | | | + | [random-split](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/random-split.ipynb) | | | | | | | + | [replace-datasource-replace-reference](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/replace-datasource-replace-reference.ipynb) | | | | | | | + | [replace-fill-error](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/replace-fill-error.ipynb) | | | | | | | + | [secrets](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/secrets.ipynb) | | | | | | | + | [semantic-types](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/semantic-types.ipynb) | | | | | | | + | [split-column-by-example](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/split-column-by-example.ipynb) | | | | | | | + | [subsetting-sampling](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/subsetting-sampling.ipynb) | | | | | | | + | [summarize](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/summarize.ipynb) | | | | | | | + | [working-with-file-streams](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/working-with-file-streams.ipynb) | | | | | | | + | [writing-data](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/how-to-guides/writing-data.ipynb) | | | | | | | + | [getting-started](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/dataprep/tutorials/getting-started/getting-started.ipynb) | | | | | | | -| [datasets-diff](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets/datasets-diff/datasets-diff.ipynb) | | | | | | | -| [file-dataset-img-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets/datasets-tutorial/file-dataset-img-classification.ipynb) | | | | | | | -| [tabular-dataset-tutorial](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-dataset-tutorial.ipynb) | | | | | | | + | [tabular-timeseries-dataset-filtering](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets/datasets-tutorial/tabular-timeseries-dataset-filtering.ipynb) | | | | | | | + | [train-with-datasets](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets/datasets-tutorial/train-with-datasets.ipynb) | | | | | | | + | [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master//setup-environment/configuration.ipynb) | | | | | | | + | [img-classification-part1-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/img-classification-part1-training.ipynb) | | | | | | | + | [img-classification-part2-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/img-classification-part2-deploy.ipynb) | | | | | | | + | [regression-automated-ml](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/regression-automated-ml.ipynb) | | | | | | | + | [tutorial-1st-experiment-sdk-train](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/tutorial-1st-experiment-sdk-train.ipynb) | | | | | | | +| [tutorial-pipeline-batch-scoring-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/tutorial-pipeline-batch-scoring-classification.ipynb) | | | | | | | + diff --git a/setup-environment/configuration.ipynb b/setup-environment/configuration.ipynb index 1425c490..7a9c6570 100644 --- a/setup-environment/configuration.ipynb +++ b/setup-environment/configuration.ipynb @@ -102,7 +102,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.62 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.0.65 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/tutorials/img-classification-part1-training.yml b/tutorials/img-classification-part1-training.yml index c76cf572..3e92b4d1 100644 --- a/tutorials/img-classification-part1-training.yml +++ b/tutorials/img-classification-part1-training.yml @@ -5,3 +5,4 @@ dependencies: - azureml-widgets - matplotlib - sklearn + - pandas diff --git a/tutorials/imgs/flow2.png b/tutorials/imgs/flow2.png deleted file mode 100644 index f5c8968b..00000000 Binary files a/tutorials/imgs/flow2.png and /dev/null differ diff --git a/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb b/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb index 72762315..360bdd4a 100644 --- a/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb +++ b/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb @@ -691,29 +691,6 @@ "name": "sanpil" } ], - "friendly_name": "Use pipelines for batch scoring", - "exclude_from_index": false, - "index_order": 1, - "category": "tutorial", - "star_tag": [ - "featured" - ], - "task": "Batch scoring", - "datasets": [ - "None" - ], - "compute": [ - "AmlCompute" - ], - "deployment": [ - "Published pipeline" - ], - "framework": [ - "Azure ML Pipelines" - ], - "tags": [ - "None" - ], "kernelspec": { "display_name": "Python 3.6", "language": "python", diff --git a/tutorials/tutorial-pipeline-batch-scoring-classification.yml b/tutorials/tutorial-pipeline-batch-scoring-classification.yml index 8a0dc35b..bb640269 100644 --- a/tutorials/tutorial-pipeline-batch-scoring-classification.yml +++ b/tutorials/tutorial-pipeline-batch-scoring-classification.yml @@ -2,8 +2,8 @@ name: tutorial-pipeline-batch-scoring-classification dependencies: - pip: - azureml-sdk - - azureml-widgets - azureml-pipeline-core - azureml-pipeline-steps - pandas - requests + - azureml-widgets