Merge pull request #1980 from Man-MSFT/mafong/fairness-dep

Remove fairness notebooks
2025-12-20 09:37:04 -05:00 · 2025-03-14 09:42:02 -07:00 · 2025-03-13 14:25:59 -07:00 · 2024-12-16 08:44:42 -08:00 · 2024-12-13 15:51:10 -08:00 · 2024-12-13 08:50:52 -08:00
409 changed files with 133192 additions and 28307 deletions
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -0,0 +1,9 @@
 # Microsoft Open Source Code of Conduct
 This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 Resources:
 - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
 - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
 - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
    MIT License
    Copyright (c) Microsoft Corporation. All rights reserved.
    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:
    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.
    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE
--- a/Licenses/sdk-license/LICENSE
+++ b/Licenses/sdk-license/LICENSE
@@ -0,0 +1,14 @@
 This software is made available to you on the condition that you agree to
 [your agreement][1] governing your use of Azure.
 If you do not have an existing agreement governing your use of Azure, you agree that 
 your agreement governing use of Azure is the [Microsoft Online Subscription Agreement][2]
 (which incorporates the [Online Services Terms][3]).
 By using the software you agree to these terms. This software may collect data
 that is transmitted to Microsoft. Please see the [Microsoft Privacy Statement][4]
 to learn more about how Microsoft processes personal data.
 [1]: https://azure.microsoft.com/en-us/support/legal/
 [2]: https://azure.microsoft.com/en-us/support/legal/subscription-agreement/
 [3]: http://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=46
 [4]: http://go.microsoft.com/fwlink/?LinkId=248681 
--- a/Licenses/sdk-preview-license/LICENSE
+++ b/Licenses/sdk-preview-license/LICENSE
@@ -0,0 +1,15 @@
 This Preview is made available to you on the condition that you agree to the
 [Supplemental Terms of Use for Microsoft Azure Previews][1], which supplement
 [your agreement][2] governing your use of Azure.
 If you do not have an existing agreement governing your use of Azure, you agree that 
 your agreement governing use of Azure is the [Microsoft Online Subscription Agreement][3]
 (which incorporates the [Online Services Terms][4]).
 By using the Preview you agree to these terms. This Preview may collect data
 that is transmitted to Microsoft. Please see the [Microsoft Privacy Statement][5]
 to learn more about how Microsoft processes personal data.
 [1]: https://azure.microsoft.com/en-us/support/legal/preview-supplemental-terms/
 [2]: https://azure.microsoft.com/en-us/support/legal/
 [3]: https://azure.microsoft.com/en-us/support/legal/subscription-agreement/
 [4]: http://www.microsoftvolumelicensing.com/DocumentSearch.aspx?Mode=3&DocumentTypeId=46
 [5]: http://go.microsoft.com/fwlink/?LinkId=248681 
--- a/NBSETUP.md
+++ b/NBSETUP.md
@@ -0,0 +1,95 @@
 # Set up your notebook environment for Azure Machine Learning
 To run the notebooks in this repository use one of following options.
 ## **Option 1: Use Azure Notebooks**
 Azure Notebooks is a hosted Jupyter-based notebook service in the Azure cloud. Azure Machine Learning Python SDK is already pre-installed in the Azure Notebooks `Python 3.6` kernel.
 1. [![Azure Notebooks](https://notebooks.azure.com/launch.png)](https://aka.ms/aml-clone-azure-notebooks)
 [Import sample notebooks ](https://aka.ms/aml-clone-azure-notebooks) into Azure Notebooks
 1. Follow the instructions in the [Configuration](configuration.ipynb) notebook to create and connect to a workspace
 1. Open one of the sample notebooks
    **Make sure the Azure Notebook kernel is set to `Python 3.6`** when you open a notebook by choosing Kernel > Change Kernel > Python 3.6 from the menus.
 ## **Option 2: Use your own notebook server**
 ### Quick installation
 We recommend you create a Python virtual environment ([Miniconda](https://conda.io/miniconda.html) preferred but [virtualenv](https://virtualenv.pypa.io/en/latest/) works too) and install the SDK in it.
 ```sh
 # install just the base SDK
 pip install azureml-sdk
 # clone the sample repoistory
 git clone https://github.com/Azure/MachineLearningNotebooks.git
 # below steps are optional
 # install the base SDK, Jupyter notebook server and tensorboard
 pip install azureml-sdk[notebooks,tensorboard]
 # install model explainability component
 pip install azureml-sdk[interpret]
 # install automated ml components
 pip install azureml-sdk[automl]
 # install experimental features (not ready for production use)
 pip install azureml-sdk[contrib]
 ```
 Note the _extras_ (the keywords inside the square brackets) can be combined. For example:
 ```sh
 # install base SDK, Jupyter notebook and automated ml components
 pip install azureml-sdk[notebooks,automl]
 ```
 ### Full instructions
 [Install the Azure Machine Learning SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
 Please make sure you start with the [Configuration](configuration.ipynb) notebook to create and connect to a workspace.
 ### Video walkthrough:
 [!VIDEO https://youtu.be/VIsXeTuW3FU]
 ## **Option 3: Use Docker**
 You need to have Docker engine installed locally and running. Open a command line window and type the following command. 
 __Note:__ We use version `1.0.10` below as an exmaple, but you can replace that with any available version number you like.
 ```sh
 # clone the sample repoistory
 git clone https://github.com/Azure/MachineLearningNotebooks.git
 # change current directory to the folder 
 # where Dockerfile of the specific SDK version is located.
 cd MachineLearningNotebooks/Dockerfiles/1.0.10
 # build a Docker image with the a name (azuremlsdk for example) 
 # and a version number tag (1.0.10 for example).
 # this can take several minutes depending on your computer speed and network bandwidth.
 docker build . -t azuremlsdk:1.0.10
 # launch the built Docker container which also automatically starts
 # a Jupyter server instance listening on port 8887 of the host machine
 docker run -it -p 8887:8887 azuremlsdk:1.0.10
 ```
 Now you can point your browser to http://localhost:8887. We recommend that you start from the `configuration.ipynb` notebook at the root directory.
 If you need additional Azure ML SDK components, you can either modify the Docker files before you build the Docker images to add additional steps, or install them through command line in the live container after you build the Docker image. For example:
 ```sh
 # install the core SDK and automated ml components
 pip install azureml-sdk[automl]
 # install the core SDK and model explainability component
 pip install azureml-sdk[interpret]
 # install the core SDK and experimental components
 pip install azureml-sdk[contrib]
 ```
 Drag and Drop
 The image will be downloaded by Fatkun
--- a/README.md
+++ b/README.md
@@ -0,0 +1,43 @@
 # Azure Machine Learning Python SDK notebooks
 ### **With the introduction of AzureML SDK v2, this samples repository for the v1 SDK is now deprecated and will not be monitored or updated. Users are encouraged to visit the [v2 SDK samples repository](https://github.com/Azure/azureml-examples) instead for up-to-date and enhanced examples of how to build, train, and deploy machine learning models with AzureML's newest features.**
 Welcome to the Azure Machine Learning Python SDK notebooks repository!
 ## Getting started
 These notebooks are recommended for use in an Azure Machine Learning [Compute Instance](https://docs.microsoft.com/azure/machine-learning/concept-compute-instance), where you can run them without any additional set up.
 However, the notebooks can be run in any development environment with the correct `azureml` packages installed.
 Install the `azureml.core` Python package:
 ```sh
 pip install azureml-core
 ```
 Install additional packages as needed:
 ```sh
 pip install azureml-mlflow
 pip install azureml-dataset-runtime
 pip install azureml-automl-runtime
 pip install azureml-pipeline
 pip install azureml-pipeline-steps
 ...
 ```
 We recommend starting with one of the [quickstarts](tutorials/compute-instance-quickstarts).
 ## Contributing
 This repository is a push-only mirror. Pull requests are ignored.
 ## Code of Conduct
 This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). Please see the [code of conduct](CODE_OF_CONDUCT.md) for details.
 ## Reference
 - [Documentation](https://docs.microsoft.com/azure/machine-learning)
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -0,0 +1,41 @@
 <!-- BEGIN MICROSOFT SECURITY.MD V0.0.7 BLOCK -->
 ## Security
 Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).
 If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.
 ## Reporting Security Issues
 **Please do not report security vulnerabilities through public GitHub issues.**
 Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).
 If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).
 You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). 
 Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)
  * Full paths of source file(s) related to the manifestation of the issue
  * The location of the affected source code (tag/branch/commit or direct URL)
  * Any special configuration required to reproduce the issue
  * Step-by-step instructions to reproduce the issue
  * Proof-of-concept or exploit code (if possible)
  * Impact of the issue, including how an attacker might exploit the issue
 This information will help us triage your report more quickly.
 If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.
 ## Preferred Languages
 We prefer all communications to be in English.
 ## Policy
 Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).
 <!-- END MICROSOFT SECURITY.MD BLOCK -->
--- a/configuration.ipynb
+++ b/configuration.ipynb
@@ -103,7 +103,7 @@
      "source": [
        "import azureml.core\n",
        "\n",
-    "print(\"This notebook was created using version AZUREML-SDK-VERSION of the Azure ML SDK\")\n",
+        "print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
@@ -329,7 +329,7 @@
        "    print(\"Creating new gpu-cluster\")\n",
        "    \n",
        "    # Specify the configuration for the new cluster\n",
-    "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
+        "    compute_config = AmlCompute.provisioning_configuration(vm_size=\"Standard_NC6s_v3\",\n",
        "                                                           min_nodes=0,\n",
        "                                                           max_nodes=4)\n",
        "    # Create the cluster with the specified name and configuration\n",
@@ -367,9 +367,9 @@
      }
    ],
    "kernelspec": {
-   "display_name": "Python 3.6",
+      "display_name": "Python 3.8 - AzureML",
      "language": "python",
-   "name": "python36"
+      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
--- a/contrib/RAPIDS/README.md
+++ b/contrib/RAPIDS/README.md
@@ -0,0 +1,305 @@
 ## How to use the RAPIDS on AzureML materials
 ### Setting up requirements
 The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2  (pip install progressbar2)</span>
 After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
 The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.  
 &nbsp;  
 ### Getting and running the material 
 Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory: 
 * C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
 On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook. 
 * (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
 From the resulting browser at http://localhost:8888/tree, navigate to the master notebook: 
 * http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
 &nbsp;  
 The following notebook will appear:  
 ![](imgs/NotebookHome.png)
 &nbsp;  
 ### Master Jupyter Notebook
 The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
 The first couple of functional steps import the necessary AzureML libraries.  If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
 &nbsp;  
 #### Setting up a Workspace
 The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
 ![](imgs/WorkSpaceSetUp.png)
 It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
    subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
    resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
    workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
    workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
 &nbsp;  
 The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
 The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
 ![](imgs/saved_workspace.png)
 Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
 * C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
 &nbsp;  
 #### Creating an AML Compute Target 
 Following step, creates an AML Compute Target 
 ![](imgs/target_creation.png)
 Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
 &nbsp;  
 If the output of running the step has an error of the form:
 ![](imgs/targeterror1.png)
 It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family. 
 You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
 &nbsp;  
 Another possible error is the following: 
 ![](imgs/targeterror2.png)
 Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
 The successful creation of the compute target would have an output like the following:
 ![](imgs/targetsuccess.png)
 &nbsp;  
 #### RAPIDS script uploading and viewing
 The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail). 
 If the user wants to use a different RAPIDS script, the references to the  <span style="font-family: Courier New;">process_data.py</span> script have to be changed
 ![](imgs/scriptuploading.png)
 &nbsp;  
 #### Data Uploading
 The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
 The next few steps download and decompress the data and is made available to the  script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
 &nbsp;  
 The following functions are used to download and decompress the input data
 ![](imgs/dcf1.png)
 ![](imgs/dcf2.png)
 ![](imgs/dcf3.png)
 ![](imgs/dcf4.png)
 &nbsp;  
 The next step uses those functions to download locally file: 
 http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
 And to decompress it, into local folder path = .\mortgage_2000-2001
 The step takes several minutes, the intermediate outputs provide progress indicators.
 ![](imgs/downamddecom.png)
 &nbsp;  
 The decompressed data should have the following structure:
 * .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt 
 * .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt 
 * .\mortgage_2000-2001\names.csv
 The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later. 
 &nbsp;  
 The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
 The step takes several minutes to load the data, the output provides a progress indicator.
 ![](imgs/datastore.png)
 Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference  
 &nbsp;  
 #### Setting up required libraries and environment to run RAPIDS code
 There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
 ![](imgs/install2.png)
 &nbsp;  
 #### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
 The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>;  a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
 ![](imgs/wrapper.png)
 &nbsp;  
 The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
 In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
 As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
 If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
 &nbsp;  
 #### Submitting Experiments
 We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
 &nbsp;  
 The following couple of steps submit experiments under different conditions. 
 ![](imgs/submission1.png)
 &nbsp;  
 The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
 &nbsp;  
 If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear 
 ![](imgs/queue.png)
 &nbsp;  
 When the experiment starts running, its status would appeared as Running and the output would change to something like this:
 ![](imgs/running.png)
 &nbsp;  
 #### Reproducing the performance gains plot results on the Blog Post
 When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
 &nbsp; 
 ![](imgs/completed.png)
 Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
 &nbsp;  
 ![](imgs/2GPUs.png)
 This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
 &nbsp;  
 ![](imgs/3GPUs.png)
 This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
 &nbsp;  
 ![](imgs/4gpus.png)
 This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
 &nbsp;  
 ![](imgs/CPUBase.png)
 This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
 &nbsp;  
 ![](imgs/OOM.png)
 This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed. 
 &nbsp;  
 ##### Freeing Resources
 In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
 ![](imgs/clusterdelete.png)
 &nbsp;  
 ### RAPIDS Script
 The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
 The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
 The all internal functions to be used by the script are defined.
 &nbsp;  
 #### Wrapper Auxiliary Functions:
 The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
 ![](imgs/wap1.png)![](imgs/wap2.png)
 &nbsp;  
 A couple of other functions are wrappers for the submission of jobs to the DASK client:
 ![](imgs/wap3.png)
 ![](imgs/wap4.png)
 &nbsp;  
 #### Data Loading Functions:
 The data is loaded through the use of the following three functions 
 ![](imgs/DLF1.png)![](imgs/DLF2.png)![](imgs/DLF3.png)
 All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
 &nbsp;  
 #### Data Transformation and Feature Extraction Functions:
 The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
 ![](imgs/fef1.png)![](imgs/fef2.png)![](imgs/fef3.png)![](imgs/fef4.png)![](imgs/fef5.png)
 ![](imgs/fef6.png)![](imgs/fef7.png)![](imgs/fef8.png)![](imgs/fef9.png)
 &nbsp;  
 #### Main() Function
 The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
 &nbsp;  
 ##### Setting Up DASK client:
 The following lines:
 ![](imgs/daskini.png)
 &nbsp;  
 Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
 ![](imgs/daskoutput.png)
 ##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
 ![](imgs/ETL.png)
 &nbsp;  
 ##### Concentrating the data assigned to each DASK worker
 The partitions assigned to each worker are concatenated and set up for training.
 ![](imgs/Dask2.png)
 &nbsp;  
 ##### Setting Training Parameters
 The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
 ![](imgs/PArameters.png)
 Notice how the parameters are modified when using the CPU-only mode.
 &nbsp;  
 ##### Launching the training of a gradient boosted decision tree model using XGBoost.
 ![](imgs/training.png)
 The outputs of the script can be observed in the master notebook as the script is executed
--- a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
+++ b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
@@ -0,0 +1,547 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids/azure-ml-with-nvidia-rapids.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# NVIDIA RAPIDS in Azure Machine Learning"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c3\u201a\u00c2\u00a0in Azure.\n",
        " \n",
        "In this notebook, we will do the following:\n",
        " \n",
        "* Create an Azure Machine Learning Workspace\n",
        "* Create an AMLCompute target\n",
        "* Use a script to process our data and train a model\n",
        "* Obtain the data required to run this sample\n",
        "* Create an AML run configuration to launch a machine learning job\n",
        "* Run the script to prepare data for training and train the model\n",
        " \n",
        "Prerequisites:\n",
        "* An Azure subscription to create a Machine Learning Workspace\n",
        "* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n",
        "* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Verify if Azure ML SDK is installed"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import azureml.core\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "from azureml.core import Workspace, Experiment\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
        "from azureml.data.data_reference import DataReference\n",
        "from azureml.core.runconfig import RunConfiguration\n",
        "from azureml.core import ScriptRunConfig\n",
        "from azureml.widgets import RunDetails"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create Azure ML Workspace"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n",
        "skip this workspace creation step and move on to the next step to load the workspace.\n",
        " \n",
        "<font color='red'>Important</font>: in the code cell below, be sure to set the correct values for the subscription_id, \n",
        "resource_group, workspace_name, region before executing this code cell."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<subscription_id>\")\n",
        "resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<resource_group>\")\n",
        "workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"<workspace_name>\")\n",
        "workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"<region>\")\n",
        "\n",
        "ws = Workspace.create(workspace_name, subscription_id=subscription_id, resource_group=resource_group, location=workspace_region)\n",
        "\n",
        "# write config to a local directory for future use\n",
        "ws.write_config()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load existing Workspace"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n",
        "# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)\n",
        "\n",
        "print('Workspace name: ' + ws.name, \n",
        "      'Azure region: ' + ws.location, \n",
        "      'Subscription id: ' + ws.subscription_id, \n",
        "      'Resource group: ' + ws.resource_group, sep = '\\n')\n",
        "\n",
        "scripts_folder = \"scripts_folder\"\n",
        "\n",
        "if not os.path.isdir(scripts_folder):\n",
        "    os.mkdir(scripts_folder)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create AML Compute Target"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n",
        " \n",
        "Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n",
        " \n",
        "The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "gpu_cluster_name = \"gpucluster\"\n",
        "\n",
        "if gpu_cluster_name in ws.compute_targets:\n",
        "    gpu_cluster = ws.compute_targets[gpu_cluster_name]\n",
        "    if gpu_cluster and type(gpu_cluster) is AmlCompute:\n",
        "        print('Found compute target. Will use {0} '.format(gpu_cluster_name))\n",
        "else:\n",
        "    print(\"creating new cluster\")\n",
        "    # vm_size parameter below could be modified to one of the RAPIDS-supported VM types\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_NC6s_v3\", min_nodes=1, max_nodes = 1)\n",
        "\n",
        "    # create the cluster\n",
        "    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n",
        "    gpu_cluster.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Script to process data and train model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# copy process_data.py into the script folder\n",
        "import shutil\n",
        "shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Data required to run this sample"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Downloading Data"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import tarfile\n",
        "import hashlib\n",
        "from urllib.request import urlretrieve\n",
        "\n",
        "def validate_downloaded_data(path):\n",
        "    if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n",
        "        if(os.path.isdir(path + '//acq' ) and len(os.listdir(path + '//acq')) == 8):\n",
        "            if(os.path.isdir(path + '//perf' ) and len(os.listdir(path + '//perf')) == 11):\n",
        "                print(\"Data has been downloaded and decompressed at: {0}\".format(path))\n",
        "                return True\n",
        "    print(\"Data has not been downloaded and decompressed\")\n",
        "    return False\n",
        "\n",
        "def show_progress(count, block_size, total_size):\n",
        "    global pbar\n",
        "    global processed\n",
        "    \n",
        "    if count == 0:\n",
        "        pbar = ProgressBar(maxval=total_size)\n",
        "        processed = 0\n",
        "    \n",
        "    processed += block_size\n",
        "    processed = min(processed,total_size)\n",
        "    pbar.update(processed)\n",
        "\n",
        "        \n",
        "def download_file(fileroot):\n",
        "    filename = fileroot + '.tgz'\n",
        "    if(not os.path.exists(filename) or hashlib.md5(open(filename, 'rb').read()).hexdigest() != '82dd47135053303e9526c2d5c43befd5' ):\n",
        "        url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n",
        "        url = url_format.format(fileroot)\n",
        "        print(\"...Downloading file :{0}\".format(filename))\n",
        "        urlretrieve(url, filename)\n",
        "        pbar.finish()\n",
        "        print(\"...File :{0} finished downloading\".format(filename))\n",
        "    else:\n",
        "        print(\"...File :{0} has been downloaded already\".format(filename))\n",
        "    return filename\n",
        "\n",
        "def decompress_file(filename,path):\n",
        "    tar = tarfile.open(filename)\n",
        "    print(\"...Getting information from {0} about files to decompress\".format(filename))\n",
        "    members = tar.getmembers()\n",
        "    numFiles = len(members)\n",
        "    so_far = 0\n",
        "    for member_info in members:\n",
        "        tar.extract(member_info,path=path)\n",
        "        so_far += 1\n",
        "    print(\"...All {0} files have been decompressed\".format(numFiles))\n",
        "    tar.close()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "fileroot = 'mortgage_2000-2001'\n",
        "path = '.\\\\{0}'.format(fileroot)\n",
        "pbar = None\n",
        "processed = 0\n",
        "\n",
        "if(not validate_downloaded_data(path)):\n",
        "    print(\"Downloading and Decompressing Input Data\")\n",
        "    filename = download_file(fileroot)\n",
        "    decompress_file(filename,path)\n",
        "    print(\"Input Data has been Downloaded and Decompressed\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Uploading Data to Workspace"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds = ws.get_default_datastore()\n",
        "\n",
        "# download and uncompress data in a local directory before uploading to data store\n",
        "# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n",
        "\n",
        "# ---->>>> UNCOMMENT THE BELOW LINE TO UPLOAD YOUR DATA IF NOT DONE SO ALREADY <<<<----\n",
        "# ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n",
        "\n",
        "# data already uploaded to the datastore\n",
        "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create AML run configuration to launch a machine learning job"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
        "1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n",
        "2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n",
        "\n",
        "The second option is the recommended option in AML. \n",
        "The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Specify prebuilt conda environment"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subequent runs use the cached image and the overhead is minimal."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
        "run_config = RunConfiguration(conda_dependencies=cd)\n",
        "run_config.framework = 'python'\n",
        "run_config.target = gpu_cluster_name\n",
        "run_config.environment.docker.enabled = True\n",
        "run_config.environment.docker.gpu_support = True\n",
        "run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu20.04\"\n",
        "run_config.environment.spark.precache_packages = False\n",
        "run_config.data_references={'data':data_ref.to_config()}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Using Docker"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Alternatively, you can specify RAPIDS Docker image."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# run_config = RunConfiguration()\n",
        "# run_config.framework = 'python'\n",
        "# run_config.environment.python.user_managed_dependencies = True\n",
        "# run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n",
        "# run_config.target = gpu_cluster_name\n",
        "# run_config.environment.docker.enabled = True\n",
        "# run_config.environment.docker.gpu_support = True\n",
        "# run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu20.04\"\n",
        "# # run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
        "# # run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
        "# # run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
        "# run_config.environment.spark.precache_packages = False\n",
        "# run_config.data_references={'data':data_ref.to_config()}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Wrapper function to submit Azure Machine Learning experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n",
        "# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n",
        "def run_rapids_experiment(cpu_training, gpu_count, part_count):\n",
        "    # any value between 1-4 is allowed here depending the type of VMs available in gpu_cluster\n",
        "    if gpu_count not in [1, 2, 3, 4]:\n",
        "        raise Exception('Value specified for the number of GPUs to use {0} is invalid'.format(gpu_count))\n",
        "\n",
        "    # following data partition mapping is empirical (specific to GPUs used and current data partitioning scheme) and may need to be tweaked\n",
        "    max_gpu_count_data_partition_mapping = {1: 3, 2: 4, 3: 6, 4: 8}\n",
        "    \n",
        "    if part_count > max_gpu_count_data_partition_mapping[gpu_count]:\n",
        "        print(\"Too many partitions for the number of GPUs, exceeding memory threshold\")\n",
        "        \n",
        "    if part_count > 11:\n",
        "        print(\"Warning: Maximum number of partitions available is 11\")\n",
        "        part_count = 11\n",
        "        \n",
        "    end_year = 2000\n",
        "    \n",
        "    if part_count > 4:\n",
        "        end_year = 2001 # use more data with more GPUs\n",
        "\n",
        "    src = ScriptRunConfig(source_directory=scripts_folder, \n",
        "                          script='process_data.py', \n",
        "                          arguments = ['--num_gpu', gpu_count, '--data_dir', str(data_ref),\n",
        "                                      '--part_count', part_count, '--end_year', end_year,\n",
        "                                      '--cpu_predictor', cpu_training\n",
        "                                      ],\n",
        "                          run_config=run_config\n",
        "                         )\n",
        "\n",
        "    exp = Experiment(ws, 'rapidstest')\n",
        "    run = exp.submit(config=src)\n",
        "    RunDetails(run).show()\n",
        "    return run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit experiment (ETL & training on GPU)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cpu_predictor = False\n",
        "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
        "num_gpu = 1\n",
        "data_part_count = 1\n",
        "# train using CPU, use GPU for both ETL and training\n",
        "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit experiment (ETL on GPU, training on CPU)\n",
        "\n",
        "To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "cpu_predictor = True\n",
        "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
        "num_gpu = 1\n",
        "data_part_count = 1\n",
        "# train using CPU, use GPU for ETL\n",
        "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Delete cluster"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# delete the cluster\n",
        "# gpu_cluster.delete()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "ksivas"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/contrib/RAPIDS/imgs/2GPUs.png
+++ b/contrib/RAPIDS/imgs/2GPUs.png
--- a/contrib/RAPIDS/imgs/3GPUs.png
+++ b/contrib/RAPIDS/imgs/3GPUs.png
--- a/contrib/RAPIDS/imgs/4gpus.png
+++ b/contrib/RAPIDS/imgs/4gpus.png
--- a/contrib/RAPIDS/imgs/CPUBase.png
+++ b/contrib/RAPIDS/imgs/CPUBase.png
--- a/contrib/RAPIDS/imgs/DLF1.png
+++ b/contrib/RAPIDS/imgs/DLF1.png
--- a/contrib/RAPIDS/imgs/DLF2.png
+++ b/contrib/RAPIDS/imgs/DLF2.png
--- a/contrib/RAPIDS/imgs/DLF3.png
+++ b/contrib/RAPIDS/imgs/DLF3.png
--- a/contrib/RAPIDS/imgs/Dask2.png
+++ b/contrib/RAPIDS/imgs/Dask2.png
--- a/contrib/RAPIDS/imgs/ETL.png
+++ b/contrib/RAPIDS/imgs/ETL.png
--- a/contrib/RAPIDS/imgs/NotebookHome.png
+++ b/contrib/RAPIDS/imgs/NotebookHome.png
--- a/contrib/RAPIDS/imgs/OOM.png
+++ b/contrib/RAPIDS/imgs/OOM.png
--- a/contrib/RAPIDS/imgs/PArameters.png
+++ b/contrib/RAPIDS/imgs/PArameters.png
--- a/contrib/RAPIDS/imgs/WorkSpaceSetUp.png
+++ b/contrib/RAPIDS/imgs/WorkSpaceSetUp.png
--- a/contrib/RAPIDS/imgs/clusterdelete.png
+++ b/contrib/RAPIDS/imgs/clusterdelete.png
--- a/contrib/RAPIDS/imgs/completed.png
+++ b/contrib/RAPIDS/imgs/completed.png
--- a/contrib/RAPIDS/imgs/daskini.png
+++ b/contrib/RAPIDS/imgs/daskini.png
--- a/contrib/RAPIDS/imgs/daskoutput.png
+++ b/contrib/RAPIDS/imgs/daskoutput.png
--- a/contrib/RAPIDS/imgs/datastore.png
+++ b/contrib/RAPIDS/imgs/datastore.png
--- a/contrib/RAPIDS/imgs/dcf1.png
+++ b/contrib/RAPIDS/imgs/dcf1.png
--- a/contrib/RAPIDS/imgs/dcf2.png
+++ b/contrib/RAPIDS/imgs/dcf2.png
--- a/contrib/RAPIDS/imgs/dcf3.png
+++ b/contrib/RAPIDS/imgs/dcf3.png
--- a/contrib/RAPIDS/imgs/dcf4.png
+++ b/contrib/RAPIDS/imgs/dcf4.png
--- a/contrib/RAPIDS/imgs/downamddecom.png
+++ b/contrib/RAPIDS/imgs/downamddecom.png
--- a/contrib/RAPIDS/imgs/fef1.png
+++ b/contrib/RAPIDS/imgs/fef1.png
--- a/contrib/RAPIDS/imgs/fef2.png
+++ b/contrib/RAPIDS/imgs/fef2.png
--- a/contrib/RAPIDS/imgs/fef3.png
+++ b/contrib/RAPIDS/imgs/fef3.png
--- a/contrib/RAPIDS/imgs/fef4.png
+++ b/contrib/RAPIDS/imgs/fef4.png
--- a/contrib/RAPIDS/imgs/fef5.png
+++ b/contrib/RAPIDS/imgs/fef5.png
--- a/contrib/RAPIDS/imgs/fef6.png
+++ b/contrib/RAPIDS/imgs/fef6.png
--- a/contrib/RAPIDS/imgs/fef7.png
+++ b/contrib/RAPIDS/imgs/fef7.png
--- a/contrib/RAPIDS/imgs/fef8.png
+++ b/contrib/RAPIDS/imgs/fef8.png
--- a/contrib/RAPIDS/imgs/fef9.png
+++ b/contrib/RAPIDS/imgs/fef9.png
--- a/contrib/RAPIDS/imgs/install2.png
+++ b/contrib/RAPIDS/imgs/install2.png
--- a/contrib/RAPIDS/imgs/installation.png
+++ b/contrib/RAPIDS/imgs/installation.png
--- a/contrib/RAPIDS/imgs/queue.png
+++ b/contrib/RAPIDS/imgs/queue.png
--- a/contrib/RAPIDS/imgs/running.png
+++ b/contrib/RAPIDS/imgs/running.png
--- a/contrib/RAPIDS/imgs/saved_workspace.png
+++ b/contrib/RAPIDS/imgs/saved_workspace.png
--- a/contrib/RAPIDS/imgs/scriptuploading.png
+++ b/contrib/RAPIDS/imgs/scriptuploading.png
--- a/contrib/RAPIDS/imgs/submission1.png
+++ b/contrib/RAPIDS/imgs/submission1.png
--- a/contrib/RAPIDS/imgs/target_creation.png
+++ b/contrib/RAPIDS/imgs/target_creation.png
--- a/contrib/RAPIDS/imgs/targeterror1.png
+++ b/contrib/RAPIDS/imgs/targeterror1.png
--- a/contrib/RAPIDS/imgs/targeterror2.png
+++ b/contrib/RAPIDS/imgs/targeterror2.png
--- a/contrib/RAPIDS/imgs/targetsuccess.png
+++ b/contrib/RAPIDS/imgs/targetsuccess.png
--- a/contrib/RAPIDS/imgs/training.png
+++ b/contrib/RAPIDS/imgs/training.png
--- a/contrib/RAPIDS/imgs/wap1.png
+++ b/contrib/RAPIDS/imgs/wap1.png
--- a/contrib/RAPIDS/imgs/wap2.png
+++ b/contrib/RAPIDS/imgs/wap2.png
--- a/contrib/RAPIDS/imgs/wap3.png
+++ b/contrib/RAPIDS/imgs/wap3.png
--- a/contrib/RAPIDS/imgs/wap4.png
+++ b/contrib/RAPIDS/imgs/wap4.png
--- a/contrib/RAPIDS/imgs/wrapper.png
+++ b/contrib/RAPIDS/imgs/wrapper.png
--- a/contrib/RAPIDS/process_data.py
+++ b/contrib/RAPIDS/process_data.py
@@ -0,0 +1,470 @@
 import numpy as np
 import datetime
 import dask_xgboost as dxgb_gpu
 import dask
 import dask_cudf
 from dask_cuda import LocalCUDACluster
 from dask.delayed import delayed
 from dask.distributed import Client, wait
 import xgboost as xgb
 import cudf
 from cudf.dataframe import DataFrame
 from collections import OrderedDict
 import gc
 from glob import glob
 import os
 import argparse
 def run_dask_task(func, **kwargs):
    task = func(**kwargs)
    return task
 def process_quarter_gpu(client, col_names_path, acq_data_path, year=2000, quarter=1, perf_file=""):
    dask_client = client
    ml_arrays = run_dask_task(delayed(run_gpu_workflow),
                                          col_path=col_names_path,
                                          acq_path=acq_data_path,
                                          quarter=quarter,
                                          year=year,
                                          perf_file=perf_file)
    return dask_client.compute(ml_arrays,
                          optimize_graph=False,
                          fifo_timeout="0ms")
 def null_workaround(df, **kwargs):
    for column, data_type in df.dtypes.items():
        if str(data_type) == "category":
            df[column] = df[column].astype('int32').fillna(-1)
        if str(data_type) in ['int8', 'int16', 'int32', 'int64', 'float32', 'float64']:
            df[column] = df[column].fillna(-1)
    return df
 def run_gpu_workflow(col_path, acq_path, quarter=1, year=2000, perf_file="", **kwargs):
    names = gpu_load_names(col_path=col_path)
    acq_gdf = gpu_load_acquisition_csv(acquisition_path= acq_path + "/Acquisition_"
                                      + str(year) + "Q" + str(quarter) + ".txt")
    acq_gdf = acq_gdf.merge(names, how='left', on=['seller_name'])
    acq_gdf.drop_column('seller_name')
    acq_gdf['seller_name'] = acq_gdf['new']
    acq_gdf.drop_column('new')
    perf_df_tmp = gpu_load_performance_csv(perf_file)
    gdf = perf_df_tmp
    everdf = create_ever_features(gdf)
    delinq_merge = create_delinq_features(gdf)
    everdf = join_ever_delinq_features(everdf, delinq_merge)
    del(delinq_merge)
    joined_df = create_joined_df(gdf, everdf)
    testdf = create_12_mon_features(joined_df)
    joined_df = combine_joined_12_mon(joined_df, testdf)
    del(testdf)
    perf_df = final_performance_delinquency(gdf, joined_df)
    del(gdf, joined_df)
    final_gdf = join_perf_acq_gdfs(perf_df, acq_gdf)
    del(perf_df)
    del(acq_gdf)
    final_gdf = last_mile_cleaning(final_gdf)
    return final_gdf
 def gpu_load_performance_csv(performance_path, **kwargs):
    """ Loads performance data
    Returns
    -------
    GPU DataFrame
    """
    cols = [
        "loan_id", "monthly_reporting_period", "servicer", "interest_rate", "current_actual_upb",
        "loan_age", "remaining_months_to_legal_maturity", "adj_remaining_months_to_maturity",
        "maturity_date", "msa", "current_loan_delinquency_status", "mod_flag", "zero_balance_code",
        "zero_balance_effective_date", "last_paid_installment_date", "foreclosed_after",
        "disposition_date", "foreclosure_costs", "prop_preservation_and_repair_costs",
        "asset_recovery_costs", "misc_holding_expenses", "holding_taxes", "net_sale_proceeds",
        "credit_enhancement_proceeds", "repurchase_make_whole_proceeds", "other_foreclosure_proceeds",
        "non_interest_bearing_upb", "principal_forgiveness_upb", "repurchase_make_whole_proceeds_flag",
        "foreclosure_principal_write_off_amount", "servicing_activity_indicator"
    ]
    dtypes = OrderedDict([
        ("loan_id", "int64"),
        ("monthly_reporting_period", "date"),
        ("servicer", "category"),
        ("interest_rate", "float64"),
        ("current_actual_upb", "float64"),
        ("loan_age", "float64"),
        ("remaining_months_to_legal_maturity", "float64"),
        ("adj_remaining_months_to_maturity", "float64"),
        ("maturity_date", "date"),
        ("msa", "float64"),
        ("current_loan_delinquency_status", "int32"),
        ("mod_flag", "category"),
        ("zero_balance_code", "category"),
        ("zero_balance_effective_date", "date"),
        ("last_paid_installment_date", "date"),
        ("foreclosed_after", "date"),
        ("disposition_date", "date"),
        ("foreclosure_costs", "float64"),
        ("prop_preservation_and_repair_costs", "float64"),
        ("asset_recovery_costs", "float64"),
        ("misc_holding_expenses", "float64"),
        ("holding_taxes", "float64"),
        ("net_sale_proceeds", "float64"),
        ("credit_enhancement_proceeds", "float64"),
        ("repurchase_make_whole_proceeds", "float64"),
        ("other_foreclosure_proceeds", "float64"),
        ("non_interest_bearing_upb", "float64"),
        ("principal_forgiveness_upb", "float64"),
        ("repurchase_make_whole_proceeds_flag", "category"),
        ("foreclosure_principal_write_off_amount", "float64"),
        ("servicing_activity_indicator", "category")
    ])
    print(performance_path)
    return cudf.read_csv(performance_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
 def gpu_load_acquisition_csv(acquisition_path, **kwargs):
    """ Loads acquisition data
    Returns
    -------
    GPU DataFrame
    """
    cols = [
        'loan_id', 'orig_channel', 'seller_name', 'orig_interest_rate', 'orig_upb', 'orig_loan_term', 
        'orig_date', 'first_pay_date', 'orig_ltv', 'orig_cltv', 'num_borrowers', 'dti', 'borrower_credit_score', 
        'first_home_buyer', 'loan_purpose', 'property_type', 'num_units', 'occupancy_status', 'property_state',
        'zip', 'mortgage_insurance_percent', 'product_type', 'coborrow_credit_score', 'mortgage_insurance_type', 
        'relocation_mortgage_indicator'
    ]
    dtypes = OrderedDict([
        ("loan_id", "int64"),
        ("orig_channel", "category"),
        ("seller_name", "category"),
        ("orig_interest_rate", "float64"),
        ("orig_upb", "int64"),
        ("orig_loan_term", "int64"),
        ("orig_date", "date"),
        ("first_pay_date", "date"),
        ("orig_ltv", "float64"),
        ("orig_cltv", "float64"),
        ("num_borrowers", "float64"),
        ("dti", "float64"),
        ("borrower_credit_score", "float64"),
        ("first_home_buyer", "category"),
        ("loan_purpose", "category"),
        ("property_type", "category"),
        ("num_units", "int64"),
        ("occupancy_status", "category"),
        ("property_state", "category"),
        ("zip", "int64"),
        ("mortgage_insurance_percent", "float64"),
        ("product_type", "category"),
        ("coborrow_credit_score", "float64"),
        ("mortgage_insurance_type", "float64"),
        ("relocation_mortgage_indicator", "category")
    ])
    print(acquisition_path)
    return cudf.read_csv(acquisition_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
 def gpu_load_names(col_path):
    """ Loads names used for renaming the banks
    Returns
    -------
    GPU DataFrame
    """
    cols = [
        'seller_name', 'new'
    ]
    dtypes = OrderedDict([
        ("seller_name", "category"),
        ("new", "category"),
    ])
    return cudf.read_csv(col_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
 def create_ever_features(gdf, **kwargs):
    everdf = gdf[['loan_id', 'current_loan_delinquency_status']]
    everdf = everdf.groupby('loan_id', method='hash').max().reset_index()
    del(gdf)
    everdf['ever_30'] = (everdf['current_loan_delinquency_status'] >= 1).astype('int8')
    everdf['ever_90'] = (everdf['current_loan_delinquency_status'] >= 3).astype('int8')
    everdf['ever_180'] = (everdf['current_loan_delinquency_status'] >= 6).astype('int8')
    everdf.drop_column('current_loan_delinquency_status')
    return everdf
 def create_delinq_features(gdf, **kwargs):
    delinq_gdf = gdf[['loan_id', 'monthly_reporting_period', 'current_loan_delinquency_status']]
    del(gdf)
    delinq_30 = delinq_gdf.query('current_loan_delinquency_status >= 1')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
    delinq_30['delinquency_30'] = delinq_30['monthly_reporting_period']
    delinq_30.drop_column('monthly_reporting_period')
    delinq_90 = delinq_gdf.query('current_loan_delinquency_status >= 3')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
    delinq_90['delinquency_90'] = delinq_90['monthly_reporting_period']
    delinq_90.drop_column('monthly_reporting_period')
    delinq_180 = delinq_gdf.query('current_loan_delinquency_status >= 6')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
    delinq_180['delinquency_180'] = delinq_180['monthly_reporting_period']
    delinq_180.drop_column('monthly_reporting_period')
    del(delinq_gdf)
    delinq_merge = delinq_30.merge(delinq_90, how='left', on=['loan_id'], type='hash')
    delinq_merge['delinquency_90'] = delinq_merge['delinquency_90'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
    delinq_merge = delinq_merge.merge(delinq_180, how='left', on=['loan_id'], type='hash')
    delinq_merge['delinquency_180'] = delinq_merge['delinquency_180'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
    del(delinq_30)
    del(delinq_90)
    del(delinq_180)
    return delinq_merge
 def join_ever_delinq_features(everdf_tmp, delinq_merge, **kwargs):
    everdf = everdf_tmp.merge(delinq_merge, on=['loan_id'], how='left', type='hash')
    del(everdf_tmp)
    del(delinq_merge)
    everdf['delinquency_30'] = everdf['delinquency_30'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
    everdf['delinquency_90'] = everdf['delinquency_90'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
    everdf['delinquency_180'] = everdf['delinquency_180'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
    return everdf
 def create_joined_df(gdf, everdf, **kwargs):
    test = gdf[['loan_id', 'monthly_reporting_period', 'current_loan_delinquency_status', 'current_actual_upb']]
    del(gdf)
    test['timestamp'] = test['monthly_reporting_period']
    test.drop_column('monthly_reporting_period')
    test['timestamp_month'] = test['timestamp'].dt.month
    test['timestamp_year'] = test['timestamp'].dt.year
    test['delinquency_12'] = test['current_loan_delinquency_status']
    test.drop_column('current_loan_delinquency_status')
    test['upb_12'] = test['current_actual_upb']
    test.drop_column('current_actual_upb')
    test['upb_12'] = test['upb_12'].fillna(999999999)
    test['delinquency_12'] = test['delinquency_12'].fillna(-1)
    joined_df = test.merge(everdf, how='left', on=['loan_id'], type='hash')
    del(everdf)
    del(test)
    joined_df['ever_30'] = joined_df['ever_30'].fillna(-1)
    joined_df['ever_90'] = joined_df['ever_90'].fillna(-1)
    joined_df['ever_180'] = joined_df['ever_180'].fillna(-1)
    joined_df['delinquency_30'] = joined_df['delinquency_30'].fillna(-1)
    joined_df['delinquency_90'] = joined_df['delinquency_90'].fillna(-1)
    joined_df['delinquency_180'] = joined_df['delinquency_180'].fillna(-1)
    joined_df['timestamp_year'] = joined_df['timestamp_year'].astype('int32')
    joined_df['timestamp_month'] = joined_df['timestamp_month'].astype('int32')
    return joined_df
 def create_12_mon_features(joined_df, **kwargs):
    testdfs = []
    n_months = 12
    for y in range(1, n_months + 1):
        tmpdf = joined_df[['loan_id', 'timestamp_year', 'timestamp_month', 'delinquency_12', 'upb_12']]
        tmpdf['josh_months'] = tmpdf['timestamp_year'] * 12 + tmpdf['timestamp_month']
        tmpdf['josh_mody_n'] = ((tmpdf['josh_months'].astype('float64') - 24000 - y) / 12).floor()
        tmpdf = tmpdf.groupby(['loan_id', 'josh_mody_n'], method='hash').agg({'delinquency_12': 'max','upb_12': 'min'}).reset_index()
        tmpdf['delinquency_12'] = (tmpdf['delinquency_12']>3).astype('int32')
        tmpdf['delinquency_12'] +=(tmpdf['upb_12']==0).astype('int32')
        tmpdf['upb_12'] = tmpdf['upb_12']
        tmpdf['timestamp_year'] = (((tmpdf['josh_mody_n'] * n_months) + 24000 + (y - 1)) / 12).floor().astype('int16')
        tmpdf['timestamp_month'] = np.int8(y)
        tmpdf.drop_column('josh_mody_n')
        testdfs.append(tmpdf)
        del(tmpdf)
    del(joined_df)
    return cudf.concat(testdfs)
 def combine_joined_12_mon(joined_df, testdf, **kwargs):
    joined_df.drop_column('delinquency_12')
    joined_df.drop_column('upb_12')
    joined_df['timestamp_year'] = joined_df['timestamp_year'].astype('int16')
    joined_df['timestamp_month'] = joined_df['timestamp_month'].astype('int8')
    return joined_df.merge(testdf, how='left', on=['loan_id', 'timestamp_year', 'timestamp_month'], type='hash')
 def final_performance_delinquency(gdf, joined_df, **kwargs):
    merged = null_workaround(gdf)
    joined_df = null_workaround(joined_df)
    merged['timestamp_month'] = merged['monthly_reporting_period'].dt.month
    merged['timestamp_month'] = merged['timestamp_month'].astype('int8')
    merged['timestamp_year'] = merged['monthly_reporting_period'].dt.year
    merged['timestamp_year'] = merged['timestamp_year'].astype('int16')
    merged = merged.merge(joined_df, how='left', on=['loan_id', 'timestamp_year', 'timestamp_month'], type='hash')
    merged.drop_column('timestamp_year')
    merged.drop_column('timestamp_month')
    return merged
 def join_perf_acq_gdfs(perf, acq, **kwargs):
    perf = null_workaround(perf)
    acq = null_workaround(acq)
    return perf.merge(acq, how='left', on=['loan_id'], type='hash')
 def last_mile_cleaning(df, **kwargs):
    drop_list = [
        'loan_id', 'orig_date', 'first_pay_date', 'seller_name',
        'monthly_reporting_period', 'last_paid_installment_date', 'maturity_date', 'ever_30', 'ever_90', 'ever_180',
        'delinquency_30', 'delinquency_90', 'delinquency_180', 'upb_12',
        'zero_balance_effective_date','foreclosed_after', 'disposition_date','timestamp'
    ]
    for column in drop_list:
        df.drop_column(column)
    for col, dtype in df.dtypes.iteritems():
        if str(dtype)=='category':
            df[col] = df[col].cat.codes
        df[col] = df[col].astype('float32')
    df['delinquency_12'] = df['delinquency_12'] > 0
    df['delinquency_12'] = df['delinquency_12'].fillna(False).astype('int32')
    for column in df.columns:
        df[column] = df[column].fillna(-1)
    return df.to_arrow(preserve_index=False)
 def main():
    parser = argparse.ArgumentParser("rapidssample")
    parser.add_argument("--data_dir", type=str, help="location of data")
    parser.add_argument("--num_gpu", type=int, help="Number of GPUs to use", default=1)
    parser.add_argument("--part_count", type=int, help="Number of data files to train against", default=2)
    parser.add_argument("--end_year", type=int, help="Year to end the data load", default=2000)
    parser.add_argument("--cpu_predictor", type=str, help="Flag to use CPU for prediction", default='False')
    parser.add_argument('-f', type=str, default='') # added for notebook execution scenarios
    args = parser.parse_args()
    data_dir = args.data_dir
    num_gpu = args.num_gpu
    part_count = args.part_count
    end_year = args.end_year
    cpu_predictor = args.cpu_predictor.lower() in ('yes', 'true', 't', 'y', '1')
    if cpu_predictor:
        print('Training with CPUs require num gpu = 1')
        num_gpu = 1
    print('data_dir = {0}'.format(data_dir))
    print('num_gpu = {0}'.format(num_gpu))
    print('part_count = {0}'.format(part_count))
    print('end_year = {0}'.format(end_year))
    print('cpu_predictor = {0}'.format(cpu_predictor))
    import subprocess
    cmd = "hostname --all-ip-addresses"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()
    IPADDR = str(output.decode()).split()[0]
    cluster = LocalCUDACluster(ip=IPADDR,n_workers=num_gpu)
    client = Client(cluster)
    client
    print(client.ncores())
    # to download data for this notebook, visit https://rapidsai.github.io/demos/datasets/mortgage-data and update the following paths accordingly
    acq_data_path = "{0}/acq".format(data_dir) #"/rapids/data/mortgage/acq"
    perf_data_path = "{0}/perf".format(data_dir) #"/rapids/data/mortgage/perf"
    col_names_path = "{0}/names.csv".format(data_dir) # "/rapids/data/mortgage/names.csv"
    start_year = 2000
    client
    print('--->>> Workers used: {0}'.format(client.ncores()))
    # NOTE: The ETL calculates additional features which are then dropped before creating the XGBoost DMatrix.
    # This can be optimized to avoid calculating the dropped features.
    print("Reading ...")
    t1 = datetime.datetime.now()
    gpu_dfs = []
    gpu_time = 0
    quarter = 1
    year = start_year
    count = 0
    while year <= end_year:
        for file in glob(os.path.join(perf_data_path + "/Performance_" + str(year) + "Q" + str(quarter) + "*")):
            if count < part_count:
                gpu_dfs.append(process_quarter_gpu(client, col_names_path, acq_data_path, year=year, quarter=quarter, perf_file=file))
                count += 1
                print('file: {0}'.format(file))
                print('count: {0}'.format(count))
        quarter += 1
        if quarter == 5:
            year += 1
            quarter = 1
    wait(gpu_dfs)
    t2 = datetime.datetime.now()
    print("Reading time: {0}".format(str(t2-t1)))
    print('--->>> Number of data parts: {0}'.format(len(gpu_dfs)))
    dxgb_gpu_params = {
        'nround':            100,
        'max_depth':         8,
        'max_leaves':        2**8,
        'alpha':             0.9,
        'eta':               0.1,
        'gamma':             0.1,
        'learning_rate':     0.1,
        'subsample':         1,
        'reg_lambda':        1,
        'scale_pos_weight':  2,
        'min_child_weight':  30,
        'tree_method':       'gpu_hist',
        'n_gpus':            1, 
        'distributed_dask':  True,
        'loss':              'ls',
        'objective':         'reg:squarederror',
        'max_features':      'auto',
        'criterion':         'friedman_mse',
        'grow_policy':       'lossguide',
        'verbose':           True
    }
    if cpu_predictor:
        print('\n---->>>> Training using CPUs <<<<----\n')
        dxgb_gpu_params['predictor'] = 'cpu_predictor'
        dxgb_gpu_params['tree_method'] = 'hist'
        dxgb_gpu_params['objective'] = 'reg:linear'
    else:
        print('\n---->>>> Training using GPUs <<<<----\n')
    print('Training parameters are {0}'.format(dxgb_gpu_params))
    gpu_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[:part_count]]
    gpu_dfs = [gpu_df for gpu_df in gpu_dfs]
    wait(gpu_dfs)
    tmp_map = [(gpu_df, list(client.who_has(gpu_df).values())[0]) for gpu_df in gpu_dfs]
    new_map = {}
    for key, value in tmp_map:
        if value not in new_map:
            new_map[value] = [key]
        else:
            new_map[value].append(key)
    del(tmp_map)
    gpu_dfs = []
    for list_delayed in new_map.values():
        gpu_dfs.append(delayed(cudf.concat)(list_delayed))
    del(new_map)
    gpu_dfs = [(gpu_df[['delinquency_12']], gpu_df[delayed(list)(gpu_df.columns.difference(['delinquency_12']))]) for gpu_df in gpu_dfs]
    gpu_dfs = [(gpu_df[0].persist(), gpu_df[1].persist()) for gpu_df in gpu_dfs]
    gpu_dfs = [dask.delayed(xgb.DMatrix)(gpu_df[1], gpu_df[0]) for gpu_df in gpu_dfs]
    gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs]
    gc.collect()
    wait(gpu_dfs)
    # TRAIN THE MODEL
    labels = None
    t1 = datetime.datetime.now()
    bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_dfs, labels, num_boost_round=dxgb_gpu_params['nround'])
    t2 = datetime.datetime.now()
    print('\n---->>>> Training time: {0} <<<<----\n'.format(str(t2-t1)))
    print('Exiting script')
 if __name__ == '__main__':
    main()
--- a/how-to-use-azureml/README.md
+++ b/how-to-use-azureml/README.md
@@ -0,0 +1,14 @@
 ## Examples to get started with Azure Machine Learning service
 Learn how to use Azure Machine Learning services for experimentation and model management.
 As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.
 * [train-within-notebook](./training/train-within-notebook): Train a model while tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
 * [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
 * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
 * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
 * [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
 * [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
 Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
--- a/how-to-use-azureml/automated-machine-learning/README.md
+++ b/how-to-use-azureml/automated-machine-learning/README.md
@@ -0,0 +1,299 @@
 # Table of Contents
 1. [Automated ML Introduction](#introduction)
 1. [Setup using Compute Instances](#jupyter)
 1. [Setup using a Local Conda environment](#localconda)
 1. [Setup using Azure Databricks](#databricks)
 1. [Automated ML SDK Sample Notebooks](#samples)
 1. [Documentation](#documentation)
 1. [Running using python command](#pythoncommand)
 1. [Troubleshooting](#troubleshooting)
 <a name="introduction"></a>
 # Automated ML introduction
 Automated machine learning (automated ML) builds high quality machine learning models for you by automating model and hyperparameter selection. Bring a labelled dataset that you want to build a model for, automated ML will give you a high quality machine learning model that you can use for predictions.
 If you are new to Data Science, automated ML will help you get jumpstarted by simplifying machine learning model building. It abstracts you from needing to perform model selection, hyperparameter selection and in one step creates a high quality trained model for you to use.
 If you are an experienced data scientist, automated ML will help increase your productivity by intelligently performing the model and hyperparameter selection for your training and generates high quality models much quicker than manually specifying several combinations of the parameters and running training jobs. Automated ML provides visibility and access to all the training jobs and the performance characteristics of the models to help you further tune the pipeline if you desire.
 Below are the three execution environments supported by automated ML.
 <a name="jupyter"></a>
 ## Setup using Compute Instances - Jupyter based notebooks from a Azure Virtual Machine
 1. Open the [ML Azure portal](https://ml.azure.com)
 1. Select Compute
 1. Select Compute Instances
 1. Click New
 1. Type a Compute Name, select a Virtual Machine type and select a Virtual Machine size
 1. Click Create
 <a name="localconda"></a>
 ## Setup using a Local Conda environment
 To run these notebook on your own notebook server, use these installation instructions.
 The instructions below will install everything you need and then start a Jupyter notebook.
 ### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.7 or higher.
 - **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V).  If you have a previous version installed, you can update it using the command: conda update conda.
 There's no need to install mini-conda specifically.
 ### 2. Downloading the sample notebooks
 - Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory.  The automated ML sample notebooks are in the "automated-machine-learning" folder.
 ### 3. Setup a new conda environment
 The **automl_setup** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter.  The default conda environment name is azure_automl.  The exact command depends on the operating system.  See the specific sections below for Windows, Mac and Linux.  It can take about 10 minutes to execute.
 Packages installed by the **automl_setup** script:
    <ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>scipy</li><li>scikit-learn</li><li>pandas</li><li>tensorflow</li><li>py-xgboost</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
 For more details refer to the [automl_env.yml](./automl_env.yml)
 ## Windows
 Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
 ```
 automl_setup
 ```
 ## Mac
 Install "Command line developer tools" if it is not already installed (you can use the command: `xcode-select --install`).
 Start a Terminal windows, cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
 ```
 bash automl_setup_mac.sh
 ```
 ## Linux
 cd to the **how-to-use-azureml/automated-machine-learning** folder where the sample notebooks were extracted and then run:
 ```
 bash automl_setup_linux.sh
 ```
 ### 4. Running configuration.ipynb
 - Before running any samples you next need to run the configuration notebook. Click on [configuration](../../configuration.ipynb) notebook
 - Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
 ### 5. Running Samples
 - Please make sure you use the Python [conda env:azure_automl] kernel when trying the sample Notebooks.
 - Follow the instructions in the individual notebooks to explore various features in automated ML.
 ### 6. Starting jupyter notebook manually
 To start your Jupyter notebook manually, use:
 ```
 conda activate azure_automl
 jupyter notebook
 ```
 or on Mac or Linux:
 ```
 source activate azure_automl
 jupyter notebook
 ```
 <a name="databricks"></a>
 ## Setup using Azure Databricks
 **NOTE**: Please create your Azure Databricks cluster as v7.1 (high concurrency preferred) with **Python 3** (dropdown).
 **NOTE**: You should at least have contributor access to your Azure subcription to run the notebook.
 - You can find the detail Readme instructions at [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/automl).
 - Download the sample notebook automl-databricks-local-01.ipynb from [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/automl) and import into the Azure databricks workspace.
 - Attach the notebook to the cluster.
 <a name="samples"></a>
 # Automated ML SDK Sample Notebooks
 ## Classification
 - **Classify Credit Card Fraud**
    - Dataset: [Kaggle's credit card fraud detection dataset](https://www.kaggle.com/mlg-ulb/creditcardfraud)
      - **[Jupyter Notebook (remote run)](classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb)**
          - run the experiment remotely on AML Compute cluster
          - test the performance of the best model in the local environment
      - **[Jupyter Notebook (local run)](local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb)**
          - run experiment in the local environment
          - use Mimic Explainer for computing feature importance
          - deploy the best model along with the explainer to an Azure Kubernetes (AKS) cluster, which will compute the raw and engineered feature importances at inference time
 - **Predict Term Deposit Subscriptions in a Bank**
    - Dataset: [UCI's bank marketing dataset](https://www.kaggle.com/janiobachmann/bank-marketing-dataset)
        - **[Jupyter Notebook](classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb)**
          - run experiment remotely on AML Compute cluster to generate ONNX compatible models
          - view the featurization steps that were applied during training
          - view feature importance for the best model
          - download the best model in ONNX format and use it for inferencing using ONNXRuntime
          - deploy the best model in PKL format to Azure Container Instance (ACI)
 - **Predict Newsgroup based on Text from News Article**
    - Dataset: [20 newsgroups text dataset](https://scikit-learn.org/0.19/datasets/twenty_newsgroups.html)
        - **[Jupyter Notebook](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)**
          - AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data
          - AutoML will use Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used
          - Bidirectional Long-Short Term neural network (BiLSTM) will be utilized when a CPU compute is used, thereby optimizing the choice of DNN
 ## Regression
 - **Predict Performance of Hardware Parts**
    - Dataset: Hardware Performance Dataset
        - **[Jupyter Notebook](regression/auto-ml-regression.ipynb)**
            - run the experiment remotely on AML Compute cluster
            - get best trained model for a different metric than the one the experiment was optimized for
            - test the performance of the best model in the local environment
        - **[Jupyter Notebook (advanced)](regression/auto-ml-regression.ipynb)**
            - run the experiment remotely on AML Compute cluster
            - customize featurization: override column purpose within the dataset, configure transformer parameters
            - get best trained model for a different metric than the one the experiment was optimized for
            - run a model explanation experiment on the remote cluster
            - deploy the model along the explainer and run online inferencing
 ## Time Series Forecasting
 - **Forecast Energy Demand**
    - Dataset: [NYC energy demand data](http://mis.nyiso.com/public/P-58Blist.htm)
        - **[Jupyter Notebook](forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb)**
          - run experiment remotely on AML Compute cluster
          - use lags and rolling window features
          - view the featurization steps that were applied during training
          - get the best model, use it to forecast on test data and compare the accuracy of predictions against real data
 - **Forecast Orange Juice Sales (Multi-Series)**
    - Dataset: [Dominick's grocery sales of orange juice](forecasting-orange-juice-sales/dominicks_OJ.csv)
        - **[Jupyter Notebook](forecasting-orange-juice-sales/dominicks_OJ.csv)**
          - run experiment remotely on AML Compute cluster
          - customize time-series featurization, change column purpose and override transformer hyper parameters
          - evaluate locally the performance of the generated best model
          - deploy the best model as a webservice on Azure Container Instance (ACI)
          - get online predictions from the deployed model
 - **Forecast Demand of a Bike-Sharing Service**
    - Dataset: [Bike demand data](forecasting-bike-share/bike-no.csv)
        - **[Jupyter Notebook](forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)**
          - run experiment remotely on AML Compute cluster
          - integrate holiday features
          - run rolling forecast for test set that is longer than the forecast horizon
          - compute metrics on the predictions from the remote forecast
 - **The Forecast Function Interface**
    - Dataset: Generated for sample purposes
        - **[Jupyter Notebook](forecasting-forecast-function/auto-ml-forecasting-function.ipynb)**
          - train a forecaster using a remote AML Compute cluster
          - capabilities of forecast function (e.g. forecast farther into the horizon)
          - generate confidence intervals
 - **Forecast Beverage Production**
    - Dataset: [Monthly beer production data](forecasting-beer-remote/Beer_no_valid_split_train.csv)
        - **[Jupyter Notebook](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb)**
          - train using a remote AML Compute cluster
          - enable the DNN learning model
          - forecast on a remote compute cluster and compare different model performance
 - **Continuous Retraining with NOAA Weather Data**
    - Dataset: [NOAA weather data from Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/)
        - **[Jupyter Notebook](continuous-retraining/auto-ml-continuous-retraining.ipynb)**
          - continuously retrain a model using Pipelines and AutoML
          - create a Pipeline to upload a time series dataset to an Azure blob
          - create a Pipeline to run an AutoML experiment and register the best resulting model in the Workspace
          - publish the training pipeline created and schedule it to run daily
 <a name="documentation"></a>
 See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
 <a name="pythoncommand"></a>
 # Running using python command
 Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
 You can then run this file using the python command.
 However, on Windows the file needs to be modified before it can be run.
 The following condition must be added to the main code in the file:
    if __name__ == "__main__":
 The main code of the file must be indented so that it is under this condition.
 <a name="troubleshooting"></a>
 # Troubleshooting
 ## automl_setup fails
 1. On Windows, make sure that you are running automl_setup from an Anconda Prompt window rather than a regular cmd window.  You can launch the "Anaconda Prompt" window by hitting the Start button and typing "Anaconda Prompt".  If you don't see the application "Anaconda Prompt", you might not have conda or mini conda installed.  In that case, you can install it [here](https://conda.io/miniconda.html)
 2. Check that you have conda 64-bit installed rather than 32-bit.  You can check this with the command `conda info`.  The `platform` should be `win-64` for Windows or `osx-64` for Mac.
 3. Check that you have conda 4.7.8 or later.  You can check the version with the command `conda -V`.  If you have a previous version installed, you can update it using the command: `conda update conda`.
 4. On Linux, if the error is `gcc: error trying to exec 'cc1plus': execvp: No such file or directory`, install build essentials using the command `sudo apt-get install build-essential`.
 5. Pass a new name as the first parameter to automl_setup so that it creates a new conda environment. You can view existing conda environments using `conda env list` and remove them with `conda env remove -n <environmentname>`.
 ## automl_setup_linux.sh fails
 If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execute 'gcc': No such file or directory`
 1. Make sure that outbound ports 53 and 80 are enabled.  On an Azure VM, you can do this from the Azure Portal by selecting the VM and clicking on Networking.
 2. Run the command: `sudo apt-get update`
 3. Run the command: `sudo apt-get install build-essential --fix-missing`
 4. Run `automl_setup_linux.sh` again.
 ## configuration.ipynb fails
 1) For local conda, make sure that you have susccessfully run automl_setup first.
 2) Check that the subscription_id is correct.  You can find the subscription_id in the Azure Portal by selecting All Service and then Subscriptions. The characters "<" and ">" should not be included in the subscription_id value.  For example, `subscription_id = "12345678-90ab-1234-5678-1234567890abcd"` has the valid format.
 3) Check that you have Contributor or Owner access to the Subscription.
 4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus`
 5) Check that you have access to the region using the Azure Portal.
 ## import AutoMLConfig fails after upgrade from before 1.0.76 to 1.0.76 or later
 There were package changes in automated machine learning version 1.0.76, which require the previous version to be uninstalled before upgrading to the new version.
 If you have manually upgraded from a version of automated machine learning before 1.0.76 to 1.0.76 or later, you may get the error:
 `ImportError: cannot import name 'AutoMLConfig'`
 This can be resolved by running:
 `pip uninstall azureml-train-automl` and then 
 `pip install azureml-train-automl`
 The automl_setup.cmd script does this automatically.
 ## workspace.from_config fails
 If the call `ws = Workspace.from_config()` fails:
 1) Make sure that you have run the `configuration.ipynb` notebook successfully.
 2) If you are running a notebook from a folder that is not under the folder where you ran `configuration.ipynb`, copy the folder aml_config and the file config.json that it contains to the new folder.  Workspace.from_config reads the config.json for the notebook folder or it parent folder.
 3) If you are switching to a new subscription, resource group, workspace or region, make sure that you run the `configuration.ipynb` notebook again.  Changing config.json directly will only work if the workspace already exists in the specified resource group under the specified subscription.
 4) If you want to change the region, please change the workspace, resource group or subscription.  `Workspace.create` will not create or update a workspace if it already exists, even if the region specified is different.
 ## Sample notebook fails
 If a sample notebook fails with an error that property, method or library does not exist:
 1) Check that you have selected correct kernel in jupyter notebook.  The kernel is displayed in the top right of the notebook page.  It can be changed using the `Kernel | Change Kernel` menu option.  For Azure Notebooks, it should be `Python 3.6`.  For local conda environments, it should be the conda envioronment name that you specified in automl_setup.  The default is azure_automl.  Note that the kernel is saved as part of the notebook.  So, if you switch to a new conda environment, you will have to select the new kernel in the notebook.
 2) Check that the notebook is for the SDK version that you are using.  You can check the SDK version by executing `azureml.core.VERSION` in a jupyter notebook cell.  You can download previous version of the sample notebooks from GitHub by clicking the `Branch` button, selecting the `Tags` tab and then selecting the version.
 ## Numpy import fails on Windows
 Some Windows environments see an error loading numpy with the latest Python version 3.6.8.  If you see this issue, try with Python version 3.6.7.
 ## Numpy import fails
 Check the tensorflow version in the automated ml conda environment. Supported versions are < 1.13. Uninstall tensorflow from the environment if version is >= 1.13
 You may check the version of tensorflow and uninstall as follows
 1) start a command shell, activate conda environment where automated ml packages are installed
 2) enter `pip freeze` and look for `tensorflow` , if found, the version listed should be < 1.13
 3) If the listed version is a not a supported version,  `pip uninstall tensorflow` in the command shell and enter y for confirmation.
 ## KeyError: 'brand' when running AutoML on local compute or Azure Databricks cluster**
 If a new environment was created after 10 June 2020 using SDK 1.7.0 or lower, training may fail with the above error due to an update in the py-cpuinfo package. (Environments created on or before 10 June 2020 are unaffected, as well as experiments run on remote compute as cached training images are used.) To work around this issue, either of the two following steps can be taken:
 1) Update the SDK version to 1.8.0 or higher (this will also downgrade py-cpuinfo to 5.0.0):
 `pip install --upgrade azureml-sdk[automl]`
 2) Downgrade the installed version of py-cpuinfo to 5.0.0:
 `pip install py-cpuinfo==5.0.0`
 ## Remote run: DsvmCompute.create fails
 There are several reasons why the DsvmCompute.create can fail.  The reason is usually in the error message but you have to look at the end of the error message for the detailed reason.  Some common reasons are:
 1) `Compute name is invalid, it should start with a letter, be between 2 and 16 character, and only include letters (a-zA-Z), numbers (0-9) and \'-\'.`  Note that underscore is not allowed in the name.
 2) `The requested VM size xxxxx is not available in the current region.`  You can select a different region or vm_size.
 ## Remote run: Unable to establish SSH connection
 Automated ML uses the SSH protocol to communicate with remote DSVMs.  This defaults to port 22.  Possible causes for this error are:
 1) The DSVM is not ready for SSH connections.  When DSVM creation completes, the DSVM might still not be ready to acceept SSH connections.  The sample notebooks have a one minute delay to allow for this.
 2) Your Azure Subscription may restrict the IP address ranges that can access the DSVM on port 22.  You can check this in the Azure Portal by selecting the Virtual Machine and then clicking Networking.  The Virtual Machine name is the name that you provided in the notebook plus 10 alpha numeric characters to make the name unique.  The Inbound Port Rules define what can access the VM on specific ports.  Note that there is a priority priority order.  So, a Deny entry with a low priority number will override a Allow entry with a higher priority number.
 ## Remote run: setup iteration fails
 This is often an issue with the `get_data` method.
 1) Check that the `get_data` method is valid by running it locally.
 2) Make sure that `get_data` isn't referring to any local files.  `get_data` is executed on the remote DSVM.  So, it doesn't have direct access to local data files.  Instead you can store the data files with DataStore.  See [auto-ml-remote-execution-with-datastore.ipynb](remote-execution-with-datastore/auto-ml-remote-execution-with-datastore.ipynb)
 3) You can get to the error log for the setup iteration by clicking the `Click here to see the run in Azure portal` link, click `Back to Experiment`, click on the highest run number and then click on Logs.
 ## Remote run: disk full
 Automated ML creates files under /tmp/azureml_runs for each iteration that it runs.  It creates a folder with the iteration id.  For example: AutoML_9a038a18-77cc-48f1-80fb-65abdbc33abe_93.  Under this, there is a azureml-logs folder, which contains logs.  If you run too many iterations on the same DSVM, these files can fill the disk.
 You can delete the files under /tmp/azureml_runs or just delete the VM and create a new one.
 If your get_data downloads files, make sure the delete them or they can use disk space as well.
 When using DataStore, it is good to specify an absolute path for the files so that they are downloaded just once.  If you specify a relative path, it will download a file for each iteration.
 ## Remote run: Iterations fail and the log contains "MemoryError"
 This can be caused by insufficient memory on the DSVM.  Automated ML loads all training data into memory.  So, the available memory should be more than the training data size.
 If you are using a remote DSVM, memory is needed for each concurrent iteration.  The max_concurrent_iterations setting specifies the maximum concurrent iterations.  For example, if the training data size is 8Gb and max_concurrent_iterations is set to 10, the minimum memory required is at least 80Gb.
 To resolve this issue, allocate a DSVM with more memory or reduce the value specified for max_concurrent_iterations.
 ## Remote run: Iterations show as "Not Responding" in the RunDetails widget.
 This can be caused by too many concurrent iterations for a remote DSVM.  Each concurrent iteration usually takes 100% of a core when it is running.  Some iterations can use multiple cores.  So, the max_concurrent_iterations setting should always be less than the number of cores of the DSVM.
 To resolve this issue, try reducing the value specified for the max_concurrent_iterations setting.
--- a/how-to-use-azureml/automated-machine-learning/automl_env.yml
+++ b/how-to-use-azureml/automated-machine-learning/automl_env.yml
@@ -0,0 +1,26 @@
 name: azure_automl
 channels:
  - conda-forge
  - pytorch
  - main
 dependencies:
  # The python interpreter version.
  # Azure ML only supports 3.8 and later.
 - pip==22.3.1
 - python>=3.10,<3.11
 - holidays==0.29
 - scipy==1.10.1
 - tqdm==4.66.1
 - pip:
  # Required packages for AzureML execution, history, and data preparation.
  - azureml-widgets~=1.59.0
  - azureml-defaults~=1.59.0
  - -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_win32_requirements.txt [--no-deps]
  - matplotlib==3.7.1
  - xgboost==1.5.2
  - prophet==1.1.4
  - onnx==1.16.1
  - setuptools-git==1.2
  - spacy==3.7.4
  - https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz
--- a/how-to-use-azureml/automated-machine-learning/automl_env_linux.yml
+++ b/how-to-use-azureml/automated-machine-learning/automl_env_linux.yml
@@ -0,0 +1,30 @@
 name: azure_automl
 channels:
  - conda-forge
  - pytorch
  - main
 dependencies:
  # The python interpreter version.
  # Azure ML only supports 3.7 and later.
 - pip==22.3.1
 - python>=3.10,<3.11
 - matplotlib==3.7.1
 - numpy>=1.21.6,<=1.23.5
 - urllib3==1.26.7
 - scipy==1.10.1
 - scikit-learn==1.5.1
 - holidays==0.29
 - pytorch::pytorch=1.11.0
 - cudatoolkit=10.1.243
 - notebook
 - pip:
  # Required packages for AzureML execution, history, and data preparation.
  - azureml-widgets~=1.59.0
  - azureml-defaults~=1.59.0
  - pytorch-transformers==1.0.0
  - spacy==3.7.4
  - xgboost==1.5.2
  - prophet==1.1.4
  - https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz
  - -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_linux_requirements.txt [--no-deps]
--- a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml
+++ b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml
@@ -0,0 +1,26 @@
 name: azure_automl
 channels:
  - conda-forge
  - pytorch
  - main
 dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.7 and later.
 - pip==22.3.1
 - python>=3.10,<3.11
 - numpy>=1.21.6,<=1.23.5
 - scipy==1.10.1
 - scikit-learn==1.5.1
 - holidays==0.29
 - pip:
  # Required packages for AzureML execution, history, and data preparation.
  - azureml-widgets~=1.59.0
  - azureml-defaults~=1.59.0
  - pytorch-transformers==1.0.0
  - prophet==1.1.4
  - xgboost==1.5.2
  - spacy==3.7.4
  - matplotlib==3.7.1
  - https://aka.ms/automl-resources/packages/en_core_web_sm-3.7.1.tar.gz
  - -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.59.0/validated_darwin_requirements.txt [--no-deps]
--- a/how-to-use-azureml/automated-machine-learning/automl_setup.cmd
+++ b/how-to-use-azureml/automated-machine-learning/automl_setup.cmd
@@ -0,0 +1,80 @@
@echo off
 set conda_env_name=%1
 set automl_env_file=%2
 set options=%3
 set PIP_NO_WARN_SCRIPT_LOCATION=0
 IF "%conda_env_name%"=="" SET conda_env_name="azure_automl"
 IF "%automl_env_file%"=="" SET automl_env_file="automl_env.yml"
 SET check_conda_version_script="check_conda_version.py"
 IF NOT EXIST %automl_env_file% GOTO YmlMissing
 IF "%CONDA_EXE%"=="" GOTO CondaMissing
 IF NOT EXIST %check_conda_version_script% GOTO VersionCheckMissing
 python "%check_conda_version_script%"
 IF errorlevel 1 GOTO ErrorExit:
 SET replace_version_script="replace_latest_version.ps1"
 IF EXIST %replace_version_script% (
  powershell -file %replace_version_script% %automl_env_file%
 )
 call conda activate %conda_env_name% 2>nul:
 if not errorlevel 1 (
  echo Upgrading existing conda environment %conda_env_name%
  call pip uninstall azureml-train-automl -y -q
  call conda env update --name %conda_env_name% --file %automl_env_file%
  if errorlevel 1 goto ErrorExit
 ) else (
  call conda env create -f %automl_env_file% -n %conda_env_name%
 )
 python "%conda_prefix%\scripts\pywin32_postinstall.py" -install
 call conda activate %conda_env_name% 2>nul:
 if errorlevel 1 goto ErrorExit
 call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
 REM azureml.widgets is now installed as part of the pip install under the conda env.
 REM Removing the old user install so that the notebooks will use the latest widget.
 call jupyter nbextension uninstall --user --py azureml.widgets
 echo.
 echo.
 echo ***************************************
 echo * AutoML setup completed successfully *
 echo ***************************************
 IF NOT "%options%"=="nolaunch" (
  echo.
  echo Starting jupyter notebook - please run the configuration notebook 
  echo.
  jupyter notebook --log-level=50 --notebook-dir='..\..'
 )
 goto End
 :CondaMissing
 echo Please run this script from an Anaconda Prompt window.
 echo You can start an Anaconda Prompt window by
 echo typing Anaconda Prompt on the Start menu.
 echo If you don't see the Anaconda Prompt app, install Miniconda.
 echo If you are running an older version of Miniconda or Anaconda,
 echo you can upgrade using the command: conda update conda
 goto End
 :VersionCheckMissing
 echo File %check_conda_version_script% not found.
 goto End
 :YmlMissing
 echo File %automl_env_file% not found.
 :ErrorExit
 echo Install failed
 :End
--- a/how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh
+++ b/how-to-use-azureml/automated-machine-learning/automl_setup_linux.sh
@@ -0,0 +1,66 @@
 #!/bin/bash
 CONDA_ENV_NAME=$1
 AUTOML_ENV_FILE=$2
 OPTIONS=$3
 PIP_NO_WARN_SCRIPT_LOCATION=0
 CHECK_CONDA_VERSION_SCRIPT="check_conda_version.py"
 if [ "$CONDA_ENV_NAME" == "" ]
 then
  CONDA_ENV_NAME="azure_automl"
 fi
 if [ "$AUTOML_ENV_FILE" == "" ]
 then
  AUTOML_ENV_FILE="automl_env_linux.yml"
 fi
 if [ ! -f $AUTOML_ENV_FILE ]; then
    echo "File $AUTOML_ENV_FILE not found"
    exit 1
 fi
 if [ ! -f $CHECK_CONDA_VERSION_SCRIPT ]; then
    echo "File $CHECK_CONDA_VERSION_SCRIPT not found"
    exit 1
 fi
 python "$CHECK_CONDA_VERSION_SCRIPT"
 if [ $? -ne 0 ]; then
    exit 1
 fi
 sed -i 's/AZUREML-SDK-VERSION/latest/' $AUTOML_ENV_FILE
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
   pip uninstall azureml-train-automl -y -q
   conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
   jupyter nbextension uninstall --user --py azureml.widgets
 else
   conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
   source activate $CONDA_ENV_NAME &&
   python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
   jupyter nbextension uninstall --user --py azureml.widgets &&
   echo "" &&
   echo "" &&
   echo "***************************************" &&
   echo "* AutoML setup completed successfully *" &&
   echo "***************************************" &&
   if [ "$OPTIONS" != "nolaunch" ]
   then
      echo "" &&
      echo "Starting jupyter notebook - please run the configuration notebook" &&
      echo "" &&
      jupyter notebook --log-level=50 --notebook-dir '../..'
   fi
 fi
 if [ $? -gt 0 ]
 then
   echo "Installation failed"
 fi
--- a/how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh
+++ b/how-to-use-azureml/automated-machine-learning/automl_setup_mac.sh
@@ -0,0 +1,69 @@
 #!/bin/bash
 CONDA_ENV_NAME=$1
 AUTOML_ENV_FILE=$2
 OPTIONS=$3
 PIP_NO_WARN_SCRIPT_LOCATION=0
 CHECK_CONDA_VERSION_SCRIPT="check_conda_version.py"
 if [ "$CONDA_ENV_NAME" == "" ]
 then
  CONDA_ENV_NAME="azure_automl"
 fi
 if [ "$AUTOML_ENV_FILE" == "" ]
 then
  AUTOML_ENV_FILE="automl_env_mac.yml"
 fi
 if [ ! -f $AUTOML_ENV_FILE ]; then
    echo "File $AUTOML_ENV_FILE not found"
    exit 1
 fi
 if [ ! -f $CHECK_CONDA_VERSION_SCRIPT ]; then
    echo "File $CHECK_CONDA_VERSION_SCRIPT not found"
    exit 1
 fi
 python "$CHECK_CONDA_VERSION_SCRIPT"
 if [ $? -ne 0 ]; then
    exit 1
 fi
 sed -i '' 's/AZUREML-SDK-VERSION/latest/' $AUTOML_ENV_FILE
 brew install libomp
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
   pip uninstall azureml-train-automl -y -q
   conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
   jupyter nbextension uninstall --user --py azureml.widgets
 else
   conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
   source activate $CONDA_ENV_NAME &&
   conda install lightgbm -c conda-forge -y &&
   python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
   jupyter nbextension uninstall --user --py azureml.widgets &&
   echo "" &&
   echo "" &&
   echo "***************************************" &&
   echo "* AutoML setup completed successfully *" &&
   echo "***************************************" &&
   if [ "$OPTIONS" != "nolaunch" ]
   then
      echo "" &&
      echo "Starting jupyter notebook - please run the configuration notebook" &&
      echo "" &&
      jupyter notebook --log-level=50 --notebook-dir '../..'
   fi
 fi
 if [ $? -gt 0 ]
 then
   echo "Installation failed"
 fi
--- a/how-to-use-azureml/automated-machine-learning/check_conda_version.py
+++ b/how-to-use-azureml/automated-machine-learning/check_conda_version.py
@@ -0,0 +1,26 @@
 from setuptools._vendor.packaging import version
 import platform
 try:
    import conda
 except Exception:
    print('Failed to import conda.')
    print('This setup is usually run from the base conda environment.')
    print('You can activate the base environment using the command "conda activate base"')
    exit(1)
 architecture = platform.architecture()[0]
 if architecture != "64bit":
    print('This setup requires 64bit Anaconda or Miniconda.  Found: ' + architecture)
    exit(1)
 minimumVersion = "4.7.8"
 versionInvalid = (version.parse(conda.__version__) < version.parse(minimumVersion))
 if versionInvalid:
    print('Setup requires conda version ' + minimumVersion + ' or higher.')
    print('You can use the command "conda update conda" to upgrade conda.')
 exit(versionInvalid)
--- a/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb
--- a/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb
@@ -0,0 +1,504 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Classification of credit card fraudulent transactions on remote compute **_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Train](#Train)\n",
        "1. [Results](#Results)\n",
        "1. [Test](#Test)\n",
        "1. [Acknowledgements](#Acknowledgements)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "\n",
        "In this example we use the associated credit card dataset to showcase how you can use AutoML for a simple classification problem. The goal is to predict if a credit card transaction is considered a fraudulent charge.\n",
        "\n",
        "This notebook is using remote compute to train the model.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Create an experiment using an existing workspace.\n",
        "2. Configure AutoML using `AutoMLConfig`.\n",
        "3. Train the model using remote compute.\n",
        "4. Explore the results.\n",
        "5. Test the fitted model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "import pandas as pd\n",
        "import os\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.core.dataset import Dataset\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for experiment\n",
        "experiment_name = \"automl-classification-ccard-remote\"\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Experiment Name\"] = experiment.name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create or Attach existing AmlCompute\n",
        "A compute target is required to execute the Automated ML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "cpu_cluster_name = \"cpu-cluster-1\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=6\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Data\n",
        "\n",
        "Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "name": "load-data"
      },
      "outputs": [],
      "source": [
        "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard.csv\"\n",
        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
        "training_data, validation_data = dataset.random_split(percentage=0.8, seed=223)\n",
        "label_column_name = \"Class\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train\n",
        "\n",
        "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification or regression|\n",
        "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics: <br><i>accuracy</i><br><i>AUC_weighted</i><br><i>average_precision_score_weighted</i><br><i>norm_macro_recall</i><br><i>precision_score_weighted</i>|\n",
        "|**enable_early_stopping**|Stop the run if the metric score is not showing improvement.|\n",
        "|**n_cross_validations**|Number of cross validation splits.|\n",
        "|**training_data**|Input dataset, containing both features and label column.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "\n",
        "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "name": "automl-config"
      },
      "outputs": [],
      "source": [
        "automl_settings = {\n",
        "    \"n_cross_validations\": 3,\n",
        "    \"primary_metric\": \"average_precision_score_weighted\",\n",
        "    \"enable_early_stopping\": True,\n",
        "    \"max_concurrent_iterations\": 2,  # This is a limit for testing purpose, please increase it as per cluster size\n",
        "    \"experiment_timeout_hours\": 0.25,  # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n",
        "    \"verbosity\": logging.INFO,\n",
        "}\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"classification\",\n",
        "    debug_log=\"automl_errors.log\",\n",
        "    compute_target=compute_target,\n",
        "    training_data=training_data,\n",
        "    label_column_name=label_column_name,\n",
        "    **automl_settings,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If you need to retrieve a run that already started, use the following code\n",
        "# from azureml.train.automl.run import AutoMLRun\n",
        "# remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Widget for Monitoring Runs\n",
        "\n",
        "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
        "\n",
        "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "widget-rundetails-sample"
        ]
      },
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "\n",
        "RunDetails(remote_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Explain model\n",
        "\n",
        "Automated ML models can be explained and visualized using the SDK Explainability library. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Analyze results\n",
        "\n",
        "### Retrieve the Best Model\n",
        "\n",
        "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model.  Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run, fitted_model = remote_run.get_output()\n",
        "fitted_model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Print the properties of the model\n",
        "The fitted_model is a python object and you can read the different properties of the object.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Test the fitted model\n",
        "\n",
        "Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# convert the test data to dataframe\n",
        "X_test_df = validation_data.drop_columns(\n",
        "    columns=[label_column_name]\n",
        ").to_pandas_dataframe()\n",
        "y_test_df = validation_data.keep_columns(\n",
        "    columns=[label_column_name], validate=True\n",
        ").to_pandas_dataframe()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# call the predict functions on the model\n",
        "y_pred = fitted_model.predict(X_test_df)\n",
        "y_pred"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Calculate metrics for the prediction\n",
        "\n",
        "Now visualize the data on a scatter plot to show what our truth (actual) values are compared to the predicted values \n",
        "from the trained model that was returned."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.metrics import confusion_matrix\n",
        "import numpy as np\n",
        "import itertools\n",
        "\n",
        "cf = confusion_matrix(y_test_df.values, y_pred)\n",
        "plt.imshow(cf, cmap=plt.cm.Blues, interpolation=\"nearest\")\n",
        "plt.colorbar()\n",
        "plt.title(\"Confusion Matrix\")\n",
        "plt.xlabel(\"Predicted\")\n",
        "plt.ylabel(\"Actual\")\n",
        "class_labels = [\"False\", \"True\"]\n",
        "tick_marks = np.arange(len(class_labels))\n",
        "plt.xticks(tick_marks, class_labels)\n",
        "plt.yticks([-0.5, 0, 1, 1.5], [\"\", \"False\", \"True\", \"\"])\n",
        "# plotting text value inside cells\n",
        "thresh = cf.max() / 2.0\n",
        "for i, j in itertools.product(range(cf.shape[0]), range(cf.shape[1])):\n",
        "    plt.text(\n",
        "        j,\n",
        "        i,\n",
        "        format(cf[i, j], \"d\"),\n",
        "        horizontalalignment=\"center\",\n",
        "        color=\"white\" if cf[i, j] > thresh else \"black\",\n",
        "    )\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Acknowledgements"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This Credit Card fraud Detection dataset is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ and is available at: https://www.kaggle.com/mlg-ulb/creditcardfraud\n",
        "\n",
        "The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Universit\u00c3\u00a9 Libre de Bruxelles) on big data mining and fraud detection.\n",
        "More details on current and past projects on related topics are available on https://www.researchgate.net/project/Fraud-detection-5 and the page of the DefeatFraud project\n",
        "\n",
        "Please cite the following works:\n",
        "\n",
        "Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson and Gianluca Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In Symposium on Computational Intelligence and Data Mining (CIDM), IEEE, 2015\n",
        "\n",
        "Dal Pozzolo, Andrea; Caelen, Olivier; Le Borgne, Yann-Ael; Waterschoot, Serge; Bontempi, Gianluca. Learned lessons in credit card fraud detection from a practitioner perspective, Expert systems with applications,41,10,4915-4928,2014, Pergamon\n",
        "\n",
        "Dal Pozzolo, Andrea; Boracchi, Giacomo; Caelen, Olivier; Alippi, Cesare; Bontempi, Gianluca. Credit card fraud detection: a realistic modeling and a novel learning strategy, IEEE transactions on neural networks and learning systems,29,8,3784-3797,2018,IEEE\n",
        "\n",
        "Dal Pozzolo, Andrea Adaptive Machine learning for credit card fraud detection ULB MLG PhD thesis (supervised by G. Bontempi)\n",
        "\n",
        "Carcillo, Fabrizio; Dal Pozzolo, Andrea; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Mazzer, Yannis; Bontempi, Gianluca. Scarff: a scalable framework for streaming credit card fraud detection with Spark, Information fusion,41, 182-194,2018,Elsevier\n",
        "\n",
        "Carcillo, Fabrizio; Le Borgne, Yann-A\u00c3\u00abl; Caelen, Olivier; Bontempi, Gianluca. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization, International Journal of Data Science and Analytics, 5,4,285-300,2018,Springer International Publishing\n",
        "\n",
        "Bertrand Lebichot, Yann-A\u00c3\u00abl Le Borgne, Liyun He, Frederic Obl\u00c3\u00a9, Gianluca Bontempi Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection, INNSBDDL 2019: Recent Advances in Big Data and Deep Learning, pp 78-88, 2019\n",
        "\n",
        "Fabrizio Carcillo, Yann-A\u00c3\u00abl Le Borgne, Olivier Caelen, Frederic Obl\u00c3\u00a9, Gianluca Bontempi Combining Unsupervised and Supervised Learning in Credit Card Fraud Detection Information Sciences, 2019"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "ratanase"
      }
    ],
    "category": "tutorial",
    "compute": [
      "AML Compute"
    ],
    "datasets": [
      "Creditcard"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "file_extension": ".py",
    "framework": [
      "None"
    ],
    "friendly_name": "Classification of credit card fraudulent transactions using Automated ML",
    "index_order": 5,
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    },
    "mimetype": "text/x-python",
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
    "tags": [
      "remote_run",
      "AutomatedML"
    ],
    "task": "Classification",
    "version": "3.6.7"
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.ipynb
@@ -0,0 +1,602 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/continuous-retraining/auto-ml-continuous-retraining.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning \n",
        "**Continuous retraining using Pipelines and Time-Series TabularDataset**\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "2. [Setup](#Setup)\n",
        "3. [Compute](#Compute)\n",
        "4. [Run Configuration](#Run-Configuration)\n",
        "5. [Data Ingestion Pipeline](#Data-Ingestion-Pipeline)\n",
        "6. [Training Pipeline](#Training-Pipeline)\n",
        "7. [Publish Retraining Pipeline and Schedule](#Publish-Retraining-Pipeline-and-Schedule)\n",
        "8. [Test Retraining](#Test-Retraining)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "In this example we use AutoML and Pipelines to enable contious retraining of a model based on updates to the training dataset. We will create two pipelines, the first one to demonstrate a training dataset that gets updated over time. We leverage time-series capabilities of `TabularDataset` to achieve this. The second pipeline utilizes pipeline `Schedule` to trigger continuous retraining. \n",
        "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
        "In this notebook you will learn how to:\n",
        "* Create an Experiment in an existing Workspace.\n",
        "* Configure AutoML using AutoMLConfig.\n",
        "* Create data ingestion pipeline to update a time-series based TabularDataset\n",
        "* Create training pipeline to prepare data, run AutoML, register the model and setup pipeline triggers.\n",
        "\n",
        "## Setup\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "\n",
        "from matplotlib import pyplot as plt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "from sklearn import datasets\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Accessing the Azure ML workspace requires authentication with Azure.\n",
        "\n",
        "The default authentication is interactive authentication using the default tenant. Executing the ws = Workspace.from_config() line in the cell below will prompt for authentication the first time that it is run.\n",
        "\n",
        "If you have multiple Azure tenants, you can specify the tenant by replacing the ws = Workspace.from_config() line in the cell below with the following:\n",
        "```\n",
        "from azureml.core.authentication import InteractiveLoginAuthentication\n",
        "auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n",
        "ws = Workspace.from_config(auth = auth)\n",
        "```\n",
        "If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the ws = Workspace.from_config() line in the cell below with the following:\n",
        "```\n",
        "from azureml.core.authentication import ServicePrincipalAuthentication\n",
        "auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n",
        "ws = Workspace.from_config(auth = auth)\n",
        "```\n",
        "For more details, see aka.ms/aml-notebook-auth"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "dstor = ws.get_default_datastore()\n",
        "\n",
        "# Choose a name for the run history container in the workspace.\n",
        "experiment_name = \"retrain-noaaweather\"\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Run History Name\"] = experiment_name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute \n",
        "\n",
        "#### Create or Attach existing AmlCompute\n",
        "\n",
        "You will need to create a compute target for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "amlcompute_cluster_name = \"cont-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=4\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Run Configuration"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n",
        "\n",
        "# create a new RunConfig object\n",
        "conda_run_config = RunConfiguration(framework=\"python\")\n",
        "\n",
        "# Set compute target to AmlCompute\n",
        "conda_run_config.target = compute_target\n",
        "\n",
        "conda_run_config.environment.docker.enabled = True\n",
        "\n",
        "cd = CondaDependencies.create(\n",
        "    pip_packages=[\n",
        "        \"azureml-sdk[automl]\",\n",
        "        \"applicationinsights\",\n",
        "        \"azureml-opendatasets\",\n",
        "        \"azureml-defaults\",\n",
        "    ],\n",
        "    conda_packages=[\"numpy==1.19.5\"],\n",
        "    pin_sdk_version=False,\n",
        ")\n",
        "conda_run_config.environment.python.conda_dependencies = cd\n",
        "\n",
        "print(\"run config is ready\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data Ingestion Pipeline \n",
        "For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You can replace this with your own dataset, or you can skip this pipeline if you already have a time-series based `TabularDataset`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# The name and target column of the Dataset to create\n",
        "dataset = \"NOAA-Weather-DS4\"\n",
        "target_column_name = \"temperature\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n",
        "### Upload Data Step\n",
        "The data ingestion pipeline has a single step with a script to query the latest weather data and upload it to the blob store. During the first run, the script will create and register a time-series based `TabularDataset` with the past one week of weather data. For each subsequent run, the script will create a partition in the blob store by querying NOAA for new weather data since the last modified time of the dataset (`dataset.data_changed_time`) and creating a data.csv file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import Pipeline, PipelineParameter\n",
        "from azureml.pipeline.steps import PythonScriptStep\n",
        "\n",
        "ds_name = PipelineParameter(name=\"ds_name\", default_value=dataset)\n",
        "upload_data_step = PythonScriptStep(\n",
        "    script_name=\"upload_weather_data.py\",\n",
        "    allow_reuse=False,\n",
        "    name=\"upload_weather_data\",\n",
        "    arguments=[\"--ds_name\", ds_name],\n",
        "    compute_target=compute_target,\n",
        "    runconfig=conda_run_config,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit Pipeline Run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_pipeline = Pipeline(\n",
        "    description=\"pipeline_with_uploaddata\", workspace=ws, steps=[upload_data_step]\n",
        ")\n",
        "data_pipeline_run = experiment.submit(\n",
        "    data_pipeline, pipeline_parameters={\"ds_name\": dataset}\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_pipeline_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Training Pipeline\n",
        "### Prepare Training Data Step\n",
        "\n",
        "Script to check if new data is available since the model was last trained. If no new data is available, we cancel the remaining pipeline steps. We need to set allow_reuse flag to False to allow the pipeline to run even when inputs don't change. We also need the name of the model to check the time the model was last trained."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import PipelineData\n",
        "\n",
        "# The model name with which to register the trained model in the workspace.\n",
        "model_name = PipelineParameter(\"model_name\", default_value=\"noaaweatherds\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_prep_step = PythonScriptStep(\n",
        "    script_name=\"check_data.py\",\n",
        "    allow_reuse=False,\n",
        "    name=\"check_data\",\n",
        "    arguments=[\"--ds_name\", ds_name, \"--model_name\", model_name],\n",
        "    compute_target=compute_target,\n",
        "    runconfig=conda_run_config,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Dataset\n",
        "\n",
        "train_ds = Dataset.get_by_name(ws, dataset)\n",
        "train_ds = train_ds.drop_columns([\"partition_date\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### AutoMLStep\n",
        "Create an AutoMLConfig and a training step."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.train.automl import AutoMLConfig\n",
        "from azureml.pipeline.steps import AutoMLStep\n",
        "\n",
        "automl_settings = {\n",
        "    \"iteration_timeout_minutes\": 10,\n",
        "    \"experiment_timeout_hours\": 0.25,\n",
        "    \"n_cross_validations\": 3,\n",
        "    \"primary_metric\": \"r2_score\",\n",
        "    \"max_concurrent_iterations\": 3,\n",
        "    \"max_cores_per_iteration\": -1,\n",
        "    \"verbosity\": logging.INFO,\n",
        "    \"enable_early_stopping\": True,\n",
        "}\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"regression\",\n",
        "    debug_log=\"automl_errors.log\",\n",
        "    path=\".\",\n",
        "    compute_target=compute_target,\n",
        "    training_data=train_ds,\n",
        "    label_column_name=target_column_name,\n",
        "    **automl_settings,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import PipelineData, TrainingOutput\n",
        "\n",
        "metrics_output_name = \"metrics_output\"\n",
        "best_model_output_name = \"best_model_output\"\n",
        "\n",
        "metrics_data = PipelineData(\n",
        "    name=\"metrics_data\",\n",
        "    datastore=dstor,\n",
        "    pipeline_output_name=metrics_output_name,\n",
        "    training_output=TrainingOutput(type=\"Metrics\"),\n",
        ")\n",
        "model_data = PipelineData(\n",
        "    name=\"model_data\",\n",
        "    datastore=dstor,\n",
        "    pipeline_output_name=best_model_output_name,\n",
        "    training_output=TrainingOutput(type=\"Model\"),\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_step = AutoMLStep(\n",
        "    name=\"automl_module\",\n",
        "    automl_config=automl_config,\n",
        "    outputs=[metrics_data, model_data],\n",
        "    allow_reuse=False,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Register Model Step\n",
        "Script to register the model to the workspace. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "register_model_step = PythonScriptStep(\n",
        "    script_name=\"register_model.py\",\n",
        "    name=\"register_model\",\n",
        "    allow_reuse=False,\n",
        "    arguments=[\n",
        "        \"--model_name\",\n",
        "        model_name,\n",
        "        \"--model_path\",\n",
        "        model_data,\n",
        "        \"--ds_name\",\n",
        "        ds_name,\n",
        "    ],\n",
        "    inputs=[model_data],\n",
        "    compute_target=compute_target,\n",
        "    runconfig=conda_run_config,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit Pipeline Run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline = Pipeline(\n",
        "    description=\"training_pipeline\",\n",
        "    workspace=ws,\n",
        "    steps=[data_prep_step, automl_step, register_model_step],\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline_run = experiment.submit(\n",
        "    training_pipeline,\n",
        "    pipeline_parameters={\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_pipeline_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Publish Retraining Pipeline and Schedule\n",
        "Once we are happy with the pipeline, we can publish the training pipeline to the workspace and create a schedule to trigger on blob change. The schedule polls the blob store where the data is being uploaded and runs the retraining pipeline if there is a data change. A new version of the model will be registered to the workspace once the run is complete."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_name = \"Retraining-Pipeline-NOAAWeather\"\n",
        "\n",
        "published_pipeline = training_pipeline.publish(\n",
        "    name=pipeline_name, description=\"Pipeline that retrains AutoML model\"\n",
        ")\n",
        "\n",
        "published_pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import Schedule\n",
        "\n",
        "schedule = Schedule.create(\n",
        "    workspace=ws,\n",
        "    name=\"RetrainingSchedule\",\n",
        "    pipeline_parameters={\"ds_name\": dataset, \"model_name\": \"noaaweatherds\"},\n",
        "    pipeline_id=published_pipeline.id,\n",
        "    experiment_name=experiment_name,\n",
        "    datastore=dstor,\n",
        "    wait_for_provisioning=True,\n",
        "    polling_interval=1440,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Test Retraining\n",
        "Here we setup the data ingestion pipeline to run on a schedule, to verify that the retraining pipeline runs as expected. \n",
        "\n",
        "Note: \n",
        "* Azure NOAA Weather data is updated daily and retraining will not trigger if there is no new data available. \n",
        "* Depending on the polling interval set in the schedule, the retraining may take some time trigger after data ingestion pipeline completes."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_name = \"DataIngestion-Pipeline-NOAAWeather\"\n",
        "\n",
        "published_pipeline = training_pipeline.publish(\n",
        "    name=pipeline_name, description=\"Pipeline that updates NOAAWeather Dataset\"\n",
        ")\n",
        "\n",
        "published_pipeline"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import Schedule\n",
        "\n",
        "schedule = Schedule.create(\n",
        "    workspace=ws,\n",
        "    name=\"RetrainingSchedule-DataIngestion\",\n",
        "    pipeline_parameters={\"ds_name\": dataset},\n",
        "    pipeline_id=published_pipeline.id,\n",
        "    experiment_name=experiment_name,\n",
        "    datastore=dstor,\n",
        "    wait_for_provisioning=True,\n",
        "    polling_interval=1440,\n",
        ")"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "vivijay"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/check_data.py
+++ b/how-to-use-azureml/automated-machine-learning/continuous-retraining/check_data.py
@@ -0,0 +1,49 @@
 import argparse
 import os
 import azureml.core
 from datetime import datetime
 import pandas as pd
 import pytz
 from azureml.core import Dataset, Model
 from azureml.core.run import Run, _OfflineRun
 from azureml.core import Workspace
 run = Run.get_context()
 ws = None
 if type(run) == _OfflineRun:
    ws = Workspace.from_config()
 else:
    ws = run.experiment.workspace
 print("Check for new data.")
 parser = argparse.ArgumentParser("split")
 parser.add_argument("--ds_name", help="input dataset name")
 parser.add_argument("--model_name", help="name of the deployed model")
 args = parser.parse_args()
 print("Argument 1(ds_name): %s" % args.ds_name)
 print("Argument 2(model_name): %s" % args.model_name)
 # Get the latest registered model
 try:
    model = Model(ws, args.model_name)
    last_train_time = model.created_time
    print("Model was last trained on {0}.".format(last_train_time))
 except Exception:
    print("Could not get last model train time.")
    last_train_time = datetime.min.replace(tzinfo=pytz.UTC)
 train_ds = Dataset.get_by_name(ws, args.ds_name)
 dataset_changed_time = train_ds.data_changed_time.replace(tzinfo=pytz.UTC)
 print("dataset_changed_time=" + str(dataset_changed_time))
 print("last_train_time=" + str(last_train_time))
 if not dataset_changed_time > last_train_time:
    print("Cancelling run since there is no new data.")
    run.parent.cancel()
 else:
    # New data is available since the model was last trained
    print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time))
--- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/register_model.py
+++ b/how-to-use-azureml/automated-machine-learning/continuous-retraining/register_model.py
@@ -0,0 +1,35 @@
 from azureml.core.model import Model, Dataset
 from azureml.core.run import Run, _OfflineRun
 from azureml.core import Workspace
 import argparse
 parser = argparse.ArgumentParser()
 parser.add_argument("--model_name")
 parser.add_argument("--model_path")
 parser.add_argument("--ds_name")
 args = parser.parse_args()
 print("Argument 1(model_name): %s" % args.model_name)
 print("Argument 2(model_path): %s" % args.model_path)
 print("Argument 3(ds_name): %s" % args.ds_name)
 run = Run.get_context()
 ws = None
 if type(run) == _OfflineRun:
    ws = Workspace.from_config()
 else:
    ws = run.experiment.workspace
 train_ds = Dataset.get_by_name(ws, args.ds_name)
 datasets = [(Dataset.Scenario.TRAINING, train_ds)]
 # Register model with training dataset
 model = Model.register(
    workspace=ws,
    model_path=args.model_path,
    model_name=args.model_name,
    datasets=datasets,
 )
 print("Registered version {0} of model {1}".format(model.version, model.name))
--- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py
+++ b/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py
@@ -0,0 +1,161 @@
 import argparse
 import os
 from datetime import datetime
 from dateutil.relativedelta import relativedelta
 import pandas as pd
 import traceback
 from azureml.core import Dataset
 from azureml.core.run import Run, _OfflineRun
 from azureml.core import Workspace
 from azureml.opendatasets import NoaaIsdWeather
 run = Run.get_context()
 ws = None
 if type(run) == _OfflineRun:
    ws = Workspace.from_config()
 else:
    ws = run.experiment.workspace
 usaf_list = [
    "725724",
    "722149",
    "723090",
    "722159",
    "723910",
    "720279",
    "725513",
    "725254",
    "726430",
    "720381",
    "723074",
    "726682",
    "725486",
    "727883",
    "723177",
    "722075",
    "723086",
    "724053",
    "725070",
    "722073",
    "726060",
    "725224",
    "725260",
    "724520",
    "720305",
    "724020",
    "726510",
    "725126",
    "722523",
    "703333",
    "722249",
    "722728",
    "725483",
    "722972",
    "724975",
    "742079",
    "727468",
    "722193",
    "725624",
    "722030",
    "726380",
    "720309",
    "722071",
    "720326",
    "725415",
    "724504",
    "725665",
    "725424",
    "725066",
 ]
 def get_noaa_data(start_time, end_time):
    columns = [
        "usaf",
        "wban",
        "datetime",
        "latitude",
        "longitude",
        "elevation",
        "windAngle",
        "windSpeed",
        "temperature",
        "stationName",
        "p_k",
    ]
    isd = NoaaIsdWeather(start_time, end_time, cols=columns)
    noaa_df = isd.to_pandas_dataframe()
    df_filtered = noaa_df[noaa_df["usaf"].isin(usaf_list)]
    df_filtered.reset_index(drop=True)
    print(
        "Received {0} rows of training data between {1} and {2}".format(
            df_filtered.shape[0], start_time, end_time
        )
    )
    return df_filtered
 print("Check for new data and prepare the data")
 parser = argparse.ArgumentParser("split")
 parser.add_argument("--ds_name", help="name of the Dataset to update")
 args = parser.parse_args()
 print("Argument 1(ds_name): %s" % args.ds_name)
 dstor = ws.get_default_datastore()
 register_dataset = False
 end_time = datetime.utcnow()
 try:
    ds = Dataset.get_by_name(ws, args.ds_name)
    end_time_last_slice = ds.data_changed_time.replace(tzinfo=None)
    print("Dataset {0} last updated on {1}".format(args.ds_name, end_time_last_slice))
 except Exception:
    print(traceback.format_exc())
    print(
        "Dataset with name {0} not found, registering new dataset.".format(args.ds_name)
    )
    register_dataset = True
    end_time = datetime(2021, 5, 1, 0, 0)
    end_time_last_slice = end_time - relativedelta(weeks=2)
 try:
    train_df = get_noaa_data(end_time_last_slice, end_time)
 except Exception as ex:
    print("get_noaa_data failed:", ex)
    train_df = None
 if train_df is not None and train_df.size > 0:
    print(
        "Received {0} rows of new data after {1}.".format(
            train_df.shape[0], end_time_last_slice
        )
    )
    folder_name = "{}/{:04d}/{:02d}/{:02d}/{:02d}/{:02d}/{:02d}".format(
        args.ds_name,
        end_time.year,
        end_time.month,
        end_time.day,
        end_time.hour,
        end_time.minute,
        end_time.second,
    )
    file_path = "{0}/data.csv".format(folder_name)
    # Add a new partition to the registered dataset
    os.makedirs(folder_name, exist_ok=True)
    train_df.to_csv(file_path, index=False)
    dstor.upload_files(
        files=[file_path], target_path=folder_name, overwrite=True, show_progress=True
    )
 else:
    print("No new data since {0}.".format(end_time_last_slice))
 if register_dataset:
    ds = Dataset.Tabular.from_delimited_files(
        dstor.path("{}/**/*.csv".format(args.ds_name)),
        partition_format="/{partition_date:yyyy/MM/dd/HH/mm/ss}/data.csv",
    )
    ds.register(ws, name=args.ds_name)
--- a/how-to-use-azureml/automated-machine-learning/experimental/README.md
+++ b/how-to-use-azureml/automated-machine-learning/experimental/README.md
@@ -0,0 +1,92 @@
 # Experimental Notebooks for Automated ML
 Notebooks listed in this folder are leveraging experimental features. Namespaces or function signitures may change in future SDK releases. The notebooks published here will reflect the latest supported APIs. All of these notebooks can run on a client-only installation of the Automated ML SDK.
 The client only installation doesn't contain any of the machine learning libraries, such as scikit-learn, xgboost, or tensorflow, making it much faster to install and is less likely to conflict with any packages in an existing environment. However, since the ML libraries are not available locally, models cannot be downloaded and loaded directly in the client. To replace the functionality of having models locally, these notebooks also demonstrate the ModelProxy feature which will allow you to submit a predict/forecast to the training environment. 
 <a name="localconda"></a>
 ## Setup using a Local Conda environment
 To run these notebook on your own notebook server, use these installation instructions.
 The instructions below will install everything you need and then start a Jupyter notebook.
 If you would like to use a lighter-weight version of the client that does not install all of the machine learning libraries locally, you can leverage the [experimental notebooks.](experimental/README.md)
 ### 1. Install mini-conda from [here](https://conda.io/miniconda.html), choose 64-bit Python 3.8 or higher.
 - **Note**: if you already have conda installed, you can keep using it but it should be version 4.4.10 or later (as shown by: conda -V).  If you have a previous version installed, you can update it using the command: conda update conda.
 There's no need to install mini-conda specifically.
 ### 2. Downloading the sample notebooks
 - Download the sample notebooks from [GitHub](https://github.com/Azure/MachineLearningNotebooks) as zip and extract the contents to a local directory.  The automated ML sample notebooks are in the "automated-machine-learning" folder.
 ### 3. Setup a new conda environment
 The **automl_setup_thin_client** script creates a new conda environment, installs the necessary packages, configures the widget and starts a jupyter notebook. It takes the conda environment name as an optional parameter.  The default conda environment name is azure_automl_experimental.  The exact command depends on the operating system.  See the specific sections below for Windows, Mac and Linux.  It can take about 10 minutes to execute.
 Packages installed by the **automl_setup** script:
    <ul><li>python</li><li>nb_conda</li><li>matplotlib</li><li>numpy</li><li>cython</li><li>urllib3</li><li>pandas</li><li>azureml-sdk</li><li>azureml-widgets</li><li>pandas-ml</li></ul>
 For more details refer to the [automl_env_thin_client.yml](./automl_env_thin_client.yml)
 ## Windows
 Start an **Anaconda Prompt** window, cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
 ```
 automl_setup_thin_client
 ```
 ## Mac
 Install "Command line developer tools" if it is not already installed (you can use the command: `xcode-select --install`).
 Start a Terminal windows, cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
 ```
 bash automl_setup_thin_client_mac.sh
 ```
 ## Linux
 cd to the **how-to-use-azureml/automated-machine-learning/experimental** folder where the sample notebooks were extracted and then run:
 ```
 bash automl_setup_thin_client_linux.sh
 ```
 ### 4. Running configuration.ipynb
 - Before running any samples you next need to run the configuration notebook. Click on [configuration](../../configuration.ipynb) notebook
 - Execute the cells in the notebook to Register Machine Learning Services Resource Provider and create a workspace. (*instructions in notebook*)
 ### 5. Running Samples
 - Please make sure you use the Python [conda env:azure_automl_experimental] kernel when trying the sample Notebooks.
 - Follow the instructions in the individual notebooks to explore various features in automated ML.
 ### 6. Starting jupyter notebook manually
 To start your Jupyter notebook manually, use:
 ```
 conda activate azure_automl
 jupyter notebook
 ```
 or on Mac or Linux:
 ```
 source activate azure_automl
 jupyter notebook
 ```
 <a name="samples"></a>
 # Automated ML SDK Sample Notebooks
 - [auto-ml-regression-model-proxy.ipynb](regression-model-proxy/auto-ml-regression-model-proxy.ipynb)
    - Dataset: Hardware Performance Dataset
    - Simple example of using automated ML for regression
    - Uses azure compute for training
    - Uses ModelProxy for submitting prediction to training environment on azure compute
 <a name="documentation"></a>
 See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
 <a name="pythoncommand"></a>
 # Running using python command
 Jupyter notebook provides a File / Download as / Python (.py) option for saving the notebook as a Python file.
 You can then run this file using the python command.
 However, on Windows the file needs to be modified before it can be run.
 The following condition must be added to the main code in the file:
    if __name__ == "__main__":
 The main code of the file must be indented so that it is under this condition.
--- a/how-to-use-azureml/automated-machine-learning/experimental/autofeaturization-codegen/codegen-for-autofeaturization.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/experimental/autofeaturization-codegen/codegen-for-autofeaturization.ipynb
@@ -0,0 +1,346 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/custom-model-training-from-autofeaturization-run.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning - Codegen for AutoFeaturization \n",
        "_**Autofeaturization of credit card fraudulent transactions dataset on remote compute and codegen functionality**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Autofeaturization](#Autofeaturization)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Introduction'></a>\n",
        "## Introduction"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Autofeaturization** lets you run an AutoML experiment to only featurize the datasets. These datasets along with the transformer are stored in AML Storage and linked to the run which can later be retrieved and used to train models. \n",
        "\n",
        "**To run Autofeaturization, set the number of iterations to zero and featurization as auto.**\n",
        "\n",
        "Please refer to [Autofeaturization and custom model training](../autofeaturization-custom-model-training/custom-model-training-from-autofeaturization-run.ipynb) for more details on the same.\n",
        "\n",
        "[Codegen](https://github.com/Azure/automl-codegen-preview) is a feature, which when enabled, provides a user with the script of the underlying functionality and a notebook to tweak inputs or code and rerun the same.\n",
        "\n",
        "In this example we use the credit card fraudulent transactions dataset to showcase how you can use AutoML for autofeaturization and further how you can enable the `Codegen` feature.\n",
        "\n",
        "This notebook is using remote compute to complete the featurization.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../configuration.ipynb) notebook first if you haven't already, to establish your connection to the AzureML Workspace. \n",
        "\n",
        "Here you will learn how to create an autofeaturization experiment using an existing workspace with codegen feature enabled."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Setup'></a>\n",
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "import pandas as pd\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.core.dataset import Dataset\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for experiment\n",
        "experiment_name = 'automl-autofeaturization-ccard-codegen-remote'\n",
        "\n",
        "experiment=Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Experiment Name'] = experiment.name\n",
        "outputDf = pd.DataFrame(data = output, index = [''])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create or Attach existing AmlCompute\n",
        "A compute target is required to execute the Automated ML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "cpu_cluster_name = \"cpu-codegen\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print('Found existing cluster, use it.')\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
        "                                                           max_nodes=6)\n",
        "    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Data'></a>\n",
        "## Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Data\n",
        "\n",
        "Load the credit card fraudulent transactions dataset from a CSV file, containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. \n",
        "\n",
        "Here the autofeaturization run will featurize the training data passed in."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "##### Training Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard_train.csv\"\n",
        "training_dataset = Dataset.Tabular.from_delimited_files(training_data) # Tabular dataset\n",
        "\n",
        "label_column_name = 'Class' # output label"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Autofeaturization'></a>\n",
        "## AutoFeaturization\n",
        "\n",
        "Instantiate an AutoMLConfig object. This defines the settings and data used to run the autofeaturization experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification or regression or forecasting|\n",
        "|**training_data**|Input training dataset, containing both features and label column.|\n",
        "|**iterations**|For an autofeaturization run, iterations will be 0.|\n",
        "|**featurization**|For an autofeaturization run, featurization can be 'auto' or 'custom'.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "|**enable_code_generation**|For enabling codegen for the run, value would be True|\n",
        "\n",
        "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_config = AutoMLConfig(task = 'classification',\n",
        "                             debug_log = 'automl_errors.log',\n",
        "                             iterations = 0, # autofeaturization run can be triggered by setting iterations to 0\n",
        "                             compute_target = compute_target,\n",
        "                             training_data = training_dataset,\n",
        "                             label_column_name = label_column_name,\n",
        "                             featurization = 'auto',\n",
        "                             verbosity = logging.INFO,\n",
        "                             enable_code_generation = True # enable codegen\n",
        "                            )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output = False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Widget for Monitoring Runs\n",
        "\n",
        "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
        "\n",
        "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(remote_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Codegen Script and Notebook"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Codegen script and notebook can be found under the `Outputs + logs` section from the details page of the remote run. Please check for the `autofeaturization_notebook.ipynb` under `/outputs/generated_code`. To modify the featurization code, open `script.py` and make changes. The codegen notebook can be run with the same environment configuration as the above AutoML run."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Experiment Complete!"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "bhavanatumma"
      }
    ],
    "interpreter": {
      "hash": "adb464b67752e4577e3dc163235ced27038d19b7d88def00d75d1975bde5d9ab"
    },
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/experimental/autofeaturization-custom-model-training/custom-model-training-from-autofeaturization-run.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/experimental/autofeaturization-custom-model-training/custom-model-training-from-autofeaturization-run.ipynb
@@ -0,0 +1,729 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/custom-model-training-from-autofeaturization-run.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning - AutoFeaturization (Part 1)\n",
        "_**Autofeaturization of credit card fraudulent transactions dataset on remote compute**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Autofeaturization](#Autofeaturization)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Introduction'></a>\n",
        "## Introduction"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Autofeaturization is a new feature to let you as the user run an AutoML experiment to only featurize the datasets. These datasets along with the transformer will be stored in the experiment which can later be retrieved and used to train models, either via AutoML or custom training. \n",
        "\n",
        "**To run Autofeaturization, pass in zero iterations and featurization as auto. This will featurize the datasets and terminate the experiment. Training will not occur.**\n",
        "\n",
        "*Limitations - Sparse data cannot be supported at the moment. Any dataset that has extensive categorical data might be featurized into sparse data which will not be allowed as input to AutoML. Efforts are underway to support sparse data and will be updated soon.* \n",
        "\n",
        "In this example we use the credit card fraudulent transactions dataset to showcase how you can use AutoML for autofeaturization. The goal is to clean and featurize the training dataset.\n",
        "\n",
        "This notebook is using remote compute to complete the featurization.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration](../../configuration.ipynb) notebook first if you haven't already, to establish your connection to the AzureML Workspace. \n",
        "\n",
        "In the below steps, you will learn how to:\n",
        "1. Create an autofeaturization experiment using an existing workspace.\n",
        "2. View the featurized datasets and transformer"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Setup'></a>\n",
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "import pandas as pd\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.core.dataset import Dataset\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for experiment\n",
        "experiment_name = 'automl-autofeaturization-ccard-remote'\n",
        "\n",
        "experiment=Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Experiment Name'] = experiment.name\n",
        "outputDf = pd.DataFrame(data = output, index = [''])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Create or Attach existing AmlCompute\n",
        "A compute target is required to execute the Automated ML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "cpu_cluster_name = \"cpu-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print('Found existing cluster, use it.')\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
        "                                                           max_nodes=6)\n",
        "    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Data'></a>\n",
        "## Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Data\n",
        "\n",
        "Load the credit card fraudulent transactions dataset from a CSV file, containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. \n",
        "\n",
        "Here the autofeaturization run will featurize the training data passed in."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "##### Training Dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard_train.csv\"\n",
        "training_dataset = Dataset.Tabular.from_delimited_files(training_data) # Tabular dataset\n",
        "\n",
        "label_column_name = 'Class' # output label"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Autofeaturization'></a>\n",
        "## AutoFeaturization\n",
        "\n",
        "Instantiate an AutoMLConfig object. This defines the settings and data used to run the autofeaturization experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification or regression|\n",
        "|**training_data**|Input training dataset, containing both features and label column.|\n",
        "|**iterations**|For an autofeaturization run, iterations will be 0.|\n",
        "|**featurization**|For an autofeaturization run, featurization will be 'auto'.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "\n",
        "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_config = AutoMLConfig(task = 'classification',\n",
        "                             debug_log = 'automl_errors.log',\n",
        "                             iterations = 0, # autofeaturization run can be triggered by setting iterations to 0\n",
        "                             compute_target = compute_target,\n",
        "                             training_data = training_dataset,\n",
        "                             label_column_name = label_column_name,\n",
        "                             featurization = 'auto',\n",
        "                             verbosity = logging.INFO\n",
        "                            )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data this can run for a while. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output = False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Transformer and Featurized Datasets\n",
        "The given datasets have been featurized and stored under `Outputs + logs` from the details page of the remote run. The structure is shown below. The featurized dataset is stored under `/outputs/featurization/data` and the transformer is saved under `/outputs/featurization/pipeline` \n",
        "\n",
        "Below you will learn how to refer to the data saved in your run and retrieve the same."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Featurized Data](https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/autofeaturization_img.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Widget for Monitoring Runs\n",
        "\n",
        "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n",
        "\n",
        "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(remote_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning - AutoFeaturization (Part 2)\n",
        "_**Training using a custom model with the featurized data from Autofeaturization run of credit card fraudulent transactions dataset**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Data Setup](#DataSetup)\n",
        "1. [Autofeaturization Data](#AutofeaturizationData)\n",
        "1. [Train](#Train)\n",
        "1. [Results](#Results)\n",
        "1. [Test](#Test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Introduction'></a>\n",
        "## Introduction\n",
        "\n",
        "Here we use the featurized dataset saved in the above run to showcase how you can perform custom training by using the transformer from an autofeaturization run to transform validation / test datasets. \n",
        "\n",
        "The goal is to use autofeaturized run data and transformer to transform and run a custom training experiment independently\n",
        "\n",
        "In the below steps, you will learn how to:\n",
        "1. Read transformer from a completed autofeaturization run and transform data\n",
        "2. Pull featurized data from a completed autofeaturization run\n",
        "3. Run a custom training experiment with the above data\n",
        "4. Check results"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='DataSetup'></a>\n",
        "## Data Setup"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will load the featurized training data and also load the transformer from the above autofeaturized run. This transformer can then be used to transform the test data to check the accuracy of the custom model after training."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Test Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "load test dataset from CSV and split into X and y columns to featurize with the transformer going forward."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/creditcard_test.csv\"\n",
        "\n",
        "test_dataset = pd.read_csv(test_data)\n",
        "label_column_name = 'Class'\n",
        "\n",
        "X_test_data = test_dataset[test_dataset.columns.difference([label_column_name])]\n",
        "y_test_data = test_dataset[label_column_name].values\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load data_transformer from the above remote run artifact"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### (Method 1)\n",
        "\n",
        "Method 1 allows you to read the transformer from the remote storage."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import mlflow\n",
        "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())\n",
        "\n",
        "# Set uri to fetch data transformer from remote parent run.\n",
        "artifact_path = \"/outputs/featurization/pipeline/\"\n",
        "uri = \"runs:/\" + remote_run.id + artifact_path\n",
        "\n",
        "print(uri)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### (Method 2)\n",
        "\n",
        "Method 2 downloads the transformer to the local directory and then can be used to transform the data. Uncomment to use."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "''' import pathlib\n",
        "\n",
        "# Download the transformer to the local directory\n",
        "transformers_file_path = \"/outputs/featurization/pipeline/\"\n",
        "local_path = \"./transformer\"\n",
        "remote_run.download_files(prefix=transformers_file_path, output_directory=local_path, batch_size=500)\n",
        "\n",
        "path = pathlib.Path(\"transformer\") \n",
        "path = str(path.absolute()) + transformers_file_path\n",
        "str_uri = \"file:///\" + path\n",
        "\n",
        "print(str_uri) '''"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Transform Data"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "**Note:** Not all datasets produce a y_transformer. The dataset used in the current notebook requires a transformer as the y column data is categorical. \n",
        "\n",
        "We will go ahead and download the mlflow transformer model and use it to transform test data that can be used for further experimentation below. To run the commented code, make sure the environment requirement is satisfied. You can go ahead and create the environment from the `conda.yaml` file under `/outputs/featurization/pipeline/` and run the given code in it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "''' from azureml.automl.core.shared.constants import Transformers\n",
        "\n",
        "transformers = mlflow.sklearn.load_model(uri) # Using method 1\n",
        "data_transformers = transformers.get_transformers()\n",
        "x_transformer = data_transformers[Transformers.X_TRANSFORMER]\n",
        "y_transformer = data_transformers[Transformers.Y_TRANSFORMER]\n",
        "\n",
        "X_test = x_transformer.transform(X_test_data)\n",
        "y_test = y_transformer.transform(y_test_data) '''"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the following cell to see the featurization summary of X and y transformers. Uncomment to use.  "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "''' X_data_summary = x_transformer.get_featurization_summary(is_user_friendly=False)\n",
        "\n",
        "summary_df = pd.DataFrame.from_records(X_data_summary)\n",
        "summary_df '''"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Datastore\n",
        "\n",
        "The below data store holds the featurized datasets, hence we load and access the data. Check the path and file names according to the saved structure in your experiment `Outputs + logs` as seen in <i>Autofeaturization Part 1</i>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.datastore import Datastore\n",
        "\n",
        "ds = Datastore.get(ws, \"workspaceartifactstore\")\n",
        "experiment_loc = \"ExperimentRun/dcid.\" + remote_run.id\n",
        "\n",
        "remote_data_path = \"/outputs/featurization/data/\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='AutofeaturizationData'></a>\n",
        "## Autofeaturization Data\n",
        "\n",
        "We will load the training data from the previously completed Autofeaturization experiment. The resulting featurized dataframe can be passed into the custom model for training. Here we are saving the file to local from the experiment storage and reading the data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "train_data_file_path = \"full_training_dataset.df.parquet\"\n",
        "local_data_path = \"./data/\" + train_data_file_path\n",
        "\n",
        "remote_run.download_file(remote_data_path + train_data_file_path, local_data_path)\n",
        "\n",
        "full_training_data = pd.read_parquet(local_data_path)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Another way to load the data is to go to the above autofeaturization experiment and check for the featurized dataset ids under `Output datasets`. Uncomment and replace them accordingly below, to use."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# train_data = Dataset.get_by_id(ws, 'cb4418ee-bac4-45ac-b055-600653bdf83a') # replace the featurized full_training_dataset id\n",
        "# full_training_data = train_data.to_pandas_dataframe()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Training Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We are dropping the y column and weights column from the featurized training dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "Y_COLUMN = \"automl_y\"\n",
        "SW_COLUMN = \"automl_weights\"\n",
        "\n",
        "X_train = full_training_data[full_training_data.columns.difference([Y_COLUMN, SW_COLUMN])]\n",
        "y_train = full_training_data[Y_COLUMN].values\n",
        "sample_weight = full_training_data[SW_COLUMN].values"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Train'></a>\n",
        "## Train"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here we are passing our training data to the lightgbm classifier, any custom model can be used with your data. Let us first install lightgbm."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "! pip install lightgbm"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import lightgbm as lgb\n",
        "\n",
        "model = lgb.LGBMClassifier(learning_rate=0.08,max_depth=-5,random_state=42)\n",
        "model.fit(X_train, y_train, sample_weight=sample_weight)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Once training is done, the test data obtained after transforming from the above downloaded transformer can be used to calculate the accuracy "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print('Training accuracy {:.4f}'.format(model.score(X_train, y_train)))\n",
        "\n",
        "# Uncomment below to test the model on test data \n",
        "# print('Testing accuracy {:.4f}'.format(model.score(X_test, y_test)))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Results'></a>\n",
        "## Analyze results\n",
        "\n",
        "### Retrieve the Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<a id='Test'></a>\n",
        "## Test the fitted model\n",
        "\n",
        "Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Uncomment below to test the model on test data\n",
        "# y_pred = model.predict(X_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Experiment Complete!"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "bhavanatumma"
      }
    ],
    "interpreter": {
      "hash": "adb464b67752e4577e3dc163235ced27038d19b7d88def00d75d1975bde5d9ab"
    },
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client.cmd
+++ b/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client.cmd
@@ -0,0 +1,63 @@
@echo off
 set conda_env_name=%1
 set automl_env_file=%2
 set options=%3
 set PIP_NO_WARN_SCRIPT_LOCATION=0
 IF "%conda_env_name%"=="" SET conda_env_name="azure_automl_experimental"
 IF "%automl_env_file%"=="" SET automl_env_file="automl_thin_client_env.yml"
 IF NOT EXIST %automl_env_file% GOTO YmlMissing
 IF "%CONDA_EXE%"=="" GOTO CondaMissing
 call conda activate %conda_env_name% 2>nul:
 if not errorlevel 1 (
  echo Upgrading existing conda environment %conda_env_name%
  call pip uninstall azureml-train-automl -y -q
  call conda env update --name %conda_env_name% --file %automl_env_file%
  if errorlevel 1 goto ErrorExit
 ) else (
  call conda env create -f %automl_env_file% -n %conda_env_name%
 )
 call conda activate %conda_env_name% 2>nul:
 if errorlevel 1 goto ErrorExit
 call python -m ipykernel install --user --name %conda_env_name% --display-name "Python (%conda_env_name%)"
 REM azureml.widgets is now installed as part of the pip install under the conda env.
 REM Removing the old user install so that the notebooks will use the latest widget.
 call jupyter nbextension uninstall --user --py azureml.widgets
 echo.
 echo.
 echo ***************************************
 echo * AutoML setup completed successfully *
 echo ***************************************
 IF NOT "%options%"=="nolaunch" (
  echo.
  echo Starting jupyter notebook - please run the configuration notebook 
  echo.
  jupyter notebook --log-level=50 --notebook-dir='..\..'
 )
 goto End
 :CondaMissing
 echo Please run this script from an Anaconda Prompt window.
 echo You can start an Anaconda Prompt window by
 echo typing Anaconda Prompt on the Start menu.
 echo If you don't see the Anaconda Prompt app, install Miniconda.
 echo If you are running an older version of Miniconda or Anaconda,
 echo you can upgrade using the command: conda update conda
 goto End
 :YmlMissing
 echo File %automl_env_file% not found.
 :ErrorExit
 echo Install failed
 :End
--- a/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client_linux.sh
+++ b/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client_linux.sh
@@ -0,0 +1,53 @@
 #!/bin/bash
 CONDA_ENV_NAME=$1
 AUTOML_ENV_FILE=$2
 OPTIONS=$3
 PIP_NO_WARN_SCRIPT_LOCATION=0
 if [ "$CONDA_ENV_NAME" == "" ]
 then
  CONDA_ENV_NAME="azure_automl_experimental"
 fi
 if [ "$AUTOML_ENV_FILE" == "" ]
 then
  AUTOML_ENV_FILE="automl_thin_client_env.yml"
 fi
 if [ ! -f $AUTOML_ENV_FILE ]; then
    echo "File $AUTOML_ENV_FILE not found"
    exit 1
 fi
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
   pip uninstall azureml-train-automl -y -q
   conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
   jupyter nbextension uninstall --user --py azureml.widgets
 else
   conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
   source activate $CONDA_ENV_NAME &&
   python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
   jupyter nbextension uninstall --user --py azureml.widgets &&
   echo "" &&
   echo "" &&
   echo "***************************************" &&
   echo "* AutoML setup completed successfully *" &&
   echo "***************************************" &&
   if [ "$OPTIONS" != "nolaunch" ]
   then
      echo "" &&
      echo "Starting jupyter notebook - please run the configuration notebook" &&
      echo "" &&
      jupyter notebook --log-level=50 --notebook-dir '../..'
   fi
 fi
 if [ $? -gt 0 ]
 then
   echo "Installation failed"
 fi
--- a/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client_mac.sh
+++ b/how-to-use-azureml/automated-machine-learning/experimental/automl_setup_thin_client_mac.sh
@@ -0,0 +1,55 @@
 #!/bin/bash
 CONDA_ENV_NAME=$1
 AUTOML_ENV_FILE=$2
 OPTIONS=$3
 PIP_NO_WARN_SCRIPT_LOCATION=0
 if [ "$CONDA_ENV_NAME" == "" ]
 then
  CONDA_ENV_NAME="azure_automl_experimental"
 fi
 if [ "$AUTOML_ENV_FILE" == "" ]
 then
  AUTOML_ENV_FILE="automl_thin_client_env_mac.yml"
 fi
 if [ ! -f $AUTOML_ENV_FILE ]; then
    echo "File $AUTOML_ENV_FILE not found"
    exit 1
 fi
 if source activate $CONDA_ENV_NAME 2> /dev/null
 then
   echo "Upgrading existing conda environment" $CONDA_ENV_NAME
   pip uninstall azureml-train-automl -y -q
   conda env update --name $CONDA_ENV_NAME --file $AUTOML_ENV_FILE &&
   jupyter nbextension uninstall --user --py azureml.widgets
 else
   conda env create -f $AUTOML_ENV_FILE -n $CONDA_ENV_NAME &&
   source activate $CONDA_ENV_NAME &&
   conda install lightgbm -c conda-forge -y &&
   python -m ipykernel install --user --name $CONDA_ENV_NAME --display-name "Python ($CONDA_ENV_NAME)" &&
   jupyter nbextension uninstall --user --py azureml.widgets &&
   echo "" &&
   echo "" &&
   echo "***************************************" &&
   echo "* AutoML setup completed successfully *" &&
   echo "***************************************" &&
   if [ "$OPTIONS" != "nolaunch" ]
   then
      echo "" &&
      echo "Starting jupyter notebook - please run the configuration notebook" &&
      echo "" &&
      jupyter notebook --log-level=50 --notebook-dir '../..'
   fi
 fi
 if [ $? -gt 0 ]
 then
   echo "Installation failed"
 fi
--- a/how-to-use-azureml/automated-machine-learning/experimental/automl_thin_client_env.yml
+++ b/how-to-use-azureml/automated-machine-learning/experimental/automl_thin_client_env.yml
@@ -0,0 +1,15 @@
 name: azure_automl_experimental
 dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.7.0 and later.
 - pip<=22.3.1
 - python>=3.7.0,<3.11
 - pip:
  # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults
  - azureml-sdk
  - azureml-widgets
  - azureml-mlflow
  - pandas
  - mlflow
--- a/how-to-use-azureml/automated-machine-learning/experimental/automl_thin_client_env_mac.yml
+++ b/how-to-use-azureml/automated-machine-learning/experimental/automl_thin_client_env_mac.yml
@@ -0,0 +1,24 @@
 name: azure_automl_experimental
 channels:
  - conda-forge
  - main
 dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.7.0 and later.
 - pip<=20.2.4
 - nomkl
 - python>=3.7.0,<3.11
 - urllib3==1.26.7
 - PyJWT < 2.0.0
 - numpy>=1.21.6,<=1.22.3
 - pip:
  # Required packages for AzureML execution, history, and data preparation.
  - azure-core==1.24.1
  - azure-identity==1.7.0
  - azureml-defaults
  - azureml-sdk
  - azureml-widgets
  - azureml-mlflow
  - pandas
  - mlflow
--- a/how-to-use-azureml/automated-machine-learning/experimental/regression-model-proxy/auto-ml-regression-model-proxy.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/experimental/regression-model-proxy/auto-ml-regression-model-proxy.ipynb
@@ -0,0 +1,470 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/experimental/regression-model-proxy/auto-ml-regression-model-proxy.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Regression with Aml Compute**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Train](#Train)\n",
        "1. [Results](#Results)\n",
        "1. [Test](#Test)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "In this example we use an experimental feature, Model Proxy, to do a predict on the best generated model without downloading the model locally. The prediction will happen on same compute and environment that was used to train the model. This feature is currently in the experimental state, which means that the API is prone to changing, please make sure to run on the latest version of this notebook if you face any issues.\n",
        "This notebook will also leverage MLFlow for saving models, allowing for more portability of the resulting models. See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow for more details around MLFlow is AzureML.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set.  Otherwise, go through the [configuration](../../../../configuration.ipynb)  notebook first if you haven't already to establish your connection to the AzureML Workspace. \n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Create an `Experiment` in an existing `Workspace`.\n",
        "2. Configure AutoML using `AutoMLConfig`.\n",
        "3. Train the model using remote compute.\n",
        "4. Explore the results.\n",
        "5. Test the best fitted model."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n",
        "\n",
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "\n",
        "import json\n",
        "\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.core.dataset import Dataset\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.59.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# Choose a name for the experiment.\n",
        "experiment_name = 'automl-regression-model-proxy'\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Run History Name'] = experiment_name\n",
        "output"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Using AmlCompute\n",
        "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you use `AmlCompute` as your training compute resource."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "# Try to ensure that the cluster name is unique across the notebooks\n",
        "cpu_cluster_name = \"reg-model-proxy\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
        "    print('Found existing cluster, use it.')\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n",
        "                                                           max_nodes=4)\n",
        "    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Load Data\n",
        "Load the hardware dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/machineData.csv\"\n",
        "dataset = Dataset.Tabular.from_delimited_files(data)\n",
        "\n",
        "# Split the dataset into train and test datasets\n",
        "train_data, test_data = dataset.random_split(percentage=0.8, seed=223)\n",
        "\n",
        "label = \"ERP\"\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The split data will be used in the remote compute by ModelProxy and locally to compare results.\n",
        "So, we need to persist the split data to avoid descrepencies from different package versions in the local and remote."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds = ws.get_default_datastore()\n",
        "\n",
        "train_data = Dataset.Tabular.register_pandas_dataframe(\n",
        "    train_data.to_pandas_dataframe(), target=(ds, \"machineTrainData\"), name=\"train_data\")\n",
        "\n",
        "test_data = Dataset.Tabular.register_pandas_dataframe(\n",
        "    test_data.to_pandas_dataframe(), target=(ds, \"machineTestData\"), name=\"test_data\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train\n",
        "\n",
        "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|classification, regression or forecasting|\n",
        "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics: <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
        "|**n_cross_validations**|Number of cross validation splits.|\n",
        "|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n",
        "|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
        "\n",
        "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "automlconfig-remarks-sample"
        ]
      },
      "outputs": [],
      "source": [
        "automl_settings = {\n",
        "    \"n_cross_validations\": 3,\n",
        "    \"primary_metric\": 'r2_score',\n",
        "    \"enable_early_stopping\": True, \n",
        "    \"experiment_timeout_hours\": 0.3, #for real scenarios we recommend a timeout of at least one hour \n",
        "    \"max_concurrent_iterations\": 4,\n",
        "    \"max_cores_per_iteration\": -1,\n",
        "    \"verbosity\": logging.INFO,\n",
        "    \"save_mlflow\": True,\n",
        "}\n",
        "\n",
        "automl_config = AutoMLConfig(task = 'regression',\n",
        "                             compute_target = compute_target,\n",
        "                             training_data = train_data,\n",
        "                             label_column_name = label,\n",
        "                             **automl_settings\n",
        "                            )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Execution of remote runs is asynchronous. Depending on the data and the number of iterations this can run for a while.  Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output = False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# If you need to retrieve a run that already started, use the following code\n",
        "#from azureml.train.automl.run import AutoMLRun\n",
        "#remote_run = AutoMLRun(experiment = experiment, run_id = '<replace with your run id>')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Results"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Child Run\n",
        "\n",
        "Below we select the best pipeline from our iterations. The `get_best_child` method returns the best run. Overloads on `get_best_child` allow you to retrieve the best run for *any* logged metric."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run = remote_run.get_best_child()\n",
        "print(best_run)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Show hyperparameters\n",
        "Show the model pipeline used for the best run with its hyperparameters.\n",
        "For ensemble pipelines it shows the iterations and algorithms that are ensembled."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run_properties = best_run.get_details()['properties']\n",
        "pipeline_script = json.loads(run_properties['pipeline_script'])\n",
        "print(json.dumps(pipeline_script, indent = 1)) \n",
        "\n",
        "if 'ensembled_iterations' in run_properties:\n",
        "    print(\"\")\n",
        "    print(\"Ensembled Iterations\")\n",
        "    print(run_properties['ensembled_iterations'])\n",
        "    \n",
        "if 'ensembled_algorithms' in run_properties:\n",
        "    print(\"\")\n",
        "    print(\"Ensembled Algorithms\")\n",
        "    print(run_properties['ensembled_algorithms'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Best Child Run Based on Any Other Metric\n",
        "Show the run and the model that has the smallest `root_mean_squared_error` value (which turned out to be the same as the one with largest `spearman_correlation` value):"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "lookup_metric = \"root_mean_squared_error\"\n",
        "best_run = remote_run.get_best_child(metric = lookup_metric)\n",
        "print(best_run)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "y_test = test_data.keep_columns('ERP')\n",
        "test_data = test_data.drop_columns('ERP')\n",
        "\n",
        "\n",
        "y_train = train_data.keep_columns('ERP')\n",
        "train_data = train_data.drop_columns('ERP')\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Creating ModelProxy for submitting prediction runs to the training environment.\n",
        "We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.train.automl.model_proxy import ModelProxy\n",
        "best_model_proxy = ModelProxy(best_run)\n",
        "y_pred_train = best_model_proxy.predict(train_data)\n",
        "y_pred_test = best_model_proxy.predict(test_data)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Exploring results"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "y_pred_train = y_pred_train.to_pandas_dataframe().values.flatten()\n",
        "y_train = y_train.to_pandas_dataframe().values.flatten()\n",
        "y_residual_train = y_train - y_pred_train\n",
        "\n",
        "y_pred_test = y_pred_test.to_pandas_dataframe().values.flatten()\n",
        "y_test = y_test.to_pandas_dataframe().values.flatten()\n",
        "y_residual_test = y_test - y_pred_test\n",
        "print(y_residual_train)\n",
        "print(y_residual_test)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "sekrupa"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "automated-machine-learning"
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/Backtesting.png
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/Backtesting.png
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/assets/score.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/assets/score.py
@@ -0,0 +1,174 @@
 from typing import Any, Dict, Optional, List
 import argparse
 import json
 import os
 import re
 import numpy as np
 import pandas as pd
 from matplotlib import pyplot as plt
 from matplotlib.backends.backend_pdf import PdfPages
 from azureml.automl.core.shared import constants
 from azureml.automl.core.shared.types import GrainType
 from azureml.automl.runtime.shared.score import scoring
 GRAIN = "time_series_id"
 BACKTEST_ITER = "backtest_iteration"
 ACTUALS = "actual_level"
 PREDICTIONS = "predicted_level"
 ALL_GRAINS = "all_sets"
 FORECASTS_FILE = "forecast.csv"
 SCORES_FILE = "scores.csv"
 PLOTS_FILE = "plots_fcst_vs_actual.pdf"
 RE_INVALID_SYMBOLS = re.compile("[: ]")
 def _compute_metrics(df: pd.DataFrame, metrics: List[str]):
    """
    Compute metrics for one data frame.
    :param df: The data frame which contains actual_level and predicted_level columns.
    :return: The data frame with two columns - metric_name and metric.
    """
    scores = scoring.score_regression(
        y_test=df[ACTUALS], y_pred=df[PREDICTIONS], metrics=metrics
    )
    metrics_df = pd.DataFrame(list(scores.items()), columns=["metric_name", "metric"])
    metrics_df.sort_values(["metric_name"], inplace=True)
    metrics_df.reset_index(drop=True, inplace=True)
    return metrics_df
 def _format_grain_name(grain: GrainType) -> str:
    """
    Convert grain name to string.
    :param grain: the grain name.
    :return: the string representation of the given grain.
    """
    if not isinstance(grain, tuple) and not isinstance(grain, list):
        return str(grain)
    grain = list(map(str, grain))
    return "|".join(grain)
 def compute_all_metrics(
    fcst_df: pd.DataFrame,
    ts_id_colnames: List[str],
    metric_names: Optional[List[set]] = None,
 ):
    """
    Calculate metrics per grain.
    :param fcst_df: forecast data frame. Must contain 2 columns: 'actual_level' and 'predicted_level'
    :param metric_names: (optional) the list of metric names to return
    :param ts_id_colnames: (optional) list of grain column names
    :return: dictionary of summary table for all tests and final decision on stationary vs nonstaionary
    """
    if not metric_names:
        metric_names = list(constants.Metric.SCALAR_REGRESSION_SET)
    if ts_id_colnames is None:
        ts_id_colnames = []
    metrics_list = []
    if ts_id_colnames:
        for grain, df in fcst_df.groupby(ts_id_colnames):
            one_grain_metrics_df = _compute_metrics(df, metric_names)
            one_grain_metrics_df[GRAIN] = _format_grain_name(grain)
            metrics_list.append(one_grain_metrics_df)
    # overall metrics
    one_grain_metrics_df = _compute_metrics(fcst_df, metric_names)
    one_grain_metrics_df[GRAIN] = ALL_GRAINS
    metrics_list.append(one_grain_metrics_df)
    # collect into a data frame
    return pd.concat(metrics_list)
 def _draw_one_plot(
    df: pd.DataFrame,
    time_column_name: str,
    grain_column_names: List[str],
    pdf: PdfPages,
 ) -> None:
    """
    Draw the single plot.
    :param df: The data frame with the data to build plot.
    :param time_column_name: The name of a time column.
    :param grain_column_names: The name of grain columns.
    :param pdf: The pdf backend used to render the plot.
    """
    fig, _ = plt.subplots(figsize=(20, 10))
    df = df.set_index(time_column_name)
    plt.plot(df[[ACTUALS, PREDICTIONS]])
    plt.xticks(rotation=45)
    iteration = df[BACKTEST_ITER].iloc[0]
    if grain_column_names:
        grain_name = [df[grain].iloc[0] for grain in grain_column_names]
        plt.title(f"Time series ID: {_format_grain_name(grain_name)} {iteration}")
    plt.legend(["actual", "forecast"])
    plt.close(fig)
    pdf.savefig(fig)
 def calculate_scores_and_build_plots(
    input_dir: str, output_dir: str, automl_settings: Dict[str, Any]
 ):
    os.makedirs(output_dir, exist_ok=True)
    grains = automl_settings.get(
        constants.TimeSeries.TIME_SERIES_ID_COLUMN_NAMES,
        automl_settings.get(constants.TimeSeries.GRAIN_COLUMN_NAMES, None),
    )
    time_column_name = automl_settings.get(constants.TimeSeries.TIME_COLUMN_NAME)
    if grains is None:
        grains = []
    if isinstance(grains, str):
        grains = [grains]
    while BACKTEST_ITER in grains:
        grains.remove(BACKTEST_ITER)
    dfs = []
    for fle in os.listdir(input_dir):
        file_path = os.path.join(input_dir, fle)
        if os.path.isfile(file_path) and file_path.endswith(".csv"):
            df_iter = pd.read_csv(file_path, parse_dates=[time_column_name])
            for _, iteration in df_iter.groupby(BACKTEST_ITER):
                dfs.append(iteration)
    forecast_df = pd.concat(dfs, sort=False, ignore_index=True)
    # To make sure plots are in order, sort the predictions by grain and iteration.
    ts_index = grains + [BACKTEST_ITER]
    forecast_df.sort_values(by=ts_index, inplace=True)
    pdf = PdfPages(os.path.join(output_dir, PLOTS_FILE))
    for _, one_forecast in forecast_df.groupby(ts_index):
        _draw_one_plot(one_forecast, time_column_name, grains, pdf)
    pdf.close()
    forecast_df.to_csv(os.path.join(output_dir, FORECASTS_FILE), index=False)
    # Remove np.NaN and np.inf from the prediction and actuals data.
    forecast_df.replace([np.inf, -np.inf], np.nan, inplace=True)
    forecast_df.dropna(subset=[ACTUALS, PREDICTIONS], inplace=True)
    metrics = compute_all_metrics(forecast_df, grains + [BACKTEST_ITER])
    metrics.to_csv(os.path.join(output_dir, SCORES_FILE), index=False)
 if __name__ == "__main__":
    args = {"forecasts": "--forecasts", "scores_out": "--output-dir"}
    parser = argparse.ArgumentParser("Parsing input arguments.")
    for argname, arg in args.items():
        parser.add_argument(arg, dest=argname, required=True)
    parsed_args, _ = parser.parse_known_args()
    input_dir = parsed_args.forecasts
    output_dir = parsed_args.scores_out
    with open(
        os.path.join(
            os.path.dirname(os.path.realpath(__file__)), "automl_settings.json"
        )
    ) as json_file:
        automl_settings = json.load(json_file)
    calculate_scores_and_build_plots(input_dir, output_dir, automl_settings)
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/auto-ml-forecasting-backtest-many-models.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/auto-ml-forecasting-backtest-many-models.ipynb
@@ -0,0 +1,779 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-backtest-many-models/auto-ml-forecasting-backtest-many-models.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Many Models with Backtesting - Automated ML\n",
        "**_Backtest many models time series forecasts with Automated Machine Learning_**\n",
        "\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For this notebook we are using a synthetic dataset to demonstrate the back testing in many model scenario. This allows us to check historical performance of AutoML on a historical data. To do that we step back on the backtesting period by the data set several times and split the data to train and test sets. Then these data sets are used for training and evaluation of model.<br>\n",
        "\n",
        "Thus, it is a quick way of evaluating AutoML as if it was in production. Here, we do not test historical performance of a particular model, for this see the [notebook](../forecasting-backtest-single-model/auto-ml-forecasting-backtest-single-model.ipynb). Instead, the best model for every backtest iteration can be different since AutoML chooses the best model for a given training set.\n",
        "\n",
        "![Backtesting](Backtesting.png)\n",
        "\n",
        "**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Prerequisites\n",
        "You'll need to create a compute Instance by following [these](https://learn.microsoft.com/en-us/azure/machine-learning/v1/how-to-create-manage-compute-instance?tabs=python) instructions."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1.0 Set up workspace, datastore, experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1613003526897
        }
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core import Workspace, Datastore\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "\n",
        "from pandas.tseries.frequencies import to_offset\n",
        "\n",
        "# Set up your workspace\n",
        "ws = Workspace.from_config()\n",
        "ws.get_details()\n",
        "\n",
        "# Set up your datastores\n",
        "dstore = ws.get_default_datastore()\n",
        "\n",
        "output = {}\n",
        "output[\"SDK version\"] = azureml.core.VERSION\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Default datastore name\"] = dstore.name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is compatible with Azure ML SDK version 1.35.1 or later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Choose an experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1613003540729
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core import Experiment\n",
        "\n",
        "experiment = Experiment(ws, \"automl-many-models-backtest\")\n",
        "\n",
        "print(\"Experiment name: \" + experiment.name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2.0 Data\n",
        "\n",
        "#### 2.1 Data generation\n",
        "For this notebook we will generate the artificial data set with two [time series IDs](https://docs.microsoft.com/en-us/python/api/azureml-automl-core/azureml.automl.core.forecasting_parameters.forecastingparameters?view=azure-ml-py). Then we will generate backtest folds and will upload it to the default BLOB storage and create a [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# simulate data: 2 grains - 700\n",
        "TIME_COLNAME = \"date\"\n",
        "TARGET_COLNAME = \"value\"\n",
        "TIME_SERIES_ID_COLNAME = \"ts_id\"\n",
        "\n",
        "sample_size = 700\n",
        "# Set the random seed for reproducibility of results.\n",
        "np.random.seed(20)\n",
        "X1 = pd.DataFrame(\n",
        "    {\n",
        "        TIME_COLNAME: pd.date_range(start=\"2018-01-01\", periods=sample_size),\n",
        "        TARGET_COLNAME: np.random.normal(loc=100, scale=20, size=sample_size),\n",
        "        TIME_SERIES_ID_COLNAME: \"ts_A\",\n",
        "    }\n",
        ")\n",
        "X2 = pd.DataFrame(\n",
        "    {\n",
        "        TIME_COLNAME: pd.date_range(start=\"2018-01-01\", periods=sample_size),\n",
        "        TARGET_COLNAME: np.random.normal(loc=100, scale=20, size=sample_size),\n",
        "        TIME_SERIES_ID_COLNAME: \"ts_B\",\n",
        "    }\n",
        ")\n",
        "\n",
        "X = pd.concat([X1, X2], ignore_index=True, sort=False)\n",
        "print(\"Simulated dataset contains {} rows \\n\".format(X.shape[0]))\n",
        "X.head()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we will generate 8 backtesting folds with backtesting period of 7 days and with the same forecasting horizon. We will add the column \"backtest_iteration\", which will identify the backtesting period by the last training date."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "offset_type = \"7D\"\n",
        "NUMBER_OF_BACKTESTS = 8  # number of train/test sets to generate\n",
        "\n",
        "dfs_train = []\n",
        "dfs_test = []\n",
        "for ts_id, df_one in X.groupby(TIME_SERIES_ID_COLNAME):\n",
        "\n",
        "    data_end = df_one[TIME_COLNAME].max()\n",
        "\n",
        "    for i in range(NUMBER_OF_BACKTESTS):\n",
        "        train_cutoff_date = data_end - to_offset(offset_type)\n",
        "        df_one = df_one.copy()\n",
        "        df_one[\"backtest_iteration\"] = \"iteration_\" + str(train_cutoff_date)\n",
        "        train = df_one[df_one[TIME_COLNAME] <= train_cutoff_date]\n",
        "        test = df_one[\n",
        "            (df_one[TIME_COLNAME] > train_cutoff_date)\n",
        "            & (df_one[TIME_COLNAME] <= data_end)\n",
        "        ]\n",
        "        data_end = train[TIME_COLNAME].max()\n",
        "        dfs_train.append(train)\n",
        "        dfs_test.append(test)\n",
        "\n",
        "X_train = pd.concat(dfs_train, sort=False, ignore_index=True)\n",
        "X_test = pd.concat(dfs_test, sort=False, ignore_index=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### 2.2 Create the Tabular Data Set.\n",
        "\n",
        "A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
        "\n",
        "Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) documentation on how to access data from Datastore.\n",
        "\n",
        "In this next step, we will upload the data and create a TabularDataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "ds = ws.get_default_datastore()\n",
        "# Upload saved data to the default data store.\n",
        "train_data = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    X_train, target=(ds, \"data_mm\"), name=\"data_train\"\n",
        ")\n",
        "test_data = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    X_test, target=(ds, \"data_mm\"), name=\"data_test\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3.0 Build the training pipeline\n",
        "Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Choose a compute target\n",
        "\n",
        "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
        "\n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1613007037308
        }
      },
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "\n",
        "# Name your cluster\n",
        "compute_name = \"backtest-mm\"\n",
        "\n",
        "\n",
        "if compute_name in ws.compute_targets:\n",
        "    compute_target = ws.compute_targets[compute_name]\n",
        "    if compute_target and type(compute_target) is AmlCompute:\n",
        "        print(\"Found compute target: \" + compute_name)\n",
        "else:\n",
        "    print(\"Creating a new compute target...\")\n",
        "    provisioning_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=6\n",
        "    )\n",
        "    # Create the compute target\n",
        "    compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
        "\n",
        "    # Can poll for a minimum number of nodes and for a specific timeout.\n",
        "    # If no min node count is provided it will use the scale settings for the cluster\n",
        "    compute_target.wait_for_completion(\n",
        "        show_output=True, min_node_count=None, timeout_in_minutes=20\n",
        "    )\n",
        "\n",
        "    # For a more detailed view of current cluster status, use the 'status' property\n",
        "    print(compute_target.status.serialize())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Set up training parameters\n",
        "\n",
        "We need to provide ``ForecastingParameters``, ``AutoMLConfig`` and ``ManyModelsTrainParameters`` objects. For the forecasting task we also need to define several settings including the name of the time column, the maximum forecast horizon, and the partition column name(s) definition.\n",
        "\n",
        "#### ``ForecastingParameters`` arguments\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **forecast_horizon**               | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
        "| **time_column_name**               | The name of your time column. |\n",
        "| **time_series_id_column_names**    | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
        "| **cv_step_size**                   | Number of periods between two consecutive cross-validation folds. The default value is \\\"auto\\\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value. |\n",
        "\n",
        "#### ``AutoMLConfig`` arguments\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **task**                           | forecasting |\n",
        "| **primary_metric**                 | This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i> |\n",
        "| **blocked_models**                 | Blocked models won't be used by AutoML. |\n",
        "| **iteration_timeout_minutes**      | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **iterations**                     | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **experiment_timeout_hours**       | Maximum amount of time in hours that each experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. **It does not control the overall timeout for the pipeline run, instead controls the timeout for each training run per partitioned time series.** |\n",
        "| **label_column_name**              | The name of the label column. |\n",
        "| **n_cross_validations**            | Number of cross validation splits. The default value is \\\"auto\\\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
        "| **enable_early_stopping**          | Flag to enable early termination if the primary metric is no longer improving. |\n",
        "| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
        "| **track_child_runs**               | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
        "| **pipeline_fetch_max_batch_size**  | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
        "\n",
        "\n",
        "#### ``ManyModelsTrainParameters`` arguments\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **automl_settings**                | The ``AutoMLConfig`` object defined above. |\n",
        "| **partition_column_names**         | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1613007061544
        }
      },
      "outputs": [],
      "source": [
        "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
        "    ManyModelsTrainParameters,\n",
        ")\n",
        "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
        "from azureml.train.automl.automlconfig import AutoMLConfig\n",
        "\n",
        "partition_column_names = [TIME_SERIES_ID_COLNAME, \"backtest_iteration\"]\n",
        "\n",
        "forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=TIME_COLNAME,\n",
        "    forecast_horizon=6,\n",
        "    time_series_id_column_names=partition_column_names,\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_settings = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    iteration_timeout_minutes=10,\n",
        "    iterations=15,\n",
        "    experiment_timeout_hours=0.25,\n",
        "    label_column_name=TARGET_COLNAME,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    track_child_runs=False,\n",
        "    forecasting_parameters=forecasting_parameters,\n",
        ")\n",
        "\n",
        "\n",
        "mm_paramters = ManyModelsTrainParameters(\n",
        "    automl_settings=automl_settings, partition_column_names=partition_column_names\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Set up many models pipeline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Parallel run step is leveraged to train multiple models at once. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The process_count_per_node is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
        "\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **experiment**                     | The experiment used for training. |\n",
        "| **train_data**                     | The file dataset to be used as input to the training run. |\n",
        "| **node_count**                     | The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long. |\n",
        "| **process_count_per_node**         | Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance. |\n",
        "| **train_pipeline_parameters**      | The set of configuration parameters defined in the previous section. |\n",
        "| **run_invocation_timeout**         | Maximum amount of time in seconds that the ``ParallelRunStep`` class is allowed. This is optional but provides customers with greater control on exit criteria. This must be greater than ``experiment_timeout_hours`` by at least 300 seconds. |\n",
        "\n",
        "Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution.\n",
        "\n",
        "**Note**: Total time taken for the **training step** in the pipeline to complete = $ \\frac{t}{ p \\times n } \\times ts $\n",
        "where,\n",
        "- $ t $ is time taken for training one partition (can be viewed in the training logs)\n",
        "- $ p $ is ``process_count_per_node``\n",
        "- $ n $ is ``node_count``\n",
        "- $ ts $ is total number of partitions in time series based on ``partition_column_names``"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
        "\n",
        "\n",
        "training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
        "    experiment=experiment,\n",
        "    train_data=train_data,\n",
        "    compute_target=compute_target,\n",
        "    node_count=2,\n",
        "    process_count_per_node=2,\n",
        "    run_invocation_timeout=1200,\n",
        "    train_pipeline_parameters=mm_paramters,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import Pipeline\n",
        "\n",
        "training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit the pipeline to run\n",
        "Next we submit our pipeline to run. The whole training pipeline takes about 20 minutes using a STANDARD_DS12_V2 VM with our current ParallelRunConfig setting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_run = experiment.submit(training_pipeline)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "training_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Check the run status, if training_run is in completed state, continue to next section. Otherwise, check the portal for failures."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4.0 Backtesting\n",
        "Now that we selected the best AutoML model for each backtest fold, we will use these models to generate the forecasts and compare with the actuals."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Set up output dataset for inference data\n",
        "Output of inference can be represented as [OutputFileDatasetConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.output_dataset_config.outputdatasetconfig?view=azure-ml-py) object and OutputFileDatasetConfig can be registered as a dataset. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data import OutputFileDatasetConfig\n",
        "\n",
        "output_inference_data_ds = OutputFileDatasetConfig(\n",
        "    name=\"many_models_inference_output\",\n",
        "    destination=(dstore, \"backtesting/inference_data/\"),\n",
        ").register_on_complete(name=\"backtesting_data_ds\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For many models we need to provide the ManyModelsInferenceParameters object.\n",
        "\n",
        "#### ``ManyModelsInferenceParameters`` arguments\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **partition_column_names**         | List of column names that identifies groups. |\n",
        "| **target_column_name**             | \\[Optional] Column name only if the inference dataset has the target. |\n",
        "| **time_column_name**               | \\[Optional] Time column name only if it is timeseries. |\n",
        "| **inference_type**                 | \\[Optional] Which inference method to use on the model. Possible values are 'forecast', 'predict_proba', and 'predict'. |\n",
        "| **forecast_mode**                  | \\[Optional] The type of forecast to be used, either 'rolling' or 'recursive'; defaults to 'recursive'. |\n",
        "| **step**                           | \\[Optional] Number of periods to advance the forecasting window in each iteration **(for rolling forecast only)**; defaults to 1. |\n",
        "\n",
        "#### ``get_many_models_batch_inference_steps`` arguments\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **experiment**                     | The experiment used for inference run. |\n",
        "| **inference_data**                 | The data to use for inferencing. It should be the same schema as used for training.\n",
        "| **compute_target**                 | The compute target that runs the inference pipeline. |\n",
        "| **node_count**                     | The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku). |\n",
        "| **process_count_per_node**         | \\[Optional] The number of processes per node. By default it's 2 (should be at most half of the number of cores in a single node of the compute cluster that will be used for the experiment).\n",
        "| **inference_pipeline_parameters**  | \\[Optional] The ``ManyModelsInferenceParameters`` object defined above. |\n",
        "| **append_row_file_name**           | \\[Optional] The name of the output file (optional, default value is 'parallel_run_step.txt'). Supports 'txt' and 'csv' file extension. A 'txt' file extension generates the output in 'txt' format with space as separator without column names. A 'csv' file extension generates the output in 'csv' format with comma as separator and with column names. |\n",
        "| **train_run_id**                   | \\[Optional] The run id of the **training pipeline**. By default it is the latest successful training pipeline run in the experiment. |\n",
        "| **train_experiment_name**          | \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline. |\n",
        "| **run_invocation_timeout**         | \\[Optional] Maximum amount of time in seconds that the ``ParallelRunStep`` class is allowed. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **output_datastore**               | \\[Optional] The ``Datastore`` or ``OutputDatasetConfig`` to be used for output. If specified any pipeline output will be written to that location. If unspecified the default datastore will be used. |\n",
        "| **arguments**                      | \\[Optional] Arguments to be passed to inference script. Possible argument is '--forecast_quantiles' followed by quantile values. |"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
        "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
        "    ManyModelsInferenceParameters,\n",
        ")\n",
        "\n",
        "mm_parameters = ManyModelsInferenceParameters(\n",
        "    partition_column_names=partition_column_names,\n",
        "    time_column_name=TIME_COLNAME,\n",
        "    target_column_name=TARGET_COLNAME,\n",
        ")\n",
        "\n",
        "output_file_name = \"parallel_run_step.csv\"\n",
        "\n",
        "inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
        "    experiment=experiment,\n",
        "    inference_data=test_data,\n",
        "    node_count=2,\n",
        "    process_count_per_node=2,\n",
        "    compute_target=compute_target,\n",
        "    run_invocation_timeout=300,\n",
        "    output_datastore=output_inference_data_ds,\n",
        "    train_run_id=training_run.id,\n",
        "    train_experiment_name=training_run.experiment.name,\n",
        "    inference_pipeline_parameters=mm_parameters,\n",
        "    append_row_file_name=output_file_name,\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import Pipeline\n",
        "\n",
        "inference_pipeline = Pipeline(ws, steps=inference_steps)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "inference_run = experiment.submit(inference_pipeline)\n",
        "inference_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5.0 Retrieve results and calculate metrics\n",
        "\n",
        "The pipeline returns one file with the predictions for each times series ID and outputs the result to the forecasting_output Blob container. The details of the blob container is listed in 'forecasting_output.txt' under Outputs+logs. \n",
        "\n",
        "The next code snippet does the following:\n",
        "1. Downloads the contents of the output folder that is passed in the parallel run step \n",
        "2. Reads the parallel_run_step.txt file that has the predictions as pandas dataframe \n",
        "3. Saves the table in csv format and \n",
        "4. Displays the top 10 rows of the predictions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.contrib.automl.pipeline.steps.utilities import get_output_from_mm_pipeline\n",
        "\n",
        "PREDICTION_COLNAME = \"Predictions\"\n",
        "forecasting_results_name = \"forecasting_results\"\n",
        "forecasting_output_name = \"many_models_inference_output\"\n",
        "forecast_file = get_output_from_mm_pipeline(\n",
        "    inference_run, forecasting_results_name, forecasting_output_name, output_file_name\n",
        ")\n",
        "df = pd.read_csv(forecast_file, parse_dates=[0])\n",
        "print(\n",
        "    \"Prediction has \", df.shape[0], \" rows. Here the first 10 rows are being displayed.\"\n",
        ")\n",
        "# Save the csv file to read it in the next step.\n",
        "df.rename(\n",
        "    columns={TARGET_COLNAME: \"actual_level\", PREDICTION_COLNAME: \"predicted_level\"},\n",
        "    inplace=True,\n",
        ")\n",
        "df.to_csv(os.path.join(forecasting_results_name, \"forecast.csv\"), index=False)\n",
        "df.head(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## View metrics\n",
        "We will read in the obtained results and run the helper script, which will generate metrics and create the plots of predicted versus actual values."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from assets.score import calculate_scores_and_build_plots\n",
        "\n",
        "backtesting_results = \"backtesting_mm_results\"\n",
        "os.makedirs(backtesting_results, exist_ok=True)\n",
        "calculate_scores_and_build_plots(\n",
        "    forecasting_results_name,\n",
        "    backtesting_results,\n",
        "    automl_settings.as_serializable_dict(),\n",
        ")\n",
        "pd.DataFrame({\"File\": os.listdir(backtesting_results)})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The directory contains a set of files with results:\n",
        "- forecast.csv contains forecasts for all backtest iterations. The backtest_iteration column contains iteration identifier with the last training date as a suffix\n",
        "- scores.csv contains all metrics. If data set contains several time series, the metrics are given for all combinations of time series id and iterations, as well as scores for all iterations and time series ids, which are marked as \"all_sets\"\n",
        "- plots_fcst_vs_actual.pdf contains the predictions vs forecast plots for each iteration and, eash time series is saved as separate plot.\n",
        "\n",
        "For demonstration purposes we will display the table of metrics for one of the time series with ID \"ts0\". We will create the utility function, which will build the table with metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def get_metrics_for_ts(all_metrics, ts):\n",
        "    \"\"\"\n",
        "    Get the metrics for the time series with ID ts and return it as pandas data frame.\n",
        "\n",
        "    :param all_metrics: The table with all the metrics.\n",
        "    :param ts: The ID of a time series of interest.\n",
        "    :return: The pandas DataFrame with metrics for one time series.\n",
        "    \"\"\"\n",
        "    results_df = None\n",
        "    for ts_id, one_series in all_metrics.groupby(\"time_series_id\"):\n",
        "        if not ts_id.startswith(ts):\n",
        "            continue\n",
        "        iteration = ts_id.split(\"|\")[-1]\n",
        "        df = one_series[[\"metric_name\", \"metric\"]]\n",
        "        df.rename({\"metric\": iteration}, axis=1, inplace=True)\n",
        "        df.set_index(\"metric_name\", inplace=True)\n",
        "        if results_df is None:\n",
        "            results_df = df\n",
        "        else:\n",
        "            results_df = results_df.merge(\n",
        "                df, how=\"inner\", left_index=True, right_index=True\n",
        "            )\n",
        "    results_df.sort_index(axis=1, inplace=True)\n",
        "    return results_df\n",
        "\n",
        "\n",
        "metrics_df = pd.read_csv(os.path.join(backtesting_results, \"scores.csv\"))\n",
        "ts = \"ts_A\"\n",
        "get_metrics_for_ts(metrics_df, ts)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Forecast vs actuals plots."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.display import IFrame\n",
        "\n",
        "IFrame(\"./backtesting_mm_results/plots_fcst_vs_actual.pdf\", width=800, height=300)"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "automated-machine-learning"
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    },
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/Backtesting.png
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/Backtesting.png
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/data_split.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/data_split.py
@@ -0,0 +1,45 @@
 import argparse
 import os
 import pandas as pd
 import azureml.train.automl.runtime._hts.hts_runtime_utilities as hru
 from azureml.core import Run
 from azureml.core.dataset import Dataset
 # Parse the arguments.
 args = {
    "step_size": "--step-size",
    "step_number": "--step-number",
    "time_column_name": "--time-column-name",
    "time_series_id_column_names": "--time-series-id-column-names",
    "out_dir": "--output-dir",
 }
 parser = argparse.ArgumentParser("Parsing input arguments.")
 for argname, arg in args.items():
    parser.add_argument(arg, dest=argname, required=True)
 parsed_args, _ = parser.parse_known_args()
 step_number = int(parsed_args.step_number)
 step_size = int(parsed_args.step_size)
 # Create the working dirrectory to store the temporary csv files.
 working_dir = parsed_args.out_dir
 os.makedirs(working_dir, exist_ok=True)
 # Set input and output
 script_run = Run.get_context()
 input_dataset = script_run.input_datasets["training_data"]
 X_train = input_dataset.to_pandas_dataframe()
 # Split the data.
 for i in range(step_number):
    file_name = os.path.join(working_dir, "backtest_{}.csv".format(i))
    if parsed_args.time_series_id_column_names:
        dfs = []
        for _, one_series in X_train.groupby([parsed_args.time_series_id_column_names]):
            one_series = one_series.sort_values(
                by=[parsed_args.time_column_name], inplace=False
            )
            dfs.append(one_series.iloc[: len(one_series) - step_size * i])
        pd.concat(dfs, sort=False, ignore_index=True).to_csv(file_name, index=False)
    else:
        X_train.sort_values(by=[parsed_args.time_column_name], inplace=True)
        X_train.iloc[: len(X_train) - step_size * i].to_csv(file_name, index=False)
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/retrain_models.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/retrain_models.py
@@ -0,0 +1,178 @@
 # ---------------------------------------------------------
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # ---------------------------------------------------------
 """The batch script needed for back testing of models using PRS."""
 import argparse
 import json
 import logging
 import os
 import pickle
 import re
 import pandas as pd
 from azureml.core.experiment import Experiment
 from azureml.core.model import Model
 from azureml.core.run import Run
 from azureml.automl.core.shared import constants
 from azureml.automl.runtime.shared.score import scoring
 from azureml.train.automl import AutoMLConfig
 RE_INVALID_SYMBOLS = re.compile(r"[:\s]")
 model_name = None
 target_column_name = None
 current_step_run = None
 output_dir = None
 logger = logging.getLogger(__name__)
 def _get_automl_settings():
    with open(
        os.path.join(
            os.path.dirname(os.path.realpath(__file__)), "automl_settings.json"
        )
    ) as json_file:
        return json.load(json_file)
 def init():
    global model_name
    global target_column_name
    global output_dir
    global automl_settings
    global model_uid
    global forecast_quantiles
    logger.info("Initialization of the run.")
    parser = argparse.ArgumentParser("Parsing input arguments.")
    parser.add_argument("--output-dir", dest="out", required=True)
    parser.add_argument("--model-name", dest="model", default=None)
    parser.add_argument("--model-uid", dest="model_uid", default=None)
    parser.add_argument(
        "--forecast_quantiles",
        nargs="*",
        type=float,
        help="forecast quantiles list",
        default=None,
    )
    parsed_args, _ = parser.parse_known_args()
    model_name = parsed_args.model
    automl_settings = _get_automl_settings()
    target_column_name = automl_settings.get("label_column_name")
    output_dir = parsed_args.out
    model_uid = parsed_args.model_uid
    forecast_quantiles = parsed_args.forecast_quantiles
    os.makedirs(output_dir, exist_ok=True)
    os.environ["AUTOML_IGNORE_PACKAGE_VERSION_INCOMPATIBILITIES".lower()] = "True"
 def get_run():
    global current_step_run
    if current_step_run is None:
        current_step_run = Run.get_context()
    return current_step_run
 def run_backtest(data_input_name: str, file_name: str, experiment: Experiment):
    """Re-train the model and return metrics."""
    data_input = pd.read_csv(
        data_input_name,
        parse_dates=[automl_settings[constants.TimeSeries.TIME_COLUMN_NAME]],
    )
    print(data_input.head())
    if not automl_settings.get(constants.TimeSeries.GRAIN_COLUMN_NAMES):
        # There is no grains.
        data_input.sort_values(
            [automl_settings[constants.TimeSeries.TIME_COLUMN_NAME]], inplace=True
        )
        X_train = data_input.iloc[: -automl_settings["max_horizon"]]
        y_train = X_train.pop(target_column_name).values
        X_test = data_input.iloc[-automl_settings["max_horizon"] :]
        y_test = X_test.pop(target_column_name).values
    else:
        # The data contain grains.
        dfs_train = []
        dfs_test = []
        for _, one_series in data_input.groupby(
            automl_settings.get(constants.TimeSeries.GRAIN_COLUMN_NAMES)
        ):
            one_series.sort_values(
                [automl_settings[constants.TimeSeries.TIME_COLUMN_NAME]], inplace=True
            )
            dfs_train.append(one_series.iloc[: -automl_settings["max_horizon"]])
            dfs_test.append(one_series.iloc[-automl_settings["max_horizon"] :])
        X_train = pd.concat(dfs_train, sort=False, ignore_index=True)
        y_train = X_train.pop(target_column_name).values
        X_test = pd.concat(dfs_test, sort=False, ignore_index=True)
        y_test = X_test.pop(target_column_name).values
    last_training_date = str(
        X_train[automl_settings[constants.TimeSeries.TIME_COLUMN_NAME]].max()
    )
    if file_name:
        # If file name is provided, we will load model and retrain it on backtest data.
        with open(file_name, "rb") as fp:
            fitted_model = pickle.load(fp)
        fitted_model.fit(X_train, y_train)
    else:
        # We will run the experiment and select the best model.
        X_train[target_column_name] = y_train
        automl_config = AutoMLConfig(training_data=X_train, **automl_settings)
        automl_run = current_step_run.submit_child(automl_config, show_output=True)
        best_run, fitted_model = automl_run.get_output()
        # As we have generated models, we need to register them for the future use.
        description = "Backtest model example"
        tags = {"last_training_date": last_training_date, "experiment": experiment.name}
        if model_uid:
            tags["model_uid"] = model_uid
        automl_run.register_model(
            model_name=best_run.properties["model_name"],
            description=description,
            tags=tags,
        )
        print(f"The model {best_run.properties['model_name']} was registered.")
    # By default we will have forecast quantiles of 0.5, which is our target
    if forecast_quantiles:
        if 0.5 not in forecast_quantiles:
            forecast_quantiles.append(0.5)
        fitted_model.quantiles = forecast_quantiles
    x_pred = fitted_model.forecast_quantiles(X_test)
    x_pred["actual_level"] = y_test
    x_pred["backtest_iteration"] = f"iteration_{last_training_date}"
    x_pred.rename({0.5: "predicted_level"}, axis=1, inplace=True)
    date_safe = RE_INVALID_SYMBOLS.sub("_", last_training_date)
    x_pred.to_csv(os.path.join(output_dir, f"iteration_{date_safe}.csv"), index=False)
    return x_pred
 def run(input_files):
    """Run the script"""
    logger.info("Running mini batch.")
    ws = get_run().experiment.workspace
    file_name = None
    if model_name:
        models = Model.list(ws, name=model_name)
        cloud_model = None
        if models:
            for one_mod in models:
                if cloud_model is None or one_mod.version > cloud_model.version:
                    logger.info(
                        "Using existing model from the workspace. Model version: {}".format(
                            one_mod.version
                        )
                    )
                    cloud_model = one_mod
        file_name = cloud_model.download(exist_ok=True)
    forecasts = []
    logger.info("Running backtest.")
    for input_file in input_files:
        forecasts.append(run_backtest(input_file, file_name, get_run().experiment))
    return pd.concat(forecasts)
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/score.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/assets/score.py
@@ -0,0 +1,171 @@
 from typing import Any, Dict, Optional, List
 import argparse
 import json
 import os
 import re
 import numpy as np
 import pandas as pd
 from matplotlib import pyplot as plt
 from matplotlib.backends.backend_pdf import PdfPages
 from azureml.automl.core.shared import constants
 from azureml.automl.core.shared.types import GrainType
 from azureml.automl.runtime.shared.score import scoring
 GRAIN = "time_series_id"
 BACKTEST_ITER = "backtest_iteration"
 ACTUALS = "actual_level"
 PREDICTIONS = "predicted_level"
 ALL_GRAINS = "all_sets"
 FORECASTS_FILE = "forecast.csv"
 SCORES_FILE = "scores.csv"
 PLOTS_FILE = "plots_fcst_vs_actual.pdf"
 RE_INVALID_SYMBOLS = re.compile("[: ]")
 def _compute_metrics(df: pd.DataFrame, metrics: List[str]):
    """
    Compute metrics for one data frame.
    :param df: The data frame which contains actual_level and predicted_level columns.
    :return: The data frame with two columns - metric_name and metric.
    """
    scores = scoring.score_regression(
        y_test=df[ACTUALS], y_pred=df[PREDICTIONS], metrics=metrics
    )
    metrics_df = pd.DataFrame(list(scores.items()), columns=["metric_name", "metric"])
    metrics_df.sort_values(["metric_name"], inplace=True)
    metrics_df.reset_index(drop=True, inplace=True)
    return metrics_df
 def _format_grain_name(grain: GrainType) -> str:
    """
    Convert grain name to string.
    :param grain: the grain name.
    :return: the string representation of the given grain.
    """
    if not isinstance(grain, tuple) and not isinstance(grain, list):
        return str(grain)
    grain = list(map(str, grain))
    return "|".join(grain)
 def compute_all_metrics(
    fcst_df: pd.DataFrame,
    ts_id_colnames: List[str],
    metric_names: Optional[List[set]] = None,
 ):
    """
    Calculate metrics per grain.
    :param fcst_df: forecast data frame. Must contain 2 columns: 'actual_level' and 'predicted_level'
    :param metric_names: (optional) the list of metric names to return
    :param ts_id_colnames: (optional) list of grain column names
    :return: dictionary of summary table for all tests and final decision on stationary vs nonstaionary
    """
    if not metric_names:
        metric_names = list(constants.Metric.SCALAR_REGRESSION_SET)
    if ts_id_colnames is None:
        ts_id_colnames = []
    metrics_list = []
    if ts_id_colnames:
        for grain, df in fcst_df.groupby(ts_id_colnames):
            one_grain_metrics_df = _compute_metrics(df, metric_names)
            one_grain_metrics_df[GRAIN] = _format_grain_name(grain)
            metrics_list.append(one_grain_metrics_df)
    # overall metrics
    one_grain_metrics_df = _compute_metrics(fcst_df, metric_names)
    one_grain_metrics_df[GRAIN] = ALL_GRAINS
    metrics_list.append(one_grain_metrics_df)
    # collect into a data frame
    return pd.concat(metrics_list)
 def _draw_one_plot(
    df: pd.DataFrame,
    time_column_name: str,
    grain_column_names: List[str],
    pdf: PdfPages,
 ) -> None:
    """
    Draw the single plot.
    :param df: The data frame with the data to build plot.
    :param time_column_name: The name of a time column.
    :param grain_column_names: The name of grain columns.
    :param pdf: The pdf backend used to render the plot.
    """
    fig, _ = plt.subplots(figsize=(20, 10))
    df = df.set_index(time_column_name)
    plt.plot(df[[ACTUALS, PREDICTIONS]])
    plt.xticks(rotation=45)
    iteration = df[BACKTEST_ITER].iloc[0]
    if grain_column_names:
        grain_name = [df[grain].iloc[0] for grain in grain_column_names]
        plt.title(f"Time series ID: {_format_grain_name(grain_name)} {iteration}")
    plt.legend(["actual", "forecast"])
    plt.close(fig)
    pdf.savefig(fig)
 def calculate_scores_and_build_plots(
    input_dir: str, output_dir: str, automl_settings: Dict[str, Any]
 ):
    os.makedirs(output_dir, exist_ok=True)
    grains = automl_settings.get(constants.TimeSeries.GRAIN_COLUMN_NAMES)
    time_column_name = automl_settings.get(constants.TimeSeries.TIME_COLUMN_NAME)
    if grains is None:
        grains = []
    if isinstance(grains, str):
        grains = [grains]
    while BACKTEST_ITER in grains:
        grains.remove(BACKTEST_ITER)
    dfs = []
    for fle in os.listdir(input_dir):
        file_path = os.path.join(input_dir, fle)
        if os.path.isfile(file_path) and file_path.endswith(".csv"):
            df_iter = pd.read_csv(file_path, parse_dates=[time_column_name])
            for _, iteration in df_iter.groupby(BACKTEST_ITER):
                dfs.append(iteration)
    forecast_df = pd.concat(dfs, sort=False, ignore_index=True)
    # To make sure plots are in order, sort the predictions by grain and iteration.
    ts_index = grains + [BACKTEST_ITER]
    forecast_df.sort_values(by=ts_index, inplace=True)
    pdf = PdfPages(os.path.join(output_dir, PLOTS_FILE))
    for _, one_forecast in forecast_df.groupby(ts_index):
        _draw_one_plot(one_forecast, time_column_name, grains, pdf)
    pdf.close()
    forecast_df.to_csv(os.path.join(output_dir, FORECASTS_FILE), index=False)
    # Remove np.NaN and np.inf from the prediction and actuals data.
    forecast_df.replace([np.inf, -np.inf], np.nan, inplace=True)
    forecast_df.dropna(subset=[ACTUALS, PREDICTIONS], inplace=True)
    metrics = compute_all_metrics(forecast_df, grains + [BACKTEST_ITER])
    metrics.to_csv(os.path.join(output_dir, SCORES_FILE), index=False)
 if __name__ == "__main__":
    args = {"forecasts": "--forecasts", "scores_out": "--output-dir"}
    parser = argparse.ArgumentParser("Parsing input arguments.")
    for argname, arg in args.items():
        parser.add_argument(arg, dest=argname, required=True)
    parsed_args, _ = parser.parse_known_args()
    input_dir = parsed_args.forecasts
    output_dir = parsed_args.scores_out
    with open(
        os.path.join(
            os.path.dirname(os.path.realpath(__file__)), "automl_settings.json"
        )
    ) as json_file:
        automl_settings = json.load(json_file)
    calculate_scores_and_build_plots(input_dir, output_dir, automl_settings)
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/auto-ml-forecasting-backtest-single-model.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/auto-ml-forecasting-backtest-single-model.ipynb
@@ -0,0 +1,729 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License.\n",
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/auto-ml-forecasting-backtest-single-model.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated MachineLearning\n",
        "_**The model backtesting**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "2. [Setup](#Setup)\n",
        "3. [Data](#Data)\n",
        "4. [Prepare remote compute and data.](#prepare_remote)\n",
        "5. [Create the configuration for AutoML backtesting](#train)\n",
        "6. [Backtest AutoML](#backtest_automl)\n",
        "7. [View metrics](#Metrics)\n",
        "8. [Backtest the best model](#backtest_model)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "Model backtesting is used to evaluate its performance on historical data. To do that we step back on the backtesting period by the data set several times and split the data to train and test sets. Then these data sets are used for training and evaluation of model.<br>\n",
        "This notebook is intended to demonstrate backtesting on a single model, this is the best solution for small data sets with a few or one time series in it. For scenarios where we would like to choose the best AutoML model for every backtest iteration, please see [AutoML Forecasting Backtest Many Models Example](../forecasting-backtest-many-models/auto-ml-forecasting-backtest-many-models.ipynb) notebook.\n",
        "![Backtesting](Backtesting.png)\n",
        "This notebook demonstrates two ways of backtesting:\n",
        "- AutoML backtesting: we will train separate AutoML models for historical data\n",
        "- Model backtesting: from the first run we will select the best model trained on the most recent data, retrain it on the past data and evaluate."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "import shutil\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core import Experiment, Model, Workspace"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is compatible with Azure ML SDK version 1.35.1 or later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As part of the setup you have already created a <b>Workspace</b>."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"SKU\"] = ws.sku\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data\n",
        "For the demonstration purposes we will simulate one year of daily data. To do this we need to specify the following parameters: time column name, time series ID column names and label column name. Our intention is to forecast for two weeks ahead."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "TIME_COLUMN_NAME = \"date\"\n",
        "TIME_SERIES_ID_COLUMN_NAMES = \"time_series_id\"\n",
        "LABEL_COLUMN_NAME = \"y\"\n",
        "FORECAST_HORIZON = 14\n",
        "FREQUENCY = \"D\"\n",
        "\n",
        "\n",
        "def simulate_timeseries_data(\n",
        "    train_len: int,\n",
        "    test_len: int,\n",
        "    time_column_name: str,\n",
        "    target_column_name: str,\n",
        "    time_series_id_column_name: str,\n",
        "    time_series_number: int = 1,\n",
        "    freq: str = \"H\",\n",
        "):\n",
        "    \"\"\"\n",
        "    Return the time series of designed length.\n",
        "\n",
        "    :param train_len: The length of training data (one series).\n",
        "    :type train_len: int\n",
        "    :param test_len: The length of testing data (one series).\n",
        "    :type test_len: int\n",
        "    :param time_column_name: The desired name of a time column.\n",
        "    :type time_column_name: str\n",
        "    :param time_series_number: The number of time series in the data set.\n",
        "    :type time_series_number: int\n",
        "    :param freq: The frequency string representing pandas offset.\n",
        "                 see https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html\n",
        "    :type freq: str\n",
        "    :returns: the tuple of train and test data sets.\n",
        "    :rtype: tuple\n",
        "\n",
        "    \"\"\"\n",
        "    data_train = []  # type: List[pd.DataFrame]\n",
        "    data_test = []  # type: List[pd.DataFrame]\n",
        "    data_length = train_len + test_len\n",
        "    for i in range(time_series_number):\n",
        "        X = pd.DataFrame(\n",
        "            {\n",
        "                time_column_name: pd.date_range(\n",
        "                    start=\"2000-01-01\", periods=data_length, freq=freq\n",
        "                ),\n",
        "                target_column_name: np.arange(data_length).astype(float)\n",
        "                + np.random.rand(data_length)\n",
        "                + i * 5,\n",
        "                \"ext_predictor\": np.asarray(range(42, 42 + data_length)),\n",
        "                time_series_id_column_name: np.repeat(\"ts{}\".format(i), data_length),\n",
        "            }\n",
        "        )\n",
        "        data_train.append(X[:train_len])\n",
        "        data_test.append(X[train_len:])\n",
        "    train = pd.concat(data_train)\n",
        "    label_train = train.pop(target_column_name).values\n",
        "    test = pd.concat(data_test)\n",
        "    label_test = test.pop(target_column_name).values\n",
        "    return train, label_train, test, label_test\n",
        "\n",
        "\n",
        "n_test_periods = FORECAST_HORIZON\n",
        "n_train_periods = 365\n",
        "X_train, y_train, X_test, y_test = simulate_timeseries_data(\n",
        "    train_len=n_train_periods,\n",
        "    test_len=n_test_periods,\n",
        "    time_column_name=TIME_COLUMN_NAME,\n",
        "    target_column_name=LABEL_COLUMN_NAME,\n",
        "    time_series_id_column_name=TIME_SERIES_ID_COLUMN_NAMES,\n",
        "    time_series_number=2,\n",
        "    freq=FREQUENCY,\n",
        ")\n",
        "X_train[LABEL_COLUMN_NAME] = y_train"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's see what the training data looks like."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "X_train.tail()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Prepare remote compute and data. <a id=\"prepare_remote\"></a>\n",
        "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the artificial data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "ds = ws.get_default_datastore()\n",
        "# Upload saved data to the default data store.\n",
        "train_data = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    X_train, target=(ds, \"data\"), name=\"data_backtest\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You will need to create a compute target for backtesting. In this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute), you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your CPU cluster\n",
        "amlcompute_cluster_name = \"backtest-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=6\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create the configuration for AutoML backtesting <a id=\"train\"></a>\n",
        "\n",
        "This dictionary defines the AutoML and many models settings. For this forecasting task we need to define several settings including the name of the time column, the maximum forecast horizon, and the partition column name definition.\n",
        "\n",
        "| Property                           | Description|\n",
        "| :---------------                   | :------------------- |\n",
        "| **task**                           | forecasting |\n",
        "| **primary_metric**                 | This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>normalized_root_mean_squared_error</i><br><i>normalized_mean_absolute_error</i> |\n",
        "| **iteration_timeout_minutes**      | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **iterations**                     | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **experiment_timeout_hours**       | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
        "| **label_column_name**              | The name of the label column. |\n",
        "| **max_horizon**               | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
        "| **n_cross_validations**            | Number of cross validation splits. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
        "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value.\n",
        "| **time_column_name**               | The name of your time column. |\n",
        "| **grain_column_names**     | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_settings = {\n",
        "    \"task\": \"forecasting\",\n",
        "    \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
        "    \"iteration_timeout_minutes\": 10,  # This needs to be changed based on the dataset. We ask customer to explore how long training is taking before settings this value\n",
        "    \"iterations\": 15,\n",
        "    \"experiment_timeout_hours\": 1,  # This also needs to be changed based on the dataset. For larger data set this number needs to be bigger.\n",
        "    \"label_column_name\": LABEL_COLUMN_NAME,\n",
        "    \"n_cross_validations\": \"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    \"cv_step_size\": \"auto\",\n",
        "    \"time_column_name\": TIME_COLUMN_NAME,\n",
        "    \"max_horizon\": FORECAST_HORIZON,\n",
        "    \"track_child_runs\": False,\n",
        "    \"grain_column_names\": TIME_SERIES_ID_COLUMN_NAMES,\n",
        "}"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Backtest AutoML <a id=\"backtest_automl\"></a>\n",
        "First we set backtesting parameters: we will step back by 30 days and will make 5 such steps; for each step we will forecast for next two weeks."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# The number of periods to step back on each backtest iteration.\n",
        "BACKTESTING_PERIOD = 30\n",
        "# The number of times we will back test the model.\n",
        "NUMBER_OF_BACKTESTS = 5"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To train AutoML on backtesting folds we will use the [Azure Machine Learning pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines). It will generate backtest folds, then train model for each of them and calculate the accuracy metrics. To run pipeline, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve (here, it is a forecasting), while a Run corresponds to a specific approach to the problem."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from uuid import uuid1\n",
        "\n",
        "from pipeline_helper import get_backtest_pipeline\n",
        "\n",
        "pipeline_exp = Experiment(ws, \"automl-backtesting\")\n",
        "\n",
        "# We will create the unique identifier to mark our models.\n",
        "model_uid = str(uuid1())\n",
        "\n",
        "pipeline = get_backtest_pipeline(\n",
        "    experiment=pipeline_exp,\n",
        "    dataset=train_data,\n",
        "    # The STANDARD_DS12_V2 has 4 vCPU per node, we will set 2 process per node to be safe.\n",
        "    process_per_node=2,\n",
        "    # The maximum number of nodes for our compute is 6.\n",
        "    node_count=6,\n",
        "    compute_target=compute_target,\n",
        "    automl_settings=automl_settings,\n",
        "    step_size=BACKTESTING_PERIOD,\n",
        "    step_number=NUMBER_OF_BACKTESTS,\n",
        "    model_uid=model_uid,\n",
        "    forecast_quantiles=[0.025, 0.975],  # Optional\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run the pipeline and wait for results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_run = pipeline_exp.submit(pipeline)\n",
        "pipeline_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "After the run is complete, we can download the results. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "metrics_output = pipeline_run.get_pipeline_output(\"results\")\n",
        "metrics_output.download(\"backtest_metrics\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## View metrics<a id=\"Metrics\"></a>\n",
        "To distinguish these metrics from the model backtest, which we will obtain in the next section, we will move the directory with metrics out of the backtest_metrics and will remove the parent folder. We will create the utility function for that."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def copy_scoring_directory(new_name):\n",
        "    scores_path = os.path.join(\"backtest_metrics\", \"azureml\")\n",
        "    directory_list = [os.path.join(scores_path, d) for d in os.listdir(scores_path)]\n",
        "    latest_file = max(directory_list, key=os.path.getctime)\n",
        "    print(\n",
        "        f\"The output directory {latest_file} was created on {pd.Timestamp(os.path.getctime(latest_file), unit='s')} GMT.\"\n",
        "    )\n",
        "    shutil.move(os.path.join(latest_file, \"results\"), new_name)\n",
        "    shutil.rmtree(\"backtest_metrics\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Move the directory and list its contents."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "copy_scoring_directory(\"automl_backtest\")\n",
        "pd.DataFrame({\"File\": os.listdir(\"automl_backtest\")})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The directory contains a set of files with results:\n",
        "- forecast.csv contains forecasts for all backtest iterations. The backtest_iteration column contains iteration identifier with the last training date as a suffix\n",
        "- scores.csv contains all metrics. If data set contains several time series, the metrics are given for all combinations of time series id and iterations, as well as scores for all iterations and time series id are marked as \"all_sets\"\n",
        "- plots_fcst_vs_actual.pdf contains the predictions vs forecast plots for each iteration and time series.\n",
        "\n",
        "For demonstration purposes we will display the table of metrics for one of the time series with ID \"ts0\". Again, we will create the utility function, which will be re used in model backtesting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "def get_metrics_for_ts(all_metrics, ts):\n",
        "    \"\"\"\n",
        "    Get the metrics for the time series with ID ts and return it as pandas data frame.\n",
        "\n",
        "    :param all_metrics: The table with all the metrics.\n",
        "    :param ts: The ID of a time series of interest.\n",
        "    :return: The pandas DataFrame with metrics for one time series.\n",
        "    \"\"\"\n",
        "    results_df = None\n",
        "    for ts_id, one_series in all_metrics.groupby(\"time_series_id\"):\n",
        "        if not ts_id.startswith(ts):\n",
        "            continue\n",
        "        iteration = ts_id.split(\"|\")[-1]\n",
        "        df = one_series[[\"metric_name\", \"metric\"]]\n",
        "        df.rename({\"metric\": iteration}, axis=1, inplace=True)\n",
        "        df.set_index(\"metric_name\", inplace=True)\n",
        "        if results_df is None:\n",
        "            results_df = df\n",
        "        else:\n",
        "            results_df = results_df.merge(\n",
        "                df, how=\"inner\", left_index=True, right_index=True\n",
        "            )\n",
        "    results_df.sort_index(axis=1, inplace=True)\n",
        "    return results_df\n",
        "\n",
        "\n",
        "metrics_df = pd.read_csv(os.path.join(\"automl_backtest\", \"scores.csv\"))\n",
        "ts_id = \"ts0\"\n",
        "get_metrics_for_ts(metrics_df, ts_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Forecast vs actuals plots."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.display import IFrame\n",
        "\n",
        "IFrame(\"./automl_backtest/plots_fcst_vs_actual.pdf\", width=800, height=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# <font color='blue'>Backtest the best model</font> <a id=\"backtest_model\"></a>\n",
        "\n",
        "For model backtesting we will use the same parameters we used to backtest AutoML. All the models, we have obtained in the previous run were registered in our workspace. To identify the model, each was assigned a tag with the last trainig date."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_list = Model.list(ws, tags=[[\"experiment\", \"automl-backtesting\"]])\n",
        "model_data = {\"name\": [], \"last_training_date\": []}\n",
        "for model in model_list:\n",
        "    if (\n",
        "        \"last_training_date\" not in model.tags\n",
        "        or \"model_uid\" not in model.tags\n",
        "        or model.tags[\"model_uid\"] != model_uid\n",
        "    ):\n",
        "        continue\n",
        "    model_data[\"name\"].append(model.name)\n",
        "    model_data[\"last_training_date\"].append(\n",
        "        pd.Timestamp(model.tags[\"last_training_date\"])\n",
        "    )\n",
        "df_models = pd.DataFrame(model_data)\n",
        "df_models.sort_values([\"last_training_date\"], inplace=True)\n",
        "df_models.reset_index(inplace=True, drop=True)\n",
        "df_models"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will backtest the model trained on the most recet data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_name = df_models[\"name\"].iloc[-1]\n",
        "model_name"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrain the models.\n",
        "Assemble the pipeline, which will retrain the best model from AutoML run on historical data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_exp = Experiment(ws, \"model-backtesting\")\n",
        "\n",
        "pipeline = get_backtest_pipeline(\n",
        "    experiment=pipeline_exp,\n",
        "    dataset=train_data,\n",
        "    # The STANDARD_DS12_V2 has 4 vCPU per node, we will set 2 process per node to be safe.\n",
        "    process_per_node=2,\n",
        "    # The maximum number of nodes for our compute is 6.\n",
        "    node_count=6,\n",
        "    compute_target=compute_target,\n",
        "    automl_settings=automl_settings,\n",
        "    step_size=BACKTESTING_PERIOD,\n",
        "    step_number=NUMBER_OF_BACKTESTS,\n",
        "    model_name=model_name,\n",
        "    forecast_quantiles=[0.025, 0.975],\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Launch the backtesting pipeline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pipeline_run = pipeline_exp.submit(pipeline)\n",
        "pipeline_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The metrics are stored in the pipeline output named \"score\". The next code will download the table with metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "metrics_output = pipeline_run.get_pipeline_output(\"results\")\n",
        "metrics_output.download(\"backtest_metrics\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Again, we will copy the data files from the downloaded directory, but in this case we will call the folder \"model_backtest\"; it will contain the same files as the one for AutoML backtesting."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "copy_scoring_directory(\"model_backtest\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Finally, we will display the metrics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_metrics_df = pd.read_csv(os.path.join(\"model_backtest\", \"scores.csv\"))\n",
        "get_metrics_for_ts(model_metrics_df, ts_id)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Forecast vs actuals plots."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from IPython.display import IFrame\n",
        "\n",
        "IFrame(\"./model_backtest/plots_fcst_vs_actual.pdf\", width=800, height=300)"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "category": "tutorial",
    "compute": [
      "Remote"
    ],
    "datasets": [
      "None"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "Azure ML AutoML"
    ],
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.5"
    },
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/pipeline_helper.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-backtest-single-model/pipeline_helper.py
@@ -0,0 +1,169 @@
 from typing import Any, Dict, Optional
 import os
 import azureml.train.automl.runtime._hts.hts_runtime_utilities as hru
 from azureml._restclient.jasmine_client import JasmineClient
 from azureml.contrib.automl.pipeline.steps import utilities
 from azureml.core import RunConfiguration
 from azureml.core.compute import ComputeTarget
 from azureml.core.experiment import Experiment
 from azureml.data import LinkTabularOutputDatasetConfig, TabularDataset
 from azureml.pipeline.core import Pipeline, PipelineData, PipelineParameter
 from azureml.pipeline.steps import ParallelRunConfig, ParallelRunStep, PythonScriptStep
 from azureml.train.automl.constants import Scenarios
 from azureml.data.dataset_consumption_config import DatasetConsumptionConfig
 PROJECT_FOLDER = "assets"
 SETTINGS_FILE = "automl_settings.json"
 def get_backtest_pipeline(
    experiment: Experiment,
    dataset: TabularDataset,
    process_per_node: int,
    node_count: int,
    compute_target: ComputeTarget,
    automl_settings: Dict[str, Any],
    step_size: int,
    step_number: int,
    model_name: Optional[str] = None,
    model_uid: Optional[str] = None,
    forecast_quantiles: Optional[list] = None,
 ) -> Pipeline:
    """
    :param experiment: The experiment used to run the pipeline.
    :param dataset: Tabular data set to be used for model training.
    :param process_per_node: The number of processes per node. Generally it should be the number of cores
                             on the node divided by two.
    :param node_count: The number of nodes to be used.
    :param compute_target: The compute target to be used to run the pipeline.
    :param model_name: The name of a model to be back tested.
    :param automl_settings: The dictionary with automl settings.
    :param step_size: The number of periods to step back in backtesting.
    :param step_number: The number of backtesting iterations.
    :param model_uid: The uid to mark models from this run of the experiment.
    :param forecast_quantiles: The forecast quantiles that are required in the inference.
    :return: The pipeline to be used for model retraining.
             **Note:** The output will be uploaded in the pipeline output
             called 'score'.
    """
    jasmine_client = JasmineClient(
        service_context=experiment.workspace.service_context,
        experiment_name=experiment.name,
        experiment_id=experiment.id,
    )
    env = jasmine_client.get_curated_environment(
        scenario=Scenarios.AUTOML,
        enable_dnn=False,
        enable_gpu=False,
        compute=compute_target,
        compute_sku=experiment.workspace.compute_targets.get(
            compute_target.name
        ).vm_size,
    )
    data_results = PipelineData(
        name="results", datastore=None, pipeline_output_name="results"
    )
    ############################################################
    # Split the data set using python script.
    ############################################################
    run_config = RunConfiguration()
    run_config.docker.use_docker = True
    run_config.environment = env
    utilities.set_environment_variables_for_run(run_config)
    split_data = PipelineData(name="split_data_output", datastore=None).as_dataset()
    split_step = PythonScriptStep(
        name="split_data_for_backtest",
        script_name="data_split.py",
        inputs=[dataset.as_named_input("training_data")],
        outputs=[split_data],
        source_directory=PROJECT_FOLDER,
        arguments=[
            "--step-size",
            step_size,
            "--step-number",
            step_number,
            "--time-column-name",
            automl_settings.get("time_column_name"),
            "--time-series-id-column-names",
            automl_settings.get("grain_column_names"),
            "--output-dir",
            split_data,
        ],
        runconfig=run_config,
        compute_target=compute_target,
        allow_reuse=False,
    )
    ############################################################
    # We will do the backtest the parallel run step.
    ############################################################
    settings_path = os.path.join(PROJECT_FOLDER, SETTINGS_FILE)
    hru.dump_object_to_json(automl_settings, settings_path)
    mini_batch_size = PipelineParameter(name="batch_size_param", default_value=str(1))
    back_test_config = ParallelRunConfig(
        source_directory=PROJECT_FOLDER,
        entry_script="retrain_models.py",
        mini_batch_size=mini_batch_size,
        error_threshold=-1,
        output_action="append_row",
        append_row_file_name="outputs.txt",
        compute_target=compute_target,
        environment=env,
        process_count_per_node=process_per_node,
        run_invocation_timeout=3600,
        node_count=node_count,
    )
    utilities.set_environment_variables_for_run(back_test_config)
    forecasts = PipelineData(name="forecasts", datastore=None)
    if model_name:
        parallel_step_name = "{}-backtest".format(model_name.replace("_", "-"))
    else:
        parallel_step_name = "AutoML-backtest"
    prs_args = [
        "--target_column_name",
        automl_settings.get("label_column_name"),
        "--output-dir",
        forecasts,
    ]
    if model_name is not None:
        prs_args.append("--model-name")
        prs_args.append(model_name)
    if model_uid is not None:
        prs_args.append("--model-uid")
        prs_args.append(model_uid)
    if forecast_quantiles:
        prs_args.append("--forecast_quantiles")
        prs_args.extend(forecast_quantiles)
    backtest_prs = ParallelRunStep(
        name=parallel_step_name,
        parallel_run_config=back_test_config,
        arguments=prs_args,
        inputs=[split_data],
        output=forecasts,
        allow_reuse=False,
    )
    ############################################################
    # Then we collect the output and return it as scores output.
    ############################################################
    collection_step = PythonScriptStep(
        name="score",
        script_name="score.py",
        inputs=[forecasts.as_mount()],
        outputs=[data_results],
        source_directory=PROJECT_FOLDER,
        arguments=["--forecasts", forecasts, "--output-dir", data_results],
        runconfig=run_config,
        compute_target=compute_target,
        allow_reuse=False,
    )
    # Build and return the pipeline.
    return Pipeline(
        workspace=experiment.workspace,
        steps=[split_step, backtest_prs, collection_step],
    )
--- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb
@@ -0,0 +1,828 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<font color=\"red\" size=\"5\"><strong>!Important!</strong> </br>This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share)).</font>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "**BikeShare Demand Forecasting**\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Compute](#Compute)\n",
        "1. [Data](#Data)\n",
        "1. [Train](#Train)\n",
        "1. [Featurization](#Featurization)\n",
        "1. [Evaluate](#Evaluate)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "This notebook demonstrates demand forecasting for a bike-sharing service using AutoML.\n",
        "\n",
        "AutoML highlights here include built-in holiday featurization, accessing engineered feature names, and working with the `forecast` function. Please also look at the additional forecasting notebooks, which document lagging, rolling windows, forecast quantiles, other ways to use the forecast function, and forecaster deployment.\n",
        "\n",
        "Make sure you have executed the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook.\n",
        "\n",
        "Notebook synopsis:\n",
        "1. Creating an Experiment in an existing Workspace\n",
        "2. Configuration and local run of AutoML for a time-series model with lag and holiday features \n",
        "3. Viewing the engineered names for featurized data and featurization summary for all raw features\n",
        "4. Evaluating the fitted model using a rolling test "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1680248038565
        }
      },
      "outputs": [],
      "source": [
        "import json\n",
        "import logging\n",
        "from datetime import datetime\n",
        "\n",
        "import azureml.core\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "from azureml.automl.core.featurization import FeaturizationConfig\n",
        "from azureml.core import Dataset, Experiment, Workspace\n",
        "from azureml.train.automl import AutoMLConfig"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is compatible with Azure ML SDK version 1.35.0 or later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for the run history container in the workspace\n",
        "experiment_name = \"automl-bikeshareforecasting\"\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"SKU\"] = ws.sku\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Run History Name\"] = experiment_name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Compute\n",
        "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your cluster.\n",
        "amlcompute_cluster_name = \"bike-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=4\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data\n",
        "\n",
        "Let's set up what we know about the dataset. \n",
        "\n",
        "**Target column** is what we want to forecast.\n",
        "\n",
        "**Time column** is the time axis along which to predict."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "target_column_name = \"cnt\"\n",
        "time_column_name = \"date\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "You are now ready to load the historical bike share data. We will load the CSV file into a plain pandas DataFrame."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "all_data = pd.read_csv(\"bike-no.csv\", parse_dates=[time_column_name])\n",
        "\n",
        "# Drop the columns 'casual' and 'registered' as these columns are a breakdown of the total and therefore a leak.\n",
        "all_data.drop([\"casual\", \"registered\"], axis=1, inplace=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "source": [
        "### Split the data\n",
        "\n",
        "The first split we make is into train and test sets. Note we are splitting on time. Data before 9/1 will be used for training, and data after and including 9/1 will be used for testing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "gather": {
          "logged": 1680247376789
        },
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# select data that occurs before a specified date\n",
        "train = all_data[all_data[time_column_name] <= pd.Timestamp(\"2012-08-31\")].copy()\n",
        "test = all_data[all_data[time_column_name] >= pd.Timestamp(\"2012-09-01\")].copy()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Upload data to datastore\n",
        "\n",
        "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace) is paired with the storage account, which contains the default data store. We will use it to upload the bike share data and create [tabular dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false,
          "source_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "\n",
        "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    train, target=(datastore, \"dataset/\"), name=\"bike_no_train\"\n",
        ")\n",
        "\n",
        "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    test, target=(datastore, \"dataset/\"), name=\"bike_no_test\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Forecasting Parameters\n",
        "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**time_column_name**|The name of your time column.|\n",
        "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
        "|**country_or_region_for_holidays**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
        "|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n",
        "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information.\n",
        "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Train\n",
        "\n",
        "Instantiate a AutoMLConfig object. This defines the settings and data used to run the experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|forecasting|\n",
        "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
        "|**blocked_models**|Models in blocked_models won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
        "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
        "|**training_data**|Input dataset, containing both features and label column.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "|**compute_target**|The remote compute for training.|\n",
        "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value.\n",
        "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
        "|**forecasting_parameters**|A class that holds all the forecasting related parameters.|\n",
        "\n",
        "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setting forecaster maximum horizon \n",
        "\n",
        "The forecast horizon is the number of periods into the future that the model should predict. Here, we set the horizon to 14 periods (i.e. 14 days). Notice that this is much shorter than the number of days in the test set; we will need to use a rolling test to evaluate the performance on the whole test set. For more discussion of forecast horizons and guiding principles for setting them, please see the [energy demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand).  "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "forecast_horizon = 14"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Convert prediction type to integer\n",
        "The featurization configuration can be used to change the default prediction type from decimal numbers to integer. This customization can be used in the scenario when the target column is expected to contain whole values as the number of rented bikes per day."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "featurization_config = FeaturizationConfig()\n",
        "# Force the target column, to be integer type.\n",
        "featurization_config.add_prediction_transform_type(\"Integer\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Config AutoML"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
        "\n",
        "forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=forecast_horizon,\n",
        "    country_or_region_for_holidays=\"US\",  # set country_or_region will trigger holiday featurizer\n",
        "    target_lags=\"auto\",  # use heuristic based lag setting\n",
        "    freq=\"D\",  # Set the forecast frequency to be daily\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    featurization=featurization_config,\n",
        "    blocked_models=[\"ExtremeRandomTrees\"],\n",
        "    experiment_timeout_hours=0.3,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    max_concurrent_iterations=4,\n",
        "    max_cores_per_iteration=-1,\n",
        "    verbosity=logging.INFO,\n",
        "    forecasting_parameters=forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will now run the experiment, you can go to Azure ML portal to view the run details. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Run details\n",
        "Below we retrieve the best Run object from among all the runs in the experiment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run = remote_run.get_best_child()\n",
        "best_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Featurization\n",
        "\n",
        "We can look at the engineered feature names generated in time-series featurization via. the JSON file named 'engineered_feature_names.json' under the run outputs. Note that a number of named holiday periods are represented. We recommend that you have at least one year of data when using this feature to ensure that all yearly holidays are captured in the training featurization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/engineered_feature_names.json\", \"engineered_feature_names.json\"\n",
        ")\n",
        "with open(\"engineered_feature_names.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "\n",
        "records"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View the featurization summary\n",
        "\n",
        "You can also see what featurization steps were performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:\n",
        "\n",
        "- Raw feature name\n",
        "- Number of engineered features formed out of this raw feature\n",
        "- Type detected\n",
        "- If feature was dropped\n",
        "- List of feature transformations for the raw feature"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the featurization summary JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/featurization_summary.json\", \"featurization_summary.json\"\n",
        ")\n",
        "\n",
        "# Render the JSON as a pandas DataFrame\n",
        "with open(\"featurization_summary.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "fs = pd.DataFrame.from_records(records)\n",
        "\n",
        "# View a summary of the featurization\n",
        "fs[\n",
        "    [\n",
        "        \"RawFeatureName\",\n",
        "        \"TypeDetected\",\n",
        "        \"Dropped\",\n",
        "        \"EngineeredFeatureCount\",\n",
        "        \"Transformations\",\n",
        "    ]\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluate"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now use the best fitted model from the AutoML Run to make forecasts for the test set. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
        "\n",
        "The scoring will run on a remote compute. In this example, it will reuse the training compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment = Experiment(ws, experiment_name + \"_test\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving forecasts from the model\n",
        "To run the forecast on the remote compute we will use a helper script: forecasting_script. This script contains the utility methods which will be used by the remote estimator. We copy the script to the project folder to upload it to remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import shutil\n",
        "\n",
        "script_folder = os.path.join(os.getcwd(), \"forecast\")\n",
        "os.makedirs(script_folder, exist_ok=True)\n",
        "shutil.copy(\"forecasting_script.py\", script_folder)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For brevity, we have created a function called run_forecast that submits the test data to the best model determined during the training run and retrieves forecasts. The test set is longer than the forecast horizon specified at train time, so the forecasting script uses a so-called rolling evaluation to generate predictions over the whole test set. A rolling evaluation iterates the forecaster over the test set, using the actuals in the test set to make lag features as needed. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from run_forecast import run_rolling_forecast\n",
        "\n",
        "remote_run = run_rolling_forecast(\n",
        "    test_experiment, compute_target, best_run, test_dataset, target_column_name\n",
        ")\n",
        "remote_run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion(show_output=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Download the prediction result for metrics calculation\n",
        "The test data with predictions are saved in artifact outputs/predictions.csv. You can download it and calculation some error metrics for the forecasts and vizualize the predictions vs. the actuals."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.download_file(\"outputs/predictions.csv\", \"predictions.csv\")\n",
        "fcst_df = pd.read_csv(\"predictions.csv\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that the rolling forecast can contain multiple predictions for each date, each from a different forecast origin. For example, consider 2012-09-05:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "fcst_df[fcst_df.date == \"2012-09-05\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Here, the forecast origin refers to the latest date of actuals available for a given forecast. The earliest origin in the rolling forecast, 2012-08-31, is the last day in the training data. For origin date 2012-09-01, the forecasts use actual recorded counts from the training data *and* the actual count recorded on 2012-09-01. Note that the model is not retrained for origin dates later than 2012-08-31, but the values for model features, such as lagged values of daily count, are updated.\n",
        "\n",
        "Let's calculate the metrics over all rolling forecasts:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.shared import constants\n",
        "from azureml.automl.runtime.shared.score import scoring\n",
        "from sklearn.metrics import mean_absolute_error, mean_squared_error\n",
        "\n",
        "# use automl metrics module\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_df[target_column_name],\n",
        "    y_pred=fcst_df[\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For more details on what metrics are included and how they are calculated, please refer to [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics). You could also calculate residuals, like described [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals).\n",
        "\n",
        "The rolling forecast metric values are very high in comparison to the validation metrics reported by the AutoML job. What's going on here? We will investigate in the following cells!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Forecast versus actuals plot\n",
        "We will plot predictions and actuals on a time series plot. Since there are many forecasts for each date, we select the 14-day-ahead forecast from each forecast origin for our comparison."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from matplotlib import pyplot as plt\n",
        "\n",
        "%matplotlib inline\n",
        "\n",
        "fcst_df_h14 = (\n",
        "    fcst_df.groupby(\"forecast_origin\", as_index=False)\n",
        "    .last()\n",
        "    .drop(columns=[\"forecast_origin\"])\n",
        ")\n",
        "fcst_df_h14.set_index(time_column_name, inplace=True)\n",
        "plt.plot(fcst_df_h14[[target_column_name, \"predicted\"]])\n",
        "plt.xticks(rotation=45)\n",
        "plt.title(f\"Predicted vs. Actuals\")\n",
        "plt.legend([\"actual\", \"14-day-ahead forecast\"])\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Looking at the plot, there are two clear issues:\n",
        "1. An anomalously low count value on October 29th, 2012.\n",
        "2. End-of-year holidays (Thanksgiving and Christmas) in late November and late December.\n",
        "\n",
        "What happened on Oct. 29th, 2012? That day, Hurricane Sandy brought severe storm surge flooding to the east coast of the United States, particularly around New York City. This is certainly an anomalous event that the model did not account for!\n",
        "\n",
        "As for the late year holidays, the model apparently did not learn to account for the full reduction of bike share rentals on these major holidays. The training data covers 2011 and early 2012, so the model fit only had access to a single occurrence of these holidays. This makes it challenging to resolve holiday effects; however, a larger AutoML model search may result in a better model that is more holiday-aware.\n",
        "\n",
        "If we filter the predictions prior to the Thanksgiving holiday and remove the anomalous day of 2012-10-29, the metrics are closer to validation levels:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "date_filter = (fcst_df.date != \"2012-10-29\") & (fcst_df.date < \"2012-11-22\")\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_df[date_filter][target_column_name],\n",
        "    y_pred=fcst_df[date_filter][\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores (filtered)]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "category": "tutorial",
    "compute": [
      "Remote"
    ],
    "datasets": [
      "BikeShare"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "file_extension": ".py",
    "framework": [
      "Azure ML AutoML"
    ],
    "friendly_name": "Forecasting BikeShare Demand",
    "index_order": 1,
    "kernel_info": {
      "name": "python38-azureml"
    },
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "microsoft": {
      "ms_spell_check": {
        "ms_spell_check_language": "en"
      }
    },
    "mimetype": "text/x-python",
    "name": "python",
    "npconvert_exporter": "python",
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    },
    "pygments_lexer": "ipython3",
    "tags": [
      "Forecasting"
    ],
    "task": "Forecasting",
    "version": 3,
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/bike-no.csv
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/bike-no.csv
@@ -0,0 +1,732 @@
 instant,date,season,yr,mnth,weekday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
 1,1/1/2011,1,0,1,6,2,0.344167,0.363625,0.805833,0.160446,331,654,985
 2,1/2/2011,1,0,1,0,2,0.363478,0.353739,0.696087,0.248539,131,670,801
 3,1/3/2011,1,0,1,1,1,0.196364,0.189405,0.437273,0.248309,120,1229,1349
 4,1/4/2011,1,0,1,2,1,0.2,0.212122,0.590435,0.160296,108,1454,1562
 5,1/5/2011,1,0,1,3,1,0.226957,0.22927,0.436957,0.1869,82,1518,1600
 6,1/6/2011,1,0,1,4,1,0.204348,0.233209,0.518261,0.0895652,88,1518,1606
 7,1/7/2011,1,0,1,5,2,0.196522,0.208839,0.498696,0.168726,148,1362,1510
 8,1/8/2011,1,0,1,6,2,0.165,0.162254,0.535833,0.266804,68,891,959
 9,1/9/2011,1,0,1,0,1,0.138333,0.116175,0.434167,0.36195,54,768,822
 10,1/10/2011,1,0,1,1,1,0.150833,0.150888,0.482917,0.223267,41,1280,1321
 11,1/11/2011,1,0,1,2,2,0.169091,0.191464,0.686364,0.122132,43,1220,1263
 12,1/12/2011,1,0,1,3,1,0.172727,0.160473,0.599545,0.304627,25,1137,1162
 13,1/13/2011,1,0,1,4,1,0.165,0.150883,0.470417,0.301,38,1368,1406
 14,1/14/2011,1,0,1,5,1,0.16087,0.188413,0.537826,0.126548,54,1367,1421
 15,1/15/2011,1,0,1,6,2,0.233333,0.248112,0.49875,0.157963,222,1026,1248
 16,1/16/2011,1,0,1,0,1,0.231667,0.234217,0.48375,0.188433,251,953,1204
 17,1/17/2011,1,0,1,1,2,0.175833,0.176771,0.5375,0.194017,117,883,1000
 18,1/18/2011,1,0,1,2,2,0.216667,0.232333,0.861667,0.146775,9,674,683
 19,1/19/2011,1,0,1,3,2,0.292174,0.298422,0.741739,0.208317,78,1572,1650
 20,1/20/2011,1,0,1,4,2,0.261667,0.25505,0.538333,0.195904,83,1844,1927
 21,1/21/2011,1,0,1,5,1,0.1775,0.157833,0.457083,0.353242,75,1468,1543
 22,1/22/2011,1,0,1,6,1,0.0591304,0.0790696,0.4,0.17197,93,888,981
 23,1/23/2011,1,0,1,0,1,0.0965217,0.0988391,0.436522,0.2466,150,836,986
 24,1/24/2011,1,0,1,1,1,0.0973913,0.11793,0.491739,0.15833,86,1330,1416
 25,1/25/2011,1,0,1,2,2,0.223478,0.234526,0.616957,0.129796,186,1799,1985
 26,1/26/2011,1,0,1,3,3,0.2175,0.2036,0.8625,0.29385,34,472,506
 27,1/27/2011,1,0,1,4,1,0.195,0.2197,0.6875,0.113837,15,416,431
 28,1/28/2011,1,0,1,5,2,0.203478,0.223317,0.793043,0.1233,38,1129,1167
 29,1/29/2011,1,0,1,6,1,0.196522,0.212126,0.651739,0.145365,123,975,1098
 30,1/30/2011,1,0,1,0,1,0.216522,0.250322,0.722174,0.0739826,140,956,1096
 31,1/31/2011,1,0,1,1,2,0.180833,0.18625,0.60375,0.187192,42,1459,1501
 32,2/1/2011,1,0,2,2,2,0.192174,0.23453,0.829565,0.053213,47,1313,1360
 33,2/2/2011,1,0,2,3,2,0.26,0.254417,0.775417,0.264308,72,1454,1526
 34,2/3/2011,1,0,2,4,1,0.186957,0.177878,0.437826,0.277752,61,1489,1550
 35,2/4/2011,1,0,2,5,2,0.211304,0.228587,0.585217,0.127839,88,1620,1708
 36,2/5/2011,1,0,2,6,2,0.233333,0.243058,0.929167,0.161079,100,905,1005
 37,2/6/2011,1,0,2,0,1,0.285833,0.291671,0.568333,0.1418,354,1269,1623
 38,2/7/2011,1,0,2,1,1,0.271667,0.303658,0.738333,0.0454083,120,1592,1712
 39,2/8/2011,1,0,2,2,1,0.220833,0.198246,0.537917,0.36195,64,1466,1530
 40,2/9/2011,1,0,2,3,2,0.134783,0.144283,0.494783,0.188839,53,1552,1605
 41,2/10/2011,1,0,2,4,1,0.144348,0.149548,0.437391,0.221935,47,1491,1538
 42,2/11/2011,1,0,2,5,1,0.189091,0.213509,0.506364,0.10855,149,1597,1746
 43,2/12/2011,1,0,2,6,1,0.2225,0.232954,0.544167,0.203367,288,1184,1472
 44,2/13/2011,1,0,2,0,1,0.316522,0.324113,0.457391,0.260883,397,1192,1589
 45,2/14/2011,1,0,2,1,1,0.415,0.39835,0.375833,0.417908,208,1705,1913
 46,2/15/2011,1,0,2,2,1,0.266087,0.254274,0.314348,0.291374,140,1675,1815
 47,2/16/2011,1,0,2,3,1,0.318261,0.3162,0.423478,0.251791,218,1897,2115
 48,2/17/2011,1,0,2,4,1,0.435833,0.428658,0.505,0.230104,259,2216,2475
 49,2/18/2011,1,0,2,5,1,0.521667,0.511983,0.516667,0.264925,579,2348,2927
 50,2/19/2011,1,0,2,6,1,0.399167,0.391404,0.187917,0.507463,532,1103,1635
 51,2/20/2011,1,0,2,0,1,0.285217,0.27733,0.407826,0.223235,639,1173,1812
 52,2/21/2011,1,0,2,1,2,0.303333,0.284075,0.605,0.307846,195,912,1107
 53,2/22/2011,1,0,2,2,1,0.182222,0.186033,0.577778,0.195683,74,1376,1450
 54,2/23/2011,1,0,2,3,1,0.221739,0.245717,0.423043,0.094113,139,1778,1917
 55,2/24/2011,1,0,2,4,2,0.295652,0.289191,0.697391,0.250496,100,1707,1807
 56,2/25/2011,1,0,2,5,2,0.364348,0.350461,0.712174,0.346539,120,1341,1461
 57,2/26/2011,1,0,2,6,1,0.2825,0.282192,0.537917,0.186571,424,1545,1969
 58,2/27/2011,1,0,2,0,1,0.343478,0.351109,0.68,0.125248,694,1708,2402
 59,2/28/2011,1,0,2,1,2,0.407273,0.400118,0.876364,0.289686,81,1365,1446
 60,3/1/2011,1,0,3,2,1,0.266667,0.263879,0.535,0.216425,137,1714,1851
 61,3/2/2011,1,0,3,3,1,0.335,0.320071,0.449583,0.307833,231,1903,2134
 62,3/3/2011,1,0,3,4,1,0.198333,0.200133,0.318333,0.225754,123,1562,1685
 63,3/4/2011,1,0,3,5,2,0.261667,0.255679,0.610417,0.203346,214,1730,1944
 64,3/5/2011,1,0,3,6,2,0.384167,0.378779,0.789167,0.251871,640,1437,2077
 65,3/6/2011,1,0,3,0,2,0.376522,0.366252,0.948261,0.343287,114,491,605
 66,3/7/2011,1,0,3,1,1,0.261739,0.238461,0.551304,0.341352,244,1628,1872
 67,3/8/2011,1,0,3,2,1,0.2925,0.3024,0.420833,0.12065,316,1817,2133
 68,3/9/2011,1,0,3,3,2,0.295833,0.286608,0.775417,0.22015,191,1700,1891
 69,3/10/2011,1,0,3,4,3,0.389091,0.385668,0,0.261877,46,577,623
 70,3/11/2011,1,0,3,5,2,0.316522,0.305,0.649565,0.23297,247,1730,1977
 71,3/12/2011,1,0,3,6,1,0.329167,0.32575,0.594583,0.220775,724,1408,2132
 72,3/13/2011,1,0,3,0,1,0.384348,0.380091,0.527391,0.270604,982,1435,2417
 73,3/14/2011,1,0,3,1,1,0.325217,0.332,0.496957,0.136926,359,1687,2046
 74,3/15/2011,1,0,3,2,2,0.317391,0.318178,0.655652,0.184309,289,1767,2056
 75,3/16/2011,1,0,3,3,2,0.365217,0.36693,0.776522,0.203117,321,1871,2192
 76,3/17/2011,1,0,3,4,1,0.415,0.410333,0.602917,0.209579,424,2320,2744
 77,3/18/2011,1,0,3,5,1,0.54,0.527009,0.525217,0.231017,884,2355,3239
 78,3/19/2011,1,0,3,6,1,0.4725,0.466525,0.379167,0.368167,1424,1693,3117
 79,3/20/2011,1,0,3,0,1,0.3325,0.32575,0.47375,0.207721,1047,1424,2471
 80,3/21/2011,2,0,3,1,2,0.430435,0.409735,0.737391,0.288783,401,1676,2077
 81,3/22/2011,2,0,3,2,1,0.441667,0.440642,0.624583,0.22575,460,2243,2703
 82,3/23/2011,2,0,3,3,2,0.346957,0.337939,0.839565,0.234261,203,1918,2121
 83,3/24/2011,2,0,3,4,2,0.285,0.270833,0.805833,0.243787,166,1699,1865
 84,3/25/2011,2,0,3,5,1,0.264167,0.256312,0.495,0.230725,300,1910,2210
 85,3/26/2011,2,0,3,6,1,0.265833,0.257571,0.394167,0.209571,981,1515,2496
 86,3/27/2011,2,0,3,0,2,0.253043,0.250339,0.493913,0.1843,472,1221,1693
 87,3/28/2011,2,0,3,1,1,0.264348,0.257574,0.302174,0.212204,222,1806,2028
 88,3/29/2011,2,0,3,2,1,0.3025,0.292908,0.314167,0.226996,317,2108,2425
 89,3/30/2011,2,0,3,3,2,0.3,0.29735,0.646667,0.172888,168,1368,1536
 90,3/31/2011,2,0,3,4,3,0.268333,0.257575,0.918333,0.217646,179,1506,1685
 91,4/1/2011,2,0,4,5,2,0.3,0.283454,0.68625,0.258708,307,1920,2227
 92,4/2/2011,2,0,4,6,2,0.315,0.315637,0.65375,0.197146,898,1354,2252
 93,4/3/2011,2,0,4,0,1,0.378333,0.378767,0.48,0.182213,1651,1598,3249
 94,4/4/2011,2,0,4,1,1,0.573333,0.542929,0.42625,0.385571,734,2381,3115
 95,4/5/2011,2,0,4,2,2,0.414167,0.39835,0.642083,0.388067,167,1628,1795
 96,4/6/2011,2,0,4,3,1,0.390833,0.387608,0.470833,0.263063,413,2395,2808
 97,4/7/2011,2,0,4,4,1,0.4375,0.433696,0.602917,0.162312,571,2570,3141
 98,4/8/2011,2,0,4,5,2,0.335833,0.324479,0.83625,0.226992,172,1299,1471
 99,4/9/2011,2,0,4,6,2,0.3425,0.341529,0.8775,0.133083,879,1576,2455
 100,4/10/2011,2,0,4,0,2,0.426667,0.426737,0.8575,0.146767,1188,1707,2895
 101,4/11/2011,2,0,4,1,2,0.595652,0.565217,0.716956,0.324474,855,2493,3348
 102,4/12/2011,2,0,4,2,2,0.5025,0.493054,0.739167,0.274879,257,1777,2034
 103,4/13/2011,2,0,4,3,2,0.4125,0.417283,0.819167,0.250617,209,1953,2162
 104,4/14/2011,2,0,4,4,1,0.4675,0.462742,0.540417,0.1107,529,2738,3267
 105,4/15/2011,2,0,4,5,1,0.446667,0.441913,0.67125,0.226375,642,2484,3126
 106,4/16/2011,2,0,4,6,3,0.430833,0.425492,0.888333,0.340808,121,674,795
 107,4/17/2011,2,0,4,0,1,0.456667,0.445696,0.479583,0.303496,1558,2186,3744
 108,4/18/2011,2,0,4,1,1,0.5125,0.503146,0.5425,0.163567,669,2760,3429
 109,4/19/2011,2,0,4,2,2,0.505833,0.489258,0.665833,0.157971,409,2795,3204
 110,4/20/2011,2,0,4,3,1,0.595,0.564392,0.614167,0.241925,613,3331,3944
 111,4/21/2011,2,0,4,4,1,0.459167,0.453892,0.407083,0.325258,745,3444,4189
 112,4/22/2011,2,0,4,5,2,0.336667,0.321954,0.729583,0.219521,177,1506,1683
 113,4/23/2011,2,0,4,6,2,0.46,0.450121,0.887917,0.230725,1462,2574,4036
 114,4/24/2011,2,0,4,0,2,0.581667,0.551763,0.810833,0.192175,1710,2481,4191
 115,4/25/2011,2,0,4,1,1,0.606667,0.5745,0.776667,0.185333,773,3300,4073
 116,4/26/2011,2,0,4,2,1,0.631667,0.594083,0.729167,0.3265,678,3722,4400
 117,4/27/2011,2,0,4,3,2,0.62,0.575142,0.835417,0.3122,547,3325,3872
 118,4/28/2011,2,0,4,4,2,0.6175,0.578929,0.700833,0.320908,569,3489,4058
 119,4/29/2011,2,0,4,5,1,0.51,0.497463,0.457083,0.240063,878,3717,4595
 120,4/30/2011,2,0,4,6,1,0.4725,0.464021,0.503333,0.235075,1965,3347,5312
 121,5/1/2011,2,0,5,0,2,0.451667,0.448204,0.762083,0.106354,1138,2213,3351
 122,5/2/2011,2,0,5,1,2,0.549167,0.532833,0.73,0.183454,847,3554,4401
 123,5/3/2011,2,0,5,2,2,0.616667,0.582079,0.697083,0.342667,603,3848,4451
 124,5/4/2011,2,0,5,3,2,0.414167,0.40465,0.737083,0.328996,255,2378,2633
 125,5/5/2011,2,0,5,4,1,0.459167,0.441917,0.444167,0.295392,614,3819,4433
 126,5/6/2011,2,0,5,5,1,0.479167,0.474117,0.59,0.228246,894,3714,4608
 127,5/7/2011,2,0,5,6,1,0.52,0.512621,0.54125,0.16045,1612,3102,4714
 128,5/8/2011,2,0,5,0,1,0.528333,0.518933,0.631667,0.0746375,1401,2932,4333
 129,5/9/2011,2,0,5,1,1,0.5325,0.525246,0.58875,0.176,664,3698,4362
 130,5/10/2011,2,0,5,2,1,0.5325,0.522721,0.489167,0.115671,694,4109,4803
 131,5/11/2011,2,0,5,3,1,0.5425,0.5284,0.632917,0.120642,550,3632,4182
 132,5/12/2011,2,0,5,4,1,0.535,0.523363,0.7475,0.189667,695,4169,4864
 133,5/13/2011,2,0,5,5,2,0.5125,0.4943,0.863333,0.179725,692,3413,4105
 134,5/14/2011,2,0,5,6,2,0.520833,0.500629,0.9225,0.13495,902,2507,3409
 135,5/15/2011,2,0,5,0,2,0.5625,0.536,0.867083,0.152979,1582,2971,4553
 136,5/16/2011,2,0,5,1,1,0.5775,0.550512,0.787917,0.126871,773,3185,3958
 137,5/17/2011,2,0,5,2,2,0.561667,0.538529,0.837917,0.277354,678,3445,4123
 138,5/18/2011,2,0,5,3,2,0.55,0.527158,0.87,0.201492,536,3319,3855
 139,5/19/2011,2,0,5,4,2,0.530833,0.510742,0.829583,0.108213,735,3840,4575
 140,5/20/2011,2,0,5,5,1,0.536667,0.529042,0.719583,0.125013,909,4008,4917
 141,5/21/2011,2,0,5,6,1,0.6025,0.571975,0.626667,0.12065,2258,3547,5805
 142,5/22/2011,2,0,5,0,1,0.604167,0.5745,0.749583,0.148008,1576,3084,4660
 143,5/23/2011,2,0,5,1,2,0.631667,0.590296,0.81,0.233842,836,3438,4274
 144,5/24/2011,2,0,5,2,2,0.66,0.604813,0.740833,0.207092,659,3833,4492
 145,5/25/2011,2,0,5,3,1,0.660833,0.615542,0.69625,0.154233,740,4238,4978
 146,5/26/2011,2,0,5,4,1,0.708333,0.654688,0.6775,0.199642,758,3919,4677
 147,5/27/2011,2,0,5,5,1,0.681667,0.637008,0.65375,0.240679,871,3808,4679
 148,5/28/2011,2,0,5,6,1,0.655833,0.612379,0.729583,0.230092,2001,2757,4758
 149,5/29/2011,2,0,5,0,1,0.6675,0.61555,0.81875,0.213938,2355,2433,4788
 150,5/30/2011,2,0,5,1,1,0.733333,0.671092,0.685,0.131225,1549,2549,4098
 151,5/31/2011,2,0,5,2,1,0.775,0.725383,0.636667,0.111329,673,3309,3982
 152,6/1/2011,2,0,6,3,2,0.764167,0.720967,0.677083,0.207092,513,3461,3974
 153,6/2/2011,2,0,6,4,1,0.715,0.643942,0.305,0.292287,736,4232,4968
 154,6/3/2011,2,0,6,5,1,0.62,0.587133,0.354167,0.253121,898,4414,5312
 155,6/4/2011,2,0,6,6,1,0.635,0.594696,0.45625,0.123142,1869,3473,5342
 156,6/5/2011,2,0,6,0,2,0.648333,0.616804,0.6525,0.138692,1685,3221,4906
 157,6/6/2011,2,0,6,1,1,0.678333,0.621858,0.6,0.121896,673,3875,4548
 158,6/7/2011,2,0,6,2,1,0.7075,0.65595,0.597917,0.187808,763,4070,4833
 159,6/8/2011,2,0,6,3,1,0.775833,0.727279,0.622083,0.136817,676,3725,4401
 160,6/9/2011,2,0,6,4,2,0.808333,0.757579,0.568333,0.149883,563,3352,3915
 161,6/10/2011,2,0,6,5,1,0.755,0.703292,0.605,0.140554,815,3771,4586
 162,6/11/2011,2,0,6,6,1,0.725,0.678038,0.654583,0.15485,1729,3237,4966
 163,6/12/2011,2,0,6,0,1,0.6925,0.643325,0.747917,0.163567,1467,2993,4460
 164,6/13/2011,2,0,6,1,1,0.635,0.601654,0.494583,0.30535,863,4157,5020
 165,6/14/2011,2,0,6,2,1,0.604167,0.591546,0.507083,0.269283,727,4164,4891
 166,6/15/2011,2,0,6,3,1,0.626667,0.587754,0.471667,0.167912,769,4411,5180
 167,6/16/2011,2,0,6,4,2,0.628333,0.595346,0.688333,0.206471,545,3222,3767
 168,6/17/2011,2,0,6,5,1,0.649167,0.600383,0.735833,0.143029,863,3981,4844
 169,6/18/2011,2,0,6,6,1,0.696667,0.643954,0.670417,0.119408,1807,3312,5119
 170,6/19/2011,2,0,6,0,2,0.699167,0.645846,0.666667,0.102,1639,3105,4744
 171,6/20/2011,2,0,6,1,2,0.635,0.595346,0.74625,0.155475,699,3311,4010
 172,6/21/2011,3,0,6,2,2,0.680833,0.637646,0.770417,0.171025,774,4061,4835
 173,6/22/2011,3,0,6,3,1,0.733333,0.693829,0.7075,0.172262,661,3846,4507
 174,6/23/2011,3,0,6,4,2,0.728333,0.693833,0.703333,0.238804,746,4044,4790
 175,6/24/2011,3,0,6,5,1,0.724167,0.656583,0.573333,0.222025,969,4022,4991
 176,6/25/2011,3,0,6,6,1,0.695,0.643313,0.483333,0.209571,1782,3420,5202
 177,6/26/2011,3,0,6,0,1,0.68,0.637629,0.513333,0.0945333,1920,3385,5305
 178,6/27/2011,3,0,6,1,2,0.6825,0.637004,0.658333,0.107588,854,3854,4708
 179,6/28/2011,3,0,6,2,1,0.744167,0.692558,0.634167,0.144283,732,3916,4648
 180,6/29/2011,3,0,6,3,1,0.728333,0.654688,0.497917,0.261821,848,4377,5225
 181,6/30/2011,3,0,6,4,1,0.696667,0.637008,0.434167,0.185312,1027,4488,5515
 182,7/1/2011,3,0,7,5,1,0.7225,0.652162,0.39625,0.102608,1246,4116,5362
 183,7/2/2011,3,0,7,6,1,0.738333,0.667308,0.444583,0.115062,2204,2915,5119
 184,7/3/2011,3,0,7,0,2,0.716667,0.668575,0.6825,0.228858,2282,2367,4649
 185,7/4/2011,3,0,7,1,2,0.726667,0.665417,0.637917,0.0814792,3065,2978,6043
 186,7/5/2011,3,0,7,2,1,0.746667,0.696338,0.590417,0.126258,1031,3634,4665
 187,7/6/2011,3,0,7,3,1,0.72,0.685633,0.743333,0.149883,784,3845,4629
 188,7/7/2011,3,0,7,4,1,0.75,0.686871,0.65125,0.1592,754,3838,4592
 189,7/8/2011,3,0,7,5,2,0.709167,0.670483,0.757917,0.225129,692,3348,4040
 190,7/9/2011,3,0,7,6,1,0.733333,0.664158,0.609167,0.167912,1988,3348,5336
 191,7/10/2011,3,0,7,0,1,0.7475,0.690025,0.578333,0.183471,1743,3138,4881
 192,7/11/2011,3,0,7,1,1,0.7625,0.729804,0.635833,0.282337,723,3363,4086
 193,7/12/2011,3,0,7,2,1,0.794167,0.739275,0.559167,0.200254,662,3596,4258
 194,7/13/2011,3,0,7,3,1,0.746667,0.689404,0.631667,0.146133,748,3594,4342
 195,7/14/2011,3,0,7,4,1,0.680833,0.635104,0.47625,0.240667,888,4196,5084
 196,7/15/2011,3,0,7,5,1,0.663333,0.624371,0.59125,0.182833,1318,4220,5538
 197,7/16/2011,3,0,7,6,1,0.686667,0.638263,0.585,0.208342,2418,3505,5923
 198,7/17/2011,3,0,7,0,1,0.719167,0.669833,0.604167,0.245033,2006,3296,5302
 199,7/18/2011,3,0,7,1,1,0.746667,0.703925,0.65125,0.215804,841,3617,4458
 200,7/19/2011,3,0,7,2,1,0.776667,0.747479,0.650417,0.1306,752,3789,4541
 201,7/20/2011,3,0,7,3,1,0.768333,0.74685,0.707083,0.113817,644,3688,4332
 202,7/21/2011,3,0,7,4,2,0.815,0.826371,0.69125,0.222021,632,3152,3784
 203,7/22/2011,3,0,7,5,1,0.848333,0.840896,0.580417,0.1331,562,2825,3387
 204,7/23/2011,3,0,7,6,1,0.849167,0.804287,0.5,0.131221,987,2298,3285
 205,7/24/2011,3,0,7,0,1,0.83,0.794829,0.550833,0.169171,1050,2556,3606
 206,7/25/2011,3,0,7,1,1,0.743333,0.720958,0.757083,0.0908083,568,3272,3840
 207,7/26/2011,3,0,7,2,1,0.771667,0.696979,0.540833,0.200258,750,3840,4590
 208,7/27/2011,3,0,7,3,1,0.775,0.690667,0.402917,0.183463,755,3901,4656
 209,7/28/2011,3,0,7,4,1,0.779167,0.7399,0.583333,0.178479,606,3784,4390
 210,7/29/2011,3,0,7,5,1,0.838333,0.785967,0.5425,0.174138,670,3176,3846
 211,7/30/2011,3,0,7,6,1,0.804167,0.728537,0.465833,0.168537,1559,2916,4475
 212,7/31/2011,3,0,7,0,1,0.805833,0.729796,0.480833,0.164813,1524,2778,4302
 213,8/1/2011,3,0,8,1,1,0.771667,0.703292,0.550833,0.156717,729,3537,4266
 214,8/2/2011,3,0,8,2,1,0.783333,0.707071,0.49125,0.20585,801,4044,4845
 215,8/3/2011,3,0,8,3,2,0.731667,0.679937,0.6575,0.135583,467,3107,3574
 216,8/4/2011,3,0,8,4,2,0.71,0.664788,0.7575,0.19715,799,3777,4576
 217,8/5/2011,3,0,8,5,1,0.710833,0.656567,0.630833,0.184696,1023,3843,4866
 218,8/6/2011,3,0,8,6,2,0.716667,0.676154,0.755,0.22825,1521,2773,4294
 219,8/7/2011,3,0,8,0,1,0.7425,0.715292,0.752917,0.201487,1298,2487,3785
 220,8/8/2011,3,0,8,1,1,0.765,0.703283,0.592083,0.192175,846,3480,4326
 221,8/9/2011,3,0,8,2,1,0.775,0.724121,0.570417,0.151121,907,3695,4602
 222,8/10/2011,3,0,8,3,1,0.766667,0.684983,0.424167,0.200258,884,3896,4780
 223,8/11/2011,3,0,8,4,1,0.7175,0.651521,0.42375,0.164796,812,3980,4792
 224,8/12/2011,3,0,8,5,1,0.708333,0.654042,0.415,0.125621,1051,3854,4905
 225,8/13/2011,3,0,8,6,2,0.685833,0.645858,0.729583,0.211454,1504,2646,4150
 226,8/14/2011,3,0,8,0,2,0.676667,0.624388,0.8175,0.222633,1338,2482,3820
 227,8/15/2011,3,0,8,1,1,0.665833,0.616167,0.712083,0.208954,775,3563,4338
 228,8/16/2011,3,0,8,2,1,0.700833,0.645837,0.578333,0.236329,721,4004,4725
 229,8/17/2011,3,0,8,3,1,0.723333,0.666671,0.575417,0.143667,668,4026,4694
 230,8/18/2011,3,0,8,4,1,0.711667,0.662258,0.654583,0.233208,639,3166,3805
 231,8/19/2011,3,0,8,5,2,0.685,0.633221,0.722917,0.139308,797,3356,4153
 232,8/20/2011,3,0,8,6,1,0.6975,0.648996,0.674167,0.104467,1914,3277,5191
 233,8/21/2011,3,0,8,0,1,0.710833,0.675525,0.77,0.248754,1249,2624,3873
 234,8/22/2011,3,0,8,1,1,0.691667,0.638254,0.47,0.27675,833,3925,4758
 235,8/23/2011,3,0,8,2,1,0.640833,0.606067,0.455417,0.146763,1281,4614,5895
 236,8/24/2011,3,0,8,3,1,0.673333,0.630692,0.605,0.253108,949,4181,5130
 237,8/25/2011,3,0,8,4,2,0.684167,0.645854,0.771667,0.210833,435,3107,3542
 238,8/26/2011,3,0,8,5,1,0.7,0.659733,0.76125,0.0839625,768,3893,4661
 239,8/27/2011,3,0,8,6,2,0.68,0.635556,0.85,0.375617,226,889,1115
 240,8/28/2011,3,0,8,0,1,0.707059,0.647959,0.561765,0.304659,1415,2919,4334
 241,8/29/2011,3,0,8,1,1,0.636667,0.607958,0.554583,0.159825,729,3905,4634
 242,8/30/2011,3,0,8,2,1,0.639167,0.594704,0.548333,0.125008,775,4429,5204
 243,8/31/2011,3,0,8,3,1,0.656667,0.611121,0.597917,0.0833333,688,4370,5058
 244,9/1/2011,3,0,9,4,1,0.655,0.614921,0.639167,0.141796,783,4332,5115
 245,9/2/2011,3,0,9,5,2,0.643333,0.604808,0.727083,0.139929,875,3852,4727
 246,9/3/2011,3,0,9,6,1,0.669167,0.633213,0.716667,0.185325,1935,2549,4484
 247,9/4/2011,3,0,9,0,1,0.709167,0.665429,0.742083,0.206467,2521,2419,4940
 248,9/5/2011,3,0,9,1,2,0.673333,0.625646,0.790417,0.212696,1236,2115,3351
 249,9/6/2011,3,0,9,2,3,0.54,0.5152,0.886957,0.343943,204,2506,2710
 250,9/7/2011,3,0,9,3,3,0.599167,0.544229,0.917083,0.0970208,118,1878,1996
 251,9/8/2011,3,0,9,4,3,0.633913,0.555361,0.939565,0.192748,153,1689,1842
 252,9/9/2011,3,0,9,5,2,0.65,0.578946,0.897917,0.124379,417,3127,3544
 253,9/10/2011,3,0,9,6,1,0.66,0.607962,0.75375,0.153608,1750,3595,5345
 254,9/11/2011,3,0,9,0,1,0.653333,0.609229,0.71375,0.115054,1633,3413,5046
 255,9/12/2011,3,0,9,1,1,0.644348,0.60213,0.692174,0.088913,690,4023,4713
 256,9/13/2011,3,0,9,2,1,0.650833,0.603554,0.7125,0.141804,701,4062,4763
 257,9/14/2011,3,0,9,3,1,0.673333,0.6269,0.697083,0.1673,647,4138,4785
 258,9/15/2011,3,0,9,4,2,0.5775,0.553671,0.709167,0.271146,428,3231,3659
 259,9/16/2011,3,0,9,5,2,0.469167,0.461475,0.590417,0.164183,742,4018,4760
 260,9/17/2011,3,0,9,6,2,0.491667,0.478512,0.718333,0.189675,1434,3077,4511
 261,9/18/2011,3,0,9,0,1,0.5075,0.490537,0.695,0.178483,1353,2921,4274
 262,9/19/2011,3,0,9,1,2,0.549167,0.529675,0.69,0.151742,691,3848,4539
 263,9/20/2011,3,0,9,2,2,0.561667,0.532217,0.88125,0.134954,438,3203,3641
 264,9/21/2011,3,0,9,3,2,0.595,0.550533,0.9,0.0964042,539,3813,4352
 265,9/22/2011,3,0,9,4,2,0.628333,0.554963,0.902083,0.128125,555,4240,4795
 266,9/23/2011,4,0,9,5,2,0.609167,0.522125,0.9725,0.0783667,258,2137,2395
 267,9/24/2011,4,0,9,6,2,0.606667,0.564412,0.8625,0.0783833,1776,3647,5423
 268,9/25/2011,4,0,9,0,2,0.634167,0.572637,0.845,0.0503792,1544,3466,5010
 269,9/26/2011,4,0,9,1,2,0.649167,0.589042,0.848333,0.1107,684,3946,4630
 270,9/27/2011,4,0,9,2,2,0.636667,0.574525,0.885417,0.118171,477,3643,4120
 271,9/28/2011,4,0,9,3,2,0.635,0.575158,0.84875,0.148629,480,3427,3907
 272,9/29/2011,4,0,9,4,1,0.616667,0.574512,0.699167,0.172883,653,4186,4839
 273,9/30/2011,4,0,9,5,1,0.564167,0.544829,0.6475,0.206475,830,4372,5202
 274,10/1/2011,4,0,10,6,2,0.41,0.412863,0.75375,0.292296,480,1949,2429
 275,10/2/2011,4,0,10,0,2,0.356667,0.345317,0.791667,0.222013,616,2302,2918
 276,10/3/2011,4,0,10,1,2,0.384167,0.392046,0.760833,0.0833458,330,3240,3570
 277,10/4/2011,4,0,10,2,1,0.484167,0.472858,0.71,0.205854,486,3970,4456
 278,10/5/2011,4,0,10,3,1,0.538333,0.527138,0.647917,0.17725,559,4267,4826
 279,10/6/2011,4,0,10,4,1,0.494167,0.480425,0.620833,0.134954,639,4126,4765
 280,10/7/2011,4,0,10,5,1,0.510833,0.504404,0.684167,0.0223917,949,4036,4985
 281,10/8/2011,4,0,10,6,1,0.521667,0.513242,0.70125,0.0454042,2235,3174,5409
 282,10/9/2011,4,0,10,0,1,0.540833,0.523983,0.7275,0.06345,2397,3114,5511
 283,10/10/2011,4,0,10,1,1,0.570833,0.542925,0.73375,0.0423042,1514,3603,5117
 284,10/11/2011,4,0,10,2,2,0.566667,0.546096,0.80875,0.143042,667,3896,4563
 285,10/12/2011,4,0,10,3,3,0.543333,0.517717,0.90625,0.24815,217,2199,2416
 286,10/13/2011,4,0,10,4,2,0.589167,0.551804,0.896667,0.141787,290,2623,2913
 287,10/14/2011,4,0,10,5,2,0.550833,0.529675,0.71625,0.223883,529,3115,3644
 288,10/15/2011,4,0,10,6,1,0.506667,0.498725,0.483333,0.258083,1899,3318,5217
 289,10/16/2011,4,0,10,0,1,0.511667,0.503154,0.486667,0.281717,1748,3293,5041
 290,10/17/2011,4,0,10,1,1,0.534167,0.510725,0.579583,0.175379,713,3857,4570
 291,10/18/2011,4,0,10,2,2,0.5325,0.522721,0.701667,0.110087,637,4111,4748
 292,10/19/2011,4,0,10,3,3,0.541739,0.513848,0.895217,0.243339,254,2170,2424
 293,10/20/2011,4,0,10,4,1,0.475833,0.466525,0.63625,0.422275,471,3724,4195
 294,10/21/2011,4,0,10,5,1,0.4275,0.423596,0.574167,0.221396,676,3628,4304
 295,10/22/2011,4,0,10,6,1,0.4225,0.425492,0.629167,0.0926667,1499,2809,4308
 296,10/23/2011,4,0,10,0,1,0.421667,0.422333,0.74125,0.0995125,1619,2762,4381
 297,10/24/2011,4,0,10,1,1,0.463333,0.457067,0.772083,0.118792,699,3488,4187
 298,10/25/2011,4,0,10,2,1,0.471667,0.463375,0.622917,0.166658,695,3992,4687
 299,10/26/2011,4,0,10,3,2,0.484167,0.472846,0.720417,0.148642,404,3490,3894
 300,10/27/2011,4,0,10,4,2,0.47,0.457046,0.812917,0.197763,240,2419,2659
 301,10/28/2011,4,0,10,5,2,0.330833,0.318812,0.585833,0.229479,456,3291,3747
 302,10/29/2011,4,0,10,6,3,0.254167,0.227913,0.8825,0.351371,57,570,627
 303,10/30/2011,4,0,10,0,1,0.319167,0.321329,0.62375,0.176617,885,2446,3331
 304,10/31/2011,4,0,10,1,1,0.34,0.356063,0.703333,0.10635,362,3307,3669
 305,11/1/2011,4,0,11,2,1,0.400833,0.397088,0.68375,0.135571,410,3658,4068
 306,11/2/2011,4,0,11,3,1,0.3775,0.390133,0.71875,0.0820917,370,3816,4186
 307,11/3/2011,4,0,11,4,1,0.408333,0.405921,0.702083,0.136817,318,3656,3974
 308,11/4/2011,4,0,11,5,2,0.403333,0.403392,0.6225,0.271779,470,3576,4046
 309,11/5/2011,4,0,11,6,1,0.326667,0.323854,0.519167,0.189062,1156,2770,3926
 310,11/6/2011,4,0,11,0,1,0.348333,0.362358,0.734583,0.0920542,952,2697,3649
 311,11/7/2011,4,0,11,1,1,0.395,0.400871,0.75875,0.057225,373,3662,4035
 312,11/8/2011,4,0,11,2,1,0.408333,0.412246,0.721667,0.0690375,376,3829,4205
 313,11/9/2011,4,0,11,3,1,0.4,0.409079,0.758333,0.0621958,305,3804,4109
 314,11/10/2011,4,0,11,4,2,0.38,0.373721,0.813333,0.189067,190,2743,2933
 315,11/11/2011,4,0,11,5,1,0.324167,0.306817,0.44625,0.314675,440,2928,3368
 316,11/12/2011,4,0,11,6,1,0.356667,0.357942,0.552917,0.212062,1275,2792,4067
 317,11/13/2011,4,0,11,0,1,0.440833,0.43055,0.458333,0.281721,1004,2713,3717
 318,11/14/2011,4,0,11,1,1,0.53,0.524612,0.587083,0.306596,595,3891,4486
 319,11/15/2011,4,0,11,2,2,0.53,0.507579,0.68875,0.199633,449,3746,4195
 320,11/16/2011,4,0,11,3,3,0.456667,0.451988,0.93,0.136829,145,1672,1817
 321,11/17/2011,4,0,11,4,2,0.341667,0.323221,0.575833,0.305362,139,2914,3053
 322,11/18/2011,4,0,11,5,1,0.274167,0.272721,0.41,0.168533,245,3147,3392
 323,11/19/2011,4,0,11,6,1,0.329167,0.324483,0.502083,0.224496,943,2720,3663
 324,11/20/2011,4,0,11,0,2,0.463333,0.457058,0.684583,0.18595,787,2733,3520
 325,11/21/2011,4,0,11,1,3,0.4475,0.445062,0.91,0.138054,220,2545,2765
 326,11/22/2011,4,0,11,2,3,0.416667,0.421696,0.9625,0.118792,69,1538,1607
 327,11/23/2011,4,0,11,3,2,0.440833,0.430537,0.757917,0.335825,112,2454,2566
 328,11/24/2011,4,0,11,4,1,0.373333,0.372471,0.549167,0.167304,560,935,1495
 329,11/25/2011,4,0,11,5,1,0.375,0.380671,0.64375,0.0988958,1095,1697,2792
 330,11/26/2011,4,0,11,6,1,0.375833,0.385087,0.681667,0.0684208,1249,1819,3068
 331,11/27/2011,4,0,11,0,1,0.459167,0.4558,0.698333,0.208954,810,2261,3071
 332,11/28/2011,4,0,11,1,1,0.503478,0.490122,0.743043,0.142122,253,3614,3867
 333,11/29/2011,4,0,11,2,2,0.458333,0.451375,0.830833,0.258092,96,2818,2914
 334,11/30/2011,4,0,11,3,1,0.325,0.311221,0.613333,0.271158,188,3425,3613
 335,12/1/2011,4,0,12,4,1,0.3125,0.305554,0.524583,0.220158,182,3545,3727
 336,12/2/2011,4,0,12,5,1,0.314167,0.331433,0.625833,0.100754,268,3672,3940
 337,12/3/2011,4,0,12,6,1,0.299167,0.310604,0.612917,0.0957833,706,2908,3614
 338,12/4/2011,4,0,12,0,1,0.330833,0.3491,0.775833,0.0839583,634,2851,3485
 339,12/5/2011,4,0,12,1,2,0.385833,0.393925,0.827083,0.0622083,233,3578,3811
 340,12/6/2011,4,0,12,2,3,0.4625,0.4564,0.949583,0.232583,126,2468,2594
 341,12/7/2011,4,0,12,3,3,0.41,0.400246,0.970417,0.266175,50,655,705
 342,12/8/2011,4,0,12,4,1,0.265833,0.256938,0.58,0.240058,150,3172,3322
 343,12/9/2011,4,0,12,5,1,0.290833,0.317542,0.695833,0.0827167,261,3359,3620
 344,12/10/2011,4,0,12,6,1,0.275,0.266412,0.5075,0.233221,502,2688,3190
 345,12/11/2011,4,0,12,0,1,0.220833,0.253154,0.49,0.0665417,377,2366,2743
 346,12/12/2011,4,0,12,1,1,0.238333,0.270196,0.670833,0.06345,143,3167,3310
 347,12/13/2011,4,0,12,2,1,0.2825,0.301138,0.59,0.14055,155,3368,3523
 348,12/14/2011,4,0,12,3,2,0.3175,0.338362,0.66375,0.0609583,178,3562,3740
 349,12/15/2011,4,0,12,4,2,0.4225,0.412237,0.634167,0.268042,181,3528,3709
 350,12/16/2011,4,0,12,5,2,0.375,0.359825,0.500417,0.260575,178,3399,3577
 351,12/17/2011,4,0,12,6,2,0.258333,0.249371,0.560833,0.243167,275,2464,2739
 352,12/18/2011,4,0,12,0,1,0.238333,0.245579,0.58625,0.169779,220,2211,2431
 353,12/19/2011,4,0,12,1,1,0.276667,0.280933,0.6375,0.172896,260,3143,3403
 354,12/20/2011,4,0,12,2,2,0.385833,0.396454,0.595417,0.0615708,216,3534,3750
 355,12/21/2011,1,0,12,3,2,0.428333,0.428017,0.858333,0.2214,107,2553,2660
 356,12/22/2011,1,0,12,4,2,0.423333,0.426121,0.7575,0.047275,227,2841,3068
 357,12/23/2011,1,0,12,5,1,0.373333,0.377513,0.68625,0.274246,163,2046,2209
 358,12/24/2011,1,0,12,6,1,0.3025,0.299242,0.5425,0.190304,155,856,1011
 359,12/25/2011,1,0,12,0,1,0.274783,0.279961,0.681304,0.155091,303,451,754
 360,12/26/2011,1,0,12,1,1,0.321739,0.315535,0.506957,0.239465,430,887,1317
 361,12/27/2011,1,0,12,2,2,0.325,0.327633,0.7625,0.18845,103,1059,1162
 362,12/28/2011,1,0,12,3,1,0.29913,0.279974,0.503913,0.293961,255,2047,2302
 363,12/29/2011,1,0,12,4,1,0.248333,0.263892,0.574167,0.119412,254,2169,2423
 364,12/30/2011,1,0,12,5,1,0.311667,0.318812,0.636667,0.134337,491,2508,2999
 365,12/31/2011,1,0,12,6,1,0.41,0.414121,0.615833,0.220154,665,1820,2485
 366,1/1/2012,1,1,1,0,1,0.37,0.375621,0.6925,0.192167,686,1608,2294
 367,1/2/2012,1,1,1,1,1,0.273043,0.252304,0.381304,0.329665,244,1707,1951
 368,1/3/2012,1,1,1,2,1,0.15,0.126275,0.44125,0.365671,89,2147,2236
 369,1/4/2012,1,1,1,3,2,0.1075,0.119337,0.414583,0.1847,95,2273,2368
 370,1/5/2012,1,1,1,4,1,0.265833,0.278412,0.524167,0.129987,140,3132,3272
 371,1/6/2012,1,1,1,5,1,0.334167,0.340267,0.542083,0.167908,307,3791,4098
 372,1/7/2012,1,1,1,6,1,0.393333,0.390779,0.531667,0.174758,1070,3451,4521
 373,1/8/2012,1,1,1,0,1,0.3375,0.340258,0.465,0.191542,599,2826,3425
 374,1/9/2012,1,1,1,1,2,0.224167,0.247479,0.701667,0.0989,106,2270,2376
 375,1/10/2012,1,1,1,2,1,0.308696,0.318826,0.646522,0.187552,173,3425,3598
 376,1/11/2012,1,1,1,3,2,0.274167,0.282821,0.8475,0.131221,92,2085,2177
 377,1/12/2012,1,1,1,4,2,0.3825,0.381938,0.802917,0.180967,269,3828,4097
 378,1/13/2012,1,1,1,5,1,0.274167,0.249362,0.5075,0.378108,174,3040,3214
 379,1/14/2012,1,1,1,6,1,0.18,0.183087,0.4575,0.187183,333,2160,2493
 380,1/15/2012,1,1,1,0,1,0.166667,0.161625,0.419167,0.251258,284,2027,2311
 381,1/16/2012,1,1,1,1,1,0.19,0.190663,0.5225,0.231358,217,2081,2298
 382,1/17/2012,1,1,1,2,2,0.373043,0.364278,0.716087,0.34913,127,2808,2935
 383,1/18/2012,1,1,1,3,1,0.303333,0.275254,0.443333,0.415429,109,3267,3376
 384,1/19/2012,1,1,1,4,1,0.19,0.190038,0.4975,0.220158,130,3162,3292
 385,1/20/2012,1,1,1,5,2,0.2175,0.220958,0.45,0.20275,115,3048,3163
 386,1/21/2012,1,1,1,6,2,0.173333,0.174875,0.83125,0.222642,67,1234,1301
 387,1/22/2012,1,1,1,0,2,0.1625,0.16225,0.79625,0.199638,196,1781,1977
 388,1/23/2012,1,1,1,1,2,0.218333,0.243058,0.91125,0.110708,145,2287,2432
 389,1/24/2012,1,1,1,2,1,0.3425,0.349108,0.835833,0.123767,439,3900,4339
 390,1/25/2012,1,1,1,3,1,0.294167,0.294821,0.64375,0.161071,467,3803,4270
 391,1/26/2012,1,1,1,4,2,0.341667,0.35605,0.769583,0.0733958,244,3831,4075
 392,1/27/2012,1,1,1,5,2,0.425,0.415383,0.74125,0.342667,269,3187,3456
 393,1/28/2012,1,1,1,6,1,0.315833,0.326379,0.543333,0.210829,775,3248,4023
 394,1/29/2012,1,1,1,0,1,0.2825,0.272721,0.31125,0.24005,558,2685,3243
 395,1/30/2012,1,1,1,1,1,0.269167,0.262625,0.400833,0.215792,126,3498,3624
 396,1/31/2012,1,1,1,2,1,0.39,0.381317,0.416667,0.261817,324,4185,4509
 397,2/1/2012,1,1,2,3,1,0.469167,0.466538,0.507917,0.189067,304,4275,4579
 398,2/2/2012,1,1,2,4,2,0.399167,0.398971,0.672917,0.187187,190,3571,3761
 399,2/3/2012,1,1,2,5,1,0.313333,0.309346,0.526667,0.178496,310,3841,4151
 400,2/4/2012,1,1,2,6,2,0.264167,0.272725,0.779583,0.121896,384,2448,2832
 401,2/5/2012,1,1,2,0,2,0.265833,0.264521,0.687917,0.175996,318,2629,2947
 402,2/6/2012,1,1,2,1,1,0.282609,0.296426,0.622174,0.1538,206,3578,3784
 403,2/7/2012,1,1,2,2,1,0.354167,0.361104,0.49625,0.147379,199,4176,4375
 404,2/8/2012,1,1,2,3,2,0.256667,0.266421,0.722917,0.133721,109,2693,2802
 405,2/9/2012,1,1,2,4,1,0.265,0.261988,0.562083,0.194037,163,3667,3830
 406,2/10/2012,1,1,2,5,2,0.280833,0.293558,0.54,0.116929,227,3604,3831
 407,2/11/2012,1,1,2,6,3,0.224167,0.210867,0.73125,0.289796,192,1977,2169
 408,2/12/2012,1,1,2,0,1,0.1275,0.101658,0.464583,0.409212,73,1456,1529
 409,2/13/2012,1,1,2,1,1,0.2225,0.227913,0.41125,0.167283,94,3328,3422
 410,2/14/2012,1,1,2,2,2,0.319167,0.333946,0.50875,0.141179,135,3787,3922
 411,2/15/2012,1,1,2,3,1,0.348333,0.351629,0.53125,0.1816,141,4028,4169
 412,2/16/2012,1,1,2,4,2,0.316667,0.330162,0.752917,0.091425,74,2931,3005
 413,2/17/2012,1,1,2,5,1,0.343333,0.351629,0.634583,0.205846,349,3805,4154
 414,2/18/2012,1,1,2,6,1,0.346667,0.355425,0.534583,0.190929,1435,2883,4318
 415,2/19/2012,1,1,2,0,2,0.28,0.265788,0.515833,0.253112,618,2071,2689
 416,2/20/2012,1,1,2,1,1,0.28,0.273391,0.507826,0.229083,502,2627,3129
 417,2/21/2012,1,1,2,2,1,0.287826,0.295113,0.594348,0.205717,163,3614,3777
 418,2/22/2012,1,1,2,3,1,0.395833,0.392667,0.567917,0.234471,394,4379,4773
 419,2/23/2012,1,1,2,4,1,0.454167,0.444446,0.554583,0.190913,516,4546,5062
 420,2/24/2012,1,1,2,5,2,0.4075,0.410971,0.7375,0.237567,246,3241,3487
 421,2/25/2012,1,1,2,6,1,0.290833,0.255675,0.395833,0.421642,317,2415,2732
 422,2/26/2012,1,1,2,0,1,0.279167,0.268308,0.41,0.205229,515,2874,3389
 423,2/27/2012,1,1,2,1,1,0.366667,0.357954,0.490833,0.268033,253,4069,4322
 424,2/28/2012,1,1,2,2,1,0.359167,0.353525,0.395833,0.193417,229,4134,4363
 425,2/29/2012,1,1,2,3,2,0.344348,0.34847,0.804783,0.179117,65,1769,1834
 426,3/1/2012,1,1,3,4,1,0.485833,0.475371,0.615417,0.226987,325,4665,4990
 427,3/2/2012,1,1,3,5,2,0.353333,0.359842,0.657083,0.144904,246,2948,3194
 428,3/3/2012,1,1,3,6,2,0.414167,0.413492,0.62125,0.161079,956,3110,4066
 429,3/4/2012,1,1,3,0,1,0.325833,0.303021,0.403333,0.334571,710,2713,3423
 430,3/5/2012,1,1,3,1,1,0.243333,0.241171,0.50625,0.228858,203,3130,3333
 431,3/6/2012,1,1,3,2,1,0.258333,0.255042,0.456667,0.200875,221,3735,3956
 432,3/7/2012,1,1,3,3,1,0.404167,0.3851,0.513333,0.345779,432,4484,4916
 433,3/8/2012,1,1,3,4,1,0.5275,0.524604,0.5675,0.441563,486,4896,5382
 434,3/9/2012,1,1,3,5,2,0.410833,0.397083,0.407083,0.4148,447,4122,4569
 435,3/10/2012,1,1,3,6,1,0.2875,0.277767,0.350417,0.22575,968,3150,4118
 436,3/11/2012,1,1,3,0,1,0.361739,0.35967,0.476957,0.222587,1658,3253,4911
 437,3/12/2012,1,1,3,1,1,0.466667,0.459592,0.489167,0.207713,838,4460,5298
 438,3/13/2012,1,1,3,2,1,0.565,0.542929,0.6175,0.23695,762,5085,5847
 439,3/14/2012,1,1,3,3,1,0.5725,0.548617,0.507083,0.115062,997,5315,6312
 440,3/15/2012,1,1,3,4,1,0.5575,0.532825,0.579583,0.149883,1005,5187,6192
 441,3/16/2012,1,1,3,5,2,0.435833,0.436229,0.842083,0.113192,548,3830,4378
 442,3/17/2012,1,1,3,6,2,0.514167,0.505046,0.755833,0.110704,3155,4681,7836
 443,3/18/2012,1,1,3,0,2,0.4725,0.464,0.81,0.126883,2207,3685,5892
 444,3/19/2012,1,1,3,1,1,0.545,0.532821,0.72875,0.162317,982,5171,6153
 445,3/20/2012,1,1,3,2,1,0.560833,0.538533,0.807917,0.121271,1051,5042,6093
 446,3/21/2012,2,1,3,3,2,0.531667,0.513258,0.82125,0.0895583,1122,5108,6230
 447,3/22/2012,2,1,3,4,1,0.554167,0.531567,0.83125,0.117562,1334,5537,6871
 448,3/23/2012,2,1,3,5,2,0.601667,0.570067,0.694167,0.1163,2469,5893,8362
 449,3/24/2012,2,1,3,6,2,0.5025,0.486733,0.885417,0.192783,1033,2339,3372
 450,3/25/2012,2,1,3,0,2,0.4375,0.437488,0.880833,0.220775,1532,3464,4996
 451,3/26/2012,2,1,3,1,1,0.445833,0.43875,0.477917,0.386821,795,4763,5558
 452,3/27/2012,2,1,3,2,1,0.323333,0.315654,0.29,0.187192,531,4571,5102
 453,3/28/2012,2,1,3,3,1,0.484167,0.47095,0.48125,0.291671,674,5024,5698
 454,3/29/2012,2,1,3,4,1,0.494167,0.482304,0.439167,0.31965,834,5299,6133
 455,3/30/2012,2,1,3,5,2,0.37,0.375621,0.580833,0.138067,796,4663,5459
 456,3/31/2012,2,1,3,6,2,0.424167,0.421708,0.738333,0.250617,2301,3934,6235
 457,4/1/2012,2,1,4,0,2,0.425833,0.417287,0.67625,0.172267,2347,3694,6041
 458,4/2/2012,2,1,4,1,1,0.433913,0.427513,0.504348,0.312139,1208,4728,5936
 459,4/3/2012,2,1,4,2,1,0.466667,0.461483,0.396667,0.100133,1348,5424,6772
 460,4/4/2012,2,1,4,3,1,0.541667,0.53345,0.469583,0.180975,1058,5378,6436
 461,4/5/2012,2,1,4,4,1,0.435,0.431163,0.374167,0.219529,1192,5265,6457
 462,4/6/2012,2,1,4,5,1,0.403333,0.390767,0.377083,0.300388,1807,4653,6460
 463,4/7/2012,2,1,4,6,1,0.4375,0.426129,0.254167,0.274871,3252,3605,6857
 464,4/8/2012,2,1,4,0,1,0.5,0.492425,0.275833,0.232596,2230,2939,5169
 465,4/9/2012,2,1,4,1,1,0.489167,0.476638,0.3175,0.358196,905,4680,5585
 466,4/10/2012,2,1,4,2,1,0.446667,0.436233,0.435,0.249375,819,5099,5918
 467,4/11/2012,2,1,4,3,1,0.348696,0.337274,0.469565,0.295274,482,4380,4862
 468,4/12/2012,2,1,4,4,1,0.3975,0.387604,0.46625,0.290429,663,4746,5409
 469,4/13/2012,2,1,4,5,1,0.4425,0.431808,0.408333,0.155471,1252,5146,6398
 470,4/14/2012,2,1,4,6,1,0.495,0.487996,0.502917,0.190917,2795,4665,7460
 471,4/15/2012,2,1,4,0,1,0.606667,0.573875,0.507917,0.225129,2846,4286,7132
 472,4/16/2012,2,1,4,1,1,0.664167,0.614925,0.561667,0.284829,1198,5172,6370
 473,4/17/2012,2,1,4,2,1,0.608333,0.598487,0.390417,0.273629,989,5702,6691
 474,4/18/2012,2,1,4,3,2,0.463333,0.457038,0.569167,0.167912,347,4020,4367
 475,4/19/2012,2,1,4,4,1,0.498333,0.493046,0.6125,0.0659292,846,5719,6565
 476,4/20/2012,2,1,4,5,1,0.526667,0.515775,0.694583,0.149871,1340,5950,7290
 477,4/21/2012,2,1,4,6,1,0.57,0.542921,0.682917,0.283587,2541,4083,6624
 478,4/22/2012,2,1,4,0,3,0.396667,0.389504,0.835417,0.344546,120,907,1027
 479,4/23/2012,2,1,4,1,2,0.321667,0.301125,0.766667,0.303496,195,3019,3214
 480,4/24/2012,2,1,4,2,1,0.413333,0.405283,0.454167,0.249383,518,5115,5633
 481,4/25/2012,2,1,4,3,1,0.476667,0.470317,0.427917,0.118792,655,5541,6196
 482,4/26/2012,2,1,4,4,2,0.498333,0.483583,0.756667,0.176625,475,4551,5026
 483,4/27/2012,2,1,4,5,1,0.4575,0.452637,0.400833,0.347633,1014,5219,6233
 484,4/28/2012,2,1,4,6,2,0.376667,0.377504,0.489583,0.129975,1120,3100,4220
 485,4/29/2012,2,1,4,0,1,0.458333,0.450121,0.587083,0.116908,2229,4075,6304
 486,4/30/2012,2,1,4,1,2,0.464167,0.457696,0.57,0.171638,665,4907,5572
 487,5/1/2012,2,1,5,2,2,0.613333,0.577021,0.659583,0.156096,653,5087,5740
 488,5/2/2012,2,1,5,3,1,0.564167,0.537896,0.797083,0.138058,667,5502,6169
 489,5/3/2012,2,1,5,4,2,0.56,0.537242,0.768333,0.133696,764,5657,6421
 490,5/4/2012,2,1,5,5,1,0.6275,0.590917,0.735417,0.162938,1069,5227,6296
 491,5/5/2012,2,1,5,6,2,0.621667,0.584608,0.756667,0.152992,2496,4387,6883
 492,5/6/2012,2,1,5,0,2,0.5625,0.546737,0.74,0.149879,2135,4224,6359
 493,5/7/2012,2,1,5,1,2,0.5375,0.527142,0.664167,0.230721,1008,5265,6273
 494,5/8/2012,2,1,5,2,2,0.581667,0.557471,0.685833,0.296029,738,4990,5728
 495,5/9/2012,2,1,5,3,2,0.575,0.553025,0.744167,0.216412,620,4097,4717
 496,5/10/2012,2,1,5,4,1,0.505833,0.491783,0.552083,0.314063,1026,5546,6572
 497,5/11/2012,2,1,5,5,1,0.533333,0.520833,0.360417,0.236937,1319,5711,7030
 498,5/12/2012,2,1,5,6,1,0.564167,0.544817,0.480417,0.123133,2622,4807,7429
 499,5/13/2012,2,1,5,0,1,0.6125,0.585238,0.57625,0.225117,2172,3946,6118
 500,5/14/2012,2,1,5,1,2,0.573333,0.5499,0.789583,0.212692,342,2501,2843
 501,5/15/2012,2,1,5,2,2,0.611667,0.576404,0.794583,0.147392,625,4490,5115
 502,5/16/2012,2,1,5,3,1,0.636667,0.595975,0.697917,0.122512,991,6433,7424
 503,5/17/2012,2,1,5,4,1,0.593333,0.572613,0.52,0.229475,1242,6142,7384
 504,5/18/2012,2,1,5,5,1,0.564167,0.551121,0.523333,0.136817,1521,6118,7639
 505,5/19/2012,2,1,5,6,1,0.6,0.566908,0.45625,0.083975,3410,4884,8294
 506,5/20/2012,2,1,5,0,1,0.620833,0.583967,0.530417,0.254367,2704,4425,7129
 507,5/21/2012,2,1,5,1,2,0.598333,0.565667,0.81125,0.233204,630,3729,4359
 508,5/22/2012,2,1,5,2,2,0.615,0.580825,0.765833,0.118167,819,5254,6073
 509,5/23/2012,2,1,5,3,2,0.621667,0.584612,0.774583,0.102,766,4494,5260
 510,5/24/2012,2,1,5,4,1,0.655,0.6067,0.716667,0.172896,1059,5711,6770
 511,5/25/2012,2,1,5,5,1,0.68,0.627529,0.747083,0.14055,1417,5317,6734
 512,5/26/2012,2,1,5,6,1,0.6925,0.642696,0.7325,0.198992,2855,3681,6536
 513,5/27/2012,2,1,5,0,1,0.69,0.641425,0.697083,0.215171,3283,3308,6591
 514,5/28/2012,2,1,5,1,1,0.7125,0.6793,0.67625,0.196521,2557,3486,6043
 515,5/29/2012,2,1,5,2,1,0.7225,0.672992,0.684583,0.2954,880,4863,5743
 516,5/30/2012,2,1,5,3,2,0.656667,0.611129,0.67,0.134329,745,6110,6855
 517,5/31/2012,2,1,5,4,1,0.68,0.631329,0.492917,0.195279,1100,6238,7338
 518,6/1/2012,2,1,6,5,2,0.654167,0.607962,0.755417,0.237563,533,3594,4127
 519,6/2/2012,2,1,6,6,1,0.583333,0.566288,0.549167,0.186562,2795,5325,8120
 520,6/3/2012,2,1,6,0,1,0.6025,0.575133,0.493333,0.184087,2494,5147,7641
 521,6/4/2012,2,1,6,1,1,0.5975,0.578283,0.487083,0.284833,1071,5927,6998
 522,6/5/2012,2,1,6,2,2,0.540833,0.525892,0.613333,0.209575,968,6033,7001
 523,6/6/2012,2,1,6,3,1,0.554167,0.542292,0.61125,0.077125,1027,6028,7055
 524,6/7/2012,2,1,6,4,1,0.6025,0.569442,0.567083,0.15735,1038,6456,7494
 525,6/8/2012,2,1,6,5,1,0.649167,0.597862,0.467917,0.175383,1488,6248,7736
 526,6/9/2012,2,1,6,6,1,0.710833,0.648367,0.437083,0.144287,2708,4790,7498
 527,6/10/2012,2,1,6,0,1,0.726667,0.663517,0.538333,0.133721,2224,4374,6598
 528,6/11/2012,2,1,6,1,2,0.720833,0.659721,0.587917,0.207713,1017,5647,6664
 529,6/12/2012,2,1,6,2,2,0.653333,0.597875,0.833333,0.214546,477,4495,4972
 530,6/13/2012,2,1,6,3,1,0.655833,0.611117,0.582083,0.343279,1173,6248,7421
 531,6/14/2012,2,1,6,4,1,0.648333,0.624383,0.569583,0.253733,1180,6183,7363
 532,6/15/2012,2,1,6,5,1,0.639167,0.599754,0.589583,0.176617,1563,6102,7665
 533,6/16/2012,2,1,6,6,1,0.631667,0.594708,0.504167,0.166667,2963,4739,7702
 534,6/17/2012,2,1,6,0,1,0.5925,0.571975,0.59875,0.144904,2634,4344,6978
 535,6/18/2012,2,1,6,1,2,0.568333,0.544842,0.777917,0.174746,653,4446,5099
 536,6/19/2012,2,1,6,2,1,0.688333,0.654692,0.69,0.148017,968,5857,6825
 537,6/20/2012,2,1,6,3,1,0.7825,0.720975,0.592083,0.113812,872,5339,6211
 538,6/21/2012,3,1,6,4,1,0.805833,0.752542,0.567917,0.118787,778,5127,5905
 539,6/22/2012,3,1,6,5,1,0.7775,0.724121,0.57375,0.182842,964,4859,5823
 540,6/23/2012,3,1,6,6,1,0.731667,0.652792,0.534583,0.179721,2657,4801,7458
 541,6/24/2012,3,1,6,0,1,0.743333,0.674254,0.479167,0.145525,2551,4340,6891
 542,6/25/2012,3,1,6,1,1,0.715833,0.654042,0.504167,0.300383,1139,5640,6779
 543,6/26/2012,3,1,6,2,1,0.630833,0.594704,0.373333,0.347642,1077,6365,7442
 544,6/27/2012,3,1,6,3,1,0.6975,0.640792,0.36,0.271775,1077,6258,7335
 545,6/28/2012,3,1,6,4,1,0.749167,0.675512,0.4225,0.17165,921,5958,6879
 546,6/29/2012,3,1,6,5,1,0.834167,0.786613,0.48875,0.165417,829,4634,5463
 547,6/30/2012,3,1,6,6,1,0.765,0.687508,0.60125,0.161071,1455,4232,5687
 548,7/1/2012,3,1,7,0,1,0.815833,0.750629,0.51875,0.168529,1421,4110,5531
 549,7/2/2012,3,1,7,1,1,0.781667,0.702038,0.447083,0.195267,904,5323,6227
 550,7/3/2012,3,1,7,2,1,0.780833,0.70265,0.492083,0.126237,1052,5608,6660
 551,7/4/2012,3,1,7,3,1,0.789167,0.732337,0.53875,0.13495,2562,4841,7403
 552,7/5/2012,3,1,7,4,1,0.8275,0.761367,0.457917,0.194029,1405,4836,6241
 553,7/6/2012,3,1,7,5,1,0.828333,0.752533,0.450833,0.146142,1366,4841,6207
 554,7/7/2012,3,1,7,6,1,0.861667,0.804913,0.492083,0.163554,1448,3392,4840
 555,7/8/2012,3,1,7,0,1,0.8225,0.790396,0.57375,0.125629,1203,3469,4672
 556,7/9/2012,3,1,7,1,2,0.710833,0.654054,0.683333,0.180975,998,5571,6569
 557,7/10/2012,3,1,7,2,2,0.720833,0.664796,0.6675,0.151737,954,5336,6290
 558,7/11/2012,3,1,7,3,1,0.716667,0.650271,0.633333,0.151733,975,6289,7264
 559,7/12/2012,3,1,7,4,1,0.715833,0.654683,0.529583,0.146775,1032,6414,7446
 560,7/13/2012,3,1,7,5,2,0.731667,0.667933,0.485833,0.08085,1511,5988,7499
 561,7/14/2012,3,1,7,6,2,0.703333,0.666042,0.699167,0.143679,2355,4614,6969
 562,7/15/2012,3,1,7,0,1,0.745833,0.705196,0.717917,0.166667,1920,4111,6031
 563,7/16/2012,3,1,7,1,1,0.763333,0.724125,0.645,0.164187,1088,5742,6830
 564,7/17/2012,3,1,7,2,1,0.818333,0.755683,0.505833,0.114429,921,5865,6786
 565,7/18/2012,3,1,7,3,1,0.793333,0.745583,0.577083,0.137442,799,4914,5713
 566,7/19/2012,3,1,7,4,1,0.77,0.714642,0.600417,0.165429,888,5703,6591
 567,7/20/2012,3,1,7,5,2,0.665833,0.613025,0.844167,0.208967,747,5123,5870
 568,7/21/2012,3,1,7,6,3,0.595833,0.549912,0.865417,0.2133,1264,3195,4459
 569,7/22/2012,3,1,7,0,2,0.6675,0.623125,0.7625,0.0939208,2544,4866,7410
 570,7/23/2012,3,1,7,1,1,0.741667,0.690017,0.694167,0.138683,1135,5831,6966
 571,7/24/2012,3,1,7,2,1,0.750833,0.70645,0.655,0.211454,1140,6452,7592
 572,7/25/2012,3,1,7,3,1,0.724167,0.654054,0.45,0.1648,1383,6790,8173
 573,7/26/2012,3,1,7,4,1,0.776667,0.739263,0.596667,0.284813,1036,5825,6861
 574,7/27/2012,3,1,7,5,1,0.781667,0.734217,0.594583,0.152992,1259,5645,6904
 575,7/28/2012,3,1,7,6,1,0.755833,0.697604,0.613333,0.15735,2234,4451,6685
 576,7/29/2012,3,1,7,0,1,0.721667,0.667933,0.62375,0.170396,2153,4444,6597
 577,7/30/2012,3,1,7,1,1,0.730833,0.684987,0.66875,0.153617,1040,6065,7105
 578,7/31/2012,3,1,7,2,1,0.713333,0.662896,0.704167,0.165425,968,6248,7216
 579,8/1/2012,3,1,8,3,1,0.7175,0.667308,0.6775,0.141179,1074,6506,7580
 580,8/2/2012,3,1,8,4,1,0.7525,0.707088,0.659583,0.129354,983,6278,7261
 581,8/3/2012,3,1,8,5,2,0.765833,0.722867,0.6425,0.215792,1328,5847,7175
 582,8/4/2012,3,1,8,6,1,0.793333,0.751267,0.613333,0.257458,2345,4479,6824
 583,8/5/2012,3,1,8,0,1,0.769167,0.731079,0.6525,0.290421,1707,3757,5464
 584,8/6/2012,3,1,8,1,2,0.7525,0.710246,0.654167,0.129354,1233,5780,7013
 585,8/7/2012,3,1,8,2,2,0.735833,0.697621,0.70375,0.116908,1278,5995,7273
 586,8/8/2012,3,1,8,3,2,0.75,0.707717,0.672917,0.1107,1263,6271,7534
 587,8/9/2012,3,1,8,4,1,0.755833,0.699508,0.620417,0.1561,1196,6090,7286
 588,8/10/2012,3,1,8,5,2,0.715833,0.667942,0.715833,0.238813,1065,4721,5786
 589,8/11/2012,3,1,8,6,2,0.6925,0.638267,0.732917,0.206479,2247,4052,6299
 590,8/12/2012,3,1,8,0,1,0.700833,0.644579,0.530417,0.122512,2182,4362,6544
 591,8/13/2012,3,1,8,1,1,0.720833,0.662254,0.545417,0.136212,1207,5676,6883
 592,8/14/2012,3,1,8,2,1,0.726667,0.676779,0.686667,0.169158,1128,5656,6784
 593,8/15/2012,3,1,8,3,1,0.706667,0.654037,0.619583,0.169771,1198,6149,7347
 594,8/16/2012,3,1,8,4,1,0.719167,0.654688,0.519167,0.141796,1338,6267,7605
 595,8/17/2012,3,1,8,5,1,0.723333,0.2424,0.570833,0.231354,1483,5665,7148
 596,8/18/2012,3,1,8,6,1,0.678333,0.618071,0.603333,0.177867,2827,5038,7865
 597,8/19/2012,3,1,8,0,2,0.635833,0.603554,0.711667,0.08645,1208,3341,4549
 598,8/20/2012,3,1,8,1,2,0.635833,0.595967,0.734167,0.129979,1026,5504,6530
 599,8/21/2012,3,1,8,2,1,0.649167,0.601025,0.67375,0.0727708,1081,5925,7006
 600,8/22/2012,3,1,8,3,1,0.6675,0.621854,0.677083,0.0702833,1094,6281,7375
 601,8/23/2012,3,1,8,4,1,0.695833,0.637008,0.635833,0.0845958,1363,6402,7765
 602,8/24/2012,3,1,8,5,2,0.7025,0.6471,0.615,0.0721458,1325,6257,7582
 603,8/25/2012,3,1,8,6,2,0.661667,0.618696,0.712917,0.244408,1829,4224,6053
 604,8/26/2012,3,1,8,0,2,0.653333,0.595996,0.845833,0.228858,1483,3772,5255
 605,8/27/2012,3,1,8,1,1,0.703333,0.654688,0.730417,0.128733,989,5928,6917
 606,8/28/2012,3,1,8,2,1,0.728333,0.66605,0.62,0.190925,935,6105,7040
 607,8/29/2012,3,1,8,3,1,0.685,0.635733,0.552083,0.112562,1177,6520,7697
 608,8/30/2012,3,1,8,4,1,0.706667,0.652779,0.590417,0.0771167,1172,6541,7713
 609,8/31/2012,3,1,8,5,1,0.764167,0.6894,0.5875,0.168533,1433,5917,7350
 610,9/1/2012,3,1,9,6,2,0.753333,0.702654,0.638333,0.113187,2352,3788,6140
 611,9/2/2012,3,1,9,0,2,0.696667,0.649,0.815,0.0640708,2613,3197,5810
 612,9/3/2012,3,1,9,1,1,0.7075,0.661629,0.790833,0.151121,1965,4069,6034
 613,9/4/2012,3,1,9,2,1,0.725833,0.686888,0.755,0.236321,867,5997,6864
 614,9/5/2012,3,1,9,3,1,0.736667,0.708983,0.74125,0.187808,832,6280,7112
 615,9/6/2012,3,1,9,4,2,0.696667,0.655329,0.810417,0.142421,611,5592,6203
 616,9/7/2012,3,1,9,5,1,0.703333,0.657204,0.73625,0.171646,1045,6459,7504
 617,9/8/2012,3,1,9,6,2,0.659167,0.611121,0.799167,0.281104,1557,4419,5976
 618,9/9/2012,3,1,9,0,1,0.61,0.578925,0.5475,0.224496,2570,5657,8227
 619,9/10/2012,3,1,9,1,1,0.583333,0.565654,0.50375,0.258713,1118,6407,7525
 620,9/11/2012,3,1,9,2,1,0.5775,0.554292,0.52,0.0920542,1070,6697,7767
 621,9/12/2012,3,1,9,3,1,0.599167,0.570075,0.577083,0.131846,1050,6820,7870
 622,9/13/2012,3,1,9,4,1,0.6125,0.579558,0.637083,0.0827208,1054,6750,7804
 623,9/14/2012,3,1,9,5,1,0.633333,0.594083,0.6725,0.103863,1379,6630,8009
 624,9/15/2012,3,1,9,6,1,0.608333,0.585867,0.501667,0.247521,3160,5554,8714
 625,9/16/2012,3,1,9,0,1,0.58,0.563125,0.57,0.0901833,2166,5167,7333
 626,9/17/2012,3,1,9,1,2,0.580833,0.55305,0.734583,0.151742,1022,5847,6869
 627,9/18/2012,3,1,9,2,2,0.623333,0.565067,0.8725,0.357587,371,3702,4073
 628,9/19/2012,3,1,9,3,1,0.5525,0.540404,0.536667,0.215175,788,6803,7591
 629,9/20/2012,3,1,9,4,1,0.546667,0.532192,0.618333,0.118167,939,6781,7720
 630,9/21/2012,3,1,9,5,1,0.599167,0.571971,0.66875,0.154229,1250,6917,8167
 631,9/22/2012,3,1,9,6,1,0.65,0.610488,0.646667,0.283583,2512,5883,8395
 632,9/23/2012,4,1,9,0,1,0.529167,0.518933,0.467083,0.223258,2454,5453,7907
 633,9/24/2012,4,1,9,1,1,0.514167,0.502513,0.492917,0.142404,1001,6435,7436
 634,9/25/2012,4,1,9,2,1,0.55,0.544179,0.57,0.236321,845,6693,7538
 635,9/26/2012,4,1,9,3,1,0.635,0.596613,0.630833,0.2444,787,6946,7733
 636,9/27/2012,4,1,9,4,2,0.65,0.607975,0.690833,0.134342,751,6642,7393
 637,9/28/2012,4,1,9,5,2,0.619167,0.585863,0.69,0.164179,1045,6370,7415
 638,9/29/2012,4,1,9,6,1,0.5425,0.530296,0.542917,0.227604,2589,5966,8555
 639,9/30/2012,4,1,9,0,1,0.526667,0.517663,0.583333,0.134958,2015,4874,6889
 640,10/1/2012,4,1,10,1,2,0.520833,0.512,0.649167,0.0908042,763,6015,6778
 641,10/2/2012,4,1,10,2,3,0.590833,0.542333,0.871667,0.104475,315,4324,4639
 642,10/3/2012,4,1,10,3,2,0.6575,0.599133,0.79375,0.0665458,728,6844,7572
 643,10/4/2012,4,1,10,4,2,0.6575,0.607975,0.722917,0.117546,891,6437,7328
 644,10/5/2012,4,1,10,5,1,0.615,0.580187,0.6275,0.10635,1516,6640,8156
 645,10/6/2012,4,1,10,6,1,0.554167,0.538521,0.664167,0.268025,3031,4934,7965
 646,10/7/2012,4,1,10,0,2,0.415833,0.419813,0.708333,0.141162,781,2729,3510
 647,10/8/2012,4,1,10,1,2,0.383333,0.387608,0.709583,0.189679,874,4604,5478
 648,10/9/2012,4,1,10,2,2,0.446667,0.438112,0.761667,0.1903,601,5791,6392
 649,10/10/2012,4,1,10,3,1,0.514167,0.503142,0.630833,0.187821,780,6911,7691
 650,10/11/2012,4,1,10,4,1,0.435,0.431167,0.463333,0.181596,834,6736,7570
 651,10/12/2012,4,1,10,5,1,0.4375,0.433071,0.539167,0.235092,1060,6222,7282
 652,10/13/2012,4,1,10,6,1,0.393333,0.391396,0.494583,0.146142,2252,4857,7109
 653,10/14/2012,4,1,10,0,1,0.521667,0.508204,0.640417,0.278612,2080,4559,6639
 654,10/15/2012,4,1,10,1,2,0.561667,0.53915,0.7075,0.296037,760,5115,5875
 655,10/16/2012,4,1,10,2,1,0.468333,0.460846,0.558333,0.182221,922,6612,7534
 656,10/17/2012,4,1,10,3,1,0.455833,0.450108,0.692917,0.101371,979,6482,7461
 657,10/18/2012,4,1,10,4,2,0.5225,0.512625,0.728333,0.236937,1008,6501,7509
 658,10/19/2012,4,1,10,5,2,0.563333,0.537896,0.815,0.134954,753,4671,5424
 659,10/20/2012,4,1,10,6,1,0.484167,0.472842,0.572917,0.117537,2806,5284,8090
 660,10/21/2012,4,1,10,0,1,0.464167,0.456429,0.51,0.166054,2132,4692,6824
 661,10/22/2012,4,1,10,1,1,0.4875,0.482942,0.568333,0.0814833,830,6228,7058
 662,10/23/2012,4,1,10,2,1,0.544167,0.530304,0.641667,0.0945458,841,6625,7466
 663,10/24/2012,4,1,10,3,1,0.5875,0.558721,0.63625,0.0727792,795,6898,7693
 664,10/25/2012,4,1,10,4,2,0.55,0.529688,0.800417,0.124375,875,6484,7359
 665,10/26/2012,4,1,10,5,2,0.545833,0.52275,0.807083,0.132467,1182,6262,7444
 666,10/27/2012,4,1,10,6,2,0.53,0.515133,0.72,0.235692,2643,5209,7852
 667,10/28/2012,4,1,10,0,2,0.4775,0.467771,0.694583,0.398008,998,3461,4459
 668,10/29/2012,4,1,10,1,3,0.44,0.4394,0.88,0.3582,2,20,22
 669,10/30/2012,4,1,10,2,2,0.318182,0.309909,0.825455,0.213009,87,1009,1096
 670,10/31/2012,4,1,10,3,2,0.3575,0.3611,0.666667,0.166667,419,5147,5566
 671,11/1/2012,4,1,11,4,2,0.365833,0.369942,0.581667,0.157346,466,5520,5986
 672,11/2/2012,4,1,11,5,1,0.355,0.356042,0.522083,0.266175,618,5229,5847
 673,11/3/2012,4,1,11,6,2,0.343333,0.323846,0.49125,0.270529,1029,4109,5138
 674,11/4/2012,4,1,11,0,1,0.325833,0.329538,0.532917,0.179108,1201,3906,5107
 675,11/5/2012,4,1,11,1,1,0.319167,0.308075,0.494167,0.236325,378,4881,5259
 676,11/6/2012,4,1,11,2,1,0.280833,0.281567,0.567083,0.173513,466,5220,5686
 677,11/7/2012,4,1,11,3,2,0.295833,0.274621,0.5475,0.304108,326,4709,5035
 678,11/8/2012,4,1,11,4,1,0.352174,0.341891,0.333478,0.347835,340,4975,5315
 679,11/9/2012,4,1,11,5,1,0.361667,0.355413,0.540833,0.214558,709,5283,5992
 680,11/10/2012,4,1,11,6,1,0.389167,0.393937,0.645417,0.0578458,2090,4446,6536
 681,11/11/2012,4,1,11,0,1,0.420833,0.421713,0.659167,0.1275,2290,4562,6852
 682,11/12/2012,4,1,11,1,1,0.485,0.475383,0.741667,0.173517,1097,5172,6269
 683,11/13/2012,4,1,11,2,2,0.343333,0.323225,0.662917,0.342046,327,3767,4094
 684,11/14/2012,4,1,11,3,1,0.289167,0.281563,0.552083,0.199625,373,5122,5495
 685,11/15/2012,4,1,11,4,2,0.321667,0.324492,0.620417,0.152987,320,5125,5445
 686,11/16/2012,4,1,11,5,1,0.345,0.347204,0.524583,0.171025,484,5214,5698
 687,11/17/2012,4,1,11,6,1,0.325,0.326383,0.545417,0.179729,1313,4316,5629
 688,11/18/2012,4,1,11,0,1,0.3425,0.337746,0.692917,0.227612,922,3747,4669
 689,11/19/2012,4,1,11,1,2,0.380833,0.375621,0.623333,0.235067,449,5050,5499
 690,11/20/2012,4,1,11,2,2,0.374167,0.380667,0.685,0.082725,534,5100,5634
 691,11/21/2012,4,1,11,3,1,0.353333,0.364892,0.61375,0.103246,615,4531,5146
 692,11/22/2012,4,1,11,4,1,0.34,0.350371,0.580417,0.0528708,955,1470,2425
 693,11/23/2012,4,1,11,5,1,0.368333,0.378779,0.56875,0.148021,1603,2307,3910
 694,11/24/2012,4,1,11,6,1,0.278333,0.248742,0.404583,0.376871,532,1745,2277
 695,11/25/2012,4,1,11,0,1,0.245833,0.257583,0.468333,0.1505,309,2115,2424
 696,11/26/2012,4,1,11,1,1,0.313333,0.339004,0.535417,0.04665,337,4750,5087
 697,11/27/2012,4,1,11,2,2,0.291667,0.281558,0.786667,0.237562,123,3836,3959
 698,11/28/2012,4,1,11,3,1,0.296667,0.289762,0.50625,0.210821,198,5062,5260
 699,11/29/2012,4,1,11,4,1,0.28087,0.298422,0.555652,0.115522,243,5080,5323
 700,11/30/2012,4,1,11,5,1,0.298333,0.323867,0.649583,0.0584708,362,5306,5668
 701,12/1/2012,4,1,12,6,2,0.298333,0.316904,0.806667,0.0597042,951,4240,5191
 702,12/2/2012,4,1,12,0,2,0.3475,0.359208,0.823333,0.124379,892,3757,4649
 703,12/3/2012,4,1,12,1,1,0.4525,0.455796,0.7675,0.0827208,555,5679,6234
 704,12/4/2012,4,1,12,2,1,0.475833,0.469054,0.73375,0.174129,551,6055,6606
 705,12/5/2012,4,1,12,3,1,0.438333,0.428012,0.485,0.324021,331,5398,5729
 706,12/6/2012,4,1,12,4,1,0.255833,0.258204,0.50875,0.174754,340,5035,5375
 707,12/7/2012,4,1,12,5,2,0.320833,0.321958,0.764167,0.1306,349,4659,5008
 708,12/8/2012,4,1,12,6,2,0.381667,0.389508,0.91125,0.101379,1153,4429,5582
 709,12/9/2012,4,1,12,0,2,0.384167,0.390146,0.905417,0.157975,441,2787,3228
 710,12/10/2012,4,1,12,1,2,0.435833,0.435575,0.925,0.190308,329,4841,5170
 711,12/11/2012,4,1,12,2,2,0.353333,0.338363,0.596667,0.296037,282,5219,5501
 712,12/12/2012,4,1,12,3,2,0.2975,0.297338,0.538333,0.162937,310,5009,5319
 713,12/13/2012,4,1,12,4,1,0.295833,0.294188,0.485833,0.174129,425,5107,5532
 714,12/14/2012,4,1,12,5,1,0.281667,0.294192,0.642917,0.131229,429,5182,5611
 715,12/15/2012,4,1,12,6,1,0.324167,0.338383,0.650417,0.10635,767,4280,5047
 716,12/16/2012,4,1,12,0,2,0.3625,0.369938,0.83875,0.100742,538,3248,3786
 717,12/17/2012,4,1,12,1,2,0.393333,0.4015,0.907083,0.0982583,212,4373,4585
 718,12/18/2012,4,1,12,2,1,0.410833,0.409708,0.66625,0.221404,433,5124,5557
 719,12/19/2012,4,1,12,3,1,0.3325,0.342162,0.625417,0.184092,333,4934,5267
 720,12/20/2012,4,1,12,4,2,0.33,0.335217,0.667917,0.132463,314,3814,4128
 721,12/21/2012,1,1,12,5,2,0.326667,0.301767,0.556667,0.374383,221,3402,3623
 722,12/22/2012,1,1,12,6,1,0.265833,0.236113,0.44125,0.407346,205,1544,1749
 723,12/23/2012,1,1,12,0,1,0.245833,0.259471,0.515417,0.133083,408,1379,1787
 724,12/24/2012,1,1,12,1,2,0.231304,0.2589,0.791304,0.0772304,174,746,920
 725,12/25/2012,1,1,12,2,2,0.291304,0.294465,0.734783,0.168726,440,573,1013
 726,12/26/2012,1,1,12,3,3,0.243333,0.220333,0.823333,0.316546,9,432,441
 727,12/27/2012,1,1,12,4,2,0.254167,0.226642,0.652917,0.350133,247,1867,2114
 728,12/28/2012,1,1,12,5,2,0.253333,0.255046,0.59,0.155471,644,2451,3095
 729,12/29/2012,1,1,12,6,2,0.253333,0.2424,0.752917,0.124383,159,1182,1341
 730,12/30/2012,1,1,12,0,1,0.255833,0.2317,0.483333,0.350754,364,1432,1796
 731,12/31/2012,1,1,12,1,2,0.215833,0.223487,0.5775,0.154846,439,2290,2729
--- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py
@@ -0,0 +1,53 @@
 import argparse
 from azureml.core import Dataset, Run
 import joblib
 parser = argparse.ArgumentParser()
 parser.add_argument(
    "--target_column_name",
    type=str,
    dest="target_column_name",
    help="Target Column Name",
 )
 parser.add_argument(
    "--test_dataset", type=str, dest="test_dataset", help="Test Dataset"
 )
 args = parser.parse_args()
 target_column_name = args.target_column_name
 test_dataset_id = args.test_dataset
 run = Run.get_context()
 ws = run.experiment.workspace
 # get the input dataset by id
 test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
 X_test_df = (
    test_dataset.drop_columns(columns=[target_column_name])
    .to_pandas_dataframe()
    .reset_index(drop=True)
 )
 y_test_df = (
    test_dataset.with_timestamp_columns(None)
    .keep_columns(columns=[target_column_name])
    .to_pandas_dataframe()
 )
 fitted_model = joblib.load("model.pkl")
 X_rf = fitted_model.rolling_forecast(X_test_df, y_test_df.values, step=1)
 # Add predictions, actuals, and horizon relative to rolling origin to the test feature data
 assign_dict = {
    fitted_model.forecast_origin_column_name: "forecast_origin",
    fitted_model.forecast_column_name: "predicted",
    fitted_model.actual_column_name: target_column_name,
 }
 X_rf.rename(columns=assign_dict, inplace=True)
 file_name = "outputs/predictions.csv"
 export_csv = X_rf.to_csv(file_name, header=True)
 # Upload the predictions into artifacts
 run.upload_file(name=file_name, path_or_stream=file_name)
--- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/metrics_helper.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/metrics_helper.py
@@ -0,0 +1,22 @@
 import pandas as pd
 import numpy as np
 def APE(actual, pred):
    """
    Calculate absolute percentage error.
    Returns a vector of APE values with same length as actual/pred.
    """
    return 100 * np.abs((actual - pred) / actual)
 def MAPE(actual, pred):
    """
    Calculate mean absolute percentage error.
    Remove NA and values where actual is close to zero
    """
    not_na = ~(np.isnan(actual) | np.isnan(pred))
    not_zero = ~np.isclose(actual, 0.0)
    actual_safe = actual[not_na & not_zero]
    pred_safe = pred[not_na & not_zero]
    return np.mean(APE(actual_safe, pred_safe))
--- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/run_forecast.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/run_forecast.py
@@ -0,0 +1,40 @@
 from azureml.core import ScriptRunConfig
 def run_rolling_forecast(
    test_experiment,
    compute_target,
    train_run,
    test_dataset,
    target_column_name,
    inference_folder="./forecast",
 ):
    train_run.download_file("outputs/model.pkl", inference_folder + "/model.pkl")
    inference_env = train_run.get_environment()
    config = ScriptRunConfig(
        source_directory=inference_folder,
        script="forecasting_script.py",
        arguments=[
            "--target_column_name",
            target_column_name,
            "--test_dataset",
            test_dataset.as_named_input(test_dataset.name),
        ],
        compute_target=compute_target,
        environment=inference_env,
    )
    run = test_experiment.submit(
        config,
        tags={
            "training_run_id": train_run.id,
            "run_algorithm": train_run.properties["run_algorithm"],
            "valid_score": train_run.properties["score"],
            "primary_metric": train_run.properties["primary_metric"],
        },
    )
    run.log("run_algorithm", run.tags["run_algorithm"])
    return run
--- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb
@@ -0,0 +1,847 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "<font color=\"red\" size=\"5\"><strong>!Important!</strong> </br>This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-energy-demand/automl-forecasting-task-energy-demand-advanced-mlflow.ipynb)).</font>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Forecasting using the Energy Demand Dataset**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#introduction)\n",
        "1. [Setup](#setup)\n",
        "1. [Data and Forecasting Configurations](#data)\n",
        "1. [Train](#train)\n",
        "1. [Generate and Evaluate the Forecast](#forecast)\n",
        "\n",
        "Advanced Forecasting\n",
        "1. [Advanced Training](#advanced_training)\n",
        "1. [Advanced Results](#advanced_results)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Introduction<a id=\"introduction\"></a>\n",
        "\n",
        "In this example we use the associated New York City energy demand dataset to showcase how you can use AutoML for a simple forecasting problem and explore the results. The goal is predict the energy demand for the next 48 hours based on historic time-series data.\n",
        "\n",
        "If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) first, if you haven't already, to establish your connection to the AzureML Workspace.\n",
        "\n",
        "In this notebook you will learn how to:\n",
        "1. Creating an Experiment using an existing Workspace\n",
        "1. Configure AutoML using 'AutoMLConfig'\n",
        "1. Train the model using AmlCompute\n",
        "1. Explore the engineered features and results\n",
        "1. Generate the forecast and compute the out-of-sample accuracy metrics\n",
        "1. Configuration and remote run of AutoML for a time-series model with lag and rolling window features\n",
        "1. Run and explore the forecast with lagging features"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Setup<a id=\"setup\"></a>"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import json\n",
        "import logging\n",
        "\n",
        "from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score\n",
        "from matplotlib import pyplot as plt\n",
        "import pandas as pd\n",
        "import numpy as np\n",
        "import warnings\n",
        "import os\n",
        "\n",
        "# Squash warning messages for cleaner output in the notebook\n",
        "warnings.showwarning = lambda *args, **kwargs: None\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core import Experiment, Workspace, Dataset\n",
        "from azureml.train.automl import AutoMLConfig\n",
        "from datetime import datetime"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook is compatible with Azure ML SDK version 1.35.0 or later."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# choose a name for the run history container in the workspace\n",
        "experiment_name = \"automl-forecasting-energydemand\"\n",
        "\n",
        "# # project folder\n",
        "# project_folder = './sample_projects/automl-forecasting-energy-demand'\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output[\"Subscription ID\"] = ws.subscription_id\n",
        "output[\"Workspace\"] = ws.name\n",
        "output[\"Resource Group\"] = ws.resource_group\n",
        "output[\"Location\"] = ws.location\n",
        "output[\"Run History Name\"] = experiment_name\n",
        "output[\"SDK Version\"] = azureml.core.VERSION\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create or Attach existing AmlCompute\n",
        "A compute target is required to execute a remote Automated ML run. \n",
        "\n",
        "[Azure Machine Learning Compute](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) is a managed-compute infrastructure that allows the user to easily create a single or multi-node compute. In this tutorial, you create AmlCompute as your training compute resource.\n",
        "\n",
        "#### Creation of AmlCompute takes approximately 5 minutes. \n",
        "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
        "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "# Choose a name for your cluster.\n",
        "amlcompute_cluster_name = \"energy-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print(\"Found existing cluster, use it.\")\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(\n",
        "        vm_size=\"STANDARD_DS12_V2\", max_nodes=6\n",
        "    )\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data<a id=\"data\"></a>\n",
        "\n",
        "We will use energy consumption [data from New York City](http://mis.nyiso.com/public/P-58Blist.htm) for model training. The data is stored in a tabular format and includes energy demand and basic weather data at an hourly frequency. \n",
        "\n",
        "With Azure Machine Learning datasets you can keep a single copy of data in your storage, easily access data during model training, share data and collaborate with other users. Below, we will upload the datatset and create a [tabular dataset](https://docs.microsoft.com/bs-latn-ba/azure/machine-learning/service/how-to-create-register-datasets#dataset-types) to be used training and prediction."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's set up what we know about the dataset.\n",
        "\n",
        "<b>Target column</b> is what we want to forecast.<br></br>\n",
        "<b>Time column</b> is the time axis along which to predict.\n",
        "\n",
        "The other columns, \"temp\" and \"precip\", are implicitly designated as features."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "target_column_name = \"demand\"\n",
        "time_column_name = \"timeStamp\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "dataset = Dataset.Tabular.from_delimited_files(\n",
        "    path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\"\n",
        ").with_timestamp_columns(fine_grain_timestamp=time_column_name)\n",
        "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The NYC Energy dataset is missing energy demand values for all datetimes later than August 10th, 2017 5AM. Below, we trim the rows containing these missing values from the end of the dataset."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Cut off the end of the dataset due to large number of nan values\n",
        "dataset = dataset.time_before(datetime(2017, 10, 10, 5))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Split the data into train and test sets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The first split we make is into train and test sets. Note that we are splitting on time. Data before and including August 8th, 2017 5AM will be used for training, and data after will be used for testing."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# split into train based on time\n",
        "train = (\n",
        "    dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n",
        "    .to_pandas_dataframe()\n",
        "    .reset_index(drop=True)\n",
        ")\n",
        "train.sort_values(time_column_name).tail(5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# split into test based on time\n",
        "test = (\n",
        "    dataset.time_between(datetime(2017, 8, 8, 6), datetime(2017, 8, 10, 5))\n",
        "    .to_pandas_dataframe()\n",
        "    .reset_index(drop=True)\n",
        ")\n",
        "test.head(5)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "jupyter": {
          "outputs_hidden": false
        },
        "nteract": {
          "transient": {
            "deleting": false
          }
        }
      },
      "outputs": [],
      "source": [
        "# register the splitted train and test data in workspace storage\n",
        "from azureml.data.dataset_factory import TabularDatasetFactory\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    train, target=(datastore, \"dataset/\"), name=\"nyc_energy_train\"\n",
        ")\n",
        "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
        "    test, target=(datastore, \"dataset/\"), name=\"nyc_energy_test\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Setting the maximum forecast horizon\n",
        "\n",
        "The forecast horizon is the number of periods into the future that the model should predict. It is generally recommend that users set forecast horizons to less than 100 time periods (i.e. less than 100 hours in the NYC energy example). Furthermore, **AutoML's memory use and computation time increase in proportion to the length of the horizon**, so consider carefully how this value is set. If a long horizon forecast really is necessary, consider aggregating the series to a coarser time scale. \n",
        "\n",
        "Learn more about forecast horizons in our [Auto-train a time-series forecast model](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-auto-train-forecast#configure-and-run-experiment) guide.\n",
        "\n",
        "In this example, we set the horizon to 48 hours."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "forecast_horizon = 48"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Forecasting Parameters\n",
        "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**time_column_name**|The name of your time column.|\n",
        "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
        "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information.\n",
        "|**cv_step_size**|Number of periods between two consecutive cross-validation folds. The default value is \"auto\", in which case AutoMl determines the cross-validation step size automatically, if a validation set is not provided. Or users could specify an integer value."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Train<a id=\"train\"></a>\n",
        "\n",
        "Instantiate an AutoMLConfig object. This config defines the settings and data used to run the experiment. We can provide extra configurations within 'automl_settings', for this forecasting task we add the forecasting parameters to hold all the additional forecasting parameters.\n",
        "\n",
        "|Property|Description|\n",
        "|-|-|\n",
        "|**task**|forecasting|\n",
        "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>|\n",
        "|**blocked_models**|Models in blocked_models won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n",
        "|**experiment_timeout_hours**|Maximum amount of time in hours that the experiment take before it terminates.|\n",
        "|**training_data**|The training data to be used within the experiment.|\n",
        "|**label_column_name**|The name of the label column.|\n",
        "|**compute_target**|The remote compute for training.|\n",
        "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is \"auto\", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or users could specify an integer value.\n",
        "|**enable_early_stopping**|Flag to enble early termination if the score is not improving in the short term.|\n",
        "|**forecasting_parameters**|A class holds all the forecasting related parameters.|\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
        "\n",
        "forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=forecast_horizon,\n",
        "    freq=\"H\",  # Set the forecast frequency to be hourly\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    blocked_models=[\"ExtremeRandomTrees\", \"AutoArima\", \"Prophet\"],\n",
        "    experiment_timeout_hours=0.3,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    verbosity=logging.INFO,\n",
        "    forecasting_parameters=forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Call the `submit` method on the experiment object and pass the run configuration. Depending on the data and the number of iterations this can run for a while.\n",
        "One may specify `show_output = True` to print currently running iterations to the console."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Retrieve the Best Run details\n",
        "Below we retrieve the best Run object from among all the runs in the experiment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run = remote_run.get_best_child()\n",
        "best_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Featurization\n",
        "We can look at the engineered feature names generated in time-series featurization via. the JSON file named 'engineered_feature_names.json' under the run outputs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/engineered_feature_names.json\", \"engineered_feature_names.json\"\n",
        ")\n",
        "with open(\"engineered_feature_names.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "\n",
        "records"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View featurization summary\n",
        "You can also see what featurization steps were performed on different raw features in the user data. For each raw feature in the user data, the following information is displayed:\n",
        "\n",
        "+ Raw feature name\n",
        "+ Number of engineered features formed out of this raw feature\n",
        "+ Type detected\n",
        "+ If feature was dropped\n",
        "+ List of feature transformations for the raw feature"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Download the featurization summary JSON file locally\n",
        "best_run.download_file(\n",
        "    \"outputs/featurization_summary.json\", \"featurization_summary.json\"\n",
        ")\n",
        "\n",
        "# Render the JSON as a pandas DataFrame\n",
        "with open(\"featurization_summary.json\", \"r\") as f:\n",
        "    records = json.load(f)\n",
        "fs = pd.DataFrame.from_records(records)\n",
        "\n",
        "# View a summary of the featurization\n",
        "fs[\n",
        "    [\n",
        "        \"RawFeatureName\",\n",
        "        \"TypeDetected\",\n",
        "        \"Dropped\",\n",
        "        \"EngineeredFeatureCount\",\n",
        "        \"Transformations\",\n",
        "    ]\n",
        "]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Forecasting<a id=\"forecast\"></a>\n",
        "\n",
        "Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
        "\n",
        "The inference will run on a remote compute. In this example, it will re-use the training compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment = Experiment(ws, experiment_name + \"_inference\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving forecasts from the model\n",
        "We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from run_forecast import run_remote_inference\n",
        "\n",
        "remote_run_infer = run_remote_inference(\n",
        "    test_experiment=test_experiment,\n",
        "    compute_target=compute_target,\n",
        "    train_run=best_run,\n",
        "    test_dataset=test_dataset,\n",
        "    target_column_name=target_column_name,\n",
        ")\n",
        "remote_run_infer.wait_for_completion(show_output=False)\n",
        "\n",
        "# download the inference output file to the local machine\n",
        "remote_run_infer.download_file(\"outputs/predictions.csv\", \"predictions.csv\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Evaluate\n",
        "To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# load forecast data frame\n",
        "fcst_df = pd.read_csv(\"predictions.csv\", parse_dates=[time_column_name])\n",
        "fcst_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.shared import constants\n",
        "from azureml.automl.runtime.shared.score import scoring\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "# use automl metrics module\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_df[target_column_name],\n",
        "    y_pred=fcst_df[\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))\n",
        "\n",
        "# Plot outputs\n",
        "%matplotlib inline\n",
        "test_pred = plt.scatter(fcst_df[target_column_name], fcst_df[\"predicted\"], color=\"b\")\n",
        "test_test = plt.scatter(\n",
        "    fcst_df[target_column_name], fcst_df[target_column_name], color=\"g\"\n",
        ")\n",
        "plt.legend(\n",
        "    (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
        ")\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Advanced Training <a id=\"advanced_training\"></a>\n",
        "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Using lags and rolling window features\n",
        "Now we will configure the target lags, that is the previous values of the target variables, meaning the prediction is no longer horizon-less. We therefore must still specify the `forecast_horizon` that the model will learn to forecast. The `target_lags` keyword specifies how far back we will construct the lags of the target variable, and the `target_rolling_window_size` specifies the size of the rolling window over which we will generate the `max`, `min` and `sum` features.\n",
        "\n",
        "This notebook uses the blocked_models parameter to exclude some models that take a longer time to train on this dataset.  You can choose to remove models from the blocked_models list but you may need to increase the iteration_timeout_minutes parameter value to get results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_forecasting_parameters = ForecastingParameters(\n",
        "    time_column_name=time_column_name,\n",
        "    forecast_horizon=forecast_horizon,\n",
        "    target_lags=12,\n",
        "    target_rolling_window_size=4,\n",
        "    cv_step_size=\"auto\",\n",
        ")\n",
        "\n",
        "automl_config = AutoMLConfig(\n",
        "    task=\"forecasting\",\n",
        "    primary_metric=\"normalized_root_mean_squared_error\",\n",
        "    blocked_models=[\n",
        "        \"ElasticNet\",\n",
        "        \"ExtremeRandomTrees\",\n",
        "        \"GradientBoosting\",\n",
        "        \"XGBoostRegressor\",\n",
        "        \"ExtremeRandomTrees\",\n",
        "        \"AutoArima\",\n",
        "        \"Prophet\",\n",
        "    ],  # These models are blocked for tutorial purposes, remove this for real use cases.\n",
        "    experiment_timeout_hours=0.3,\n",
        "    training_data=train_dataset,\n",
        "    label_column_name=target_column_name,\n",
        "    compute_target=compute_target,\n",
        "    enable_early_stopping=True,\n",
        "    n_cross_validations=\"auto\",  # Feel free to set to a small integer (>=2) if runtime is an issue.\n",
        "    verbosity=logging.INFO,\n",
        "    forecasting_parameters=advanced_forecasting_parameters,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now start a new remote run, this time with lag and rolling window featurization. AutoML applies featurizations in the setup stage, prior to iterating over ML models. The full training set is featurized first, followed by featurization of each of the CV splits. Lag and rolling window features introduce additional complexity, so the run will take longer than in the previous example that lacked these featurizations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_remote_run = experiment.submit(automl_config, show_output=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "advanced_remote_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Run details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run_lags = remote_run.get_best_child()\n",
        "best_run_lags"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Advanced Results<a id=\"advanced_results\"></a>\n",
        "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, time series identifier columns and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment_advanced = Experiment(ws, experiment_name + \"_inference_advanced\")\n",
        "advanced_remote_run_infer = run_remote_inference(\n",
        "    test_experiment=test_experiment_advanced,\n",
        "    compute_target=compute_target,\n",
        "    train_run=best_run_lags,\n",
        "    test_dataset=test_dataset,\n",
        "    target_column_name=target_column_name,\n",
        "    inference_folder=\"./forecast_advanced\",\n",
        ")\n",
        "advanced_remote_run_infer.wait_for_completion(show_output=False)\n",
        "\n",
        "# download the inference output file to the local machine\n",
        "advanced_remote_run_infer.download_file(\n",
        "    \"outputs/predictions.csv\", \"predictions_advanced.csv\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "fcst_adv_df = pd.read_csv(\"predictions_advanced.csv\", parse_dates=[time_column_name])\n",
        "fcst_adv_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.automl.core.shared import constants\n",
        "from azureml.automl.runtime.shared.score import scoring\n",
        "from matplotlib import pyplot as plt\n",
        "\n",
        "# use automl metrics module\n",
        "scores = scoring.score_regression(\n",
        "    y_test=fcst_adv_df[target_column_name],\n",
        "    y_pred=fcst_adv_df[\"predicted\"],\n",
        "    metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
        ")\n",
        "\n",
        "print(\"[Test data scores]\\n\")\n",
        "for key, value in scores.items():\n",
        "    print(\"{}:   {:.3f}\".format(key, value))\n",
        "\n",
        "# Plot outputs\n",
        "%matplotlib inline\n",
        "test_pred = plt.scatter(\n",
        "    fcst_adv_df[target_column_name], fcst_adv_df[\"predicted\"], color=\"b\"\n",
        ")\n",
        "test_test = plt.scatter(\n",
        "    fcst_adv_df[target_column_name], fcst_adv_df[target_column_name], color=\"g\"\n",
        ")\n",
        "plt.legend(\n",
        "    (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
        ")\n",
        "plt.show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "jialiu"
      }
    ],
    "categories": [
      "how-to-use-azureml",
      "automated-machine-learning"
    ],
    "kernel_info": {
      "name": "python3"
    },
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.10"
    },
    "microsoft": {
      "ms_spell_check": {
        "ms_spell_check_language": "en"
      }
    },
    "nteract": {
      "version": "nteract-front-end@1.0.0"
    },
    "vscode": {
      "interpreter": {
        "hash": "6bd77c88278e012ef31757c15997a7bea8c943977c43d6909403c00ae11d43ca"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
 }
--- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/forecasting_script.py
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/forecasting_script.py
@@ -0,0 +1,61 @@
 """
 This is the script that is executed on the compute instance. It relies
 on the model.pkl file which is uploaded along with this script to the
 compute instance.
 """
 import argparse
 from azureml.core import Dataset, Run
 import joblib
 from pandas.tseries.frequencies import to_offset
 parser = argparse.ArgumentParser()
 parser.add_argument(
    "--target_column_name",
    type=str,
    dest="target_column_name",
    help="Target Column Name",
 )
 parser.add_argument(
    "--test_dataset", type=str, dest="test_dataset", help="Test Dataset"
 )
 args = parser.parse_args()
 target_column_name = args.target_column_name
 test_dataset_id = args.test_dataset
 run = Run.get_context()
 ws = run.experiment.workspace
 # get the input dataset by id
 test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
 X_test = test_dataset.to_pandas_dataframe().reset_index(drop=True)
 y_test = X_test.pop(target_column_name).values
 # generate forecast
 fitted_model = joblib.load("model.pkl")
 # We have default quantiles values set as below(95th percentile)
 quantiles = [0.025, 0.5, 0.975]
 predicted_column_name = "predicted"
 PI = "prediction_interval"
 fitted_model.quantiles = quantiles
 pred_quantiles = fitted_model.forecast_quantiles(X_test)
 pred_quantiles[PI] = pred_quantiles[[min(quantiles), max(quantiles)]].apply(
    lambda x: "[{}, {}]".format(x[0], x[1]), axis=1
 )
 X_test[target_column_name] = y_test
 X_test[PI] = pred_quantiles[PI]
 X_test[predicted_column_name] = pred_quantiles[0.5]
 # drop rows where prediction or actuals are nan
 # happens because of missing actuals
 # or at edges of time due to lags/rolling windows
 clean = X_test[
    X_test[[target_column_name, predicted_column_name]].notnull().all(axis=1)
 ]
 file_name = "outputs/predictions.csv"
 export_csv = clean.to_csv(file_name, header=True, index=False)  # added Index
 # Upload the predictions into artifacts
 run.upload_file(name=file_name, path_or_stream=file_name)
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
jeff-shepherd	f1aff553c4	Merge pull request #1980 from Man-MSFT/mafong/fairness-dep Remove fairness notebooks	2025-03-14 09:42:02 -07:00
Man Fong	d195a673e2	Remove fairness notebooks	2025-03-13 14:25:59 -07:00
jeff-shepherd	8dce0fa6fe	Merge pull request #1977 from Azure/jeffshep/windowsonnx Pin onnx on Windows	2024-12-16 08:44:42 -08:00
Jeff Shepherd	4e8a240a71	Pin onnx on Windows	2024-12-13 15:51:10 -08:00
jeff-shepherd	5b019e28de	Merge pull request #1976 from Azure/release_update_stablev2/Release-247 update samples from Release-247 as a part of 1.59.0 SDK stable release	2024-12-13 08:50:52 -08:00
amlrelsa-ms	bf4cb1e86c	update samples from Release-247 as a part of 1.59.0 SDK stable release	2024-12-10 17:34:41 +00:00
jeff-shepherd	eaa7c56590	Merge pull request #1974 from Azure/jeffshep/post158sync Remove deprecated sample notebooks	2024-11-04 09:20:56 -08:00
Jeff Shepherd	8fc0fa040d	Remove deprecated sample notebooks	2024-11-01 11:49:20 -07:00
jeff-shepherd	56e13b0b9a	Merge pull request #1972 from Azure/release_update_stablev2/Release-243 update samples from Release-243 as a part of 1.58.0 SDK stable release	2024-10-21 09:03:36 -07:00
amlrelsa-ms	785fe3c962	update samples from Release-243 as a part of 1.58.0 SDK stable release	2024-10-16 17:50:12 +00:00
jeff-shepherd	3c341f6e9a	Merge pull request #1968 from Azure/release_update_stablev2/Release-240 update samples from Release-240 as a part of 1.57.0 SDK stable release	2024-08-08 08:36:05 -07:00
amlrelsa-ms	aae88e87ea	update samples from Release-240 as a part of 1.57.0 SDK stable release	2024-08-05 21:57:46 +00:00
jeff-shepherd	2352e458c7	Merge pull request #1963 from Azure/release_update_stablev2/Release-209 update samples from Release-209 as a part of 1.56.0 SDK stable release	2024-05-16 09:15:57 -07:00
amlrelsa-ms	8373b93887	update samples from Release-209 as a part of 1.56.0 SDK stable release	2024-04-29 18:42:13 +00:00
jeff-shepherd	f0442166cd	Updated curated environments in sample notebooks (#1958 ) * Updated curated environments in sample notebooks * Fixed continuous retraining notebook	2024-02-15 13:01:44 -05:00
jeff-shepherd	33ca8c7933	Merge pull request #1957 from Azure/release_update_stablev2/Release-207 update samples from Release-207 as a part of 1.55.0 SDK stable release	2024-02-07 08:48:02 -08:00
amlrelsa-ms	3fd1ce8993	update samples from Release-207 as a part of 1.55.0 SDK stable release	2024-02-06 19:58:35 +00:00
jeff-shepherd	aa93588190	Merge pull request #1954 from Azure/jeffshep/pinpy38 Temporarily pin back to Python 3.8	2023-12-07 11:03:20 -08:00
Jeff Shepherd	12520400e5	Temporarily pin back to Python 3.8	2023-12-06 13:24:28 -08:00
jeff-shepherd	35614e83fa	Merge pull request #1951 from Azure/release_update_stablev2/Release-200 update samples from Release-200 as a part of 1.54.0 SDK stable release	2023-11-22 18:24:05 -08:00
amlrelsa-ms	ff22ac01cc	update samples from Release-200 as a part of 1.54.0 SDK stable release	2023-11-21 17:51:12 +00:00
jeff-shepherd	e7dd826f34	Merge pull request #1946 from Azure/jeffshep/pinscikit-learn Pin scikit-learn to avoid conflict with azureml-responsibleai	2023-10-23 14:57:13 -07:00
Jeff Shepherd	fcc882174b	Pin scikit-learn to avoid conflict with azureml-responsibleai	2023-10-23 09:53:39 -07:00
jeff-shepherd	6872d8a3bb	Merge pull request #1941 from Azure/jeffshep/updatefor1.53.2 Updated automl_env.yml for Azure ML SDK 1.53.2	2023-10-10 08:49:04 -07:00
Jeff Shepherd	a2cb4c3589	Updated fbprophet to prophet	2023-10-10 08:47:09 -07:00
Jeff Shepherd	15008962b2	Updated automl_env.yml for Azure ML SDK 1.53.2	2023-10-05 19:29:26 -07:00
jeff-shepherd	9414b51fac	Merge pull request #1937 from Azure/jeffshep/fixwindows153 Fixed Windows automl_setup for 1.53.0	2023-08-31 21:56:12 -07:00
Jeff Shepherd	80ac414582	Fixed Windows automl_setup for 1.53.0	2023-08-31 16:54:20 -07:00
jeff-shepherd	cbc151660b	Merge pull request #1936 from Azure/jeffshep/fixtabulardataset Fixed tabular-dataset-partition-per-column.ipynb	2023-08-25 15:34:08 -07:00
Jeff Shepherd	0024abc6e3	Fixed tabular-dataset-partition-per-column.ipynb and removed deploy-to-cloud/model-register-and-deploy.ipynb	2023-08-25 13:52:29 -07:00
jeff-shepherd	fa13385860	Merge pull request #1935 from Azure/release_update_stablev2/Release-193 update samples from Release-193 as a part of 1.53.0 SDK stable release	2023-08-23 11:41:24 -07:00
Jeff Shepherd	0c5f6daf52	Fixed readme syntax	2023-08-23 11:37:30 -07:00
Jeff Shepherd	c11e9fc1da	Fixed readme syntax	2023-08-23 11:36:17 -07:00
Jeff Shepherd	280150713e	Restored V2 message	2023-08-23 10:20:25 -07:00
amlrelsa-ms	bb11c80b1b	update samples from Release-193 as a part of 1.53.0 SDK stable release	2023-08-23 03:24:03 +00:00
Diondra Peck	d0961b98bf	Add disclaimer to README	2023-06-28 15:47:49 -07:00
Paul Shealy	302589b7f9	Merge pull request #1915 from Azure/release_update_stablev2/Release-171 Release update stablev2/release 171 for SDK 1.51.0	2023-06-07 19:19:33 -07:00
amlrelsa-ms	cc85949d6d	update samples from Release-171 as a part of 1.51 SDK stable release	2023-06-06 21:58:24 +05:30
amlrelsa-sa	3a1824e3ad	update samples from Release-170 as a part of 1.51 SDK stable release	2023-06-06 10:50:33 +05:30
Paul Shealy	579643326d	Merge pull request #1911 from diondrapeck/add-deprecation-disclaimer Add repository deprecation disclaimer and pointer to v2 repo	2023-05-25 08:04:29 -07:00
Diondra Peck	14f76f227e	Add deprecation disclaimer	2023-05-23 12:48:14 -07:00
Paul Shealy	25baf5203a	Merge pull request #1899 from Azure/release_update/Release-177 update samples from Release-177 as a part of SDK release	2023-04-17 13:01:27 -07:00
amlrelsa-ms	1178fcb0ba	update samples from Release-177 as a part of SDK release	2023-04-17 10:22:59 +00:00
Sasidhar Kasturi	e4d84c8e45	update samples from Release-169 as a part of 1.50.0 SDK stable release (#1898 ) Co-authored-by: amlrelsa-ms <amlrelsa@microsoft.com>	2023-04-14 10:39:38 -04:00
Harneet Virk	7a3ab1e44c	Merge pull request #1895 from Azure/release_update/Release-175 update samples from Release-175 as a part of SDK release	2023-03-28 10:17:27 -07:00
amlrelsa-ms	598a293dfa	update samples from Release-175 as a part of SDK release	2023-03-28 01:02:26 +00:00
Harneet Virk	40b3068462	Merge pull request #1884 from Azure/release_update_stablev2/Release-166 update samples from Release-166 as a part of 1.49.0 SDK stable release	2023-02-13 21:22:05 -08:00
amlrelsa-ms	0ecbbbce75	update samples from Release-166 as a part of 1.49.0 SDK stable release	2023-02-14 02:46:24 +00:00
Harneet Virk	9b1e130d18	Merge pull request #1867 from Azure/release_update/Release-173 update samples from Release-173 as a part of SDK release	2022-12-19 19:37:41 -08:00
amlrelsa-ms	0e17b33d2a	update samples from Release-173 as a part of SDK release	2022-12-20 03:35:58 +00:00
Harneet Virk	34d80abd26	Merge pull request #1864 from Azure/release_update/Release-172 update samples from Release-172 as a part of SDK release	2022-12-16 09:28:16 -08:00
amlrelsa-ms	249278ab77	update samples from Release-172 as a part of SDK release	2022-12-15 17:32:05 +00:00
Harneet Virk	25fdb17f80	Merge pull request #1862 from Azure/release_update/Release-170 update samples from Release-170 as a part of SDK release	2022-12-06 10:06:06 -08:00
amlrelsa-ms	3a02a27f1e	update samples from Release-170 as a part of SDK release	2022-12-06 03:22:18 +00:00
Harneet Virk	4eed9d529f	Merge pull request #1861 from Azure/release_update/Release-169 update samples from Release-169 as a part of SDK release	2022-12-05 12:33:52 -08:00
amlrelsa-ms	f344d410a2	update samples from Release-169 as a part of SDK release	2022-12-05 20:12:47 +00:00
Harneet Virk	9dc1228063	Merge pull request #1860 from Azure/release_update/Release-168 update samples from Release-168 as a part of SDK release	2022-12-05 09:54:01 -08:00
amlrelsa-ms	4404e62f58	update samples from Release-168 as a part of SDK release	2022-12-05 17:52:07 +00:00
Harneet Virk	38d5743bbb	Merge pull request #1852 from Azure/release_update/Release-167 update samples from Release-167 as a part of SDK release	2022-11-08 11:01:10 -08:00
amlrelsa-ms	0814eee151	update samples from Release-167 as a part of SDK release	2022-11-08 01:17:48 +00:00
Harneet Virk	f45b815221	Merge pull request #1848 from Azure/release_update/Release-166 update samples from Release-166 as a part of SDK release	2022-10-26 12:04:10 -07:00
amlrelsa-ms	bd629ae454	update samples from Release-166 as a part of SDK release	2022-10-26 18:46:34 +00:00
Harneet Virk	41de75a584	Merge pull request #1846 from Azure/release_update_stablev2/Release-156 update samples from Release-156 as a part of 1.47.0 SDK stable release	2022-10-25 21:01:03 -07:00
amlrelsa-ms	96a426dc36	update samples from Release-156 as a part of 1.47.0 SDK stable release	2022-10-25 21:28:24 +00:00
Harneet Virk	824dd40f7e	Merge pull request #1836 from Azure/release_update/Release-165 update samples from Release-165 as a part of SDK release	2022-10-11 13:07:26 -07:00
amlrelsa-ms	fa2e649fe8	update samples from Release-165 as a part of SDK release	2022-10-11 19:33:50 +00:00
Harneet Virk	e25e8e3a41	Merge pull request #1832 from Azure/release_update/Release-164 update samples from Release-164 as a part of SDK release	2022-10-05 11:29:47 -07:00
amlrelsa-ms	aa3670a902	update samples from Release-164 as a part of SDK release	2022-10-05 17:31:10 +00:00
Harneet Virk	ef1f9205ac	Merge pull request #1831 from Azure/release_update_stablev2/Release-153 update samples from Release-153 as a part of 1.46.0 SDK stable release	2022-10-04 15:04:25 -07:00
amlrelsa-ms	3228bbfc63	update samples from Release-153 as a part of 1.46.0 SDK stable release	2022-09-30 17:30:23 +00:00
Harneet Virk	f18a0dfc4d	Merge pull request #1825 from Azure/release_update/Release-163 update samples from Release-163 as a part of SDK release	2022-09-20 14:12:22 -07:00
amlrelsa-ms	badb620261	update samples from Release-163 as a part of SDK release	2022-09-20 21:11:25 +00:00
Harneet Virk	acf46100ae	Merge pull request #1817 from Azure/release_update/Release-161 update samples from Release-161 as a part of SDK release	2022-09-16 15:54:11 -07:00
amlrelsa-ms	cf2e3804d5	update samples from Release-161 as a part of SDK release	2022-09-16 20:16:37 +00:00
Harneet Virk	b7be42357f	Merge pull request #1814 from Azure/release_update/Release-160 update samples from Release-160 as a part of SDK release	2022-09-12 18:57:44 -07:00
amlrelsa-ms	3ac82c07ae	update samples from Release-160 as a part of SDK release	2022-09-13 01:24:40 +00:00
Harneet Virk	9743c0a1fa	Merge pull request #1755 from Azure/users/GitHubPolicyService/11f57c70-4141-4c68-9224-aceb8eab1c48 Adding Microsoft SECURITY.MD	2022-09-06 16:52:36 -07:00
Harneet Virk	ba4dac530e	Merge pull request #1808 from Azure/release_update/Release-157 update samples from Release-157 as a part of SDK release	2022-09-06 16:33:03 -07:00
amlrelsa-ms	7f7f0040fd	update samples from Release-157 as a part of SDK release	2022-09-06 23:16:24 +00:00
Harneet Virk	9ca567cd9c	Merge pull request #1802 from Azure/release_update/Release-156 update samples from Release-156 as a part of SDK release	2022-08-18 17:23:55 -07:00
amlrelsa-ms	ae7b234ba0	update samples from Release-156 as a part of SDK release	2022-08-18 23:57:09 +00:00
Harneet Virk	9788d1965f	Merge pull request #1799 from Azure/release_update/Release-155 update samples from Release-155 as a part of SDK release	2022-08-12 14:18:11 -07:00
amlrelsa-ms	387e43a423	update samples from Release-155 as a part of SDK release	2022-08-12 20:38:16 +00:00
Harneet Virk	25f407fc81	Merge pull request #1796 from Azure/release_update/Release-154 update samples from Release-154 as a part of SDK release	2022-08-10 11:36:05 -07:00
amlrelsa-ms	dcb2c4638f	update samples from Release-154 as a part of SDK release	2022-08-10 18:10:45 +00:00
Harneet Virk	7fb5dd3ef9	Merge pull request #1795 from Azure/release_update/Release-153 update samples from Release-153 as a part of SDK release	2022-08-09 15:39:30 -07:00
amlrelsa-ms	6a38f4bec3	update samples from Release-153 as a part of SDK release	2022-08-09 21:50:34 +00:00
Harneet Virk	aed078aeab	Merge pull request #1793 from Azure/release_update/Release-152 update samples from Release-152 as a part of SDK release	2022-08-08 11:51:52 -07:00
amlrelsa-ms	f999f41ed3	update samples from Release-152 as a part of SDK release	2022-08-08 17:27:37 +00:00
Harneet Virk	07e43ee7e4	Merge pull request #1791 from Azure/release_update/Release-151 update samples from Release-151 as a part of SDK release	2022-08-05 13:12:57 -07:00
amlrelsa-ms	aac706c3f0	update samples from Release-151 as a part of SDK release	2022-08-05 20:01:34 +00:00
Harneet Virk	4ccb278051	Merge pull request #1789 from Azure/release_update/Release-150 update samples from Release-150 as a part of SDK release	2022-08-04 12:08:14 -07:00
amlrelsa-ms	64a733480b	update samples from Release-150 as a part of SDK release	2022-08-03 16:29:31 +00:00
Harneet Virk	dd0976f678	Merge pull request #1779 from Azure/release_update/Release-149 update samples from Release-149 as a part of SDK release	2022-07-07 08:37:35 -07:00
amlrelsa-ms	15a3ca649d	update samples from Release-149 as a part of SDK release	2022-07-07 00:18:42 +00:00
Harneet Virk	3c4770cfe5	Merge pull request #1776 from Azure/release_update/Release-148 update samples from Release-148 as a part of SDK release	2022-07-01 13:41:03 -07:00
amlrelsa-ms	8d7de05908	update samples from Release-148 as a part of SDK release	2022-07-01 20:40:11 +00:00
Harneet Virk	863faae57f	Merge pull request #1772 from Azure/release_update/Release-147 Update samples from Release-147 as a part of SDK release 1.43	2022-06-27 10:32:58 -07:00
amlrelsa-ms	8d3f5adcdb	update samples from Release-147 as a part of SDK release	2022-06-27 17:29:38 +00:00
Harneet Virk	cd3394e129	Merge pull request #1771 from Azure/release_update/Release-146 update samples from Release-146 as a part of SDK release	2022-06-20 14:31:06 -07:00
amlrelsa-ms	ee5d0239a3	update samples from Release-146 as a part of SDK release	2022-06-20 20:45:50 +00:00
Harneet Virk	388111cedc	Merge pull request #1763 from Azure/release_update/Release-144 update samples from Release-144 as a part of SDK release	2022-06-03 11:04:13 -07:00
amlrelsa-ms	b86191ed7f	update samples from Release-144 as a part of SDK release	2022-06-03 17:28:37 +00:00
Harneet Virk	22753486de	Merge pull request #1762 from Azure/release_update/Release-143 update samples from Release-143 as a part of SDK release	2022-06-01 11:29:19 -07:00
amlrelsa-ms	cf1d1dbf01	update samples from Release-143 as a part of SDK release	2022-06-01 17:26:59 +00:00
Harneet Virk	2e45d9800d	Merge pull request #1758 from Azure/release_update/Release-142 update samples from Release-142 as a part of SDK release	2022-05-27 15:44:52 -07:00
amlrelsa-ms	a9a8de02ec	update samples from Release-142 as a part of SDK release	2022-05-27 18:58:51 +00:00
microsoft-github-policy-service[bot]	e0c9376aab	Microsoft mandatory file	2022-05-25 17:12:16 +00:00
Harneet Virk	dd8339e650	Merge pull request #1754 from Azure/release_update/Release-141 update samples from Release-141 as a part of SDK release	2022-05-25 10:12:10 -07:00
amlrelsa-ms	1594ee64a1	update samples from Release-141 as a part of SDK release	2022-05-25 16:56:26 +00:00
Harneet Virk	83ed8222d2	Merge pull request #1750 from Azure/release_update/Release-140 update samples from Release-140 as a part of SDK release	2022-05-04 16:16:28 -07:00
amlrelsa-ms	b0aa91acce	update samples from Release-140 as a part of SDK release	2022-05-04 23:01:56 +00:00
Harneet Virk	5928ba83bb	Merge pull request #1748 from Azure/release_update/Release-138 update samples from Release-138 as a part of SDK release	2022-04-29 10:40:01 -07:00
amlrelsa-ms	ffa3a43979	update samples from Release-138 as a part of SDK release	2022-04-29 17:09:13 +00:00
Harneet Virk	7ce79a43f1	Merge pull request #1746 from Azure/release_update/Release-137 update samples from Release-137 as a part of SDK release	2022-04-27 11:50:44 -07:00
amlrelsa-ms	edcc50ab0c	update samples from Release-137 as a part of SDK release	2022-04-27 17:59:44 +00:00
Harneet Virk	4a391522d0	Merge pull request #1742 from Azure/release_update/Release-136 update samples from Release-136 as a part of SDK release	2022-04-25 13:16:03 -07:00
amlrelsa-ms	1903f78285	update samples from Release-136 as a part of SDK release	2022-04-25 17:08:42 +00:00
Harneet Virk	a4dfcc4693	Merge pull request #1730 from Azure/release_update/Release-135 update samples from Release-135 as a part of SDK release	2022-04-04 14:47:18 -07:00
amlrelsa-ms	faffb3fef7	update samples from Release-135 as a part of SDK release	2022-04-04 20:15:29 +00:00
Harneet Virk	6c6227c403	Merge pull request #1729 from rezasherafat/rl_notebook_update add docker subfolder to pong notebook directly.	2022-03-30 16:05:10 -07:00
Reza Sherafat	e3be364e7a	add docker subfolder to pong notebook directly.	2022-03-30 22:47:50 +00:00
Harneet Virk	90e20a60e9	Merge pull request #1726 from Azure/release_update/Release-131 update samples from Release-131 as a part of SDK release	2022-03-29 19:32:11 -07:00
amlrelsa-ms	33a4eacf1d	update samples from Release-131 as a part of SDK release	2022-03-30 02:26:53 +00:00
Harneet Virk	e30b53fddc	Merge pull request #1725 from Azure/release_update/Release-130 update samples from Release-130 as a part of SDK release	2022-03-29 15:41:28 -07:00
amlrelsa-ms	95b0392ed2	update samples from Release-130 as a part of SDK release	2022-03-29 22:33:38 +00:00
Harneet Virk	796798cb49	Merge pull request #1724 from Azure/release_update/Release-129 update samples from Release-129 as a part of 1.40.0 SDK release	2022-03-29 12:18:30 -07:00