update samples from Release-130 as a part of SDK release

2025-12-19 17:17:04 -05:00 · 2022-03-29 22:33:38 +00:00
parent 796798cb49
commit 95b0392ed2
534 changed files with 151904 additions and 27048 deletions
--- a/contrib/RAPIDS/README.md
+++ b/contrib/RAPIDS/README.md
@@ -0,0 +1,305 @@
+## How to use the RAPIDS on AzureML materials
+### Setting up requirements
+The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2  (pip install progressbar2)</span>
+  
+After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
+
+The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.  
+
+&nbsp;  
+### Getting and running the material 
+Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory: 
+
+* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
+
+On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook. 
+
+* (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
+
+From the resulting browser at http://localhost:8888/tree, navigate to the master notebook: 
+
+* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
+
+&nbsp;  
+The following notebook will appear:  
+
+![](imgs/NotebookHome.png)
+
+&nbsp;  
+### Master Jupyter Notebook
+The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
+
+The first couple of functional steps import the necessary AzureML libraries.  If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
+
+&nbsp;  
+#### Setting up a Workspace
+The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
+
+![](imgs/WorkSpaceSetUp.png)
+
+
+It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
+
+    subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
+    resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
+    workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
+    workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
+
+&nbsp;  
+The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
+
+The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
+
+![](imgs/saved_workspace.png)
+
+Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
+
+* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
+
+&nbsp;  
+#### Creating an AML Compute Target 
+Following step, creates an AML Compute Target 
+
+![](imgs/target_creation.png)
+
+Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
+
+&nbsp;  
+If the output of running the step has an error of the form:
+
+![](imgs/targeterror1.png)
+
+It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family. 
+You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
+
+&nbsp;  
+Another possible error is the following: 
+
+![](imgs/targeterror2.png)
+
+Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
+
+The successful creation of the compute target would have an output like the following:
+
+![](imgs/targetsuccess.png)
+&nbsp;  
+#### RAPIDS script uploading and viewing
+The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail). 
+If the user wants to use a different RAPIDS script, the references to the  <span style="font-family: Courier New;">process_data.py</span> script have to be changed
+
+![](imgs/scriptuploading.png)
+&nbsp;  
+#### Data Uploading
+The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
+
+The next few steps download and decompress the data and is made available to the  script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
+
+&nbsp;  
+The following functions are used to download and decompress the input data
+
+
+![](imgs/dcf1.png)
+![](imgs/dcf2.png)
+![](imgs/dcf3.png)
+![](imgs/dcf4.png)
+
+&nbsp;  
+The next step uses those functions to download locally file: 
+http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
+And to decompress it, into local folder path = .\mortgage_2000-2001
+The step takes several minutes, the intermediate outputs provide progress indicators.
+
+![](imgs/downamddecom.png)
+
+&nbsp;  
+The decompressed data should have the following structure:
+* .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt 
+* .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt 
+* .\mortgage_2000-2001\names.csv
+
+The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later. 
+
+&nbsp;  
+The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
+
+The step takes several minutes to load the data, the output provides a progress indicator.
+
+![](imgs/datastore.png)
+
+Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference  
+
+&nbsp;  
+#### Setting up required libraries and environment to run RAPIDS code
+There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
+
+![](imgs/install2.png)
+
+&nbsp;  
+#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
+
+The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>;  a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
+
+![](imgs/wrapper.png)
+
+&nbsp;  
+The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
+In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
+
+
+As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
+If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
+
+&nbsp;  
+#### Submitting Experiments
+We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
+
+&nbsp;  
+The following couple of steps submit experiments under different conditions. 
+
+![](imgs/submission1.png)
+
+&nbsp;  
+The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
+
+&nbsp;  
+If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear 
+
+![](imgs/queue.png)
+
+&nbsp;  
+When the experiment starts running, its status would appeared as Running and the output would change to something like this:
+
+![](imgs/running.png)
+
+&nbsp;  
+#### Reproducing the performance gains plot results on the Blog Post
+When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
+
+&nbsp; 
+![](imgs/completed.png)
+
+Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+
+![](imgs/2GPUs.png)
+
+
+This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/3GPUs.png)
+
+This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/4gpus.png)
+
+This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
+
+&nbsp;  
+![](imgs/CPUBase.png)
+
+This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/OOM.png)
+
+This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed. 
+
+&nbsp;  
+##### Freeing Resources
+In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
+
+![](imgs/clusterdelete.png)
+
+&nbsp;  
+### RAPIDS Script
+The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
+
+The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
+
+The all internal functions to be used by the script are defined.
+
+&nbsp;  
+#### Wrapper Auxiliary Functions:
+The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
+
+![](imgs/wap1.png)![](imgs/wap2.png)
+
+&nbsp;  
+A couple of other functions are wrappers for the submission of jobs to the DASK client:
+
+![](imgs/wap3.png)
+![](imgs/wap4.png)
+
+&nbsp;  
+#### Data Loading Functions:
+The data is loaded through the use of the following three functions 
+
+![](imgs/DLF1.png)![](imgs/DLF2.png)![](imgs/DLF3.png)
+
+All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
+
+&nbsp;  
+#### Data Transformation and Feature Extraction Functions:
+The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
+![](imgs/fef1.png)![](imgs/fef2.png)![](imgs/fef3.png)![](imgs/fef4.png)![](imgs/fef5.png)
+
+![](imgs/fef6.png)![](imgs/fef7.png)![](imgs/fef8.png)![](imgs/fef9.png)
+
+&nbsp;  
+#### Main() Function
+The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
+
+&nbsp;  
+##### Setting Up DASK client:
+The following lines:
+
+![](imgs/daskini.png)
+
+&nbsp;  
+Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
+
+![](imgs/daskoutput.png)
+
+##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
+
+![](imgs/ETL.png)
+
+&nbsp;  
+##### Concentrating the data assigned to each DASK worker
+The partitions assigned to each worker are concatenated and set up for training.
+
+![](imgs/Dask2.png)
+
+&nbsp;  
+##### Setting Training Parameters
+The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
+![](imgs/PArameters.png)
+
+Notice how the parameters are modified when using the CPU-only mode.
+
+&nbsp;  
+##### Launching the training of a gradient boosted decision tree model using XGBoost.
+
+![](imgs/training.png)
+
+The outputs of the script can be observed in the master notebook as the script is executed
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
--- a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
+++ b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
@@ -0,0 +1,547 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+        "\n",
+        "Licensed under the MIT License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids/azure-ml-with-nvidia-rapids.png)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# NVIDIA RAPIDS in Azure Machine Learning"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL\u00c2\u00a0and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model\u00c3\u201a\u00c2\u00a0in Azure.\n",
+        " \n",
+        "In this notebook, we will do the following:\n",
+        " \n",
+        "* Create an Azure Machine Learning Workspace\n",
+        "* Create an AMLCompute target\n",
+        "* Use a script to process our data and train a model\n",
+        "* Obtain the data required to run this sample\n",
+        "* Create an AML run configuration to launch a machine learning job\n",
+        "* Run the script to prepare data for training and train the model\n",
+        " \n",
+        "Prerequisites:\n",
+        "* An Azure subscription to create a Machine Learning Workspace\n",
+        "* Familiarity with the Azure ML SDK (refer to [notebook samples](https://github.com/Azure/MachineLearningNotebooks))\n",
+        "* A Jupyter notebook environment with Azure Machine Learning SDK installed. Refer to instructions to [setup the environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Verify if Azure ML SDK is installed"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import azureml.core\n",
+        "print(\"SDK version:\", azureml.core.VERSION)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import os\n",
+        "from azureml.core import Workspace, Experiment\n",
+        "from azureml.core.conda_dependencies import CondaDependencies\n",
+        "from azureml.core.compute import AmlCompute, ComputeTarget\n",
+        "from azureml.data.data_reference import DataReference\n",
+        "from azureml.core.runconfig import RunConfiguration\n",
+        "from azureml.core import ScriptRunConfig\n",
+        "from azureml.widgets import RunDetails"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create Azure ML Workspace"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The following step is optional if you already have a workspace. If you want to use an existing workspace, then\n",
+        "skip this workspace creation step and move on to the next step to load the workspace.\n",
+        " \n",
+        "<font color='red'>Important</font>: in the code cell below, be sure to set the correct values for the subscription_id, \n",
+        "resource_group, workspace_name, region before executing this code cell."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "subscription_id = os.environ.get(\"SUBSCRIPTION_ID\", \"<subscription_id>\")\n",
+        "resource_group = os.environ.get(\"RESOURCE_GROUP\", \"<resource_group>\")\n",
+        "workspace_name = os.environ.get(\"WORKSPACE_NAME\", \"<workspace_name>\")\n",
+        "workspace_region = os.environ.get(\"WORKSPACE_REGION\", \"<region>\")\n",
+        "\n",
+        "ws = Workspace.create(workspace_name, subscription_id=subscription_id, resource_group=resource_group, location=workspace_region)\n",
+        "\n",
+        "# write config to a local directory for future use\n",
+        "ws.write_config()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Load existing Workspace"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "ws = Workspace.from_config()\n",
+        "\n",
+        "# if a locally-saved configuration file for the workspace is not available, use the following to load workspace\n",
+        "# ws = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)\n",
+        "\n",
+        "print('Workspace name: ' + ws.name, \n",
+        "      'Azure region: ' + ws.location, \n",
+        "      'Subscription id: ' + ws.subscription_id, \n",
+        "      'Resource group: ' + ws.resource_group, sep = '\\n')\n",
+        "\n",
+        "scripts_folder = \"scripts_folder\"\n",
+        "\n",
+        "if not os.path.isdir(scripts_folder):\n",
+        "    os.mkdir(scripts_folder)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create AML Compute Target"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Because NVIDIA RAPIDS requires P40 or V100 GPUs, the user needs to specify compute targets from one of [NC_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series), [NC_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview) virtual machine types in Azure; these are the families of virtual machines in Azure that are provisioned with these GPUs.\n",
+        " \n",
+        "Pick one of the supported VM SKUs based on the number of GPUs you want to use for ETL and training in RAPIDS.\n",
+        " \n",
+        "The script in this notebook is implemented for single-machine scenarios. An example supporting multiple nodes will be published later."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "gpu_cluster_name = \"gpucluster\"\n",
+        "\n",
+        "if gpu_cluster_name in ws.compute_targets:\n",
+        "    gpu_cluster = ws.compute_targets[gpu_cluster_name]\n",
+        "    if gpu_cluster and type(gpu_cluster) is AmlCompute:\n",
+        "        print('Found compute target. Will use {0} '.format(gpu_cluster_name))\n",
+        "else:\n",
+        "    print(\"creating new cluster\")\n",
+        "    # vm_size parameter below could be modified to one of the RAPIDS-supported VM types\n",
+        "    provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_NC6s_v2\", min_nodes=1, max_nodes = 1)\n",
+        "\n",
+        "    # create the cluster\n",
+        "    gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n",
+        "    gpu_cluster.wait_for_completion(show_output=True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Script to process data and train model"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# copy process_data.py into the script folder\n",
+        "import shutil\n",
+        "shutil.copy('./process_data.py', os.path.join(scripts_folder, 'process_data.py'))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Data required to run this sample"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Downloading Data"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "import tarfile\n",
+        "import hashlib\n",
+        "from urllib.request import urlretrieve\n",
+        "\n",
+        "def validate_downloaded_data(path):\n",
+        "    if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n",
+        "        if(os.path.isdir(path + '//acq' ) and len(os.listdir(path + '//acq')) == 8):\n",
+        "            if(os.path.isdir(path + '//perf' ) and len(os.listdir(path + '//perf')) == 11):\n",
+        "                print(\"Data has been downloaded and decompressed at: {0}\".format(path))\n",
+        "                return True\n",
+        "    print(\"Data has not been downloaded and decompressed\")\n",
+        "    return False\n",
+        "\n",
+        "def show_progress(count, block_size, total_size):\n",
+        "    global pbar\n",
+        "    global processed\n",
+        "    \n",
+        "    if count == 0:\n",
+        "        pbar = ProgressBar(maxval=total_size)\n",
+        "        processed = 0\n",
+        "    \n",
+        "    processed += block_size\n",
+        "    processed = min(processed,total_size)\n",
+        "    pbar.update(processed)\n",
+        "\n",
+        "        \n",
+        "def download_file(fileroot):\n",
+        "    filename = fileroot + '.tgz'\n",
+        "    if(not os.path.exists(filename) or hashlib.md5(open(filename, 'rb').read()).hexdigest() != '82dd47135053303e9526c2d5c43befd5' ):\n",
+        "        url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n",
+        "        url = url_format.format(fileroot)\n",
+        "        print(\"...Downloading file :{0}\".format(filename))\n",
+        "        urlretrieve(url, filename)\n",
+        "        pbar.finish()\n",
+        "        print(\"...File :{0} finished downloading\".format(filename))\n",
+        "    else:\n",
+        "        print(\"...File :{0} has been downloaded already\".format(filename))\n",
+        "    return filename\n",
+        "\n",
+        "def decompress_file(filename,path):\n",
+        "    tar = tarfile.open(filename)\n",
+        "    print(\"...Getting information from {0} about files to decompress\".format(filename))\n",
+        "    members = tar.getmembers()\n",
+        "    numFiles = len(members)\n",
+        "    so_far = 0\n",
+        "    for member_info in members:\n",
+        "        tar.extract(member_info,path=path)\n",
+        "        so_far += 1\n",
+        "    print(\"...All {0} files have been decompressed\".format(numFiles))\n",
+        "    tar.close()"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "fileroot = 'mortgage_2000-2001'\n",
+        "path = '.\\\\{0}'.format(fileroot)\n",
+        "pbar = None\n",
+        "processed = 0\n",
+        "\n",
+        "if(not validate_downloaded_data(path)):\n",
+        "    print(\"Downloading and Decompressing Input Data\")\n",
+        "    filename = download_file(fileroot)\n",
+        "    decompress_file(filename,path)\n",
+        "    print(\"Input Data has been Downloaded and Decompressed\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Uploading Data to Workspace"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "ds = ws.get_default_datastore()\n",
+        "\n",
+        "# download and uncompress data in a local directory before uploading to data store\n",
+        "# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n",
+        "\n",
+        "# ---->>>> UNCOMMENT THE BELOW LINE TO UPLOAD YOUR DATA IF NOT DONE SO ALREADY <<<<----\n",
+        "# ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n",
+        "\n",
+        "# data already uploaded to the datastore\n",
+        "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Create AML run configuration to launch a machine learning job"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
+        "1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n",
+        "2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n",
+        "\n",
+        "The second option is the recommended option in AML. \n",
+        "The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### Specify prebuilt conda environment"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "The following code shows how to install RAPIDS using conda. The `rapids.yml` file contains the list of packages necessary to run this tutorial. **NOTE:** Initial build of the image might take up to 20 minutes as the service needs to build and cache the new image; once the image is built the subequent runs use the cached image and the overhead is minimal."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
+        "run_config = RunConfiguration(conda_dependencies=cd)\n",
+        "run_config.framework = 'python'\n",
+        "run_config.target = gpu_cluster_name\n",
+        "run_config.environment.docker.enabled = True\n",
+        "run_config.environment.docker.gpu_support = True\n",
+        "run_config.environment.docker.base_image = \"mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu20.04\"\n",
+        "run_config.environment.spark.precache_packages = False\n",
+        "run_config.data_references={'data':data_ref.to_config()}"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "#### Using Docker"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Alternatively, you can specify RAPIDS Docker image."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# run_config = RunConfiguration()\n",
+        "# run_config.framework = 'python'\n",
+        "# run_config.environment.python.user_managed_dependencies = True\n",
+        "# run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n",
+        "# run_config.target = gpu_cluster_name\n",
+        "# run_config.environment.docker.enabled = True\n",
+        "# run_config.environment.docker.gpu_support = True\n",
+        "# run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n",
+        "# # run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
+        "# # run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
+        "# # run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
+        "# run_config.environment.spark.precache_packages = False\n",
+        "# run_config.data_references={'data':data_ref.to_config()}"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Wrapper function to submit Azure Machine Learning experiment"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n",
+        "# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n",
+        "def run_rapids_experiment(cpu_training, gpu_count, part_count):\n",
+        "    # any value between 1-4 is allowed here depending the type of VMs available in gpu_cluster\n",
+        "    if gpu_count not in [1, 2, 3, 4]:\n",
+        "        raise Exception('Value specified for the number of GPUs to use {0} is invalid'.format(gpu_count))\n",
+        "\n",
+        "    # following data partition mapping is empirical (specific to GPUs used and current data partitioning scheme) and may need to be tweaked\n",
+        "    max_gpu_count_data_partition_mapping = {1: 3, 2: 4, 3: 6, 4: 8}\n",
+        "    \n",
+        "    if part_count > max_gpu_count_data_partition_mapping[gpu_count]:\n",
+        "        print(\"Too many partitions for the number of GPUs, exceeding memory threshold\")\n",
+        "        \n",
+        "    if part_count > 11:\n",
+        "        print(\"Warning: Maximum number of partitions available is 11\")\n",
+        "        part_count = 11\n",
+        "        \n",
+        "    end_year = 2000\n",
+        "    \n",
+        "    if part_count > 4:\n",
+        "        end_year = 2001 # use more data with more GPUs\n",
+        "\n",
+        "    src = ScriptRunConfig(source_directory=scripts_folder, \n",
+        "                          script='process_data.py', \n",
+        "                          arguments = ['--num_gpu', gpu_count, '--data_dir', str(data_ref),\n",
+        "                                      '--part_count', part_count, '--end_year', end_year,\n",
+        "                                      '--cpu_predictor', cpu_training\n",
+        "                                      ],\n",
+        "                          run_config=run_config\n",
+        "                         )\n",
+        "\n",
+        "    exp = Experiment(ws, 'rapidstest')\n",
+        "    run = exp.submit(config=src)\n",
+        "    RunDetails(run).show()\n",
+        "    return run"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Submit experiment (ETL & training on GPU)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "cpu_predictor = False\n",
+        "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
+        "num_gpu = 1\n",
+        "data_part_count = 1\n",
+        "# train using CPU, use GPU for both ETL and training\n",
+        "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Submit experiment (ETL on GPU, training on CPU)\n",
+        "\n",
+        "To observe performance difference between GPU-accelerated RAPIDS based training with CPU-only training, set 'cpu_predictor' predictor to 'True' and rerun the experiment"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "cpu_predictor = True\n",
+        "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
+        "num_gpu = 1\n",
+        "data_part_count = 1\n",
+        "# train using CPU, use GPU for ETL\n",
+        "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### Delete cluster"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# delete the cluster\n",
+        "# gpu_cluster.delete()"
+      ]
+    }
+  ],
+  "metadata": {
+    "authors": [
+      {
+        "name": "ksivas"
+      }
+    ],
+    "kernelspec": {
+      "display_name": "Python 3.6",
+      "language": "python",
+      "name": "python36"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.6.8"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 4
+}
--- a/contrib/RAPIDS/imgs/2GPUs.png
+++ b/contrib/RAPIDS/imgs/2GPUs.png
--- a/contrib/RAPIDS/imgs/3GPUs.png
+++ b/contrib/RAPIDS/imgs/3GPUs.png
--- a/contrib/RAPIDS/imgs/4gpus.png
+++ b/contrib/RAPIDS/imgs/4gpus.png
--- a/contrib/RAPIDS/imgs/CPUBase.png
+++ b/contrib/RAPIDS/imgs/CPUBase.png
--- a/contrib/RAPIDS/imgs/DLF1.png
+++ b/contrib/RAPIDS/imgs/DLF1.png
--- a/contrib/RAPIDS/imgs/DLF2.png
+++ b/contrib/RAPIDS/imgs/DLF2.png
--- a/contrib/RAPIDS/imgs/DLF3.png
+++ b/contrib/RAPIDS/imgs/DLF3.png
--- a/contrib/RAPIDS/imgs/Dask2.png
+++ b/contrib/RAPIDS/imgs/Dask2.png
--- a/contrib/RAPIDS/imgs/ETL.png
+++ b/contrib/RAPIDS/imgs/ETL.png
--- a/contrib/RAPIDS/imgs/NotebookHome.png
+++ b/contrib/RAPIDS/imgs/NotebookHome.png
--- a/contrib/RAPIDS/imgs/OOM.png
+++ b/contrib/RAPIDS/imgs/OOM.png
--- a/contrib/RAPIDS/imgs/PArameters.png
+++ b/contrib/RAPIDS/imgs/PArameters.png
--- a/contrib/RAPIDS/imgs/WorkSpaceSetUp.png
+++ b/contrib/RAPIDS/imgs/WorkSpaceSetUp.png
--- a/contrib/RAPIDS/imgs/clusterdelete.png
+++ b/contrib/RAPIDS/imgs/clusterdelete.png
--- a/contrib/RAPIDS/imgs/completed.png
+++ b/contrib/RAPIDS/imgs/completed.png
--- a/contrib/RAPIDS/imgs/daskini.png
+++ b/contrib/RAPIDS/imgs/daskini.png
--- a/contrib/RAPIDS/imgs/daskoutput.png
+++ b/contrib/RAPIDS/imgs/daskoutput.png
--- a/contrib/RAPIDS/imgs/datastore.png
+++ b/contrib/RAPIDS/imgs/datastore.png
--- a/contrib/RAPIDS/imgs/dcf1.png
+++ b/contrib/RAPIDS/imgs/dcf1.png
--- a/contrib/RAPIDS/imgs/dcf2.png
+++ b/contrib/RAPIDS/imgs/dcf2.png
--- a/contrib/RAPIDS/imgs/dcf3.png
+++ b/contrib/RAPIDS/imgs/dcf3.png
--- a/contrib/RAPIDS/imgs/dcf4.png
+++ b/contrib/RAPIDS/imgs/dcf4.png
--- a/contrib/RAPIDS/imgs/downamddecom.png
+++ b/contrib/RAPIDS/imgs/downamddecom.png
--- a/contrib/RAPIDS/imgs/fef1.png
+++ b/contrib/RAPIDS/imgs/fef1.png
--- a/contrib/RAPIDS/imgs/fef2.png
+++ b/contrib/RAPIDS/imgs/fef2.png
--- a/contrib/RAPIDS/imgs/fef3.png
+++ b/contrib/RAPIDS/imgs/fef3.png
--- a/contrib/RAPIDS/imgs/fef4.png
+++ b/contrib/RAPIDS/imgs/fef4.png
--- a/contrib/RAPIDS/imgs/fef5.png
+++ b/contrib/RAPIDS/imgs/fef5.png
--- a/contrib/RAPIDS/imgs/fef6.png
+++ b/contrib/RAPIDS/imgs/fef6.png
--- a/contrib/RAPIDS/imgs/fef7.png
+++ b/contrib/RAPIDS/imgs/fef7.png
--- a/contrib/RAPIDS/imgs/fef8.png
+++ b/contrib/RAPIDS/imgs/fef8.png
--- a/contrib/RAPIDS/imgs/fef9.png
+++ b/contrib/RAPIDS/imgs/fef9.png
--- a/contrib/RAPIDS/imgs/install2.png
+++ b/contrib/RAPIDS/imgs/install2.png
--- a/contrib/RAPIDS/imgs/installation.png
+++ b/contrib/RAPIDS/imgs/installation.png
--- a/contrib/RAPIDS/imgs/queue.png
+++ b/contrib/RAPIDS/imgs/queue.png
--- a/contrib/RAPIDS/imgs/running.png
+++ b/contrib/RAPIDS/imgs/running.png
--- a/contrib/RAPIDS/imgs/saved_workspace.png
+++ b/contrib/RAPIDS/imgs/saved_workspace.png
--- a/contrib/RAPIDS/imgs/scriptuploading.png
+++ b/contrib/RAPIDS/imgs/scriptuploading.png
--- a/contrib/RAPIDS/imgs/submission1.png
+++ b/contrib/RAPIDS/imgs/submission1.png
--- a/contrib/RAPIDS/imgs/target_creation.png
+++ b/contrib/RAPIDS/imgs/target_creation.png
--- a/contrib/RAPIDS/imgs/targeterror1.png
+++ b/contrib/RAPIDS/imgs/targeterror1.png
--- a/contrib/RAPIDS/imgs/targeterror2.png
+++ b/contrib/RAPIDS/imgs/targeterror2.png
--- a/contrib/RAPIDS/imgs/targetsuccess.png
+++ b/contrib/RAPIDS/imgs/targetsuccess.png
--- a/contrib/RAPIDS/imgs/training.png
+++ b/contrib/RAPIDS/imgs/training.png
--- a/contrib/RAPIDS/imgs/wap1.png
+++ b/contrib/RAPIDS/imgs/wap1.png
--- a/contrib/RAPIDS/imgs/wap2.png
+++ b/contrib/RAPIDS/imgs/wap2.png
--- a/contrib/RAPIDS/imgs/wap3.png
+++ b/contrib/RAPIDS/imgs/wap3.png
--- a/contrib/RAPIDS/imgs/wap4.png
+++ b/contrib/RAPIDS/imgs/wap4.png
--- a/contrib/RAPIDS/imgs/wrapper.png
+++ b/contrib/RAPIDS/imgs/wrapper.png
--- a/contrib/RAPIDS/process_data.py
+++ b/contrib/RAPIDS/process_data.py
@@ -0,0 +1,470 @@
+import numpy as np
+import datetime
+import dask_xgboost as dxgb_gpu
+import dask
+import dask_cudf
+from dask_cuda import LocalCUDACluster
+from dask.delayed import delayed
+from dask.distributed import Client, wait
+import xgboost as xgb
+import cudf
+from cudf.dataframe import DataFrame
+from collections import OrderedDict
+import gc
+from glob import glob
+import os
+import argparse
+
+def run_dask_task(func, **kwargs):
+    task = func(**kwargs)
+    return task
+
+def process_quarter_gpu(client, col_names_path, acq_data_path, year=2000, quarter=1, perf_file=""):
+    dask_client = client
+    ml_arrays = run_dask_task(delayed(run_gpu_workflow),
+                                          col_path=col_names_path,
+                                          acq_path=acq_data_path,
+                                          quarter=quarter,
+                                          year=year,
+                                          perf_file=perf_file)
+    return dask_client.compute(ml_arrays,
+                          optimize_graph=False,
+                          fifo_timeout="0ms")
+
+def null_workaround(df, **kwargs):
+    for column, data_type in df.dtypes.items():
+        if str(data_type) == "category":
+            df[column] = df[column].astype('int32').fillna(-1)
+        if str(data_type) in ['int8', 'int16', 'int32', 'int64', 'float32', 'float64']:
+            df[column] = df[column].fillna(-1)
+    return df
+
+def run_gpu_workflow(col_path, acq_path, quarter=1, year=2000, perf_file="", **kwargs):
+    names = gpu_load_names(col_path=col_path)
+    acq_gdf = gpu_load_acquisition_csv(acquisition_path= acq_path + "/Acquisition_"
+                                      + str(year) + "Q" + str(quarter) + ".txt")
+    acq_gdf = acq_gdf.merge(names, how='left', on=['seller_name'])
+    acq_gdf.drop_column('seller_name')
+    acq_gdf['seller_name'] = acq_gdf['new']
+    acq_gdf.drop_column('new')
+    perf_df_tmp = gpu_load_performance_csv(perf_file)
+    gdf = perf_df_tmp
+    everdf = create_ever_features(gdf)
+    delinq_merge = create_delinq_features(gdf)
+    everdf = join_ever_delinq_features(everdf, delinq_merge)
+    del(delinq_merge)
+    joined_df = create_joined_df(gdf, everdf)
+    testdf = create_12_mon_features(joined_df)
+    joined_df = combine_joined_12_mon(joined_df, testdf)
+    del(testdf)
+    perf_df = final_performance_delinquency(gdf, joined_df)
+    del(gdf, joined_df)
+    final_gdf = join_perf_acq_gdfs(perf_df, acq_gdf)
+    del(perf_df)
+    del(acq_gdf)
+    final_gdf = last_mile_cleaning(final_gdf)
+    return final_gdf
+
+def gpu_load_performance_csv(performance_path, **kwargs):
+    """ Loads performance data
+
+    Returns
+    -------
+    GPU DataFrame
+    """
+    
+    cols = [
+        "loan_id", "monthly_reporting_period", "servicer", "interest_rate", "current_actual_upb",
+        "loan_age", "remaining_months_to_legal_maturity", "adj_remaining_months_to_maturity",
+        "maturity_date", "msa", "current_loan_delinquency_status", "mod_flag", "zero_balance_code",
+        "zero_balance_effective_date", "last_paid_installment_date", "foreclosed_after",
+        "disposition_date", "foreclosure_costs", "prop_preservation_and_repair_costs",
+        "asset_recovery_costs", "misc_holding_expenses", "holding_taxes", "net_sale_proceeds",
+        "credit_enhancement_proceeds", "repurchase_make_whole_proceeds", "other_foreclosure_proceeds",
+        "non_interest_bearing_upb", "principal_forgiveness_upb", "repurchase_make_whole_proceeds_flag",
+        "foreclosure_principal_write_off_amount", "servicing_activity_indicator"
+    ]
+    
+    dtypes = OrderedDict([
+        ("loan_id", "int64"),
+        ("monthly_reporting_period", "date"),
+        ("servicer", "category"),
+        ("interest_rate", "float64"),
+        ("current_actual_upb", "float64"),
+        ("loan_age", "float64"),
+        ("remaining_months_to_legal_maturity", "float64"),
+        ("adj_remaining_months_to_maturity", "float64"),
+        ("maturity_date", "date"),
+        ("msa", "float64"),
+        ("current_loan_delinquency_status", "int32"),
+        ("mod_flag", "category"),
+        ("zero_balance_code", "category"),
+        ("zero_balance_effective_date", "date"),
+        ("last_paid_installment_date", "date"),
+        ("foreclosed_after", "date"),
+        ("disposition_date", "date"),
+        ("foreclosure_costs", "float64"),
+        ("prop_preservation_and_repair_costs", "float64"),
+        ("asset_recovery_costs", "float64"),
+        ("misc_holding_expenses", "float64"),
+        ("holding_taxes", "float64"),
+        ("net_sale_proceeds", "float64"),
+        ("credit_enhancement_proceeds", "float64"),
+        ("repurchase_make_whole_proceeds", "float64"),
+        ("other_foreclosure_proceeds", "float64"),
+        ("non_interest_bearing_upb", "float64"),
+        ("principal_forgiveness_upb", "float64"),
+        ("repurchase_make_whole_proceeds_flag", "category"),
+        ("foreclosure_principal_write_off_amount", "float64"),
+        ("servicing_activity_indicator", "category")
+    ])
+
+    print(performance_path)
+    
+    return cudf.read_csv(performance_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
+
+def gpu_load_acquisition_csv(acquisition_path, **kwargs):
+    """ Loads acquisition data
+
+    Returns
+    -------
+    GPU DataFrame
+    """
+    
+    cols = [
+        'loan_id', 'orig_channel', 'seller_name', 'orig_interest_rate', 'orig_upb', 'orig_loan_term', 
+        'orig_date', 'first_pay_date', 'orig_ltv', 'orig_cltv', 'num_borrowers', 'dti', 'borrower_credit_score', 
+        'first_home_buyer', 'loan_purpose', 'property_type', 'num_units', 'occupancy_status', 'property_state',
+        'zip', 'mortgage_insurance_percent', 'product_type', 'coborrow_credit_score', 'mortgage_insurance_type', 
+        'relocation_mortgage_indicator'
+    ]
+    
+    dtypes = OrderedDict([
+        ("loan_id", "int64"),
+        ("orig_channel", "category"),
+        ("seller_name", "category"),
+        ("orig_interest_rate", "float64"),
+        ("orig_upb", "int64"),
+        ("orig_loan_term", "int64"),
+        ("orig_date", "date"),
+        ("first_pay_date", "date"),
+        ("orig_ltv", "float64"),
+        ("orig_cltv", "float64"),
+        ("num_borrowers", "float64"),
+        ("dti", "float64"),
+        ("borrower_credit_score", "float64"),
+        ("first_home_buyer", "category"),
+        ("loan_purpose", "category"),
+        ("property_type", "category"),
+        ("num_units", "int64"),
+        ("occupancy_status", "category"),
+        ("property_state", "category"),
+        ("zip", "int64"),
+        ("mortgage_insurance_percent", "float64"),
+        ("product_type", "category"),
+        ("coborrow_credit_score", "float64"),
+        ("mortgage_insurance_type", "float64"),
+        ("relocation_mortgage_indicator", "category")
+    ])
+    
+    print(acquisition_path)
+    
+    return cudf.read_csv(acquisition_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
+
+def gpu_load_names(col_path):
+    """ Loads names used for renaming the banks
+    
+    Returns
+    -------
+    GPU DataFrame
+    """
+
+    cols = [
+        'seller_name', 'new'
+    ]
+    
+    dtypes = OrderedDict([
+        ("seller_name", "category"),
+        ("new", "category"),
+    ])
+
+    return cudf.read_csv(col_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
+
+def create_ever_features(gdf, **kwargs):
+    everdf = gdf[['loan_id', 'current_loan_delinquency_status']]
+    everdf = everdf.groupby('loan_id', method='hash').max().reset_index()
+    del(gdf)
+    everdf['ever_30'] = (everdf['current_loan_delinquency_status'] >= 1).astype('int8')
+    everdf['ever_90'] = (everdf['current_loan_delinquency_status'] >= 3).astype('int8')
+    everdf['ever_180'] = (everdf['current_loan_delinquency_status'] >= 6).astype('int8')
+    everdf.drop_column('current_loan_delinquency_status')
+    return everdf
+
+def create_delinq_features(gdf, **kwargs):
+    delinq_gdf = gdf[['loan_id', 'monthly_reporting_period', 'current_loan_delinquency_status']]
+    del(gdf)
+    delinq_30 = delinq_gdf.query('current_loan_delinquency_status >= 1')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
+    delinq_30['delinquency_30'] = delinq_30['monthly_reporting_period']
+    delinq_30.drop_column('monthly_reporting_period')
+    delinq_90 = delinq_gdf.query('current_loan_delinquency_status >= 3')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
+    delinq_90['delinquency_90'] = delinq_90['monthly_reporting_period']
+    delinq_90.drop_column('monthly_reporting_period')
+    delinq_180 = delinq_gdf.query('current_loan_delinquency_status >= 6')[['loan_id', 'monthly_reporting_period']].groupby('loan_id', method='hash').min().reset_index()
+    delinq_180['delinquency_180'] = delinq_180['monthly_reporting_period']
+    delinq_180.drop_column('monthly_reporting_period')
+    del(delinq_gdf)
+    delinq_merge = delinq_30.merge(delinq_90, how='left', on=['loan_id'], type='hash')
+    delinq_merge['delinquency_90'] = delinq_merge['delinquency_90'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
+    delinq_merge = delinq_merge.merge(delinq_180, how='left', on=['loan_id'], type='hash')
+    delinq_merge['delinquency_180'] = delinq_merge['delinquency_180'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
+    del(delinq_30)
+    del(delinq_90)
+    del(delinq_180)
+    return delinq_merge
+
+def join_ever_delinq_features(everdf_tmp, delinq_merge, **kwargs):
+    everdf = everdf_tmp.merge(delinq_merge, on=['loan_id'], how='left', type='hash')
+    del(everdf_tmp)
+    del(delinq_merge)
+    everdf['delinquency_30'] = everdf['delinquency_30'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
+    everdf['delinquency_90'] = everdf['delinquency_90'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
+    everdf['delinquency_180'] = everdf['delinquency_180'].fillna(np.dtype('datetime64[ms]').type('1970-01-01').astype('datetime64[ms]'))
+    return everdf
+
+def create_joined_df(gdf, everdf, **kwargs):
+    test = gdf[['loan_id', 'monthly_reporting_period', 'current_loan_delinquency_status', 'current_actual_upb']]
+    del(gdf)
+    test['timestamp'] = test['monthly_reporting_period']
+    test.drop_column('monthly_reporting_period')
+    test['timestamp_month'] = test['timestamp'].dt.month
+    test['timestamp_year'] = test['timestamp'].dt.year
+    test['delinquency_12'] = test['current_loan_delinquency_status']
+    test.drop_column('current_loan_delinquency_status')
+    test['upb_12'] = test['current_actual_upb']
+    test.drop_column('current_actual_upb')
+    test['upb_12'] = test['upb_12'].fillna(999999999)
+    test['delinquency_12'] = test['delinquency_12'].fillna(-1)
+    
+    joined_df = test.merge(everdf, how='left', on=['loan_id'], type='hash')
+    del(everdf)
+    del(test)
+    
+    joined_df['ever_30'] = joined_df['ever_30'].fillna(-1)
+    joined_df['ever_90'] = joined_df['ever_90'].fillna(-1)
+    joined_df['ever_180'] = joined_df['ever_180'].fillna(-1)
+    joined_df['delinquency_30'] = joined_df['delinquency_30'].fillna(-1)
+    joined_df['delinquency_90'] = joined_df['delinquency_90'].fillna(-1)
+    joined_df['delinquency_180'] = joined_df['delinquency_180'].fillna(-1)
+    
+    joined_df['timestamp_year'] = joined_df['timestamp_year'].astype('int32')
+    joined_df['timestamp_month'] = joined_df['timestamp_month'].astype('int32')
+    
+    return joined_df
+
+def create_12_mon_features(joined_df, **kwargs):
+    testdfs = []
+    n_months = 12
+
+    for y in range(1, n_months + 1):
+        tmpdf = joined_df[['loan_id', 'timestamp_year', 'timestamp_month', 'delinquency_12', 'upb_12']]
+        tmpdf['josh_months'] = tmpdf['timestamp_year'] * 12 + tmpdf['timestamp_month']
+        tmpdf['josh_mody_n'] = ((tmpdf['josh_months'].astype('float64') - 24000 - y) / 12).floor()
+        tmpdf = tmpdf.groupby(['loan_id', 'josh_mody_n'], method='hash').agg({'delinquency_12': 'max','upb_12': 'min'}).reset_index()
+        tmpdf['delinquency_12'] = (tmpdf['delinquency_12']>3).astype('int32')
+        tmpdf['delinquency_12'] +=(tmpdf['upb_12']==0).astype('int32')
+        tmpdf['upb_12'] = tmpdf['upb_12']
+        tmpdf['timestamp_year'] = (((tmpdf['josh_mody_n'] * n_months) + 24000 + (y - 1)) / 12).floor().astype('int16')
+        tmpdf['timestamp_month'] = np.int8(y)
+        tmpdf.drop_column('josh_mody_n')
+        testdfs.append(tmpdf)
+        del(tmpdf)
+    del(joined_df)
+
+    return cudf.concat(testdfs)
+
+def combine_joined_12_mon(joined_df, testdf, **kwargs):
+    joined_df.drop_column('delinquency_12')
+    joined_df.drop_column('upb_12')
+    joined_df['timestamp_year'] = joined_df['timestamp_year'].astype('int16')
+    joined_df['timestamp_month'] = joined_df['timestamp_month'].astype('int8')
+    return joined_df.merge(testdf, how='left', on=['loan_id', 'timestamp_year', 'timestamp_month'], type='hash')
+
+def final_performance_delinquency(gdf, joined_df, **kwargs):
+    merged = null_workaround(gdf)
+    joined_df = null_workaround(joined_df)
+    merged['timestamp_month'] = merged['monthly_reporting_period'].dt.month
+    merged['timestamp_month'] = merged['timestamp_month'].astype('int8')
+    merged['timestamp_year'] = merged['monthly_reporting_period'].dt.year
+    merged['timestamp_year'] = merged['timestamp_year'].astype('int16')
+    merged = merged.merge(joined_df, how='left', on=['loan_id', 'timestamp_year', 'timestamp_month'], type='hash')
+    merged.drop_column('timestamp_year')
+    merged.drop_column('timestamp_month')
+    return merged
+
+def join_perf_acq_gdfs(perf, acq, **kwargs):
+    perf = null_workaround(perf)
+    acq = null_workaround(acq)
+    return perf.merge(acq, how='left', on=['loan_id'], type='hash')
+
+def last_mile_cleaning(df, **kwargs):
+    drop_list = [
+        'loan_id', 'orig_date', 'first_pay_date', 'seller_name',
+        'monthly_reporting_period', 'last_paid_installment_date', 'maturity_date', 'ever_30', 'ever_90', 'ever_180',
+        'delinquency_30', 'delinquency_90', 'delinquency_180', 'upb_12',
+        'zero_balance_effective_date','foreclosed_after', 'disposition_date','timestamp'
+    ]
+
+    for column in drop_list:
+        df.drop_column(column)
+    for col, dtype in df.dtypes.iteritems():
+        if str(dtype)=='category':
+            df[col] = df[col].cat.codes
+        df[col] = df[col].astype('float32')
+    df['delinquency_12'] = df['delinquency_12'] > 0
+    df['delinquency_12'] = df['delinquency_12'].fillna(False).astype('int32')
+    for column in df.columns:
+        df[column] = df[column].fillna(-1)
+    return df.to_arrow(preserve_index=False)
+
+def main():
+    parser = argparse.ArgumentParser("rapidssample")
+    parser.add_argument("--data_dir", type=str, help="location of data")
+    parser.add_argument("--num_gpu", type=int, help="Number of GPUs to use", default=1)
+    parser.add_argument("--part_count", type=int, help="Number of data files to train against", default=2)
+    parser.add_argument("--end_year", type=int, help="Year to end the data load", default=2000)
+    parser.add_argument("--cpu_predictor", type=str, help="Flag to use CPU for prediction", default='False')
+    parser.add_argument('-f', type=str, default='') # added for notebook execution scenarios
+    args = parser.parse_args()
+    data_dir = args.data_dir
+    num_gpu = args.num_gpu
+    part_count = args.part_count
+    end_year = args.end_year
+    cpu_predictor = args.cpu_predictor.lower() in ('yes', 'true', 't', 'y', '1')
+
+    if cpu_predictor:
+        print('Training with CPUs require num gpu = 1')
+        num_gpu = 1
+
+    print('data_dir = {0}'.format(data_dir))
+    print('num_gpu = {0}'.format(num_gpu))
+    print('part_count = {0}'.format(part_count))
+    print('end_year = {0}'.format(end_year))
+    print('cpu_predictor = {0}'.format(cpu_predictor))
+    
+    import subprocess
+
+    cmd = "hostname --all-ip-addresses"
+    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
+    output, error = process.communicate()
+    IPADDR = str(output.decode()).split()[0]
+    
+    cluster = LocalCUDACluster(ip=IPADDR,n_workers=num_gpu)
+    client = Client(cluster)
+    client
+    print(client.ncores())
+
+    # to download data for this notebook, visit https://rapidsai.github.io/demos/datasets/mortgage-data and update the following paths accordingly
+    acq_data_path = "{0}/acq".format(data_dir) #"/rapids/data/mortgage/acq"
+    perf_data_path = "{0}/perf".format(data_dir) #"/rapids/data/mortgage/perf"
+    col_names_path = "{0}/names.csv".format(data_dir) # "/rapids/data/mortgage/names.csv"
+    start_year = 2000
+
+    client
+    print('--->>> Workers used: {0}'.format(client.ncores()))
+
+    # NOTE: The ETL calculates additional features which are then dropped before creating the XGBoost DMatrix.
+    # This can be optimized to avoid calculating the dropped features.
+    print("Reading ...")
+    t1 = datetime.datetime.now()
+    gpu_dfs = []
+    gpu_time = 0
+    quarter = 1
+    year = start_year
+    count = 0
+    while year <= end_year:
+        for file in glob(os.path.join(perf_data_path + "/Performance_" + str(year) + "Q" + str(quarter) + "*")):
+            if count < part_count:
+                gpu_dfs.append(process_quarter_gpu(client, col_names_path, acq_data_path, year=year, quarter=quarter, perf_file=file))
+                count += 1
+                print('file: {0}'.format(file))
+                print('count: {0}'.format(count))
+        quarter += 1
+        if quarter == 5:
+            year += 1
+            quarter = 1
+            
+    wait(gpu_dfs)
+    t2 = datetime.datetime.now()
+    print("Reading time: {0}".format(str(t2-t1)))
+    print('--->>> Number of data parts: {0}'.format(len(gpu_dfs)))
+
+    dxgb_gpu_params = {
+        'nround':            100,
+        'max_depth':         8,
+        'max_leaves':        2**8,
+        'alpha':             0.9,
+        'eta':               0.1,
+        'gamma':             0.1,
+        'learning_rate':     0.1,
+        'subsample':         1,
+        'reg_lambda':        1,
+        'scale_pos_weight':  2,
+        'min_child_weight':  30,
+        'tree_method':       'gpu_hist',
+        'n_gpus':            1, 
+        'distributed_dask':  True,
+        'loss':              'ls',
+        'objective':         'reg:squarederror',
+        'max_features':      'auto',
+        'criterion':         'friedman_mse',
+        'grow_policy':       'lossguide',
+        'verbose':           True
+    }
+      
+    if cpu_predictor:
+        print('\n---->>>> Training using CPUs <<<<----\n')
+        dxgb_gpu_params['predictor'] = 'cpu_predictor'
+        dxgb_gpu_params['tree_method'] = 'hist'
+        dxgb_gpu_params['objective'] = 'reg:linear'
+        
+    else:
+        print('\n---->>>> Training using GPUs <<<<----\n')
+    
+    print('Training parameters are {0}'.format(dxgb_gpu_params))
+    
+    gpu_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[:part_count]]
+    gpu_dfs = [gpu_df for gpu_df in gpu_dfs]
+    wait(gpu_dfs)
+    
+    tmp_map = [(gpu_df, list(client.who_has(gpu_df).values())[0]) for gpu_df in gpu_dfs]
+    new_map = {}
+    for key, value in tmp_map:
+        if value not in new_map:
+            new_map[value] = [key]
+        else:
+            new_map[value].append(key)
+    
+    del(tmp_map)
+    gpu_dfs = []
+    for list_delayed in new_map.values():
+        gpu_dfs.append(delayed(cudf.concat)(list_delayed))
+    
+    del(new_map)
+    gpu_dfs = [(gpu_df[['delinquency_12']], gpu_df[delayed(list)(gpu_df.columns.difference(['delinquency_12']))]) for gpu_df in gpu_dfs]
+    gpu_dfs = [(gpu_df[0].persist(), gpu_df[1].persist()) for gpu_df in gpu_dfs]
+    
+    gpu_dfs = [dask.delayed(xgb.DMatrix)(gpu_df[1], gpu_df[0]) for gpu_df in gpu_dfs]
+    gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs]
+    gc.collect()
+    wait(gpu_dfs)
+
+    # TRAIN THE MODEL
+    labels = None
+    t1 = datetime.datetime.now()
+    bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_dfs, labels, num_boost_round=dxgb_gpu_params['nround'])
+    t2 = datetime.datetime.now()
+    print('\n---->>>> Training time: {0} <<<<----\n'.format(str(t2-t1)))
+    print('Exiting script')
+
+if __name__ == '__main__':
+    main()