diff --git a/contrib/RAPIDS/README.md b/contrib/RAPIDS/README.md
new file mode 100644
index 00000000..f8ab2cbd
--- /dev/null
+++ b/contrib/RAPIDS/README.md
@@ -0,0 +1,305 @@
+﻿## How to use the RAPIDS on AzureML materials
+### Setting up requirements
+The material requires the use of the Azure ML SDK and of the Jupyter Notebook Server to run the interactive execution. Please refer to instructions to [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") Follow the instructions under **Local Computer**, make sure to run the last step: <span style="font-family: Courier New;">pip install \<new package\></span> with <span style="font-family: Courier New;">new package = progressbar2  (pip install progressbar2)</span>
+  
+After following the directions, the user should end up setting a conda environment (<span style="font-family: Courier New;">myenv</span>)that can be activated in an Anaconda prompt
+
+The user would also require an Azure Subscription with a Machine Learning Services quota on the desired region for 24 nodes or more (to be able to select a vmSize with 4 GPUs as it is used on the Notebook) on the desired VM family ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)), the specific vmSize to be used within the chosen family would also need to be whitelisted for Machine Learning Services usage.  
+
+&nbsp;  
+### Getting and running the material 
+Clone the AzureML Notebooks repository in GitHub by running the following command on a local_directory: 
+
+* C:\local_directory>git clone https://github.com/Azure/MachineLearningNotebooks.git
+
+On a conda prompt navigate to the local directory, activate the conda environment (<span style="font-family: Courier New;">myenv</span>), where the Azure ML SDK was installed and launch Jupyter Notebook. 
+
+* (<span style="font-family: Courier New;">myenv</span>) C:\local_directory>jupyter notebook
+
+From the resulting browser at http://localhost:8888/tree, navigate to the master notebook: 
+
+* http://localhost:8888/tree/MachineLearningNotebooks/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
+
+&nbsp;  
+The following notebook will appear:  
+
+![](imgs/NotebookHome.png)
+
+&nbsp;  
+### Master Jupyter Notebook
+The notebook can be executed interactively step by step, by pressing the Run button (In a red circle in the above image.)
+
+The first couple of functional steps import the necessary AzureML libraries.  If you experience any errors please refer back to the [setup the environment.](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local "Local Computer Set Up") instructions.
+
+&nbsp;  
+#### Setting up a Workspace
+The following step gathers the information necessary to set up a workspace to execute the RAPIDS script. This needs to be done only once, or not at all if you already have a workspace you can use set up on the Azure Portal:
+
+![](imgs/WorkSpaceSetUp.png)
+
+
+It is important to be sure to set the correct values for the subscription\_id, resource\_group, workspace\_name, and region before executing the step. An example is:
+
+    subscription_id = os.environ.get("SUBSCRIPTION_ID", "1358e503-xxxx-4043-xxxx-65b83xxxx32d")
+    resource_group = os.environ.get("RESOURCE_GROUP", "AML-Rapids-Testing")
+    workspace_name = os.environ.get("WORKSPACE_NAME", "AML_Rapids_Tester")
+    workspace_region = os.environ.get("WORKSPACE_REGION", "West US 2")
+
+&nbsp;  
+The resource\_group and workspace_name could take any value, the region should match the region for which the subscription has the required Machine Learning Services node quota.
+
+The first time the code is executed it will redirect to the Azure Portal to validate subscription credentials. After the workspace is created, its related information is stored on a local file so that this step can be subsequently skipped. The immediate step will just load the saved workspace
+
+![](imgs/saved_workspace.png)
+
+Once a workspace has been created the user could skip its creation and just jump to this step. The configuration file resides in:
+
+* C:\local_directory\\MachineLearningNotebooks\contrib\RAPIDS\aml_config\config.json
+
+&nbsp;  
+#### Creating an AML Compute Target 
+Following step, creates an AML Compute Target 
+
+![](imgs/target_creation.png)
+
+Parameter vm\_size on function call AmlCompute.provisioning\_configuration() has to be a member of the VM families ([NC\_v3](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series),  [NC\_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv2-series), [ND](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#nd-series) or [ND_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ndv2-series-preview)) that are the ones provided with P40 or V100 GPUs, that are the ones supported by RAPIDS. In this particular case an Standard\_NC24s\_V2 was used.
+
+&nbsp;  
+If the output of running the step has an error of the form:
+
+![](imgs/targeterror1.png)
+
+It is an indication that even though the subscription has a node quota for VMs for that family, it does not have a node quota for Machine Learning Services for that family. 
+You will need to request an increase node quota for that family in that region for **Machine Learning Services**.
+
+&nbsp;  
+Another possible error is the following: 
+
+![](imgs/targeterror2.png)
+
+Which indicates that specified vmSize has not been whitelisted for usage on Machine Learning Services and a request to do so should be filled.
+
+The successful creation of the compute target would have an output like the following:
+
+![](imgs/targetsuccess.png)
+&nbsp;  
+#### RAPIDS script uploading and viewing
+The next step copies the RAPIDS script process_data.py, which is a slightly modified implementation of the [RAPIDS E2E example](https://github.com/rapidsai/notebooks/blob/master/mortgage/E2E.ipynb), into a script processing folder and it presents its contents to the user. (The script is discussed in the next section in detail). 
+If the user wants to use a different RAPIDS script, the references to the  <span style="font-family: Courier New;">process_data.py</span> script have to be changed
+
+![](imgs/scriptuploading.png)
+&nbsp;  
+#### Data Uploading
+The RAPIDS script loads and extracts features from the Fannie Mae’s Mortgage Dataset to train an XGBoost prediction model. The script uses two years of data
+
+The next few steps download and decompress the data and is made available to the  script as an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data).
+
+&nbsp;  
+The following functions are used to download and decompress the input data
+
+
+![](imgs/dcf1.png)
+![](imgs/dcf2.png)
+![](imgs/dcf3.png)
+![](imgs/dcf4.png)
+
+&nbsp;  
+The next step uses those functions to download locally file: 
+http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/mortgage_2000-2001.tgz'
+And to decompress it, into local folder path = .\mortgage_2000-2001
+The step takes several minutes, the intermediate outputs provide progress indicators.
+
+![](imgs/downamddecom.png)
+
+&nbsp;  
+The decompressed data should have the following structure:
+* .\mortgage_2000-2001\acq\Acquisition_<year>Q<num>.txt 
+* .\mortgage_2000-2001\perf\Performance_<year>Q<num>.txt 
+* .\mortgage_2000-2001\names.csv
+
+The data is divided in partitions that roughly correspond to yearly quarters. RAPIDS includes support for multi-node, multi-GPU deployments, enabling scaling up and out on much larger dataset sizes. The user will be able to verify that the number of partitions that the script is able to process increases with the number of GPUs used. The RAPIDS script is implemented for single-machine scenarios. An example supporting multiple nodes will be published later. 
+
+&nbsp;  
+The next step upload the data into the [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) under reference <span style="font-family: Courier New;">fileroot = mortgage_2000-2001</span>
+
+The step takes several minutes to load the data, the output provides a progress indicator.
+
+![](imgs/datastore.png)
+
+Once the data has been loaded into the Azure Machine LEarning Data Store, in subsequent run, the user can comment out the ds.upload line and just make reference to the <span style="font-family: Courier New;">mortgage_2000-2001</blog> data store reference  
+
+&nbsp;  
+#### Setting up required libraries and environment to run RAPIDS code
+There are two options to setup the environment to run RAPIDS code. The following steps shows how to ues a prebuilt conda environment. A recommended alternative is to specify a base Docker image and package dependencies. You can find sample code for that in the notebook.
+
+![](imgs/install2.png)
+
+&nbsp;  
+#### Wrapper function to submit the RAPIDS script as an Azure Machine Learning experiment
+
+The next step consists of the definition of a wrapper function to be used when the user attempts to run the RAPIDS script with different arguments. It takes as arguments: <span style="font-family: Times New Roman;">*cpu\_training*</span>;  a flag that indicates if the run is meant to be processed with CPU-only, <span style="font-family: Times New Roman;">*gpu\_count*</span>; the number of GPUs to be used if they are meant to be used and part_count: the number of data partitions to be used
+
+![](imgs/wrapper.png)
+
+&nbsp;  
+The core of the function resides in configuring the run by the instantiation of a ScriptRunConfig object, which defines the source_directory for the script to be executed, the name of the script and the arguments to be passed to the script.
+In addition to the wrapper function arguments, two other arguments are passed: <span style="font-family: Times New Roman;">*data\_dir*</span>, the directory where the data is stored and <span style="font-family: Times New Roman;">*end_year*</span> is the largest year to use partition from.
+
+
+As mentioned earlier the size of the data that can be processed increases with the number of gpus, in the function, dictionary <span style="font-family: Times New Roman;">*max\_gpu\_count\_data\_partition_mapping*</span> maps the maximum number of partitions that we empirically found that the system can handle given the number of GPUs used. The function throws a warning when the number of partitions for a given number of gpus exceeds the maximum but the script is still executed, however the user should expect an error as an out of memory situation would be encountered
+If the user wants to use a different RAPIDS script, the reference to the process_data.py script has to be changed
+
+&nbsp;  
+#### Submitting Experiments
+We are ready to submit experiments: launching the RAPIDS script with different sets of parameters.
+
+&nbsp;  
+The following couple of steps submit experiments under different conditions. 
+
+![](imgs/submission1.png)
+
+&nbsp;  
+The user can change variable num\_gpu between one and the number of GPUs supported by the chosen vmSize. Variable part\_count can take any value between 1 and 11, but if it exceeds the maximum for num_gpu, the run would result in an error
+
+&nbsp;  
+If the experiment is successfully submitted, it would be placed on a queue for processing, its status would appeared as Queued and an output like the following would appear 
+
+![](imgs/queue.png)
+
+&nbsp;  
+When the experiment starts running, its status would appeared as Running and the output would change to something like this:
+
+![](imgs/running.png)
+
+&nbsp;  
+#### Reproducing the performance gains plot results on the Blog Post
+When the run has finished successfully, its status would appeared as Completed and the output would change to something like this:
+
+&nbsp; 
+![](imgs/completed.png)
+
+Which is the output for an experiment run with three partitions and one GPU, notice that the reported processing time is 49.16 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+
+![](imgs/2GPUs.png)
+
+
+This output corresponds to a run with three partitions and two GPUs, notice that the reported processing time is 37.50 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/3GPUs.png)
+
+This output corresponds to an experiment run with three partitions and three GPUs, notice that the reported processing time is 24.40 seconds just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/4gpus.png)
+
+This output corresponds to an experiment run with three partitions and four GPUs, notice that the reported processing time is 23.33 seconds just as depicted on the performance gains plot on the blogpost
+
+&nbsp;  
+![](imgs/CPUBase.png)
+
+This output corresponds to an experiment run with three partitions and using only CPU, notice that the reported processing time is 9 minutes and 1.21 seconds or 541.21 second just as depicted on the performance gains plot on the blog post
+
+&nbsp;  
+![](imgs/OOM.png)
+
+This output corresponds to an experiment run with nine partitions and four GPUs, notice that the notebook throws a warning signaling that the number of partitions exceed the maximum that the system can handle with those many GPUs and the run ends up failing, hence having and status of Failed. 
+
+&nbsp;  
+##### Freeing Resources
+In the last step the notebook deletes the compute target. (This step is optional especially if the min_nodes in the cluster is set to 0 with which the cluster will scale down to 0 nodes when there is no usage.)
+
+![](imgs/clusterdelete.png)
+
+&nbsp;  
+### RAPIDS Script
+The Master Notebook runs experiments by launching a RAPIDS script with different sets of parameters. In this section, the RAPIDS script, process_data.py in the material, is analyzed
+
+The script first imports all the necessary libraries and parses the arguments passed by the Master Notebook.
+
+The all internal functions to be used by the script are defined.
+
+&nbsp;  
+#### Wrapper Auxiliary Functions:
+The below functions are wrappers for a configuration module for librmm, the RAPIDS Memory Manager python interface:
+
+![](imgs/wap1.png)![](imgs/wap2.png)
+
+&nbsp;  
+A couple of other functions are wrappers for the submission of jobs to the DASK client:
+
+![](imgs/wap3.png)
+![](imgs/wap4.png)
+
+&nbsp;  
+#### Data Loading Functions:
+The data is loaded through the use of the following three functions 
+
+![](imgs/DLF1.png)![](imgs/DLF2.png)![](imgs/DLF3.png)
+
+All three functions use library function cudf.read_csv(), cuDF version for the well known counterpart on Pandas.
+
+&nbsp;  
+#### Data Transformation and Feature Extraction Functions:
+The raw data is transformed and processed to extract features by joining, slicing, grouping, aggregating, factoring, etc, the original dataframes just as is done with Pandas. The following functions in the script are used for that purpose:
+![](imgs/fef1.png)![](imgs/fef2.png)![](imgs/fef3.png)![](imgs/fef4.png)![](imgs/fef5.png)
+
+![](imgs/fef6.png)![](imgs/fef7.png)![](imgs/fef8.png)![](imgs/fef9.png)
+
+&nbsp;  
+#### Main() Function
+The previous functions are used in the Main function to accomplish several steps: Set up the Dask client, do all ETL operations, set up and train an XGBoost model, the function also assigns which data needs to be processed by each Dask client
+
+&nbsp;  
+##### Setting Up DASK client:
+The following lines:
+
+![](imgs/daskini.png)
+
+&nbsp;  
+Initialize and set up a DASK client with a number of workers corresponding to the number of GPUs to be used on the run. A successful execution of the set up will result on the following output:
+
+![](imgs/daskoutput.png)
+
+##### All ETL functions are used on single calls to process\_quarter_gpu, one per data partition
+
+![](imgs/ETL.png)
+
+&nbsp;  
+##### Concentrating the data assigned to each DASK worker
+The partitions assigned to each worker are concatenated and set up for training.
+
+![](imgs/Dask2.png)
+
+&nbsp;  
+##### Setting Training Parameters
+The parameters used for the training of a gradient boosted decision tree model are set up in the following code block:
+![](imgs/PArameters.png)
+
+Notice how the parameters are modified when using the CPU-only mode.
+
+&nbsp;  
+##### Launching the training of a gradient boosted decision tree model using XGBoost.
+
+![](imgs/training.png)
+
+The outputs of the script can be observed in the master notebook as the script is executed
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
index 95027895..fc13e0fa 100644
--- a/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
+++ b/contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb
@@ -20,7 +20,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETL and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model in Azure.\n",
+    "The [RAPIDS](https://www.developer.nvidia.com/rapids) suite of software libraries from NVIDIA enables the execution of end-to-end data science and analytics pipelines entirely on GPUs. In many machine learning projects, a significant portion of the model training time is spent in setting up the data; this stage of the process is known as Extraction, Transformation and Loading, or ETL. By using the DataFrame API for ETLÂ and GPU-capable ML algorithms in RAPIDS, data preparation and training models can be done in GPU-accelerated end-to-end pipelines without incurring serialization costs between the pipeline stages. This notebook demonstrates how to use NVIDIA RAPIDS to prepare data and train model in Azure.\n",
     " \n",
     "In this notebook, we will do the following:\n",
     " \n",
@@ -62,6 +62,7 @@
    "source": [
     "import os\n",
     "from azureml.core import Workspace, Experiment\n",
+    "from azureml.core.conda_dependencies import CondaDependencies\n",
     "from azureml.core.compute import AmlCompute, ComputeTarget\n",
     "from azureml.data.data_reference import DataReference\n",
     "from azureml.core.runconfig import RunConfiguration\n",
@@ -210,21 +211,107 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This sample uses [Fannie Mae’s Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Refer to the 'Available mortgage datasets' section in [instructions](https://rapidsai.github.io/demos/datasets/mortgage-data) to get sample data.\n",
-    "\n",
-    "Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample."
+    "This sample uses [Fannie Mae's Single-Family Loan Performance Data](http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html). Once you obtain access to the data, you will need to make this data available in an [Azure Machine Learning Datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data), for use in this sample. The following code shows how to do that."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "<font color='red'>Important</font>: The following step assumes the data is uploaded to the Workspace's default data store under a folder named 'mortgagedata2000_01'. Note that uploading data to the Workspace's default data store is not necessary and the data can be referenced from any datastore, e.g., from Azure Blob or File service, once it is added as a datastore to the workspace. The path_on_datastore parameter needs to be updated, depending on where the data is available.  The directory where the data is available should have the following folder structure, as the process_data.py script expects this directory structure:\n",
-    "* _&lt;data directory>_/acq\n",
-    "* _&lt;data directory>_/perf\n",
-    "* _names.csv_\n",
+    "### Downloading Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<font color='red'>Important</font>: Python package progressbar2 is necessary to run the following cell. If it is not available in your environment where this notebook is running, please install it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import tarfile\n",
+    "import hashlib\n",
+    "from urllib.request import urlretrieve\n",
+    "from progressbar import ProgressBar\n",
     "\n",
-    "The 'acq' and 'perf' refer to directories containing data files. The _&lt;data directory>_ is the path specified in _path&#95;on&#95;datastore_ parameter in the step below."
+    "def validate_downloaded_data(path):\n",
+    "    if(os.path.isdir(path) and os.path.exists(path + '//names.csv')) :\n",
+    "        if(os.path.isdir(path + '//acq' ) and len(os.listdir(path + '//acq')) == 8):\n",
+    "            if(os.path.isdir(path + '//perf' ) and len(os.listdir(path + '//perf')) == 11):\n",
+    "                print(\"Data has been downloaded and decompressed at: {0}\".format(path))\n",
+    "                return True\n",
+    "    print(\"Data has not been downloaded and decompressed\")\n",
+    "    return False\n",
+    "\n",
+    "def show_progress(count, block_size, total_size):\n",
+    "    global pbar\n",
+    "    global processed\n",
+    "    \n",
+    "    if count == 0:\n",
+    "        pbar = ProgressBar(maxval=total_size)\n",
+    "        processed = 0\n",
+    "    \n",
+    "    processed += block_size\n",
+    "    processed = min(processed,total_size)\n",
+    "    pbar.update(processed)\n",
+    "\n",
+    "        \n",
+    "def download_file(fileroot):\n",
+    "    filename = fileroot + '.tgz'\n",
+    "    if(not os.path.exists(filename) or hashlib.md5(open(filename, 'rb').read()).hexdigest() != '82dd47135053303e9526c2d5c43befd5' ):\n",
+    "        url_format = 'http://rapidsai-data.s3-website.us-east-2.amazonaws.com/notebook-mortgage-data/{0}.tgz'\n",
+    "        url = url_format.format(fileroot)\n",
+    "        print(\"...Downloading file :{0}\".format(filename))\n",
+    "        urlretrieve(url, filename,show_progress)\n",
+    "        pbar.finish()\n",
+    "        print(\"...File :{0} finished downloading\".format(filename))\n",
+    "    else:\n",
+    "        print(\"...File :{0} has been downloaded already\".format(filename))\n",
+    "    return filename\n",
+    "\n",
+    "def decompress_file(filename,path):\n",
+    "    tar = tarfile.open(filename)\n",
+    "    print(\"...Getting information from {0} about files to decompress\".format(filename))\n",
+    "    members = tar.getmembers()\n",
+    "    numFiles = len(members)\n",
+    "    so_far = 0\n",
+    "    for member_info in members:\n",
+    "        tar.extract(member_info,path=path)\n",
+    "        show_progress(so_far, 1, numFiles)\n",
+    "        so_far += 1\n",
+    "    pbar.finish()\n",
+    "    print(\"...All {0} files have been decompressed\".format(numFiles))\n",
+    "    tar.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fileroot = 'mortgage_2000-2001'\n",
+    "path = '.\\\\{0}'.format(fileroot)\n",
+    "pbar = None\n",
+    "processed = 0\n",
+    "\n",
+    "if(not validate_downloaded_data(path)):\n",
+    "    print(\"Downloading and Decompressing Input Data\")\n",
+    "    filename = download_file(fileroot)\n",
+    "    decompress_file(filename,path)\n",
+    "    print(\"Input Data has been Downloaded and Decompressed\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Uploading Data to Workspace"
    ]
   },
   {
@@ -237,10 +324,10 @@
     "\n",
     "# download and uncompress data in a local directory before uploading to data store\n",
     "# directory specified in src_dir parameter below should have the acq, perf directories with data and names.csv file\n",
-    "# ds.upload(src_dir='<local directory that has data>', target_path='mortgagedata2000_01', overwrite=True, show_progress=True)\n",
+    "ds.upload(src_dir=path, target_path=fileroot, overwrite=True, show_progress=True)\n",
     "\n",
     "# data already uploaded to the datastore\n",
-    "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore='mortgagedata2000_01')"
+    "data_ref = DataReference(data_reference_name='data', datastore=ds, path_on_datastore=fileroot)"
    ]
   },
   {
@@ -254,7 +341,26 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "AML allows the option of using existing Docker images with prebuilt conda environments. The following step use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/)."
+    "RunConfiguration is used to submit jobs to Azure Machine Learning service. When creating RunConfiguration for a job, users can either \n",
+    "1. specify a Docker image with prebuilt conda environment and use it without any modifications to run the job, or \n",
+    "2. specify a Docker image as the base image and conda or pip packages as dependnecies to let AML build a new Docker image with a conda environment containing specified dependencies to use in the job\n",
+    "\n",
+    "The second option is the recommended option in AML. \n",
+    "The following steps have code for both options. You can pick the one that is more appropriate for your requirements. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Specify prebuilt conda environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following code shows how to use an existing image from [Docker Hub](https://hub.docker.com/r/rapidsai/rapidsai/) that has a prebuilt conda environment named 'rapids' when creating a RunConfiguration. Note that this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc. This package is automatically installed when you use 'Specify package dependencies' option and that is why it is the recommended option to create RunConfiguraiton in AML."
    ]
   },
   {
@@ -266,18 +372,52 @@
     "run_config = RunConfiguration()\n",
     "run_config.framework = 'python'\n",
     "run_config.environment.python.user_managed_dependencies = True\n",
-    "# use conda environment named 'rapids' available in the Docker image\n",
-    "# this conda environment does not include azureml-defaults package that is required for using AML functionality like metrics tracking, model management etc.\n",
     "run_config.environment.python.interpreter_path = '/conda/envs/rapids/bin/python'\n",
     "run_config.target = gpu_cluster_name\n",
     "run_config.environment.docker.enabled = True\n",
     "run_config.environment.docker.gpu_support = True\n",
-    "# if registry is not mentioned the image is pulled from Docker Hub\n",
-    "run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2_ubuntu16.04_root\"\n",
+    "run_config.environment.docker.base_image = \"rapidsai/rapidsai:cuda9.2-runtime-ubuntu18.04\"\n",
+    "# run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
+    "# run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
+    "# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
     "run_config.environment.spark.precache_packages = False\n",
     "run_config.data_references={'data':data_ref.to_config()}"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Specify package dependencies"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The following code shows how to list package dependencies in a conda environment definition file (rapids.yml) when creating a RunConfiguration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# cd = CondaDependencies(conda_dependencies_file_path='rapids.yml')\n",
+    "# run_config = RunConfiguration(conda_dependencies=cd)\n",
+    "# run_config.framework = 'python'\n",
+    "# run_config.target = gpu_cluster_name\n",
+    "# run_config.environment.docker.enabled = True\n",
+    "# run_config.environment.docker.gpu_support = True\n",
+    "# run_config.environment.docker.base_image = \"<image>\"\n",
+    "# run_config.environment.docker.base_image_registry.address = '<registry_url>' # not required if the base_image is in Docker hub\n",
+    "# run_config.environment.docker.base_image_registry.username = '<user_name>' # needed only for private images\n",
+    "# run_config.environment.docker.base_image_registry.password = '<password>' # needed only for private images\n",
+    "# run_config.environment.spark.precache_packages = False\n",
+    "# run_config.data_references={'data':data_ref.to_config()}"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -293,17 +433,24 @@
    "source": [
     "# parameter cpu_predictor indicates if training should be done on CPU. If set to true, GPUs are used *only* for ETL and *not* for training\n",
     "# parameter num_gpu indicates number of GPUs to use among the GPUs available in the VM for ETL and if cpu_predictor is false, for training as well \n",
-    "def run_rapids_experiment(cpu_training, gpu_count):\n",
+    "def run_rapids_experiment(cpu_training, gpu_count, part_count):\n",
     "    # any value between 1-4 is allowed here depending the type of VMs available in gpu_cluster\n",
     "    if gpu_count not in [1, 2, 3, 4]:\n",
     "        raise Exception('Value specified for the number of GPUs to use {0} is invalid'.format(gpu_count))\n",
     "\n",
     "    # following data partition mapping is empirical (specific to GPUs used and current data partitioning scheme) and may need to be tweaked\n",
-    "    gpu_count_data_partition_mapping = {1: 2, 2: 4, 3: 5, 4: 7}\n",
-    "    part_count = gpu_count_data_partition_mapping[gpu_count]\n",
-    "\n",
+    "    max_gpu_count_data_partition_mapping = {1: 3, 2: 4, 3: 6, 4: 8}\n",
+    "    \n",
+    "    if part_count > max_gpu_count_data_partition_mapping[gpu_count]:\n",
+    "        print(\"Too many partitions for the number of GPUs, exceeding memory threshold\")\n",
+    "        \n",
+    "    if part_count > 11:\n",
+    "        print(\"Warning: Maximum number of partitions available is 11\")\n",
+    "        part_count = 11\n",
+    "        \n",
     "    end_year = 2000\n",
-    "    if gpu_count > 2:\n",
+    "    \n",
+    "    if part_count > 4:\n",
     "        end_year = 2001 # use more data with more GPUs\n",
     "\n",
     "    src = ScriptRunConfig(source_directory=scripts_folder, \n",
@@ -317,7 +464,8 @@
     "\n",
     "    exp = Experiment(ws, 'rapidstest')\n",
     "    run = exp.submit(config=src)\n",
-    "    RunDetails(run).show()"
+    "    RunDetails(run).show()\n",
+    "    return run"
    ]
   },
   {
@@ -335,9 +483,10 @@
    "source": [
     "cpu_predictor = False\n",
     "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
-    "num_gpu = 1 \n",
+    "num_gpu = 1\n",
+    "data_part_count = 1\n",
     "# train using CPU, use GPU for both ETL and training\n",
-    "run_rapids_experiment(cpu_predictor, num_gpu)"
+    "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
    ]
   },
   {
@@ -358,8 +507,9 @@
     "cpu_predictor = True\n",
     "# the value for num_gpu should be less than or equal to the number of GPUs available in the VM\n",
     "num_gpu = 1\n",
+    "data_part_count = 1\n",
     "# train using CPU, use GPU for ETL\n",
-    "run_rapids_experiment(cpu_predictor, num_gpu)"
+    "run = run_rapids_experiment(cpu_predictor, num_gpu, data_part_count)"
    ]
   },
   {
diff --git a/contrib/RAPIDS/imgs/2GPUs.png b/contrib/RAPIDS/imgs/2GPUs.png
new file mode 100644
index 00000000..07e38374
Binary files /dev/null and b/contrib/RAPIDS/imgs/2GPUs.png differ
diff --git a/contrib/RAPIDS/imgs/3GPUs.png b/contrib/RAPIDS/imgs/3GPUs.png
new file mode 100644
index 00000000..80e44c4e
Binary files /dev/null and b/contrib/RAPIDS/imgs/3GPUs.png differ
diff --git a/contrib/RAPIDS/imgs/4gpus.png b/contrib/RAPIDS/imgs/4gpus.png
new file mode 100644
index 00000000..28411cdd
Binary files /dev/null and b/contrib/RAPIDS/imgs/4gpus.png differ
diff --git a/contrib/RAPIDS/imgs/CPUBase.png b/contrib/RAPIDS/imgs/CPUBase.png
new file mode 100644
index 00000000..f84869de
Binary files /dev/null and b/contrib/RAPIDS/imgs/CPUBase.png differ
diff --git a/contrib/RAPIDS/imgs/DLF1.png b/contrib/RAPIDS/imgs/DLF1.png
new file mode 100644
index 00000000..673454fe
Binary files /dev/null and b/contrib/RAPIDS/imgs/DLF1.png differ
diff --git a/contrib/RAPIDS/imgs/DLF2.png b/contrib/RAPIDS/imgs/DLF2.png
new file mode 100644
index 00000000..ea45be22
Binary files /dev/null and b/contrib/RAPIDS/imgs/DLF2.png differ
diff --git a/contrib/RAPIDS/imgs/DLF3.png b/contrib/RAPIDS/imgs/DLF3.png
new file mode 100644
index 00000000..2cf0ab9d
Binary files /dev/null and b/contrib/RAPIDS/imgs/DLF3.png differ
diff --git a/contrib/RAPIDS/imgs/Dask2.png b/contrib/RAPIDS/imgs/Dask2.png
new file mode 100644
index 00000000..2a4c9248
Binary files /dev/null and b/contrib/RAPIDS/imgs/Dask2.png differ
diff --git a/contrib/RAPIDS/imgs/ETL.png b/contrib/RAPIDS/imgs/ETL.png
new file mode 100644
index 00000000..2b8001d1
Binary files /dev/null and b/contrib/RAPIDS/imgs/ETL.png differ
diff --git a/contrib/RAPIDS/imgs/NotebookHome.png b/contrib/RAPIDS/imgs/NotebookHome.png
new file mode 100644
index 00000000..16b45760
Binary files /dev/null and b/contrib/RAPIDS/imgs/NotebookHome.png differ
diff --git a/contrib/RAPIDS/imgs/OOM.png b/contrib/RAPIDS/imgs/OOM.png
new file mode 100644
index 00000000..0121f1b0
Binary files /dev/null and b/contrib/RAPIDS/imgs/OOM.png differ
diff --git a/contrib/RAPIDS/imgs/PArameters.png b/contrib/RAPIDS/imgs/PArameters.png
new file mode 100644
index 00000000..6279164d
Binary files /dev/null and b/contrib/RAPIDS/imgs/PArameters.png differ
diff --git a/contrib/RAPIDS/imgs/WorkSpaceSetUp.png b/contrib/RAPIDS/imgs/WorkSpaceSetUp.png
new file mode 100644
index 00000000..fb09d2f0
Binary files /dev/null and b/contrib/RAPIDS/imgs/WorkSpaceSetUp.png differ
diff --git a/contrib/RAPIDS/imgs/clusterdelete.png b/contrib/RAPIDS/imgs/clusterdelete.png
new file mode 100644
index 00000000..634b92d6
Binary files /dev/null and b/contrib/RAPIDS/imgs/clusterdelete.png differ
diff --git a/contrib/RAPIDS/imgs/completed.png b/contrib/RAPIDS/imgs/completed.png
new file mode 100644
index 00000000..ddf04e20
Binary files /dev/null and b/contrib/RAPIDS/imgs/completed.png differ
diff --git a/contrib/RAPIDS/imgs/daskini.png b/contrib/RAPIDS/imgs/daskini.png
new file mode 100644
index 00000000..f1cd700d
Binary files /dev/null and b/contrib/RAPIDS/imgs/daskini.png differ
diff --git a/contrib/RAPIDS/imgs/daskoutput.png b/contrib/RAPIDS/imgs/daskoutput.png
new file mode 100644
index 00000000..b69d988d
Binary files /dev/null and b/contrib/RAPIDS/imgs/daskoutput.png differ
diff --git a/contrib/RAPIDS/imgs/datastore.png b/contrib/RAPIDS/imgs/datastore.png
new file mode 100644
index 00000000..0a5b3289
Binary files /dev/null and b/contrib/RAPIDS/imgs/datastore.png differ
diff --git a/contrib/RAPIDS/imgs/dcf1.png b/contrib/RAPIDS/imgs/dcf1.png
new file mode 100644
index 00000000..173b2dc9
Binary files /dev/null and b/contrib/RAPIDS/imgs/dcf1.png differ
diff --git a/contrib/RAPIDS/imgs/dcf2.png b/contrib/RAPIDS/imgs/dcf2.png
new file mode 100644
index 00000000..4c890759
Binary files /dev/null and b/contrib/RAPIDS/imgs/dcf2.png differ
diff --git a/contrib/RAPIDS/imgs/dcf3.png b/contrib/RAPIDS/imgs/dcf3.png
new file mode 100644
index 00000000..58ba3be4
Binary files /dev/null and b/contrib/RAPIDS/imgs/dcf3.png differ
diff --git a/contrib/RAPIDS/imgs/dcf4.png b/contrib/RAPIDS/imgs/dcf4.png
new file mode 100644
index 00000000..086815f1
Binary files /dev/null and b/contrib/RAPIDS/imgs/dcf4.png differ
diff --git a/contrib/RAPIDS/imgs/downamddecom.png b/contrib/RAPIDS/imgs/downamddecom.png
new file mode 100644
index 00000000..f02b5b89
Binary files /dev/null and b/contrib/RAPIDS/imgs/downamddecom.png differ
diff --git a/contrib/RAPIDS/imgs/fef1.png b/contrib/RAPIDS/imgs/fef1.png
new file mode 100644
index 00000000..e15ee2d3
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef1.png differ
diff --git a/contrib/RAPIDS/imgs/fef2.png b/contrib/RAPIDS/imgs/fef2.png
new file mode 100644
index 00000000..dd5426ee
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef2.png differ
diff --git a/contrib/RAPIDS/imgs/fef3.png b/contrib/RAPIDS/imgs/fef3.png
new file mode 100644
index 00000000..5fe4ecb2
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef3.png differ
diff --git a/contrib/RAPIDS/imgs/fef4.png b/contrib/RAPIDS/imgs/fef4.png
new file mode 100644
index 00000000..0883617e
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef4.png differ
diff --git a/contrib/RAPIDS/imgs/fef5.png b/contrib/RAPIDS/imgs/fef5.png
new file mode 100644
index 00000000..ec3e4428
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef5.png differ
diff --git a/contrib/RAPIDS/imgs/fef6.png b/contrib/RAPIDS/imgs/fef6.png
new file mode 100644
index 00000000..295a86d5
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef6.png differ
diff --git a/contrib/RAPIDS/imgs/fef7.png b/contrib/RAPIDS/imgs/fef7.png
new file mode 100644
index 00000000..1281df0b
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef7.png differ
diff --git a/contrib/RAPIDS/imgs/fef8.png b/contrib/RAPIDS/imgs/fef8.png
new file mode 100644
index 00000000..49f096d5
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef8.png differ
diff --git a/contrib/RAPIDS/imgs/fef9.png b/contrib/RAPIDS/imgs/fef9.png
new file mode 100644
index 00000000..8f5abbce
Binary files /dev/null and b/contrib/RAPIDS/imgs/fef9.png differ
diff --git a/contrib/RAPIDS/imgs/install2.png b/contrib/RAPIDS/imgs/install2.png
new file mode 100644
index 00000000..24f3d29c
Binary files /dev/null and b/contrib/RAPIDS/imgs/install2.png differ
diff --git a/contrib/RAPIDS/imgs/installation.png b/contrib/RAPIDS/imgs/installation.png
new file mode 100644
index 00000000..8b06c540
Binary files /dev/null and b/contrib/RAPIDS/imgs/installation.png differ
diff --git a/contrib/RAPIDS/imgs/queue.png b/contrib/RAPIDS/imgs/queue.png
new file mode 100644
index 00000000..ab51a1e5
Binary files /dev/null and b/contrib/RAPIDS/imgs/queue.png differ
diff --git a/contrib/RAPIDS/imgs/running.png b/contrib/RAPIDS/imgs/running.png
new file mode 100644
index 00000000..13a327fe
Binary files /dev/null and b/contrib/RAPIDS/imgs/running.png differ
diff --git a/contrib/RAPIDS/imgs/saved_workspace.png b/contrib/RAPIDS/imgs/saved_workspace.png
new file mode 100644
index 00000000..fdc1919f
Binary files /dev/null and b/contrib/RAPIDS/imgs/saved_workspace.png differ
diff --git a/contrib/RAPIDS/imgs/scriptuploading.png b/contrib/RAPIDS/imgs/scriptuploading.png
new file mode 100644
index 00000000..d0726784
Binary files /dev/null and b/contrib/RAPIDS/imgs/scriptuploading.png differ
diff --git a/contrib/RAPIDS/imgs/submission1.png b/contrib/RAPIDS/imgs/submission1.png
new file mode 100644
index 00000000..d07e0889
Binary files /dev/null and b/contrib/RAPIDS/imgs/submission1.png differ
diff --git a/contrib/RAPIDS/imgs/target_creation.png b/contrib/RAPIDS/imgs/target_creation.png
new file mode 100644
index 00000000..b98d623a
Binary files /dev/null and b/contrib/RAPIDS/imgs/target_creation.png differ
diff --git a/contrib/RAPIDS/imgs/targeterror1.png b/contrib/RAPIDS/imgs/targeterror1.png
new file mode 100644
index 00000000..d1c2884a
Binary files /dev/null and b/contrib/RAPIDS/imgs/targeterror1.png differ
diff --git a/contrib/RAPIDS/imgs/targeterror2.png b/contrib/RAPIDS/imgs/targeterror2.png
new file mode 100644
index 00000000..69a3d9b8
Binary files /dev/null and b/contrib/RAPIDS/imgs/targeterror2.png differ
diff --git a/contrib/RAPIDS/imgs/targetsuccess.png b/contrib/RAPIDS/imgs/targetsuccess.png
new file mode 100644
index 00000000..301ebefb
Binary files /dev/null and b/contrib/RAPIDS/imgs/targetsuccess.png differ
diff --git a/contrib/RAPIDS/imgs/training.png b/contrib/RAPIDS/imgs/training.png
new file mode 100644
index 00000000..d047a9ce
Binary files /dev/null and b/contrib/RAPIDS/imgs/training.png differ
diff --git a/contrib/RAPIDS/imgs/wap1.png b/contrib/RAPIDS/imgs/wap1.png
new file mode 100644
index 00000000..1d336565
Binary files /dev/null and b/contrib/RAPIDS/imgs/wap1.png differ
diff --git a/contrib/RAPIDS/imgs/wap2.png b/contrib/RAPIDS/imgs/wap2.png
new file mode 100644
index 00000000..245458a5
Binary files /dev/null and b/contrib/RAPIDS/imgs/wap2.png differ
diff --git a/contrib/RAPIDS/imgs/wap3.png b/contrib/RAPIDS/imgs/wap3.png
new file mode 100644
index 00000000..8d5553da
Binary files /dev/null and b/contrib/RAPIDS/imgs/wap3.png differ
diff --git a/contrib/RAPIDS/imgs/wap4.png b/contrib/RAPIDS/imgs/wap4.png
new file mode 100644
index 00000000..56ce1a10
Binary files /dev/null and b/contrib/RAPIDS/imgs/wap4.png differ
diff --git a/contrib/RAPIDS/imgs/wrapper.png b/contrib/RAPIDS/imgs/wrapper.png
new file mode 100644
index 00000000..0f4ab763
Binary files /dev/null and b/contrib/RAPIDS/imgs/wrapper.png differ
diff --git a/contrib/RAPIDS/process_data.py b/contrib/RAPIDS/process_data.py
index a7600b40..474cc83a 100644
--- a/contrib/RAPIDS/process_data.py
+++ b/contrib/RAPIDS/process_data.py
@@ -1,9 +1,9 @@
-# License Info: https://github.com/rapidsai/notebooks/blob/master/LICENSE
 import numpy as np
 import datetime
 import dask_xgboost as dxgb_gpu
 import dask
 import dask_cudf
+from dask_cuda import LocalCUDACluster
 from dask.delayed import delayed
 from dask.distributed import Client, wait
 import xgboost as xgb
@@ -15,53 +15,6 @@ from glob import glob
 import os
 import argparse
 
-parser = argparse.ArgumentParser("rapidssample")
-parser.add_argument("--data_dir", type=str, help="location of data")
-parser.add_argument("--num_gpu", type=int, help="Number of GPUs to use", default=1)
-parser.add_argument("--part_count", type=int, help="Number of data files to train against", default=2)
-parser.add_argument("--end_year", type=int, help="Year to end the data load", default=2000)
-parser.add_argument("--cpu_predictor", type=str, help="Flag to use CPU for prediction", default='False')
-parser.add_argument('-f', type=str, default='') # added for notebook execution scenarios
-args = parser.parse_args()
-data_dir = args.data_dir
-num_gpu = args.num_gpu
-part_count = args.part_count
-end_year = args.end_year
-cpu_predictor = args.cpu_predictor.lower() in ('yes', 'true', 't', 'y', '1')
-
-print('data_dir = {0}'.format(data_dir))
-print('num_gpu = {0}'.format(num_gpu))
-print('part_count = {0}'.format(part_count))
-part_count = part_count + 1 # adding one because the usage below is not inclusive
-print('end_year = {0}'.format(end_year))
-print('cpu_predictor = {0}'.format(cpu_predictor))
-
-import subprocess
-
-cmd = "hostname --all-ip-addresses"
-process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
-output, error = process.communicate()
-IPADDR = str(output.decode()).split()[0]
-print('IPADDR is {0}'.format(IPADDR))
-
-cmd = "/rapids/notebooks/utils/dask-setup.sh 0"
-process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
-output, error = process.communicate()
-
-cmd = "/rapids/notebooks/utils/dask-setup.sh rapids " + str(num_gpu) + " 8786 8787 8790 " + str(IPADDR) + " MASTER"
-process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
-output, error = process.communicate()
-
-print(output.decode())
-
-import dask
-from dask.delayed import delayed
-from dask.distributed import Client, wait
-
-_client = IPADDR + str(":8786")
-
-client = dask.distributed.Client(_client)
-
 def initialize_rmm_pool():
     from librmm_cffi import librmm_config as rmm_cfg
 
@@ -81,15 +34,17 @@ def run_dask_task(func, **kwargs):
     task = func(**kwargs)
     return task
 
-def process_quarter_gpu(year=2000, quarter=1, perf_file=""):
+def process_quarter_gpu(client, col_names_path, acq_data_path, year=2000, quarter=1, perf_file=""):
+    dask_client = client
     ml_arrays = run_dask_task(delayed(run_gpu_workflow),
+                                          col_path=col_names_path,
+                                          acq_path=acq_data_path,
                                           quarter=quarter,
                                           year=year,
                                           perf_file=perf_file)
-    return client.compute(ml_arrays,
+    return dask_client.compute(ml_arrays,
                           optimize_graph=False,
-                          fifo_timeout="0ms"
-                         )
+                          fifo_timeout="0ms")
 
 def null_workaround(df, **kwargs):
     for column, data_type in df.dtypes.items():
@@ -99,9 +54,9 @@ def null_workaround(df, **kwargs):
             df[column] = df[column].fillna(-1)
     return df
 
-def run_gpu_workflow(quarter=1, year=2000, perf_file="", **kwargs):
-    names = gpu_load_names()
-    acq_gdf = gpu_load_acquisition_csv(acquisition_path= acq_data_path + "/Acquisition_"
+def run_gpu_workflow(col_path, acq_path, quarter=1, year=2000, perf_file="", **kwargs):
+    names = gpu_load_names(col_path=col_path)
+    acq_gdf = gpu_load_acquisition_csv(acquisition_path= acq_path + "/Acquisition_"
                                       + str(year) + "Q" + str(quarter) + ".txt")
     acq_gdf = acq_gdf.merge(names, how='left', on=['seller_name'])
     acq_gdf.drop_column('seller_name')
@@ -231,7 +186,7 @@ def gpu_load_acquisition_csv(acquisition_path, **kwargs):
     
     return cudf.read_csv(acquisition_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
 
-def gpu_load_names(**kwargs):
+def gpu_load_names(col_path):
     """ Loads names used for renaming the banks
     
     Returns
@@ -248,7 +203,7 @@ def gpu_load_names(**kwargs):
         ("new", "category"),
     ])
 
-    return cudf.read_csv(col_names_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
+    return cudf.read_csv(col_path, names=cols, delimiter='|', dtype=list(dtypes.values()), skiprows=1)
 
 def create_ever_features(gdf, **kwargs):
     everdf = gdf[['loan_id', 'current_loan_delinquency_status']]
@@ -384,117 +339,157 @@ def last_mile_cleaning(df, **kwargs):
     df['delinquency_12'] = df['delinquency_12'].fillna(False).astype('int32')
     for column in df.columns:
         df[column] = df[column].fillna(-1)
-    return df.to_arrow(index=False)
+    return df.to_arrow(preserve_index=False)
 
+def main():
+    #print('XGBOOST_BUILD_DOC is ' + os.environ['XGBOOST_BUILD_DOC'])
+    parser = argparse.ArgumentParser("rapidssample")
+    parser.add_argument("--data_dir", type=str, help="location of data")
+    parser.add_argument("--num_gpu", type=int, help="Number of GPUs to use", default=1)
+    parser.add_argument("--part_count", type=int, help="Number of data files to train against", default=2)
+    parser.add_argument("--end_year", type=int, help="Year to end the data load", default=2000)
+    parser.add_argument("--cpu_predictor", type=str, help="Flag to use CPU for prediction", default='False')
+    parser.add_argument('-f', type=str, default='') # added for notebook execution scenarios
+    args = parser.parse_args()
+    data_dir = args.data_dir
+    num_gpu = args.num_gpu
+    part_count = args.part_count
+    end_year = args.end_year
+    cpu_predictor = args.cpu_predictor.lower() in ('yes', 'true', 't', 'y', '1')
+
+    if cpu_predictor:
+        print('Training with CPUs require num gpu = 1')
+        num_gpu = 1
+
+    print('data_dir = {0}'.format(data_dir))
+    print('num_gpu = {0}'.format(num_gpu))
+    print('part_count = {0}'.format(part_count))
+    #part_count = part_count + 1 # adding one because the usage below is not inclusive
+    print('end_year = {0}'.format(end_year))
+    print('cpu_predictor = {0}'.format(cpu_predictor))
+    
+    import subprocess
+
+    cmd = "hostname --all-ip-addresses"
+    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
+    output, error = process.communicate()
+    IPADDR = str(output.decode()).split()[0]
+    
+    cluster = LocalCUDACluster(ip=IPADDR,n_workers=num_gpu)
+    client = Client(cluster)
+    client
+    print(client.ncores())
 
 # to download data for this notebook, visit https://rapidsai.github.io/demos/datasets/mortgage-data and update the following paths accordingly
-acq_data_path = "{0}/acq".format(data_dir) #"/rapids/data/mortgage/acq"
-perf_data_path = "{0}/perf".format(data_dir) #"/rapids/data/mortgage/perf"
-col_names_path = "{0}/names.csv".format(data_dir) # "/rapids/data/mortgage/names.csv"
-start_year = 2000
+    acq_data_path = "{0}/acq".format(data_dir) #"/rapids/data/mortgage/acq"
+    perf_data_path = "{0}/perf".format(data_dir) #"/rapids/data/mortgage/perf"
+    col_names_path = "{0}/names.csv".format(data_dir) # "/rapids/data/mortgage/names.csv"
+    start_year = 2000
 #end_year = 2000 # end_year is inclusive -- converted to parameter
 #part_count = 2 # the number of data files to train against -- converted to parameter
 
-client.run(initialize_rmm_pool)
-
+    client.run(initialize_rmm_pool)
+    client
+    print(client.ncores())
 # NOTE: The ETL calculates additional features which are then dropped before creating the XGBoost DMatrix.
 # This can be optimized to avoid calculating the dropped features.
-print("Reading ...")
-t1 = datetime.datetime.now()
-gpu_dfs = []
-gpu_time = 0
-quarter = 1
-year = start_year
-count = 0
-while year <= end_year:
-    for file in glob(os.path.join(perf_data_path + "/Performance_" + str(year) + "Q" + str(quarter) + "*")):
-        if count < part_count:
-            gpu_dfs.append(process_quarter_gpu(year=year, quarter=quarter, perf_file=file))
-            count += 1
-            print('file: {0}'.format(file))
-            print('count: {0}'.format(count))
-    quarter += 1
-    if quarter == 5:
-        year += 1
-        quarter = 1
+    print("Reading ...")
+    t1 = datetime.datetime.now()
+    gpu_dfs = []
+    gpu_time = 0
+    quarter = 1
+    year = start_year
+    count = 0
+    while year <= end_year:
+        for file in glob(os.path.join(perf_data_path + "/Performance_" + str(year) + "Q" + str(quarter) + "*")):
+            if count < part_count:
+                gpu_dfs.append(process_quarter_gpu(client, col_names_path, acq_data_path, year=year, quarter=quarter, perf_file=file))
+                count += 1
+                print('file: {0}'.format(file))
+                print('count: {0}'.format(count))
+        quarter += 1
+        if quarter == 5:
+            year += 1
+            quarter = 1
+            
+    wait(gpu_dfs)
+    t2 = datetime.datetime.now()
+    print("Reading time ...")
+    print(t2-t1)
+    print('len(gpu_dfs) is {0}'.format(len(gpu_dfs)))
+    
+    client.run(cudf._gdf.rmm_finalize)
+    client.run(initialize_rmm_no_pool)
+    client
+    print(client.ncores())
+    dxgb_gpu_params = {
+        'nround':            100,
+        'max_depth':         8,
+        'max_leaves':        2**8,
+        'alpha':             0.9,
+        'eta':               0.1,
+        'gamma':             0.1,
+        'learning_rate':     0.1,
+        'subsample':         1,
+        'reg_lambda':        1,
+        'scale_pos_weight':  2,
+        'min_child_weight':  30,
+        'tree_method':       'gpu_hist',
+        'n_gpus':            1, 
+        'distributed_dask':  True,
+        'loss':              'ls',
+        'objective':         'gpu:reg:linear',
+        'max_features':      'auto',
+        'criterion':         'friedman_mse',
+        'grow_policy':       'lossguide',
+        'verbose':           True
+    }
+      
+    if cpu_predictor:
+        print('Training using CPUs')
+        dxgb_gpu_params['predictor'] = 'cpu_predictor'
+        dxgb_gpu_params['tree_method'] = 'hist'
+        dxgb_gpu_params['objective'] = 'reg:linear'
         
-wait(gpu_dfs)
-t2 = datetime.datetime.now()
-print("Reading time ...")
-print(t2-t1)
-print('len(gpu_dfs) is {0}'.format(len(gpu_dfs)))
-
-client.run(cudf._gdf.rmm_finalize)
-client.run(initialize_rmm_no_pool)
-
-dxgb_gpu_params = {
-    'nround':            100,
-    'max_depth':         8,
-    'max_leaves':        2**8,
-    'alpha':             0.9,
-    'eta':               0.1,
-    'gamma':             0.1,
-    'learning_rate':     0.1,
-    'subsample':         1,
-    'reg_lambda':        1,
-    'scale_pos_weight':  2,
-    'min_child_weight':  30,
-    'tree_method':       'gpu_hist',
-    'n_gpus':            1, 
-    'distributed_dask':  True,
-    'loss':              'ls',
-    'objective':         'gpu:reg:linear',
-    'max_features':      'auto',
-    'criterion':         'friedman_mse',
-    'grow_policy':       'lossguide',
-    'verbose':           True
-}
-  
-if cpu_predictor:
-    print('Training using CPUs')
-    dxgb_gpu_params['predictor'] = 'cpu_predictor'
-    dxgb_gpu_params['tree_method'] = 'hist'
-    dxgb_gpu_params['objective'] = 'reg:linear'
-    
-else:
-    print('Training using GPUs')
-
-print('Training parameters are {0}'.format(dxgb_gpu_params))
-
-gpu_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[:part_count]]
-    
-gpu_dfs = [gpu_df for gpu_df in gpu_dfs]
-
-wait(gpu_dfs)
-tmp_map = [(gpu_df, list(client.who_has(gpu_df).values())[0]) for gpu_df in gpu_dfs]
-new_map = {}
-for key, value in tmp_map:
-    if value not in new_map:
-        new_map[value] = [key]
     else:
-        new_map[value].append(key)
+        print('Training using GPUs')
+    
+    print('Training parameters are {0}'.format(dxgb_gpu_params))
+    
+    gpu_dfs = [delayed(DataFrame.from_arrow)(gpu_df) for gpu_df in gpu_dfs[:part_count]]
+    gpu_dfs = [gpu_df for gpu_df in gpu_dfs]
+    wait(gpu_dfs)
+    
+    tmp_map = [(gpu_df, list(client.who_has(gpu_df).values())[0]) for gpu_df in gpu_dfs]
+    new_map = {}
+    for key, value in tmp_map:
+        if value not in new_map:
+            new_map[value] = [key]
+        else:
+            new_map[value].append(key)
+    
+    del(tmp_map)
+    gpu_dfs = []
+    for list_delayed in new_map.values():
+        gpu_dfs.append(delayed(cudf.concat)(list_delayed))
+    
+    del(new_map)
+    gpu_dfs = [(gpu_df[['delinquency_12']], gpu_df[delayed(list)(gpu_df.columns.difference(['delinquency_12']))]) for gpu_df in gpu_dfs]
+    gpu_dfs = [(gpu_df[0].persist(), gpu_df[1].persist()) for gpu_df in gpu_dfs]
+    
+    gpu_dfs = [dask.delayed(xgb.DMatrix)(gpu_df[1], gpu_df[0]) for gpu_df in gpu_dfs]
+    gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs]
+    gc.collect()
+    wait(gpu_dfs)
+    
+    labels = None
+    t1 = datetime.datetime.now()
+    bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_dfs, labels, num_boost_round=dxgb_gpu_params['nround'])
+    t2 = datetime.datetime.now()
+    print("Training time ...")
+    print(t2-t1)
+    print('str(bst) is {0}'.format(str(bst)))
+    print('Exiting script')
 
-del(tmp_map)
-gpu_dfs = []
-for list_delayed in new_map.values():
-    gpu_dfs.append(delayed(cudf.concat)(list_delayed))
-
-del(new_map)
-gpu_dfs = [(gpu_df[['delinquency_12']], gpu_df[delayed(list)(gpu_df.columns.difference(['delinquency_12']))]) for gpu_df in gpu_dfs]
-gpu_dfs = [(gpu_df[0].persist(), gpu_df[1].persist()) for gpu_df in gpu_dfs]
-gpu_dfs = [dask.delayed(xgb.DMatrix)(gpu_df[1], gpu_df[0]) for gpu_df in gpu_dfs]
-gpu_dfs = [gpu_df.persist() for gpu_df in gpu_dfs]
-
-gc.collect()
-labels = None
-
-print('str(gpu_dfs) is {0}'.format(str(gpu_dfs)))
-
-wait(gpu_dfs)
-t1 = datetime.datetime.now()
-bst = dxgb_gpu.train(client, dxgb_gpu_params, gpu_dfs, labels, num_boost_round=dxgb_gpu_params['nround'])
-t2 = datetime.datetime.now()
-print("Training time ...")
-print(t2-t1)
-print('str(bst) is {0}'.format(str(bst)))
-print('Exiting script')
\ No newline at end of file
+if __name__ == '__main__':
+    main()
diff --git a/contrib/RAPIDS/rapids.yml b/contrib/RAPIDS/rapids.yml
new file mode 100644
index 00000000..57ddad04
--- /dev/null
+++ b/contrib/RAPIDS/rapids.yml
@@ -0,0 +1,35 @@
+name: rapids
+channels:
+- nvidia
+- numba
+- conda-forge
+- rapidsai
+- defaults
+- pytorch
+
+dependencies:
+- arrow-cpp=0.12.0
+- bokeh
+- cffi=1.11.5
+- cmake=3.12
+- cuda92
+- cython==0.29
+- dask=1.1.1
+- distributed=1.25.3
+- faiss-gpu=1.5.0
+- numba=0.42
+- numpy=1.15.4
+- nvstrings
+- pandas=0.23.4
+- pyarrow=0.12.0
+- scikit-learn
+- scipy
+- cudf
+- cuml
+- python=3.6.2
+- jupyterlab
+- pip:
+  - file:/rapids/xgboost/python-package/dist/xgboost-0.81-py3-none-any.whl
+  - git+https://github.com/rapidsai/dask-xgboost@dask-cudf
+  - git+https://github.com/rapidsai/dask-cudf@master
+  - git+https://github.com/rapidsai/dask-cuda@master