MachineLearningNotebooks/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 04. Train in a remote Linux VM\n",
        "* Create Workspace\n",
        "* Create `train.py` file\n",
        "* Create and Attach a Remote VM (eg. DSVM) as compute resource.\n",
        "* Upoad data files into default datastore\n",
        "* Configure & execute a run in a few different ways\n",
        "    - Use system-built conda\n",
        "    - Use existing Python environment\n",
        "    - Use Docker \n",
        "* Find the best model in the run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prerequisites\n",
        "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Check core SDK version number\n",
        "import azureml.core\n",
        "\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize Workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Workspace\n",
        "\n",
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create Experiment\n",
        "\n",
        "**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "experiment_name = 'train-on-remote-vm'\n",
        "\n",
        "from azureml.core import Experiment\n",
        "exp = Experiment(workspace=ws, name=experiment_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's also create a local folder to hold the training script."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "script_folder = './vm-run'\n",
        "os.makedirs(script_folder, exist_ok=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Upload data files into datastore\n",
        "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and access it from the compute target."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# get the default datastore\n",
        "ds = ws.get_default_datastore()\n",
        "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Load diabetes data from `scikit-learn` and save it as 2 local files."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from sklearn.datasets import load_diabetes\n",
        "import numpy as np\n",
        "\n",
        "training_data = load_diabetes()\n",
        "np.save(file='./features.npy', arr=training_data['data'])\n",
        "np.save(file='./labels.npy', arr=training_data['target'])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's upload the 2 files into the default datastore under a path named `diabetes`:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ds.upload_files(['./features.npy', './labels.npy'], target_path='diabetes', overwrite=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## View `train.py`\n",
        "\n",
        "For convenience, we created a training script for you. It is printed below as a text, but you can also run `%pfile ./train.py` in a cell to show the file. Please pay special attention on how we are loading the features and labels from files in the `data_folder` path, which is passed in as an argument of the training script (shown later)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# copy train.py into the script folder\n",
        "import shutil\n",
        "shutil.copy('./train.py', os.path.join(script_folder, 'train.py'))\n",
        "\n",
        "with open(os.path.join(script_folder, './train.py'), 'r') as training_script:\n",
        "    print(training_script.read())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Create and Attach a DSVM as a compute target\n",
        "\n",
        "**Note**: To streamline the compute that Azure Machine Learning creates, we are making updates to support creating only single to multi-node `AmlCompute`. The DSVM can be created using the below single line command and then attached(like any VM) using the sample code below. Also note, that we only support Linux VMs for remote execution from AML and the commands below will spin a Linux VM only.\n",
        "\n",
        "```shell\n",
        "# create a DSVM in your resource group\n",
        "# note you need to be at least a contributor to the resource group in order to execute this command successfully\n",
        "(myenv) $ az vm create --resource-group <resource_group_name> --name <some_vm_name> --image microsoft-dsvm:linux-data-science-vm-ubuntu:linuxdsvmubuntu:latest --admin-username <username> --admin-password <password> --generate-ssh-keys --authentication-type password\n",
        "```\n",
        "\n",
        "**Note**: You can also use [this url](https://portal.azure.com/#create/microsoft-dsvm.linux-data-science-vm-ubuntulinuxdsvmubuntu) to create the VM using the Azure Portal\n",
        "\n",
        "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you switch to a different port (such as 5022), you can specify the port number in the provisioning configuration object."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, RemoteCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "username = os.getenv('AZUREML_DSVM_USERNAME', default='<my_username>')\n",
        "address = os.getenv('AZUREML_DSVM_ADDRESS', default='<ip_address_or_fqdn>')\n",
        "\n",
        "compute_target_name = 'cpudsvm'\n",
        "# if you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase \n",
        "try:\n",
        "    attached_dsvm_compute = RemoteCompute(workspace=ws, name=compute_target_name)\n",
        "    print('found existing:', attached_dsvm_compute.name)\n",
        "except ComputeTargetException:\n",
        "    attach_config = RemoteCompute.attach_configuration(address=address,\n",
        "                                                        ssh_port=22,\n",
        "                                                        username=username,\n",
        "                                                        private_key_file='./.ssh/id_rsa')\n",
        "    attached_dsvm_compute = ComputeTarget.attach(workspace=ws,\n",
        "                                                 name=compute_target_name,\n",
        "                                                 attach_configuration=attach_config)\n",
        "    attached_dsvm_compute.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Configure & Run\n",
        "First let's create a `DataReferenceConfiguration` object to inform the system what data folder to download to the compute target."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.runconfig import DataReferenceConfiguration\n",
        "dr = DataReferenceConfiguration(datastore_name=ds.name, \n",
        "                   path_on_datastore='diabetes', \n",
        "                   mode='download', # download files from datastore to compute target\n",
        "                   overwrite=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can try a few different ways to run the training script in the VM."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Conda run\n",
        "You can ask the system to build a conda environment based on your dependency specification, and submit your script to run there. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import Environment\n",
        "from azureml.core.conda_dependencies import CondaDependencies\n",
        "\n",
        "conda_env = Environment(\"conda-env\")\n",
        "conda_env.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core import ScriptRunConfig\n",
        "\n",
        "src = ScriptRunConfig(source_directory=script_folder, \n",
        "                      script='train.py', \n",
        "                      # pass the datastore reference as a parameter to the training script\n",
        "                      arguments=['--data-folder', str(ds.as_download())] \n",
        "                     ) \n",
        "\n",
        "src.run_config.framework = \"python\"\n",
        "src.run_config.environment = conda_env\n",
        "src.run_config.target = attached_dsvm_compute.name\n",
        "src.run_config.data_references = {ds.name: dr}"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run = exp.submit(config=src)\n",
        "\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Show the run object. You can navigate to the Azure portal to see detailed information about the run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Native VM run\n",
        "You can also configure to use an exiting Python environment in the VM to execute the script without asking the system to create a conda environment for you."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "conda_env.python.user_managed_dependencies = True"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The below run will likely fail because `train.py` needs dependency `azureml`, `scikit-learn` and others, which are not found in that Python environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run = exp.submit(config=src)\n",
        "\n",
        "from azureml.exceptions import ActivityFailedException\n",
        "\n",
        "try:\n",
        "    run.wait_for_completion(show_output=True)\n",
        "except ActivityFailedException as ex:\n",
        "    print(ex)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can choose to SSH into the VM and install Azure ML SDK, and any other missing dependencies, in that Python environment. For demonstration purposes, we simply are going to use another script `train2.py` that doesn't have azureml or data store dependencies, and submit it instead."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# copy train2.py into the script folder\n",
        "shutil.copy('./train2.py', os.path.join(script_folder, 'train2.py'))\n",
        "\n",
        "with open(os.path.join(script_folder, './train2.py'), 'r') as training_script:\n",
        "    print(training_script.read())\n",
        "    \n",
        "src.run_config.data_references = {}\n",
        "src.script = \"train2.py\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now let's try again. And this time it should work fine."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run = exp.submit(config=src)\n",
        "\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note even in this case you get a run record with some basic statistics."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Configure a Docker run with new conda environment on the VM\n",
        "You can execute in a Docker container in the VM. If you choose this option, the system will pull down a base Docker image, build a new conda environment in it if you ask for (you can also skip this if you are using a customer Docker image when a preconfigured Python environment), start a container, and run your script in there. This image is also uploaded into your ACR (Azure Container Registry) assoicated with your workspace, an reused if your dependencies don't change in the subsequent runs."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "conda_env.docker.enabled = True\n",
        "conda_env.python.user_managed_dependencies = False\n",
        "\n",
        "print('Base Docker image is:', conda_env.docker.base_image)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Submit the Experiment\n",
        "Submit script to run in the Docker image in the remote VM. If you run this for the first time, the system will download the base image, layer in packages specified in the `conda_dependencies.yml` file on top of the base image, create a container and then execute the script in the container."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "src.script = \"train.py\"\n",
        "src.run_config.data_references = {ds.name: dr}\n",
        "\n",
        "run = exp.submit(config=src)\n",
        "\n",
        "run.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Use a custom Docker image instead\n",
        "\n",
        "You can also specify a custom Docker image if you don't want to use the default image provided by Azure ML.\n",
        "\n",
        "```python\n",
        "# use an image available in Docker Hub without authentication\n",
        "conda_env.docker.base_image = \"continuumio/miniconda3\"\n",
        "\n",
        "# or, use an image available in a private Azure Container Registry\n",
        "conda_env.docker.base_image = \"mycustomimage:1.0\"\n",
        "conda_env.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
        "conda_env.docker.base_image_registry.username = \"username\"\n",
        "conda_env.docker.base_image_registry.password = \"password\"\n",
        "```\n",
        "\n",
        "When you are using a custom Docker image, you might already have your environment setup properly in a Python environment in the Docker image. In that case, you can skip specifying conda dependencies, and just use `user_managed_dependencies` option instead:\n",
        "```python\n",
        "conda_env.python.user_managed_dependencies = True\n",
        "# path to the Python environment in the custom Docker image\n",
        "conda_env.python.interpreter_path = '/opt/conda/bin/python'\n",
        "```"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### View run history details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Find the best model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we have tried various execution modes, we can find the best model from the last run."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# get all metris logged in the run\n",
        "run.get_metrics()\n",
        "metrics = run.get_metrics()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# find the index where MSE is the smallest\n",
        "indices = list(range(0, len(metrics['mse'])))\n",
        "min_mse_index = min(indices, key=lambda x: metrics['mse'][x])\n",
        "\n",
        "print('When alpha is {1:0.2f}, we have min MSE {0:0.2f}.'.format(\n",
        "    metrics['mse'][min_mse_index], \n",
        "    metrics['alpha'][min_mse_index]\n",
        "))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Clean up compute resource\n",
        "\n",
        "Use ```detach()``` to detach an existing DSVM from Workspace without deleting it. Use ```delete()``` if you created a new ```DsvmCompute``` and want to delete it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# dsvm_compute.detach()\n",
        "# dsvm_compute.delete()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "roastala"
      }
    ],
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}