Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/scriptun-with-data-input-output.png)

# How to use configure a training run with data input and output

This notebook shows how to use [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) with input and output. A run submitted with ScriptRunConfig represents a single trial in an experiment. Submitting the run returns a ScriptRun object, which can be used to monitor the asynchronous execution of the run, log metrics and store output of the run, and analyze results and access artifacts generated by the run.


## Prerequisite:
* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning
* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](https://aka.ms/pl-config) to:
    * install the AML SDK
    * create a workspace and its configuration file (`config.json`)

## Initialize workspace
Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`.

In [None]:
from azureml.core import Workspace
ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

## Create or Attach existing AmlCompute
You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.

If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_D2_V2` GPU VMs. This process is broken down into 3 steps:
1. create the configuration (this step is local and only takes a second)
2. create the cluster (this step will take about **20 seconds**)
3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell

In [None]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "cpu-cluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    # can poll for a minimum number of nodes and for a specific timeout. 
    # if no min node count is provided it uses the scale settings for the cluster
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# use get_status() to get a detailed status for the current cluster. 
print(compute_target.get_status().serialize())

Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpu-cluster' of type `AmlCompute`.

## Use a simple script
We have already created a simple "hello world" script. This is the script that we will submit through the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig?view=azure-ml-py). It reads iris dataset as input, and write it out to `outputdataset` folder in default blob datastore. 

In [None]:
source_directory = 'script_run'

In [None]:
%%writefile $source_directory/dummy_train.py

# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import sys
import os

print("*********************************************************")
print("Hello Azure ML!")

mounted_input_path = sys.argv[1]
mounted_output_path = sys.argv[2]

print("Argument 1: %s" % mounted_input_path)
print("Argument 2: %s" % mounted_output_path)
    
with open(mounted_input_path, 'r') as f:
    content = f.read()
    with open(os.path.join(mounted_output_path, 'output.csv'), 'w') as fw:
        fw.write(content)

Every workspace comes with a default datastore (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create dataset from it. We will now upload the Iris data to the default datastore (blob) within your workspace.

In [None]:
def_blob_store = ws.get_default_datastore()
def_blob_store.upload_files(files = ['iris.csv'],
                       target_path = 'script-run/',
                       overwrite = True,
                       show_progress = True)

Now we are ready to define the input and output of your script. They can be passed in via `arguments`, which is a list of command-line arguments to pass to the training script specified in `script`.

In [None]:
from azureml.core import Dataset
from azureml.data import OutputFileDatasetConfig

input_data = Dataset.File.from_files(def_blob_store.path('script-run/iris.csv')).as_named_input('input').as_mount()

# output is configured to write the result back to def_blob_store, under "sample/outputdataset" folder
# learn more about options to configure the output, run 'help(OutputFileDatasetConfig)'
output = OutputFileDatasetConfig(destination=(def_blob_store, 'sample/outputdataset'))

In [None]:
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies

myenv = Environment("myenv")

myenv.docker.enabled = True
myenv.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk>=1.12.0'])

In [None]:
from azureml.core import ScriptRunConfig

src = ScriptRunConfig(source_directory=source_directory, 
                      script='dummy_train.py', 
                      # to mount the dataset on the remote compute and pass the mounted path as an argument to the training script
                      arguments =[input_data, output],
                      compute_target=compute_target,
                      environment=myenv)

## Build and Submit the Experiment

In [None]:
from azureml.core import Experiment
exp = Experiment(ws, 'ScriptRun_sample')
run = exp.submit(config=src)

## View Run Details

In [None]:
run.wait_for_completion(show_output=True)