Copyright (c) Microsoft Corporation. All rights reserved. 
Licensed under the MIT License.

# How to Publish a Pipeline and Invoke the REST endpoint
In this notebook, we will see how we can publish a pipeline and then invoke the REST endpoint.

## Prerequisites and Azure Machine Learning Basics
Make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. 

### Initialization Steps

In [None]:
import azureml.core
from azureml.core import Workspace, Run, Experiment, Datastore
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute import DataFactoryCompute
from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

from azureml.data.data_reference import DataReference
from azureml.pipeline.core import Pipeline, PipelineData, StepSequence
from azureml.pipeline.steps import PythonScriptStep
from azureml.pipeline.steps import DataTransferStep
from azureml.pipeline.core import PublishedPipeline
from azureml.pipeline.core.graph import PipelineParameter

print("Pipeline SDK-specific imports completed")

ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

# Default datastore (Azure file storage)
def_file_store = ws.get_default_datastore() 
print("Default datastore's name: {}".format(def_file_store.name))

def_blob_store = Datastore(ws, "workspaceblobstore")
print("Blobstore's name: {}".format(def_blob_store.name))

# project folder
project_folder = '.'

### Compute Targets
#### Retrieve an already attached Azure Machine Learning Compute

In [None]:

aml_compute_target = "aml-compute"
try:
 aml_compute = AmlCompute(ws, aml_compute_target)
 print("found existing compute target.")
except:
 print("creating new compute target")
 
 provisioning_config = AmlCompute.provisioning_configuration(vm_size = "STANDARD_D2_V2",
 min_nodes = 1, 
 max_nodes = 4) 
 aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)
 aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
 
print(aml_compute.status.serialize())


## Building Pipeline Steps with Inputs and Outputs
As mentioned earlier, a step in the pipeline can take data as input. This data can be a data source that lives in one of the accessible data locations, or intermediate data produced by a previous step in the pipeline.

In [None]:
# Reference the data uploaded to blob storage using DataReference
# Assign the datasource to blob_input_data variable
blob_input_data = DataReference(
 datastore=def_blob_store,
 data_reference_name="test_data",
 path_on_datastore="20newsgroups/20news.pkl")
print("DataReference object created")

In [None]:
# Define intermediate data using PipelineData
processed_data1 = PipelineData("processed_data1",datastore=def_blob_store)
print("PipelineData object created")

#### Define a Step that consumes a datasource and produces intermediate data.
In this step, we define a step that consumes a datasource and produces intermediate data.

**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

In [None]:
# trainStep consumes the datasource (Datareference) in the previous step
# and produces processed_data1
trainStep = PythonScriptStep(
 script_name="train.py", 
 arguments=["--input_data", blob_input_data, "--output_train", processed_data1],
 inputs=[blob_input_data],
 outputs=[processed_data1],
 compute_target=aml_compute, 
 source_directory=project_folder
)
print("trainStep created")

#### Define a Step that consumes intermediate data and produces intermediate data
In this step, we define a step that consumes an intermediate data and produces intermediate data.

**Open `extract.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** 

In [None]:
# extractStep to use the intermediate data produced by step4
# This step also produces an output processed_data2
processed_data2 = PipelineData("processed_data2", datastore=def_blob_store)

extractStep = PythonScriptStep(
 script_name="extract.py",
 arguments=["--input_extract", processed_data1, "--output_extract", processed_data2],
 inputs=[processed_data1],
 outputs=[processed_data2],
 compute_target=aml_compute, 
 source_directory=project_folder)
print("extractStep created")

#### Define a Step that consumes multiple intermediate data and produces intermediate data
In this step, we define a step that consumes multiple intermediate data and produces intermediate data.

### PipelineParameter

This step also has a [PipelineParameter](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.graph.pipelineparameter?view=azure-ml-py) argument that help with calling the REST endpoint of the published pipeline.

In [None]:
# We will use this later in publishing pipeline
pipeline_param = PipelineParameter(name="pipeline_arg", default_value=10)
print("pipeline parameter created")

**Open `compare.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.**

In [None]:
# Now define step6 that takes two inputs (both intermediate data), and produce an output
processed_data3 = PipelineData("processed_data3", datastore=def_blob_store)



compareStep = PythonScriptStep(
 script_name="compare.py",
 arguments=["--compare_data1", processed_data1, "--compare_data2", processed_data2, "--output_compare", processed_data3, "--pipeline_param", pipeline_param],
 inputs=[processed_data1, processed_data2],
 outputs=[processed_data3], 
 compute_target=aml_compute, 
 source_directory=project_folder)
print("compareStep created")

#### Build the pipeline

In [None]:
pipeline1 = Pipeline(workspace=ws, steps=[compareStep])
print ("Pipeline is built")

pipeline1.validate()
print("Simple validation complete") 

## Publish the pipeline

In [None]:
published_pipeline1 = pipeline1.publish(name="My_New_Pipeline", description="My Published Pipeline Description")
print(published_pipeline1.id)

### Run published pipeline using its REST endpoint

In [None]:
from azureml.core.authentication import AzureCliAuthentication
import requests

cli_auth = AzureCliAuthentication()
aad_token = cli_auth.get_authentication_header()

rest_endpoint1 = published_pipeline1.endpoint

print(rest_endpoint1)

# specify the param when running the pipeline
response = requests.post(rest_endpoint1, 
 headers=aad_token, 
 json={"ExperimentName": "My_Pipeline1",
 "RunSource": "SDK",
 "ParameterAssignments": {"pipeline_arg": 45}})
run_id = response.json()["Id"]

print(run_id)

# Next: Data Transfer
The next [notebook](./aml-pipelines-data-transfer.ipynb) will showcase data transfer steps between different types of data stores.