Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.png)

# Azure Machine Learning Pipeline with KustoStep
To use Kusto as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a KustoStep is used. A KustoStep enables the functionality of running Kusto queries on a target Kusto cluster in Azure ML Pipelines. Each KustoStep can target one Kusto cluster and perform multiple queries on them. This notebook demonstrates the use of KustoStep in Azure Machine Learning (AML) Pipeline.

## Before you begin:

1. **Have an Azure Machine Learning workspace**: You will need details of this workspace later on to define KustoStep.
2. **Have a Service Principal**: You will need a service principal and use its credentials to access your cluster. See [this](https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) for more information.
3. **Have a Blob storage**: You will need a Azure Blob storage for uploading the output of your Kusto query.

## Azure Machine Learning and Pipeline SDK-specific imports

In [None]:
import os
import azureml.core
from azureml.core.runconfig import JarLibrary
from azureml.core.compute import ComputeTarget, KustoCompute
from azureml.exceptions import ComputeTargetException
from azureml.core import Workspace, Experiment
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.pipeline.steps import KustoStep
from azureml.core.datastore import Datastore
from azureml.data.data_reference import DataReference

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

## Initialize Workspace

Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't.

In [None]:
ws = Workspace.from_config()
print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')

## Attach Kusto compute target
Next, you need to create a Kusto compute target and give it a name. You will use this name to refer to your Kusto compute target inside Azure Machine Learning. Your workspace will be associated to this Kusto compute target. You will also need to provide some credentials that will be used to enable access to your target Kusto cluster and database.

- **Resource Group** - The resource group name of your Azure Machine Learning workspace
- **Workspace Name** - The workspace name of your Azure Machine Learning workspace
- **Resource ID** - The resource ID of your Kusto cluster
- **Tenant ID** - The tenant ID associated to your Kusto cluster
- **Application ID** - The Application ID associated to your Kusto cluster
- **Application Key** - The Application key associated to your Kusto cluster
- **Kusto Connection String** - The connection string of your Kusto cluster


In [None]:
compute_name = "<compute_name>" # Name to associate with new compute in workspace

# Account details associated to the target Kusto cluster
resource_id = "<resource_id>" # Resource ID of the Kusto cluster
kusto_connection_string = "<kusto_connection_string>" # Connection string of the Kusto cluster
application_id = "<application_id>" # Application ID associated to the Kusto cluster
application_key = "<application_key>" # Application Key associated to the Kusto cluster
tenant_id = "<tenant_id>" # Tenant ID associated to the Kusto cluster

try:
    kusto_compute = KustoCompute(workspace=ws, name=compute_name)
    print('Compute target {} already exists'.format(compute_name))
except ComputeTargetException:
    print('Compute not found, will use provided parameters to attach new one')
    config = KustoCompute.attach_configuration(resource_group=ws.resource_group, workspace_name=ws.name, 
                                               resource_id=resource_id, tenant_id=tenant_id, 
                                               kusto_connection_string=kusto_connection_string, 
                                               application_id=application_id, application_key=application_key)
    kusto_compute=ComputeTarget.attach(ws, compute_name, config)
    kusto_compute.wait_for_completion(True)

## Setup output
To use Kusto as a compute target for Azure Machine Learning Pipeline, a KustoStep is used. Currently KustoStep only supports uploading results to Azure Blob store. Let's define an output datastore via PipelineData to be used in KustoStep.

In [None]:
from azureml.pipeline.core import PipelineParameter

# Use the default blob storage
def_blob_store = Datastore.get(ws, "workspaceblobstore")
print('Datastore {} will be used'.format(def_blob_store.name))

step_1_output = PipelineData("output", datastore=def_blob_store)

# Add a KustoStep to Pipeline
Adds a Kusto query as a step in a Pipeline.
- **name:** Name of the Module
- **compute_target:** Name of Kusto compute target
- **database_name:** Name of the database to perform Kusto query on
- **query_directory:** Path to folder that contains only a text file with Kusto queries (see [here](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/) for more details on Kusto queries). 
    - If the query is parameterized, then the text file must also include any declaration of query parameters (see [here](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/queryparametersstatement?pivots=azuredataexplorer) for more details on query parameters declaration statements). 
    - An example of the query text file could just contain the query "StormEvents | count | as HowManyRecords;", where StormEvents is the table name. 
    - Note. the text file should just contain the declarations and queries without quotation marks around them.
- **outputs:** Output binding to an Azure Blob Store.
- **parameter_dict (optional):** Dictionary that contains the values of parameters declared in the query text file in the **query_directory** mentioned above.
    - Dictionary key is the parameter name, and dictionary value is the parameter value.
    - For example, parameter_dict = {"paramName1": "paramValue1", "paramName2": "paramValue2"}
- **allow_reuse (optional):** Whether the step should reuse previous results when run with the same settings/inputs (default to False)

In [None]:
database_name = "<database_name>" # Name of the database to perform Kusto queries on
query_directory = "<query_directory>" # Path to folder that contains a text file with Kusto queries

kustoStep = KustoStep(
    name='KustoNotebook',
    compute_target=compute_name,
    database_name=database_name,
    query_directory=query_directory,
    output=step_1_output,
)

# Build and submit the Experiment

In [None]:
steps = [kustoStep]
pipeline = Pipeline(workspace=ws, steps=steps)
pipeline_run = Experiment(ws, 'Notebook_demo').submit(pipeline)
pipeline_run.wait_for_completion()

# View Run Details

In [None]:
from azureml.widgets import RunDetails
RunDetails(pipeline_run).show()