MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-kusto-as-compute-target.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.  \n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Azure Machine Learning Pipeline with KustoStep\n",
        "To use Kusto as a compute target from [Azure Machine Learning Pipeline](https://aka.ms/pl-concept), a KustoStep is used. A KustoStep enables the functionality of running Kusto queries on a target Kusto cluster in Azure ML Pipelines. Each KustoStep can target one Kusto cluster and perform multiple queries on them. This notebook demonstrates the use of KustoStep in Azure Machine Learning (AML) Pipeline.\n",
        "\n",
        "## Before you begin:\n",
        "\n",
        "1. **Have an Azure Machine Learning workspace**: You will need details of this workspace later on to define KustoStep.\n",
        "2. **Have a Service Principal**: You will need a service principal and use its credentials to access your cluster. See [this](https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal) for more information.\n",
        "3. **Have a Blob storage**: You will need a Azure Blob storage for uploading the output of your Kusto query."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Azure Machine Learning and Pipeline SDK-specific imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import os\n",
        "import azureml.core\n",
        "from azureml.core.runconfig import JarLibrary\n",
        "from azureml.core.compute import ComputeTarget, KustoCompute\n",
        "from azureml.exceptions import ComputeTargetException\n",
        "from azureml.core import Workspace, Experiment\n",
        "from azureml.pipeline.core import Pipeline, PipelineData\n",
        "from azureml.pipeline.steps import KustoStep\n",
        "from azureml.core.datastore import Datastore\n",
        "from azureml.data.data_reference import DataReference\n",
        "\n",
        "# Check core SDK version number\n",
        "print(\"SDK version:\", azureml.core.VERSION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Initialize Workspace\n",
        "\n",
        "Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://aka.ms/pl-config) first if you haven't."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Attach Kusto compute target\n",
        "Next, you need to create a Kusto compute target and give it a name. You will use this name to refer to your Kusto compute target inside Azure Machine Learning. Your workspace will be associated to this Kusto compute target. You will also need to provide some credentials that will be used to enable access to your target Kusto cluster and database.\n",
        "\n",
        "- **Resource Group** - The resource group name of your Azure Machine Learning workspace\n",
        "- **Workspace Name** - The workspace name of your Azure Machine Learning workspace\n",
        "- **Resource ID** - The resource ID of your Kusto cluster\n",
        "- **Tenant ID** - The tenant ID associated to your Kusto cluster\n",
        "- **Application ID** - The Application ID associated to your Kusto cluster\n",
        "- **Application Key** - The Application key associated to your Kusto cluster\n",
        "- **Kusto Connection String** - The connection string of your Kusto cluster\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "sample-databrickscompute-attach"
        ]
      },
      "outputs": [],
      "source": [
        "compute_name = \"<compute_name>\" # Name to associate with new compute in workspace\n",
        "\n",
        "# Account details associated to the target Kusto cluster\n",
        "resource_id = \"<resource_id>\" # Resource ID of the Kusto cluster\n",
        "kusto_connection_string = \"<kusto_connection_string>\" # Connection string of the Kusto cluster\n",
        "application_id = \"<application_id>\" # Application ID associated to the Kusto cluster\n",
        "application_key = \"<application_key>\" # Application Key associated to the Kusto cluster\n",
        "tenant_id = \"<tenant_id>\" # Tenant ID associated to the Kusto cluster\n",
        "\n",
        "try:\n",
        "    kusto_compute = KustoCompute(workspace=ws, name=compute_name)\n",
        "    print('Compute target {} already exists'.format(compute_name))\n",
        "except ComputeTargetException:\n",
        "    print('Compute not found, will use provided parameters to attach new one')\n",
        "    config = KustoCompute.attach_configuration(resource_group=ws.resource_group, workspace_name=ws.name, \n",
        "                                               resource_id=resource_id, tenant_id=tenant_id, \n",
        "                                               kusto_connection_string=kusto_connection_string, \n",
        "                                               application_id=application_id, application_key=application_key)\n",
        "    kusto_compute=ComputeTarget.attach(ws, compute_name, config)\n",
        "    kusto_compute.wait_for_completion(True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup output\n",
        "To use Kusto as a compute target for Azure Machine Learning Pipeline, a KustoStep is used. Currently KustoStep only supports uploading results to Azure Blob store. Let's define an output datastore via PipelineData to be used in KustoStep."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.pipeline.core import PipelineParameter\n",
        "\n",
        "# Use the default blob storage\n",
        "def_blob_store = Datastore.get(ws, \"workspaceblobstore\")\n",
        "print('Datastore {} will be used'.format(def_blob_store.name))\n",
        "\n",
        "step_1_output = PipelineData(\"output\", datastore=def_blob_store)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Add a KustoStep to Pipeline\n",
        "Adds a Kusto query as a step in a Pipeline.\n",
        "- **name:** Name of the Module\n",
        "- **compute_target:** Name of Kusto compute target\n",
        "- **database_name:** Name of the database to perform Kusto query on\n",
        "- **query_directory:** Path to folder that contains only a text file with Kusto queries (see [here](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/) for more details on Kusto queries). \n",
        "    - If the query is parameterized, then the text file must also include any declaration of query parameters (see [here](https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/queryparametersstatement?pivots=azuredataexplorer) for more details on query parameters declaration statements). \n",
        "    - An example of the query text file could just contain the query \"StormEvents | count | as HowManyRecords;\", where StormEvents is the table name. \n",
        "    - Note. the text file should just contain the declarations and queries without quotation marks around them.\n",
        "- **outputs:** Output binding to an Azure Blob Store.\n",
        "- **parameter_dict (optional):** Dictionary that contains the values of parameters declared in the query text file in the **query_directory** mentioned above.\n",
        "    - Dictionary key is the parameter name, and dictionary value is the parameter value.\n",
        "    - For example, parameter_dict = {\"paramName1\": \"paramValue1\", \"paramName2\": \"paramValue2\"}\n",
        "- **allow_reuse (optional):** Whether the step should reuse previous results when run with the same settings/inputs (default to False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "database_name = \"<database_name>\" # Name of the database to perform Kusto queries on\n",
        "query_directory = \"<query_directory>\" # Path to folder that contains a text file with Kusto queries\n",
        "\n",
        "kustoStep = KustoStep(\n",
        "    name='KustoNotebook',\n",
        "    compute_target=compute_name,\n",
        "    database_name=database_name,\n",
        "    query_directory=query_directory,\n",
        "    output=step_1_output,\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Build and submit the Experiment"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "steps = [kustoStep]\n",
        "pipeline = Pipeline(workspace=ws, steps=steps)\n",
        "pipeline_run = Experiment(ws, 'Notebook_demo').submit(pipeline)\n",
        "pipeline_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# View Run Details"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.widgets import RunDetails\n",
        "RunDetails(pipeline_run).show()"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "t-kachia"
      }
    ],
    "category": "tutorial",
    "compute": [
      "Kusto"
    ],
    "datasets": [
      "Custom"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "Azure ML, Kusto"
    ],
    "friendly_name": "How to use KustoStep with AML Pipelines",
    "kernelspec": {
      "display_name": "Python 3.8 - AzureML",
      "language": "python",
      "name": "python38-azureml"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    },
    "order_index": 5,
    "star_tag": [
      "featured"
    ],
    "tags": [
      "None"
    ],
    "task": "Demonstrates the use of KustoStep"
  },
  "nbformat": 4,
  "nbformat_minor": 2
}