update samples from Release-65 as a part of SDK release

2025-12-20 09:37:04 -05:00 · 2020-09-17 01:14:32 +00:00
parent db2bf8ae93
commit 8dad09a42f
9 changed files with 787 additions and 10 deletions
--- a/how-to-use-azureml/automated-machine-learning/README.md
+++ b/how-to-use-azureml/automated-machine-learning/README.md
@@ -154,6 +154,12 @@ jupyter notebook
 - [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb)
    - Continuous retraining using Pipelines and Time-Series TabularDataset
 - [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
    - Classification with text data using deep learning in automated ML
    - AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data.
    - Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used.
    - Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.
 <a name="documentation"></a>
 See [Configure automated machine learning experiments](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-auto-train) to learn how more about the the settings and features available for automated machine learning experiments.
--- a/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb
@@ -0,0 +1,592 @@
 {
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Copyright (c) Microsoft Corporation. All rights reserved.\n",
        "\n",
        "Licensed under the MIT License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Automated Machine Learning\n",
        "_**Text Classification Using Deep Learning**_\n",
        "\n",
        "## Contents\n",
        "1. [Introduction](#Introduction)\n",
        "1. [Setup](#Setup)\n",
        "1. [Data](#Data)\n",
        "1. [Train](#Train)\n",
        "1. [Evaluate](#Evaluate)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Introduction\n",
        "This notebook demonstrates classification with text data using deep learning in AutoML.\n",
        "\n",
        "AutoML highlights here include using deep neural networks (DNNs) to create embedded features from text data. Depending on the compute cluster the user provides, AutoML tried out Bidirectional Encoder Representations from Transformers (BERT) when a GPU compute is used, and Bidirectional Long-Short Term neural network (BiLSTM) when a CPU compute is used, thereby optimizing the choice of DNN for the uesr's setup.\n",
        "\n",
        "Make sure you have executed the [configuration](../../../configuration.ipynb) before running this notebook.\n",
        "\n",
        "An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n",
        "\n",
        "Notebook synopsis:\n",
        "1. Creating an Experiment in an existing Workspace\n",
        "2. Configuration and remote run of AutoML for a text dataset (20 Newsgroups dataset from scikit-learn) for classification\n",
        "3. Registering the best model for future use\n",
        "4. Evaluating the final model on a test set"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Setup"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import logging\n",
        "import os\n",
        "import shutil\n",
        "\n",
        "import pandas as pd\n",
        "\n",
        "import azureml.core\n",
        "from azureml.core.experiment import Experiment\n",
        "from azureml.core.workspace import Workspace\n",
        "from azureml.core.dataset import Dataset\n",
        "from azureml.core.compute import AmlCompute\n",
        "from azureml.core.compute import ComputeTarget\n",
        "from azureml.core.run import Run\n",
        "from azureml.widgets import RunDetails\n",
        "from azureml.core.model import Model \n",
        "from helper import run_inference, get_result_df\n",
        "from azureml.train.automl import AutoMLConfig\n",
        "from sklearn.datasets import fetch_20newsgroups"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "print(\"This notebook was created using version 1.13.0 of the Azure ML SDK\")\n",
        "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "As part of the setup you have already created a <b>Workspace</b>. To run AutoML, you also need to create an <b>Experiment</b>. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "ws = Workspace.from_config()\n",
        "\n",
        "# Choose an experiment name.\n",
        "experiment_name = 'automl-classification-text-dnn'\n",
        "\n",
        "experiment = Experiment(ws, experiment_name)\n",
        "\n",
        "output = {}\n",
        "output['Subscription ID'] = ws.subscription_id\n",
        "output['Workspace Name'] = ws.name\n",
        "output['Resource Group'] = ws.resource_group\n",
        "output['Location'] = ws.location\n",
        "output['Experiment Name'] = experiment.name\n",
        "pd.set_option('display.max_colwidth', -1)\n",
        "outputDf = pd.DataFrame(data = output, index = [''])\n",
        "outputDf.T"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Set up a compute cluster\n",
        "This section uses a user-provided compute cluster (named \"dnntext-cluster\" in this example). If a cluster with this name does not exist in the user's workspace, the below code will create a new cluster. You can choose the parameters of the cluster as mentioned in the comments.\n",
        "\n",
        "Whether you provide/select a CPU or GPU cluster, AutoML will choose the appropriate DNN for that setup - BiLSTM or BERT text featurizer will be included in the candidate featurizers on CPU and GPU respectively.  If your goal is to obtain the most accurate model, we recommend you use GPU clusters since BERT featurizers usually outperform BiLSTM featurizers."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from azureml.core.compute import ComputeTarget, AmlCompute\n",
        "from azureml.core.compute_target import ComputeTargetException\n",
        "\n",
        "num_nodes = 2\n",
        "\n",
        "# Choose a name for your cluster.\n",
        "amlcompute_cluster_name = \"dnntext-cluster\"\n",
        "\n",
        "# Verify that cluster does not exist already\n",
        "try:\n",
        "    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
        "    print('Found existing cluster, use it.')\n",
        "except ComputeTargetException:\n",
        "    compute_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\", # CPU for BiLSTM, such as \"STANDARD_D2_V2\" \n",
        "                                                           # To use BERT (this is recommended for best performance), select a GPU such as \"STANDARD_NC6\" \n",
        "                                                           # or similar GPU option\n",
        "                                                           # available in your workspace\n",
        "                                                           max_nodes = num_nodes)\n",
        "    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
        "\n",
        "compute_target.wait_for_completion(show_output=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Get data\n",
        "For this notebook we will use 20 Newsgroups data from scikit-learn. We filter the data to contain four classes and take a sample as training data. Please note that for accuracy improvement, more data is needed. For this notebook we provide a small-data example so that you can use this template to use with your larger sized data."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_dir = \"text-dnn-data\" # Local directory to store data\n",
        "blobstore_datadir = data_dir # Blob store directory to store data in\n",
        "target_column_name = 'y'\n",
        "feature_column_name = 'X'\n",
        "\n",
        "def get_20newsgroups_data():\n",
        "    '''Fetches 20 Newsgroups data from scikit-learn\n",
        "       Returns them in form of pandas dataframes\n",
        "    '''\n",
        "    remove = ('headers', 'footers', 'quotes')\n",
        "    categories = [\n",
        "        'rec.sport.baseball',\n",
        "        'rec.sport.hockey',\n",
        "        'comp.graphics',\n",
        "        'sci.space',\n",
        "        ]\n",
        "\n",
        "    data = fetch_20newsgroups(subset = 'train', categories = categories,\n",
        "                                    shuffle = True, random_state = 42,\n",
        "                                    remove = remove)\n",
        "    data = pd.DataFrame({feature_column_name: data.data, target_column_name: data.target})\n",
        "\n",
        "    data_train = data[:200]\n",
        "    data_test = data[200:300]    \n",
        "\n",
        "    data_train = remove_blanks_20news(data_train, feature_column_name, target_column_name)\n",
        "    data_test = remove_blanks_20news(data_test, feature_column_name, target_column_name)\n",
        "    \n",
        "    return data_train, data_test\n",
        "    \n",
        "def remove_blanks_20news(data, feature_column_name, target_column_name):\n",
        "    \n",
        "    data[feature_column_name] = data[feature_column_name].replace(r'\\n', ' ', regex=True).apply(lambda x: x.strip())\n",
        "    data = data[data[feature_column_name] != '']\n",
        "    \n",
        "    return data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Fetch data and upload to datastore for use in training"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "data_train, data_test = get_20newsgroups_data()\n",
        "\n",
        "if not os.path.isdir(data_dir):\n",
        "    os.mkdir(data_dir)\n",
        "    \n",
        "train_data_fname = data_dir + '/train_data.csv'\n",
        "test_data_fname = data_dir + '/test_data.csv'\n",
        "\n",
        "data_train.to_csv(train_data_fname, index=False)\n",
        "data_test.to_csv(test_data_fname, index=False)\n",
        "\n",
        "datastore = ws.get_default_datastore()\n",
        "datastore.upload(src_dir=data_dir, target_path=blobstore_datadir,\n",
        "                    overwrite=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "train_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/train_data.csv')])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Prepare AutoML run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n",
        "\n",
        "This notebook uses the blocked_models parameter to exclude some models that can take a longer time to train on some text datasets. You can choose to remove models from the blocked_models list but you may need to increase the experiment_timeout_hours parameter value to get results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_settings = {\n",
        "    \"experiment_timeout_minutes\": 20,\n",
        "    \"primary_metric\": 'accuracy',\n",
        "    \"max_concurrent_iterations\": num_nodes, \n",
        "    \"max_cores_per_iteration\": -1,\n",
        "    \"enable_dnn\": True,\n",
        "    \"enable_early_stopping\": True,\n",
        "    \"validation_size\": 0.3,\n",
        "    \"verbosity\": logging.INFO,\n",
        "    \"enable_voting_ensemble\": False,\n",
        "    \"enable_stack_ensemble\": False,\n",
        "}\n",
        "\n",
        "automl_config = AutoMLConfig(task = 'classification',\n",
        "                             debug_log = 'automl_errors.log',\n",
        "                             compute_target=compute_target,\n",
        "                             training_data=train_dataset,\n",
        "                             label_column_name=target_column_name,\n",
        "                             blocked_models = ['LightGBM'],\n",
        "                             **automl_settings\n",
        "                            )"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Submit AutoML Run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_run = experiment.submit(automl_config, show_output=True)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "automl_run"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieve the Best Model\n",
        "Below we select the best model pipeline from our iterations, use it to test on test data on the same compute cluster."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can test the model locally to get a feel of the input/output. When the model contains BERT, this step will require pytorch and pytorch-transformers installed in your local environment. The exact versions of these packages can be found in the **automl_env.yml** file located in the local copy of your MachineLearningNotebooks folder here:\n",
        "MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/automl_env.yml"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "best_run, fitted_model = automl_run.get_output()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "You can now see what text transformations are used to convert text data to features for this dataset, including deep learning transformations based on BiLSTM or Transformer (BERT is one implementation of a Transformer) models."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "text_transformations_used = []\n",
        "for column_group in fitted_model.named_steps['datatransformer'].get_featurization_summary():\n",
        "    text_transformations_used.extend(column_group['Transformations'])\n",
        "text_transformations_used"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Registering the best model\n",
        "We now register the best fitted model from the AutoML Run for use in future deployments.  "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Get results stats, extract the best model from AutoML run, download and register the resultant best model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "summary_df = get_result_df(automl_run)\n",
        "best_dnn_run_id = summary_df['run_id'].iloc[0]\n",
        "best_dnn_run = Run(experiment, best_dnn_run_id)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "model_dir = 'Model' # Local folder where the model will be stored temporarily\n",
        "if not os.path.isdir(model_dir):\n",
        "    os.mkdir(model_dir)\n",
        "    \n",
        "best_dnn_run.download_file('outputs/model.pkl', model_dir + '/model.pkl')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Register the model in your Azure Machine Learning Workspace. If you previously registered a model, please make sure to delete it so as to replace it with this new model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Register the model\n",
        "model_name = 'textDNN-20News'\n",
        "model = Model.register(model_path = model_dir + '/model.pkl',\n",
        "                       model_name = model_name,\n",
        "                       tags=None,\n",
        "                       workspace=ws)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluate on Test Data"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We now use the best fitted model from the AutoML Run to make predictions on the test set.  \n",
        "\n",
        "Test set schema should match that of the training set."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, blobstore_datadir + '/test_data.csv')])\n",
        "\n",
        "# preview the first 3 rows of the dataset\n",
        "test_dataset.take(3).to_pandas_dataframe()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_experiment = Experiment(ws, experiment_name + \"_test\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "script_folder = os.path.join(os.getcwd(), 'inference')\n",
        "os.makedirs(script_folder, exist_ok=True)\n",
        "shutil.copy('infer.py', script_folder)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_run = run_inference(test_experiment, compute_target, script_folder, best_dnn_run,\n",
        "                         train_dataset, test_dataset, target_column_name, model_name)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Display computed metrics"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_run"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "RunDetails(test_run).show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "test_run.wait_for_completion()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "pd.Series(test_run.get_metrics())"
      ]
    }
  ],
  "metadata": {
    "authors": [
      {
        "name": "anshirga"
      }
    ],
    "compute": [
      "AML Compute"
    ],
    "datasets": [
      "None"
    ],
    "deployment": [
      "None"
    ],
    "exclude_from_index": false,
    "framework": [
      "None"
    ],
    "friendly_name": "DNN Text Featurization",
    "index_order": 2,
    "kernelspec": {
      "display_name": "Python 3.6",
      "language": "python",
      "name": "python36"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.7"
    },
    "tags": [
      "None"
    ],
    "task": "Text featurization using DNNs for classification"
  },
  "nbformat": 4,
  "nbformat_minor": 2
 }
--- a/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.yml
+++ b/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.yml
@@ -0,0 +1,4 @@
 name: auto-ml-classification-text-dnn
 dependencies:
 - pip:
  - azureml-sdk
--- a/how-to-use-azureml/automated-machine-learning/classification-text-dnn/helper.py
+++ b/how-to-use-azureml/automated-machine-learning/classification-text-dnn/helper.py
@@ -0,0 +1,56 @@
 import pandas as pd
 from azureml.core import Environment
 from azureml.train.estimator import Estimator
 from azureml.core.run import Run
 def run_inference(test_experiment, compute_target, script_folder, train_run,
                  train_dataset, test_dataset, target_column_name, model_name):
    inference_env = train_run.get_environment()
    est = Estimator(source_directory=script_folder,
                    entry_script='infer.py',
                    script_params={
                        '--target_column_name': target_column_name,
                        '--model_name': model_name
                    },
                    inputs=[
                        train_dataset.as_named_input('train_data'),
                        test_dataset.as_named_input('test_data')
                    ],
                    compute_target=compute_target,
                    environment_definition=inference_env)
    run = test_experiment.submit(
        est, tags={
            'training_run_id': train_run.id,
            'run_algorithm': train_run.properties['run_algorithm'],
            'valid_score': train_run.properties['score'],
            'primary_metric': train_run.properties['primary_metric']
        })
    run.log("run_algorithm", run.tags['run_algorithm'])
    return run
 def get_result_df(remote_run):
    children = list(remote_run.get_children(recursive=True))
    summary_df = pd.DataFrame(index=['run_id', 'run_algorithm',
                                     'primary_metric', 'Score'])
    goal_minimize = False
    for run in children:
        if('run_algorithm' in run.properties and 'score' in run.properties):
            summary_df[run.id] = [run.id, run.properties['run_algorithm'],
                                  run.properties['primary_metric'],
                                  float(run.properties['score'])]
            if('goal' in run.properties):
                goal_minimize = run.properties['goal'].split('_')[-1] == 'min'
    summary_df = summary_df.T.sort_values(
        'Score',
        ascending=goal_minimize).drop_duplicates(['run_algorithm'])
    summary_df = summary_df.set_index('run_algorithm')
    return summary_df
--- a/how-to-use-azureml/automated-machine-learning/classification-text-dnn/infer.py
+++ b/how-to-use-azureml/automated-machine-learning/classification-text-dnn/infer.py
@@ -0,0 +1,60 @@
 import argparse
 import numpy as np
 from sklearn.externals import joblib
 from azureml.automl.runtime.shared.score import scoring, constants
 from azureml.core import Run
 from azureml.core.model import Model
 parser = argparse.ArgumentParser()
 parser.add_argument(
    '--target_column_name', type=str, dest='target_column_name',
    help='Target Column Name')
 parser.add_argument(
    '--model_name', type=str, dest='model_name',
    help='Name of registered model')
 args = parser.parse_args()
 target_column_name = args.target_column_name
 model_name = args.model_name
 print('args passed are: ')
 print('Target column name: ', target_column_name)
 print('Name of registered model: ', model_name)
 model_path = Model.get_model_path(model_name)
 # deserialize the model file back into a sklearn model
 model = joblib.load(model_path)
 run = Run.get_context()
 # get input dataset by name
 test_dataset = run.input_datasets['test_data']
 train_dataset = run.input_datasets['train_data']
 X_test_df = test_dataset.drop_columns(columns=[target_column_name]) \
                        .to_pandas_dataframe()
 y_test_df = test_dataset.with_timestamp_columns(None) \
                        .keep_columns(columns=[target_column_name]) \
                        .to_pandas_dataframe()
 y_train_df = test_dataset.with_timestamp_columns(None) \
                         .keep_columns(columns=[target_column_name]) \
                         .to_pandas_dataframe()
 predicted = model.predict_proba(X_test_df)
 # Use the AutoML scoring module
 class_labels = np.unique(np.concatenate((y_train_df.values, y_test_df.values)))
 train_labels = model.classes_
 classification_metrics = list(constants.CLASSIFICATION_SCALAR_SET)
 scores = scoring.score_classification(y_test_df.values, predicted,
                                      classification_metrics,
                                      class_labels, train_labels)
 print("scores:")
 print(scores)
 for key, value in scores.items():
    run.log(key, value)
--- a/how-to-use-azureml/azure-databricks/automl/README.md
+++ b/how-to-use-azureml/azure-databricks/automl/README.md
@@ -0,0 +1,56 @@
 # Adding an init script to an Azure Databricks cluster
 The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to
 1. Install the latest AutoML library
 To create the Azure Databricks cluster-scoped init script
 1. Create the base directory you want to store the init script in if it does not exist.
    ```
    dbutils.fs.mkdirs("dbfs:/databricks/init/")
    ```
 2. Create the script azureml-cluster-init.sh
    ```
    dbutils.fs.put("/databricks/init/azureml-cluster-init.sh","""
    #!/bin/bash
 	set -ex
 	/databricks/python/bin/pip install -r https://aka.ms/automl_linux_requirements.txt
    """, True)
    ```
 3. Check that the script exists.
    ```
    display(dbutils.fs.ls("dbfs:/databricks/init/azureml-cluster-init.sh"))
    ```
 1. Configure the cluster to run the script.
    * Using the cluster configuration page
        1. On the cluster configuration page, click the Advanced Options toggle.
        1. At the bottom of the page, click the Init Scripts tab.
        1. In the Destination drop-down, select a destination type. Example: 'DBFS'
        1. Specify a path to the init script.
            ```
            dbfs:/databricks/init/azureml-cluster-init.sh
            ```
        1. Click Add
    * Using the API.
        ```
        curl -n -X POST -H 'Content-Type: application/json' -d '{
        "cluster_id": "<cluster_id>",
        "num_workers": <num_workers>,
        "spark_version": "<spark_version>",
        "node_type_id": "<node_type_id>",
        "cluster_log_conf": {
            "dbfs" : {
            "destination": "dbfs:/cluster-logs"
            }
        },
        "init_scripts": [ {
            "dbfs": {
            "destination": "dbfs:/databricks/init/azureml-cluster-init.sh"
            }
        } ]
        }' https://<databricks-instance>/api/2.0/clusters/edit
        ```
--- a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb
+++ b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb
@@ -13,12 +13,13 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
+        "## AutoML Installation\n",
        "\n",
-        "**install azureml-sdk with Automated ML**\n",
+        "**For Databricks non ML runtime 7.1(scala 2.21, spark 3.0.0) and up, Install AML sdk by running the following command in the first cell of the notebook.**\n",
-        "* Source: Upload Python Egg or PyPi\n",
+        "\n",
-        "* PyPi Name: `azureml-sdk[automl]`\n",
+        "%pip install -r https://aka.ms/automl_linux_requirements.txt\n",
-        "* Select Install Library"
+        "\n",
        "**For Databricks non ML runtime 7.0 and lower, Install AML sdk using init script as shown in [readme](readme.md) before running this notebook.**\n"
      ]
    },
    {
--- a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb
+++ b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb
@@ -13,12 +13,13 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "We support installing AML SDK as library from GUI. When attaching a library follow this https://docs.databricks.com/user-guide/libraries.html and add the below string as your PyPi package. You can select the option to attach the library to all clusters or just one cluster.\n",
+        "## AutoML Installation\n",
        "\n",
-        "**install azureml-sdk with Automated ML**\n",
+        "**For Databricks non ML runtime 7.1(scala 2.21, spark 3.0.0) and up, Install AML sdk by running the following command in the first cell of the notebook.**\n",
-        "* Source: Upload Python Egg or PyPi\n",
+        "\n",
-        "* PyPi Name: `azureml-sdk[automl]`\n",
+        "%pip install -r https://aka.ms/automl_linux_requirements.txt\n",
-        "* Select Install Library"
+        "\n",
        "**For Databricks non ML runtime 7.0 and lower, Install AML sdk using init script as shown in [readme](readme.md) before running this notebook.**"
      ]
    },
    {
--- a/index.md
+++ b/index.md
@@ -94,6 +94,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
 ## Other Notebooks
 |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags |
 |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:|
 | [DNN Text Featurization](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb) | Text featurization using DNNs for classification | None | AML Compute | None | None | None |
 | [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) |  |  |  |  |  |  |
 | [fairlearn-azureml-mitigation](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/fairness/fairlearn-azureml-mitigation.ipynb) |  |  |  |  |  |  |
 | [upload-fairness-dashboard](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/fairness/upload-fairness-dashboard.ipynb) |  |  |  |  |  |  |