Merge pull request #1255 from Azure/release_update/Release-79

update samples from Release-79 as a part of  SDK release
This commit is contained in:
Harneet Virk
2020-12-07 11:11:32 -08:00
committed by GitHub
39 changed files with 372 additions and 280 deletions

View File

@@ -28,7 +28,7 @@ git clone https://github.com/Azure/MachineLearningNotebooks.git
pip install azureml-sdk[notebooks,tensorboard]
# install model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]
# install automated ml components
pip install azureml-sdk[automl]
@@ -86,7 +86,7 @@ If you need additional Azure ML SDK components, you can either modify the Docker
pip install azureml-sdk[automl]
# install the core SDK and model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]
# install the core SDK and experimental components
pip install azureml-sdk[contrib]

View File

@@ -103,7 +103,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -38,7 +38,7 @@
"## Introduction\n",
"This notebook shows how to use [Fairlearn (an open source fairness assessment and unfairness mitigation package)](http://fairlearn.github.io) and Azure Machine Learning Studio for a binary classification problem. This example uses the well-known adult census dataset. For the purposes of this notebook, we shall treat this as a loan decision problem. We will pretend that the label indicates whether or not each individual repaid a loan in the past. We will use the data to train a predictor to predict whether previously unseen individuals will repay a loan or not. The assumption is that the model predictions are used to decide whether an individual should be offered a loan. Its purpose is purely illustrative of a workflow including a fairness dashboard - in particular, we do **not** include a full discussion of the detailed issues which arise when considering fairness in machine learning. For such discussions, please [refer to the Fairlearn website](http://fairlearn.github.io/).\n",
"\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"We will apply the [grid search algorithm](https://fairlearn.github.io/master/api_reference/fairlearn.reductions.html#fairlearn.reductions.GridSearch) from the Fairlearn package using a specific notion of fairness called Demographic Parity. This produces a set of models, and we will view these in a dashboard both locally and in the Azure Machine Learning Studio.\n",
"\n",
"### Setup\n",
"\n",
@@ -98,8 +98,11 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_openml\n",
"data = fetch_openml(data_id=1590, as_frame=True)\n",
"from utilities import fetch_openml_with_retries\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"Y = (data.target == '>50K') * 1\n",
"\n",

View File

@@ -98,8 +98,11 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_openml\n",
"data = fetch_openml(data_id=1590, as_frame=True)\n",
"from utilities import fetch_openml_with_retries\n",
"\n",
"data = fetch_openml_with_retries(data_id=1590)\n",
" \n",
"# Extract the items we want\n",
"X_raw = data.data\n",
"Y = (data.target == '>50K') * 1"
]

View File

@@ -0,0 +1,28 @@
# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------
"""Utilities for azureml-contrib-fairness notebooks."""
from sklearn.datasets import fetch_openml
import time
def fetch_openml_with_retries(data_id, max_retries=4, retry_delay=60):
"""Fetch a given dataset from OpenML with retries as specified."""
for i in range(max_retries):
try:
print("Download attempt {0} of {1}".format(i + 1, max_retries))
data = fetch_openml(data_id=data_id, as_frame=True)
break
except Exception as e:
print("Download attempt failed with exception:")
print(e)
if i + 1 != max_retries:
print("Will retry after {0} seconds".format(retry_delay))
time.sleep(retry_delay)
retry_delay = retry_delay * 2
else:
raise RuntimeError("Unable to download dataset from OpenML")
return data

View File

@@ -3,7 +3,7 @@ dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
@@ -21,8 +21,8 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_win32_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_win32_requirements.txt [--no-deps]

View File

@@ -3,7 +3,7 @@ dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
@@ -21,9 +21,9 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_linux_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_linux_requirements.txt [--no-deps]

View File

@@ -4,7 +4,7 @@ dependencies:
# Currently Azure ML only supports 3.5.2 and later.
- pip<=19.3.1
- nomkl
- python>=3.5.2,<3.6.8
- python>=3.5.2,<3.8
- nb_conda
- boto3==1.15.18
- matplotlib==2.1.0
@@ -22,8 +22,8 @@ dependencies:
- pip:
# Required packages for AzureML execution, history, and data preparation.
- azureml-widgets~=1.18.0
- azureml-widgets~=1.19.0
- pytorch-transformers==1.0.0
- spacy==2.1.8
- https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.18.0/validated_darwin_requirements.txt [--no-deps]
- -r https://automlcesdkdataresources.blob.core.windows.net/validated-requirements/1.19.0/validated_darwin_requirements.txt [--no-deps]

View File

@@ -105,7 +105,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -93,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -298,7 +298,7 @@
" compute_target=compute_target,\n",
" training_data=train_dataset,\n",
" label_column_name=target_column_name,\n",
" blocked_models = ['LightGBM'],\n",
" blocked_models = ['LightGBM', 'XGBoostClassifier'],\n",
" **automl_settings\n",
" )"
]

View File

@@ -81,7 +81,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -68,6 +68,7 @@
"import logging\n",
"\n",
"from matplotlib import pyplot as plt\n",
"import json\n",
"import numpy as np\n",
"import pandas as pd\n",
" \n",
@@ -92,7 +93,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},
@@ -322,6 +323,24 @@
"print(best_run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Show hyperparameters\n",
"Show the model pipeline used for the best run with its hyperparameters."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run_properties = json.loads(best_run.get_details()['properties']['pipeline_script'])\n",
"print(json.dumps(run_properties, indent = 1)) "
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -113,7 +113,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -3,11 +3,11 @@ from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.train.estimator import Estimator
from azureml.core.run import Run
from azureml.automl.core.shared import constants
def split_fraction_by_grain(df, fraction, time_column_name,
grain_column_names=None):
if not grain_column_names:
df['tmp_grain_column'] = 'grain'
grain_column_names = ['tmp_grain_column']
@@ -17,10 +17,10 @@ def split_fraction_by_grain(df, fraction, time_column_name,
.groupby(grain_column_names, group_keys=False))
df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-int(len(dfg) *
fraction)] if fraction > 0 else dfg)
fraction)] if fraction > 0 else dfg)
df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-int(len(dfg) *
fraction):] if fraction > 0 else dfg[:0])
fraction):] if fraction > 0 else dfg[:0])
if 'tmp_grain_column' in grain_column_names:
for df2 in (df, df_head, df_tail):
@@ -59,11 +59,13 @@ def get_result_df(remote_run):
'primary_metric', 'Score'])
goal_minimize = False
for run in children:
if('run_algorithm' in run.properties and 'score' in run.properties):
if run.get_status().lower() == constants.RunState.COMPLETE_RUN \
and 'run_algorithm' in run.properties and 'score' in run.properties:
# We only count in the completed child runs.
summary_df[run.id] = [run.id, run.properties['run_algorithm'],
run.properties['primary_metric'],
float(run.properties['score'])]
if('goal' in run.properties):
if ('goal' in run.properties):
goal_minimize = run.properties['goal'].split('_')[-1] == 'min'
summary_df = summary_df.T.sort_values(
@@ -118,7 +120,6 @@ def run_multiple_inferences(summary_df, train_experiment, test_experiment,
compute_target, script_folder, test_dataset,
lookback_dataset, max_horizon, target_column_name,
time_column_name, freq):
for run_name, run_summary in summary_df.iterrows():
print(run_name)
print(run_summary)

View File

@@ -87,7 +87,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -1,22 +1,24 @@
import argparse
import azureml.train.automl
from azureml.core import Run
from azureml.core import Dataset, Run
from sklearn.externals import joblib
parser = argparse.ArgumentParser()
parser.add_argument(
'--target_column_name', type=str, dest='target_column_name',
help='Target Column Name')
parser.add_argument(
'--test_dataset', type=str, dest='test_dataset',
help='Test Dataset')
args = parser.parse_args()
target_column_name = args.target_column_name
test_dataset_id = args.test_dataset
run = Run.get_context()
# get input dataset by name
test_dataset = run.input_datasets['test_data']
ws = run.experiment.workspace
df = test_dataset.to_pandas_dataframe().reset_index(drop=True)
# get the input dataset by id
test_dataset = Dataset.get_by_id(ws, id=test_dataset_id)
X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True)
y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe()

View File

@@ -1,29 +1,32 @@
from azureml.train.estimator import Estimator
from azureml.core import ScriptRunConfig
def run_rolling_forecast(test_experiment, compute_target, train_run, test_dataset,
target_column_name, inference_folder='./forecast'):
def run_rolling_forecast(test_experiment, compute_target, train_run,
test_dataset, target_column_name,
inference_folder='./forecast'):
train_run.download_file('outputs/model.pkl',
inference_folder + '/model.pkl')
inference_env = train_run.get_environment()
est = Estimator(source_directory=inference_folder,
entry_script='forecasting_script.py',
script_params={
'--target_column_name': target_column_name
},
inputs=[test_dataset.as_named_input('test_data')],
compute_target=compute_target,
environment_definition=inference_env)
config = ScriptRunConfig(source_directory=inference_folder,
script='forecasting_script.py',
arguments=['--target_column_name',
target_column_name,
'--test_dataset',
test_dataset.as_named_input(test_dataset.name)],
compute_target=compute_target,
environment=inference_env)
run = test_experiment.submit(est,
tags={
'training_run_id': train_run.id,
'run_algorithm': train_run.properties['run_algorithm'],
'valid_score': train_run.properties['score'],
'primary_metric': train_run.properties['primary_metric']
})
run = test_experiment.submit(config,
tags={'training_run_id':
train_run.id,
'run_algorithm':
train_run.properties['run_algorithm'],
'valid_score':
train_run.properties['score'],
'primary_metric':
train_run.properties['primary_metric']})
run.log("run_algorithm", run.tags['run_algorithm'])
return run

View File

@@ -97,7 +97,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -94,7 +94,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -82,7 +82,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -96,7 +96,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -4,7 +4,7 @@ import os
import joblib
from interpret.ext.glassbox import LGBMExplainableModel
from automl.client.core.common.constants import MODEL_PATH
from azureml.automl.core.shared.constants import MODEL_PATH
from azureml.core.experiment import Experiment
from azureml.core.dataset import Dataset
from azureml.core.run import Run

View File

@@ -92,7 +92,7 @@
"metadata": {},
"outputs": [],
"source": [
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -280,6 +280,7 @@
"\n",
"aks_name = \"my-aks\"\n",
"\n",
"creating_compute = False\n",
"try:\n",
" aks_target = ComputeTarget(ws, aks_name)\n",
" print(\"Using existing AKS cluster {}.\".format(aks_name))\n",
@@ -290,7 +291,8 @@
" prov_config = AksCompute.provisioning_configuration()\n",
" aks_target = ComputeTarget.create(workspace=ws,\n",
" name=aks_name,\n",
" provisioning_configuration=prov_config)"
" provisioning_configuration=prov_config)\n",
" creating_compute = True"
]
},
{
@@ -300,7 +302,7 @@
"outputs": [],
"source": [
"%%time\n",
"if aks_target.provisioning_state != \"Succeeded\":\n",
"if creating_compute:\n",
" aks_target.wait_for_completion(show_output=True)"
]
},

View File

@@ -23,7 +23,7 @@
"# Train and explain models remotely via Azure Machine Learning Compute\n",
"\n",
"\n",
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a regression model remotely on an Azure Machine Leanrning Compute Target (AMLCompute).**_\n",
"_**This notebook showcases how to use the Azure Machine Learning Interpretability SDK to train and explain a regression model remotely on an Azure Machine Learning Compute Target (AMLCompute).**_\n",
"\n",
"\n",
"\n",
@@ -35,10 +35,7 @@
" 1. Initialize a Workspace\n",
" 1. Create an Experiment\n",
" 1. Introduction to AmlCompute\n",
" 1. Submit an AmlCompute run in a few different ways\n",
" 1. Option 1: Provision as a run based compute target \n",
" 1. Option 2: Provision as a persistent compute target (Basic)\n",
" 1. Option 3: Provision as a persistent compute target (Advanced)\n",
" 1. Submit an AmlCompute run\n",
"1. Additional operations to perform on AmlCompute\n",
"1. [Download model explanations from Azure Machine Learning Run History](#Download)\n",
"1. [Visualize explanations](#Visualize)\n",
@@ -158,7 +155,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit an AmlCompute run in a few different ways\n",
"## Submit an AmlCompute run\n",
"\n",
"First lets check which VM families are available in your region. Azure is a regional service and some specialized SKUs (especially GPUs) are only available in certain regions. Since AmlCompute is created in the region of your workspace, we will use the supported_vms () function to see if the VM family we want to use ('STANDARD_D2_V2') is supported.\n",
"\n",
@@ -204,7 +201,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 1: Provision a compute target (Basic)\n",
"### Provision a compute target\n",
"\n",
"You can provision an AmlCompute resource by simply defining two parameters thanks to smart defaults. By default it autoscales from 0 nodes and provisions dedicated VMs to run your job in a container. This is useful when you want to continously re-use the same target, debug it between jobs or simply share the resource with other users of your workspace.\n",
"\n",
@@ -327,183 +324,6 @@
"run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Option 2: Provision a compute target (Advanced)\n",
"\n",
"You can also specify additional properties or change defaults while provisioning AmlCompute using a more advanced configuration. This is useful when you want a dedicated cluster of 4 nodes (for example you can set the min_nodes and max_nodes to 4), or want the compute to be within an existing VNet in your subscription.\n",
"\n",
"In addition to `vm_size` and `max_nodes`, you can specify:\n",
"* `min_nodes`: Minimum nodes (default 0 nodes) to downscale to while running a job on AmlCompute\n",
"* `vm_priority`: Choose between 'dedicated' (default) and 'lowpriority' VMs when provisioning AmlCompute. Low Priority VMs use Azure's excess capacity and are thus cheaper but risk your run being pre-empted\n",
"* `idle_seconds_before_scaledown`: Idle time (default 120 seconds) to wait after run completion before auto-scaling to min_nodes\n",
"* `vnet_resourcegroup_name`: Resource group of the **existing** VNet within which AmlCompute should be provisioned\n",
"* `vnet_name`: Name of VNet\n",
"* `subnet_name`: Name of SubNet within the VNet"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpu-cluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print('Found existing cluster, use it.')\n",
"except ComputeTargetException:\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n",
" vm_priority='lowpriority',\n",
" min_nodes=2,\n",
" max_nodes=4,\n",
" idle_seconds_before_scaledown='300',\n",
" vnet_resourcegroup_name='<my-resource-group>',\n",
" vnet_name='<my-vnet-name>',\n",
" subnet_name='<my-subnet-name>')\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
"\n",
"cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure & Run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.runconfig import RunConfiguration\n",
"from azureml.core.conda_dependencies import CondaDependencies\n",
"\n",
"# Create a new RunConfig object\n",
"run_config = RunConfiguration(framework=\"python\")\n",
"\n",
"# Set compute target to AmlCompute target created in previous step\n",
"run_config.target = cpu_cluster.name\n",
"\n",
"# Enable Docker \n",
"run_config.environment.docker.enabled = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-interpret', 'azureml-telemetry', 'azureml-interpret'\n",
"]\n",
"\n",
"\n",
"\n",
"# Note: this is to pin the scikit-learn and pandas versions to be same as notebook.\n",
"# In production scenario user would choose their dependencies\n",
"import pkg_resources\n",
"available_packages = pkg_resources.working_set\n",
"sklearn_ver = None\n",
"pandas_ver = None\n",
"for dist in available_packages:\n",
" if dist.key == 'scikit-learn':\n",
" sklearn_ver = dist.version\n",
" elif dist.key == 'pandas':\n",
" pandas_ver = dist.version\n",
"sklearn_dep = 'scikit-learn'\n",
"pandas_dep = 'pandas'\n",
"if sklearn_ver:\n",
" sklearn_dep = 'scikit-learn=={}'.format(sklearn_ver)\n",
"if pandas_ver:\n",
" pandas_dep = 'pandas=={}'.format(pandas_ver)\n",
"# Specify CondaDependencies obj\n",
"# The CondaDependencies specifies the conda and pip packages that are installed in the environment\n",
"# the submitted job is run in. Note the remote environment(s) needs to be similar to the local\n",
"# environment, otherwise if a model is trained or deployed in a different environment this can\n",
"# cause errors. Please take extra care when specifying your dependencies in a production environment.\n",
"azureml_pip_packages.extend([sklearn_dep, pandas_dep])\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(pip_packages=azureml_pip_packages)\n",
"\n",
"from azureml.core import Run\n",
"from azureml.core import ScriptRunConfig\n",
"\n",
"src = ScriptRunConfig(source_directory=project_folder, \n",
" script='train_explain.py', \n",
" run_config=run_config) \n",
"run = experiment.submit(config=src)\n",
"run"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"# Shows output of the run on stdout.\n",
"run.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.get_metrics()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Additional operations to perform on AmlCompute\n",
"\n",
"You can perform more operations on AmlCompute such as updating the node counts or deleting the compute. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Get_status () gets the latest status of the AmlCompute target\n",
"cpu_cluster.get_status().serialize()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Update () takes in the min_nodes, max_nodes and idle_seconds_before_scaledown and updates the AmlCompute target\n",
"# cpu_cluster.update(min_nodes=1)\n",
"# cpu_cluster.update(max_nodes=10)\n",
"cpu_cluster.update(idle_seconds_before_scaledown=300)\n",
"# cpu_cluster.update(min_nodes=2, max_nodes=4, idle_seconds_before_scaledown=600)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Delete () is used to deprovision and delete the AmlCompute target. Useful if you want to re-use the compute name \n",
"# 'cpu-cluster' in this case but use a different VM family for instance.\n",
"\n",
"# cpu_cluster.delete()"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -19,8 +19,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to Setup a Schedule for a Published Pipeline\n",
"In this notebook, we will show you how you can run an already published pipeline on a schedule."
"# How to Setup a Schedule for a Published Pipeline or Pipeline Endpoint\n",
"In this notebook, we will show you how you can run an already published pipeline or a pipeline endpoint on a schedule."
]
},
{
@@ -159,6 +159,43 @@
"print(\"Newly published pipeline id: {}\".format(published_pipeline1.id))"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Create a Pipeline Endpoint\n",
"Alternatively, you can create a schedule to run a pipeline endpoint instead of a published pipeline. You will need this to create a schedule against a pipeline endpoint in the last section of this notebook. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.pipeline.core import PipelineEndpoint\n",
"\n",
"pipeline_endpoint = PipelineEndpoint.publish(workspace=ws, name=\"ScheduledPipelineEndpoint\",\n",
" pipeline=pipeline1, description=\"Publish pipeline endpoint for schedule test\")\n",
"pipeline_endpoint"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -196,14 +233,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a schedule for the pipeline using a recurrence\n",
"### Create a schedule for the published pipeline using a recurrence\n",
"This schedule will run on a specified recurrence interval."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azureml.pipeline.core.schedule import ScheduleRecurrence, Schedule\n",
@@ -308,7 +355,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"gather": {
"logged": 1606157800044
}
},
"outputs": [],
"source": [
"# Set the wait_for_provisioning flag to False if you do not want to wait \n",
@@ -410,7 +461,11 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"gather": {
"logged": 1606157862620
}
},
"outputs": [],
"source": [
"# Set the wait_for_provisioning flag to False if you do not want to wait \n",
@@ -419,14 +474,151 @@
"schedule = Schedule.get(ws, schedule_id)\n",
"print(\"Disabled schedule {}. New status is: {}\".format(schedule.id, schedule.status))"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Create a schedule for a pipeline endpoint\n",
"Alternative to creating schedules for a published pipeline, you can also create schedules to run pipeline endpoints.\n",
"Retrieve the pipeline endpoint id to create a schedule. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"gather": {
"logged": 1606157888851
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"ScheduledPipelineEndpoint\")\n",
"published_pipeline_endpoint_id = pipeline_endpoint_by_name.id\n",
"\n",
"recurrence = ScheduleRecurrence(frequency=\"Day\", interval=2, hours=[22], minutes=[30]) # Runs every other day at 10:30pm\n",
"\n",
"schedule = Schedule.create_for_pipeline_endpoint(workspace=ws, name=\"My_Endpoint_Schedule\",\n",
" pipeline_endpoint_id=published_pipeline_endpoint_id,\n",
" experiment_name='Schedule_Run',\n",
" recurrence=recurrence, description=\"Schedule_Run\",\n",
" wait_for_provisioning=True)\n",
"\n",
"# You may want to make sure that the schedule is provisioned properly\n",
"# before making any further changes to the schedule\n",
"\n",
"print(\"Created schedule with id: {}\".format(schedule.id))"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Get all schedules for a given pipeline endpoint\n",
"Once you have the pipeline endpoint ID, then you can get all schedules for that pipeline endopint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"schedules_for_pipeline_endpoints = Schedule.\\\n",
" get_schedules_for_pipeline_endpoint_id(ws,\n",
" pipeline_endpoint_id=published_pipeline_endpoint_id)\n",
"print('Got all schedules for pipeline endpoint:', published_pipeline_endpoint_id, 'Count:',\n",
" len(schedules_for_pipeline_endpoints))\n",
"\n",
"print('done')"
]
},
{
"cell_type": "markdown",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Disable the schedule created for running the pipeline endpont\n",
"Recall the best practice of disabling schedules when not in use.\n",
"The number of schedule triggers allowed per month per region per subscription is 100,000.\n",
"This is calculated using the project trigger counts for all active schedules."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"fetched_schedule = Schedule.get(ws, schedule_id)\n",
"print(\"Using schedule with id: {}\".format(fetched_schedule.id))\n",
"\n",
"# Set the wait_for_provisioning flag to False if you do not want to wait \n",
"# for the call to provision the schedule in the backend.\n",
"fetched_schedule.disable(wait_for_provisioning=True)\n",
"fetched_schedule = Schedule.get(ws, schedule_id)\n",
"print(\"Disabled schedule {}. New status is: {}\".format(fetched_schedule.id, fetched_schedule.status))"
]
}
],
"metadata": {
"authors": [
{
"name": "sanpil"
"name": "shbijlan"
}
],
"categories": [
"how-to-use-azureml",
"machine-learning-pipelines",
"intro-to-pipelines"
],
"category": "tutorial",
"compute": [
"AML Compute"
@@ -441,7 +633,7 @@
"framework": [
"Azure ML"
],
"friendly_name": "How to Setup a Schedule for a Published Pipeline",
"friendly_name": "How to Setup a Schedule for a Published Pipeline or Pipeline Endpoint",
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
@@ -459,6 +651,9 @@
"pygments_lexer": "ipython3",
"version": "3.6.7"
},
"nteract": {
"version": "nteract-front-end@1.0.0"
},
"order_index": 10,
"star_tag": [
"featured"
@@ -466,7 +661,7 @@
"tags": [
"None"
],
"task": "Demonstrates the use of Schedules for Published Pipelines"
"task": "Demonstrates the use of Schedules for Published Pipelines and Pipeline endpoints"
},
"nbformat": 4,
"nbformat_minor": 2

View File

@@ -30,7 +30,7 @@
"## Introduction\n",
"In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline. \n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook, please also take a look at the [Automated ML setup-using-a-local-conda-environment](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning#setup-using-a-local-conda-environment) section to setup the environment.\n",
"\n",
"In this notebook you will learn how to:\n",
"1. Create an `Experiment` in an existing `Workspace`.\n",

View File

@@ -2,7 +2,3 @@ name: aml-pipelines-with-automated-machine-learning-step
dependencies:
- pip:
- azureml-sdk
- azureml-train-automl
- azureml-widgets
- matplotlib
- pandas_ml

View File

@@ -284,7 +284,7 @@
"# Specify CondaDependencies obj, add necessary packages\n",
"aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n",
" conda_packages=['pandas','scikit-learn'], \n",
" pip_packages=['azureml-sdk[automl,explain]', 'pyarrow'])\n",
" pip_packages=['azureml-sdk[automl]', 'pyarrow'])\n",
"\n",
"print (\"Run configuration created.\")"
]

View File

@@ -121,6 +121,33 @@
" auth=interactive_auth)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Despite having access to the workspace, you may sometimes see the following error when retrieving it:\n",
"\n",
"```\n",
"You are currently logged-in to xxxxxxxx-xxx-xxxx-xxxx-xxxxxxxxxxxx tenant. You don't have access to xxxxxx-xxxx-xxx-xxx-xxxxxxxxxx subscription, please check if it is in this tenant.\n",
"```\n",
"\n",
"This error sometimes occurs when you are trying to access a subscription to which you were recently added. In this case, you need to force authentication again to avoid using a cached authentication token that has not picked up the new permissions. You can do so by setting `force=true` on the `InteractiveLoginAuthentication()` object's constructor as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"forced_interactive_auth = InteractiveLoginAuthentication(tenant_id=\"my-tenant-id\", force=True)\n",
"\n",
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\",\n",
" auth=forced_interactive_auth)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -408,7 +435,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Experiment, Run\n",
"from azureml.core import Experiment\n",
"from azureml.core.script_run_config import ScriptRunConfig\n",
"\n",
"exp = Experiment(workspace = ws, name=\"try-secret\")\n",
@@ -424,13 +451,6 @@
"source": [
"Furthermore, you can set and get multiple secrets using set_secrets and get_secrets methods."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {

View File

@@ -169,7 +169,7 @@
"from azure.mgmt.network import NetworkManagementClient\n",
"\n",
"# Virtual network name\n",
"vnet_name =\"your_vnet\"\n",
"vnet_name =\"rl_pong_vnet\"\n",
"\n",
"# Default subnet\n",
"subnet_name =\"default\"\n",

View File

@@ -189,7 +189,7 @@
"from azure.mgmt.network import NetworkManagementClient\n",
"\n",
"# Virtual network name\n",
"vnet_name =\"your_vnet\"\n",
"vnet_name =\"rl_minecraft_vnet\"\n",
"\n",
"# Default subnet\n",
"subnet_name =\"default\"\n",

View File

@@ -100,7 +100,7 @@
"\n",
"# Check core SDK version number\n",
"\n",
"print(\"This notebook was created using SDK version 1.18.0, you are currently running version\", azureml.core.VERSION)"
"print(\"This notebook was created using SDK version 1.19.0, you are currently running version\", azureml.core.VERSION)"
]
},
{

View File

@@ -35,7 +35,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an
| :star:[How to use Pipeline Drafts to create a Published Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-pipeline-drafts.ipynb) | Demonstrates the use of Pipeline Drafts | Custom | AML Compute | None | Azure ML | None |
| :star:[Azure Machine Learning Pipeline with HyperDriveStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb) | Demonstrates the use of HyperDriveStep | Custom | AML Compute | None | Azure ML | None |
| :star:[How to Publish a Pipeline and Invoke the REST endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-publish-and-run-using-rest-endpoint.ipynb) | Demonstrates the use of Published Pipelines | Custom | AML Compute | None | Azure ML | None |
| :star:[How to Setup a Schedule for a Published Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb) | Demonstrates the use of Schedules for Published Pipelines | Custom | AML Compute | None | Azure ML | None |
| :star:[How to Setup a Schedule for a Published Pipeline or Pipeline Endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb) | Demonstrates the use of Schedules for Published Pipelines and Pipeline endpoints | Custom | AML Compute | None | Azure ML | None |
| [How to setup a versioned Pipeline Endpoint](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb) | Demonstrates the use of PipelineEndpoint to run a specific version of the Published Pipeline | Custom | AML Compute | None | Azure ML | None |
| :star:[How to use DataPath as a PipelineParameter](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb) | Demonstrates the use of DataPath as a PipelineParameter | Custom | AML Compute | None | Azure ML | None |
| :star:[How to use Dataset as a PipelineParameter](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-showcasing-dataset-and-pipelineparameter.ipynb) | Demonstrates the use of Dataset as a PipelineParameter | Custom | AML Compute | None | Azure ML | None |

View File

@@ -28,7 +28,7 @@ git clone https://github.com/Azure/MachineLearningNotebooks.git
pip install azureml-sdk[notebooks,tensorboard]
# install model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]
# install automated ml components
pip install azureml-sdk[automl]
@@ -86,7 +86,7 @@ If you need additional Azure ML SDK components, you can either modify the Docker
pip install azureml-sdk[automl]
# install the core SDK and model explainability component
pip install azureml-sdk[explain]
pip install azureml-sdk[interpret]
# install the core SDK and experimental components
pip install azureml-sdk[contrib]

View File

@@ -102,7 +102,7 @@
"source": [
"import azureml.core\n",
"\n",
"print(\"This notebook was created using version 1.18.0 of the Azure ML SDK\")\n",
"print(\"This notebook was created using version 1.19.0 of the Azure ML SDK\")\n",
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
]
},

View File

@@ -306,7 +306,7 @@
"\n",
"|Property| Value in this tutorial |Description|\n",
"|----|----|---|\n",
"|**iteration_timeout_minutes**|2|Time limit in minutes for each iteration. Reduce this value to decrease total runtime.|\n",
"|**iteration_timeout_minutes**|10|Time limit in minutes for each iteration. Increase this value for larger datasets that need more time for each iteration.|\n",
"|**experiment_timeout_hours**|0.3|Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n",
"|**enable_early_stopping**|True|Flag to enable early termination if the score is not improving in the short term.|\n",
"|**primary_metric**| spearman_correlation | Metric that you want to optimize. The best-fit model will be chosen based on this metric.|\n",
@@ -324,7 +324,7 @@
"import logging\n",
"\n",
"automl_settings = {\n",
" \"iteration_timeout_minutes\": 2,\n",
" \"iteration_timeout_minutes\": 10,\n",
" \"experiment_timeout_hours\": 0.3,\n",
" \"enable_early_stopping\": True,\n",
" \"primary_metric\": 'spearman_correlation',\n",