{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"!Important! This notebook is outdated and is not supported by the AutoML Team. Please use the supported version ([link](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-recipes-univariate))."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Running AutoML experiments\n",
"\n",
"See the `auto-ml-forecasting-univariate-recipe-experiment-settings` notebook on how to determine settings for seasonal features, target lags and whether the series needs to be differenced or not. To make experimentation user-friendly, the user has to specify several parameters: DIFFERENCE_SERIES, TARGET_LAGS and STL_TYPE. Once these parameters are set, the notebook will generate correct transformations and settings to run experiments, generate forecasts, compute inference set metrics and plot forecast vs actuals. It will also convert the forecast from first differences to levels (original units of measurement) if the DIFFERENCE_SERIES parameter is set to True before calculating inference set metrics.\n",
"\n",
"
\n",
"\n",
"The output generated by this notebook is saved in the `experiment_output`folder."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import logging\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import azureml.automl.runtime\n",
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"import matplotlib.pyplot as plt\n",
"from helper_functions import ts_train_test_split, compute_metrics\n",
"\n",
"import azureml.core\n",
"from azureml.core.workspace import Workspace\n",
"from azureml.core.experiment import Experiment\n",
"from azureml.train.automl import AutoMLConfig\n",
"\n",
"\n",
"# set printing options\n",
"np.set_printoptions(precision=4, suppress=True, linewidth=100)\n",
"pd.set_option(\"display.max_columns\", 500)\n",
"pd.set_option(\"display.width\", 1000)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As part of the setup you have already created a **Workspace**. You will also need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
"> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()\n",
"amlcompute_cluster_name = \"recipe-cluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == \"AmlCompute\":\n",
" found = True\n",
" print(\"Found existing compute target.\")\n",
" compute_target = cts[amlcompute_cluster_name]\n",
"\n",
"if not found:\n",
" print(\"Creating a new compute target...\")\n",
" provisioning_config = AmlCompute.provisioning_configuration(\n",
" vm_size=\"STANDARD_D2_V2\", max_nodes=6\n",
" )\n",
"\n",
" # Create the cluster.\\n\",\n",
" compute_target = ComputeTarget.create(\n",
" ws, amlcompute_cluster_name, provisioning_config\n",
" )\n",
"\n",
"print(\"Checking cluster status...\")\n",
"# Can poll for a minimum number of nodes and for a specific timeout.\n",
"# If no min_node_count is provided, it will use the scale settings for the cluster.\n",
"compute_target.wait_for_completion(\n",
" show_output=True, min_node_count=None, timeout_in_minutes=20\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Data\n",
"\n",
"Here, we will load the data from the csv file and drop the Covid period."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"main_data_loc = \"data\"\n",
"train_file_name = \"S4248SM144SCEN.csv\"\n",
"\n",
"TARGET_COLNAME = \"S4248SM144SCEN\"\n",
"TIME_COLNAME = \"observation_date\"\n",
"COVID_PERIOD_START = (\n",
" \"2020-03-01\" # start of the covid period. To be excluded from evaluation.\n",
")\n",
"\n",
"# load data\n",
"df = pd.read_csv(os.path.join(main_data_loc, train_file_name))\n",
"df[TIME_COLNAME] = pd.to_datetime(df[TIME_COLNAME], format=\"%Y-%m-%d\")\n",
"df.sort_values(by=TIME_COLNAME, inplace=True)\n",
"\n",
"# remove the Covid period\n",
"df = df.query('{} <= \"{}\"'.format(TIME_COLNAME, COVID_PERIOD_START))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set parameters\n",
"\n",
"The first set of parameters is based on the analysis performed in the `auto-ml-forecasting-univariate-recipe-experiment-settings` notebook. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set parameters based on the settings notebook analysis\n",
"DIFFERENCE_SERIES = True\n",
"TARGET_LAGS = None\n",
"STL_TYPE = None"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, define additional parameters to be used in the AutoML config class.\n",
"\n",
"