diff --git a/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb b/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb index 521d6681..8bfcffd4 100644 --- a/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb +++ b/automl/05.auto-ml-missing-data-Blacklist-Early-Termination.ipynb @@ -155,7 +155,7 @@ "source": [ "## Configure AutoML\n", "\n", - "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment.\n", + "Instantiate an `AutoMLConfig` object to specify the settings and data used to run the experiment. This includes setting `exit_score`, which should cause the run to complete before the `iterations` count is reached.\n", "\n", "|Property|Description|\n", "|-|-|\n", @@ -185,7 +185,7 @@ " iterations = 20,\n", " n_cross_validations = 5,\n", " preprocess = True,\n", - " exit_score = 0.994,\n", + " exit_score = 0.9984,\n", " blacklist_algos = ['KNeighborsClassifier','LinearSVMWrapper'],\n", " verbosity = logging.INFO,\n", " X = X_train, \n", diff --git a/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb b/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb index d0113534..2b0ec153 100644 --- a/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb +++ b/automl/08.auto-ml-remote-execution-with-text-file-on-DSVM.ipynb @@ -13,15 +13,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# AutoML 08: Remote Execution with Text File\n", + "# AutoML 08: Remote Execution with DataStore\n", "\n", - "This sample accesses a data file on a remote DSVM. This is more efficient than reading the file from Azure Blob storage in the `get_data` method.\n", + "In this sample accesses a data file on a remote DSVM through DataStore. Advantagets of using data store\n", + "1. DataStore secures the access details.\n", + "2. DataStore supports read, write to blob and file store\n", + "3. AutoML natively supports copying data from DataStore to DSVM\n", "\n", "Make sure you have executed the [00.configuration](00.configuration.ipynb) before running this notebook.\n", "\n", - "In this notebook you will learn how to:\n", - "1. Configure the DSVM to allow files to be accessed directly by the `get_data` function.\n", - "2. Using `get_data` to return data from a local file.\n", + "In this notebook you would see\n", + "1. Configuring the DSVM to allow files to be access directly by the get_data method.\n", + "2. get_data returning data from a local file.\n", "\n" ] }, @@ -29,9 +32,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Create an Experiment\n", + "## Create Experiment\n", "\n", - "As part of the setup you have already created an Azure ML `Workspace` object. For AutoML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments." + "As part of the setup you have already created a Workspace. For AutoML you would need to create an Experiment. An Experiment is a named object in a Workspace, which is used to run experiments." ] }, { @@ -66,7 +69,7 @@ "ws = Workspace.from_config()\n", "\n", "# choose a name for experiment\n", - "experiment_name = 'automl-remote-dsvm-file'\n", + "experiment_name = 'automl-remote-datastore-file'\n", "# project folder\n", "project_folder = './sample_projects/automl-remote-dsvm-file'\n", "\n", @@ -90,7 +93,7 @@ "source": [ "## Diagnostics\n", "\n", - "Opt-in diagnostics for better experience, quality, and security of future releases." + "Opt-in diagnostics for better experience, quality, and security of future releases" ] }, { @@ -100,7 +103,7 @@ "outputs": [], "source": [ "from azureml.telemetry import set_diagnostics_collection\n", - "set_diagnostics_collection(send_diagnostics = True)" + "set_diagnostics_collection(send_diagnostics=True)" ] }, { @@ -108,7 +111,9 @@ "metadata": {}, "source": [ "## Create a Remote Linux DSVM\n", - "**Note:** If creation fails with a message about Marketplace purchase eligibilty, start creation of a DSVM through the [Azure portal](https://portal.azure.com), and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled this setting, you can exit the portal without actually creating the DSVM, and creation of the DSVM through the notebook should work.\n" + "Note: If creation fails with a message about Marketplace purchase eligibilty, go to portal.azure.com, start creating DSVM there, and select \"Want to create programmatically\" to enable programmatic creation. Once you've enabled it, you can exit without actually creating VM.\n", + "\n", + "**Note**: By default SSH runs on port 22 and you don't need to specify it. But if for security reasons you can switch to a different port (such as 5022), you can append the port number to the address. [Read more](https://render.githubusercontent.com/documentation/sdk/ssh-issue.md) on this." ] }, { @@ -118,24 +123,35 @@ "outputs": [], "source": [ "from azureml.core.compute import DsvmCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "compute_target_name = 'mydsvm'\n", "\n", - "dsvm_name = 'mydsvm'\n", "try:\n", - " dsvm_compute = DsvmCompute(ws, dsvm_name)\n", - " print('Found existing DSVM.')\n", - "except:\n", - " print('Creating a new DSVM.')\n", - " dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n", - " dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n", - " dsvm_compute.wait_for_completion(show_output = True)" + " dsvm_compute = DsvmCompute(workspace=ws, name=compute_target_name)\n", + " print('found existing:', dsvm_compute.name)\n", + "except ComputeTargetException:\n", + " dsvm_config = DsvmCompute.provisioning_configuration(vm_size=\"Standard_D2_v2\")\n", + " dsvm_compute = DsvmCompute.create(ws, name=compute_target_name, provisioning_configuration=dsvm_config)\n", + " dsvm_compute.wait_for_completion(show_output=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Copy the Data File to the DSVM\n", - "Download the data file and copy the data file to the `/tmp/data` on the DSVM." + "## Copy data file to local\n", + "\n", + "Download the data file.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mkdir data" ] }, { @@ -146,18 +162,94 @@ "source": [ "df = pd.read_csv(\"https://automldemods.blob.core.windows.net/datasets/PlayaEvents2016,_1.6MB,_3.4k-rows.cleaned.2.tsv\",\n", " delimiter=\"\\t\", quotechar='\"')\n", - "df.to_csv(\"data.tsv\", sep=\"\\t\", quotechar='\"', index=False)\n", - "\n", - "# Now copy the file data.tsv to the folder /tmp/data on the DSVM." + "df.to_csv(\"data/data.tsv\", sep=\"\\t\", quotechar='\"', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Create the `get_data.py` File\n", - "For remote executions you should author a `get_data.py` file containing a `get_data()` function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", - "In this example, the `get_data()` function returns a [dictionary](README.md#getdata)." + "## Upload data to the cloud" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now make the data accessible remotely by uploading that data from your local machine into Azure so it can be accessed for remote training. The datastore is a convenient construct associated with your workspace for you to upload/download data, and interact with it from your remote compute targets. It is backed by Azure blob storage account.\n", + "\n", + "The data.tsv files are uploaded into a directory named data at the root of the datastore." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Workspace, Datastore\n", + "#blob_datastore = Datastore(ws, blob_datastore_name)\n", + "ds = ws.get_default_datastore()\n", + "print(ds.datastore_type, ds.account_name, ds.container_name)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# ds.upload_files(\"data.tsv\")\n", + "ds.upload(src_dir='./data', target_path='data', overwrite=True, show_progress=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Configure & Run\n", + "\n", + "First let's create a DataReferenceConfigruation object to inform the system what data folder to download to the copmute target." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import DataReferenceConfiguration\n", + "dr = DataReferenceConfiguration(datastore_name=ds.name, \n", + " path_on_datastore='data', \n", + " mode='download', # download files from datastore to compute target\n", + " overwrite=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.runconfig import RunConfiguration\n", + "\n", + "# create a new RunConfig object\n", + "conda_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "# Set compute target to the Linux DSVM\n", + "conda_run_config.target = dsvm_compute.name\n", + "# set the data reference of the run coonfiguration\n", + "conda_run_config.data_references = {ds.name: dr}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Get Data File\n", + "For remote executions you should author a get_data.py file containing a get_data() function. This file should be in the root directory of the project. You can encapsulate code to read data either from a blob storage or local disk in this file.\n", + "\n", + "The *get_data()* function returns a [dictionary](README.md#getdata)." ] }, { @@ -182,18 +274,20 @@ "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import LabelEncoder\n", "import os\n", + "from os.path import expanduser, join, dirname\n", "\n", "def get_data():\n", " # Burning man 2016 data\n", - " df = pd.read_csv('/tmp/data/data.tsv',\n", - " delimiter=\"\\t\", quotechar='\"')\n", + " df = pd.read_csv(join(dirname(os.path.realpath(__file__)),\n", + " os.environ[\"AZUREML_DATAREFERENCE_workspacefilestore\"],\n", + " \"data.tsv\"), delimiter=\"\\t\", quotechar='\"')\n", " # get integer labels\n", " le = LabelEncoder()\n", " le.fit(df[\"Label\"].values)\n", " y = le.transform(df[\"Label\"].values)\n", " df = df.drop([\"Label\"], axis=1)\n", "\n", - " df_train, _, y_train, _ = train_test_split(df, y, test_size = 0.1, random_state = 42)\n", + " df_train, _, y_train, _ = train_test_split(df, y, test_size=0.1, random_state=42)\n", "\n", " return { \"X\" : df.values, \"y\" : y }" ] @@ -202,21 +296,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Configure AutoML \n", + "## Instantiate AutoML \n", "\n", - "You can specify `automl_settings` as `**kwargs` as well. Also note that you can use a `get_data()` function for local excutions too.\n", + "You can specify automl_settings as **kwargs** as well. Also note that you can use the get_data() symantic for local excutions too. \n", "\n", - "**Note:** When using Remote DSVM, you can't pass Numpy arrays directly to the fit method.\n", + "Note: For Remote DSVM and Batch AI you cannot pass Numpy arrays directly to the fit method.\n", "\n", "|Property|Description|\n", "|-|-|\n", - "|**primary_metric**|This is the metric that you want to optimize. Classification supports the following primary metrics:
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", - "|**max_time_sec**|Time limit in seconds for each iteration.|\n", - "|**iterations**|Number of iterations. In each iteration AutoML trains a specific pipeline with the data.|\n", - "|**n_cross_validations**|Number of cross validation splits.|\n", - "|**concurrent_iterations**|Maximum number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM.|\n", - "|**preprocess**|Setting this to *True* enables AutoML to perform preprocessing on the input to handle *missing data*, and perform some common *feature extraction*.|\n", - "|**max_cores_per_iteration**|Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*, you can set it to *-1* to use all cores.|" + "|**primary_metric**|This is the metric that you want to optimize.
Classification supports the following primary metrics
accuracy
AUC_weighted
balanced_accuracy
average_precision_score_weighted
precision_score_weighted|\n", + "|**max_time_sec**|Time limit in seconds for each iteration|\n", + "|**iterations**|Number of iterations. In each iteration Auto ML trains a specific pipeline with the data|\n", + "|**n_cross_validations**|Number of cross validation splits|\n", + "|**concurrent_iterations**|Max number of iterations that would be executed in parallel. This should be less than the number of cores on the DSVM\n", + "|**preprocess**| *True/False*
Setting this to *True* enables Auto ML to perform preprocessing
on the input to handle *missing data*, and perform some common *feature extraction*|\n", + "|**max_cores_per_iteration**| Indicates how many cores on the compute target would be used to train a single pipeline.
Default is *1*, you can set it to *-1* to use all cores|" ] }, { @@ -236,21 +330,21 @@ "}\n", "automl_config = AutoMLConfig(task = 'classification',\n", " debug_log = 'automl_errors.log',\n", - " path = project_folder,\n", - " compute_target = dsvm_compute,\n", + " path=project_folder,\n", + " run_configuration=conda_run_config,\n", + " #compute_target = dsvm_compute,\n", " data_script = project_folder + \"/get_data.py\",\n", - " **automl_settings)" + " **automl_settings\n", + " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Train the Model \n", + "## Training the Model \n", "\n", - "Call the `submit` method on the experiment object and pass the run configuration. For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets and models even when the experiment is running to retrieve the best model up to that point. Once you are satisfied with the model, you can cancel a particular iteration or the whole run.\n", - "\n", - "In this example, we specify `show_output = False` to suppress console output while the run is in progress." + "For remote runs the execution is asynchronous, so you will see the iterations get populated as they complete. You can interact with the widgets/models even when the experiment is running to retreive the best model up to that point. Once you are satisfied with the model you can cancel a particular iteration or the whole run." ] }, { @@ -259,27 +353,21 @@ "metadata": {}, "outputs": [], "source": [ - "remote_run = experiment.submit(automl_config, show_output = False)" + "remote_run = experiment.submit(automl_config, show_output=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Exploring the results" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Widget for Monitoring Runs\n", + "## Exploring the Results \n", + "#### Widget for monitoring runs\n", "\n", - "The widget will first report a \"loading\" status while running the first iteration. After completing the first iteration, an auto-updating graph and table will be shown. The widget will refresh once per minute, so you should see the graph update as child runs complete.\n", + "The widget will sit on \"loading\" until the first iteration completed, then you will see an auto-updating graph and table show up. It refreshed once per minute, so you should see the graph update as child runs complete.\n", "\n", - "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under `/tmp/azureml_run/{iterationid}/azureml-logs`.\n", + "You can click on a pipeline to see run properties and output logs. Logs are also available on the DSVM under /tmp/azureml_run/{iterationid}/azureml-logs\n", "\n", - "**Note:** The widget displays a link at the bottom. Use this link to open a web interface to explore the individual run details." + "NOTE: The widget displays a link at the bottom. This links to a web-ui to explore the individual run details." ] }, { @@ -298,7 +386,7 @@ "source": [ "\n", "#### Retrieve All Child Runs\n", - "You can also use SDK methods to fetch all the child runs and see individual metrics that we log." + "You can also use sdk methods to fetch all the child runs and see individual metrics that we log. " ] }, { @@ -311,7 +399,7 @@ "metricslist = {}\n", "for run in children:\n", " properties = run.get_properties()\n", - " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)}\n", + " metrics = {k: v for k, v in run.get_metrics().items() if isinstance(v, float)} \n", " metricslist[int(properties['iteration'])] = metrics\n", "\n", "rundata = pd.DataFrame(metricslist).sort_index(1)\n", @@ -322,8 +410,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Cancelling Runs\n", - "You can cancel ongoing remote runs using the `cancel` and `cancel_iteration` functions." + "## Canceling runs\n", + "You can cancel ongoing remote runs using the *cancel()* and *cancel_iteration()* functions" ] }, { @@ -332,10 +420,10 @@ "metadata": {}, "outputs": [], "source": [ - "# Cancel the ongoing experiment and stop scheduling new iterations.\n", - "# remote_run.cancel()\n", + "# Cancel the ongoing experiment and stop scheduling new iterations\n", + "remote_run.cancel()\n", "\n", - "# Cancel iteration 1 and move onto iteration 2.\n", + "# Cancel iteration 1 and move onto iteration 2\n", "# remote_run.cancel_iteration(1)" ] }, @@ -345,7 +433,7 @@ "source": [ "### Retrieve the Best Model\n", "\n", - "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + "Below we select the best pipeline from our iterations. The *get_output* method on automl_classifier returns the best run and the fitted model for the last *fit* invocation. There are overloads on *get_output* that allow you to retrieve the best run and fitted model for *any* logged metric or a particular *iteration*." ] }, { @@ -361,8 +449,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Best Model Based on Any Other Metric\n", - "Show the run and the model which has the smallest `accuracy` value:" + "#### Best Model based on any other metric" ] }, { @@ -372,15 +459,14 @@ "outputs": [], "source": [ "# lookup_metric = \"accuracy\"\n", - "# best_run, fitted_model = remote_run.get_output(metric = lookup_metric)" + "# best_run, fitted_model = remote_run.get_output(metric=lookup_metric)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Model from a Specific Iteration\n", - "Show the run and the model from the first iteration:" + "#### Model from a specific iteration" ] }, { @@ -390,14 +476,14 @@ "outputs": [], "source": [ "# iteration = 1\n", - "# best_run, fitted_model = remote_run.get_output(iteration = iteration)" + "# best_run, fitted_model = remote_run.get_output(iteration=iteration)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Register the Fitted Model for Deployment" + "### Register fitted model for deployment" ] }, { @@ -406,17 +492,17 @@ "metadata": {}, "outputs": [], "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "remote_run.register_model(description = description, tags = tags)\n", - "remote_run.model_id # Use this id to deploy the model as a web service in Azure." + "#description = 'AutoML Model'\n", + "#tags = None\n", + "#remote_run.register_model(description=description, tags=tags)\n", + "#remote_run.model_id # Use this id to deploy the model as a web service in Azure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Test the Best Fitted Model \n" + "### Testing the Fitted Model \n" ] }, { @@ -437,16 +523,16 @@ "le = LabelEncoder()\n", "le.fit(df[\"Label\"].values)\n", "y = le.transform(df[\"Label\"].values)\n", - "df = df.drop([\"Label\"], axis = 1)\n", + "df = df.drop([\"Label\"], axis=1)\n", "\n", - "_, df_test, _, y_test = train_test_split(df, y, test_size = 0.1, random_state = 42)\n", + "_, df_test, _, y_test = train_test_split(df, y, test_size=0.1, random_state=42)\n", "\n", - "y_pred = fitted_model.predict(df_test.values)\n", + "ypred = fitted_model.predict(df_test.values)\n", "\n", - "y_pred_strings = le.inverse_transform(y_pred)\n", - "y_test_strings = le.inverse_transform(y_test)\n", + "ypred_strings = le.inverse_transform(ypred)\n", + "ytest_strings = le.inverse_transform(y_test)\n", "\n", - "cm = ConfusionMatrix(y_test_strings, y_pred_strings)\n", + "cm = ConfusionMatrix(ytest_strings, ypred_strings)\n", "\n", "print(cm)\n", "\n", diff --git a/tutorials/03.auto-train-models.ipynb b/tutorials/03.auto-train-models.ipynb index 5bb6882c..25212625 100644 --- a/tutorials/03.auto-train-models.ipynb +++ b/tutorials/03.auto-train-models.ipynb @@ -15,7 +15,7 @@ "source": [ "# Tutorial: Train a classification model with automated machine learning\n", "\n", - "In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform data preprocessing, algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n", + "In this tutorial, you'll learn how to generate a machine learning model using automated machine learning (automated ML). Azure Machine Learning can perform algorithm selection and hyperparameter selection in an automated way for you. The final model can then be deployed following the workflow in the [Deploy a model](02.deploy-models.ipynb) tutorial.\n", "\n", "[flow diagram](./imgs/flow2.png)\n", "\n", @@ -133,8 +133,8 @@ "digits = datasets.load_digits()\n", "\n", "# Exclude the first 100 rows from training so that they can be used for test.\n", - "X_digits = digits.data[100:,:]\n", - "y_digits = digits.target[100:]" + "X_train = digits.data[100:,:]\n", + "y_train = digits.target[100:]" ] }, { @@ -155,13 +155,13 @@ "count = 0\n", "sample_size = 30\n", "plt.figure(figsize = (16, 6))\n", - "for i in np.random.permutation(X_digits.shape[0])[:sample_size]:\n", + "for i in np.random.permutation(X_train.shape[0])[:sample_size]:\n", " count = count + 1\n", " plt.subplot(1, sample_size, count)\n", " plt.axhline('')\n", " plt.axvline('')\n", - " plt.text(x = 2, y = -2, s = y_digits[i], fontsize = 18)\n", - " plt.imshow(X_digits[i].reshape(8, 8), cmap = plt.cm.Greys)\n", + " plt.text(x = 2, y = -2, s = y_train[i], fontsize = 18)\n", + " plt.imshow(X_train[i].reshape(8, 8), cmap = plt.cm.Greys)\n", "plt.show()" ] }, @@ -187,8 +187,7 @@ "|**max_time_sec**|12,000|Time limit in seconds for each iteration|\n", "|**iterations**|20|Number of iterations. In each iteration, the model trains with the data with a specific pipeline|\n", "|**n_cross_validations**|3|Number of cross validation splits|\n", - "|**preprocess**|False| *True/False* Enables experiment to perform preprocessing on the input. Preprocessing handles *missing data*, and performs some common *feature extraction*|\n", - "|**exit_score**|0.995|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n", + "|**exit_score**|0.9985|*double* value indicating the target for *primary_metric*. Once the target is surpassed the run terminates|\n", "|**blacklist_algos**|['kNN','LinearSVM']|*Array* of *strings* indicating algorithms to ignore.\n" ] }, @@ -210,11 +209,10 @@ " max_time_sec = 12000,\n", " iterations = 20,\n", " n_cross_validations = 3,\n", - " preprocess = False,\n", " exit_score = 0.9985,\n", " blacklist_algos = ['kNN','LinearSVM'],\n", - " X = X_digits,\n", - " y = y_digits,\n", + " X = X_train,\n", + " y = y_train,\n", " path=project_folder)" ] }, @@ -351,8 +349,10 @@ "source": [ "# find 30 random samples from test set\n", "n = 30\n", - "sample_indices = np.random.permutation(X_digits.shape[0])[0:n]\n", - "test_samples = X_digits[sample_indices]\n", + "X_test = digits.data[:100, :]\n", + "y_test = digits.target[:100]\n", + "sample_indices = np.random.permutation(X_test.shape[0])[0:n]\n", + "test_samples = X_test[sample_indices]\n", "\n", "\n", "# predict using the model\n", @@ -368,11 +368,11 @@ " plt.axvline('')\n", " \n", " # use different color for misclassified sample\n", - " font_color = 'red' if y_digits[s] != result[i] else 'black'\n", - " clr_map = plt.cm.gray if y_digits[s] != result[i] else plt.cm.Greys\n", + " font_color = 'red' if y_test[s] != result[i] else 'black'\n", + " clr_map = plt.cm.gray if y_test[s] != result[i] else plt.cm.Greys\n", " \n", " plt.text(x = 2, y = -2, s = result[i], fontsize = 18, color = font_color)\n", - " plt.imshow(X_digits[s].reshape(8, 8), cmap = clr_map)\n", + " plt.imshow(X_test[s].reshape(8, 8), cmap = clr_map)\n", " \n", " i = i + 1\n", "plt.show()" @@ -393,7 +393,7 @@ "> * Review training results\n", "> * Register the best model\n", "\n", - "Learn more about [how to configure settings for automatic training]() or [how to use automatic training on a remote resource]()." + "Learn more about [how to configure settings for automatic training](https://aka.ms/aml-how-configure-auto) or [how to use automatic training on a remote resource](https://aka.ms/aml-how-to-auto-remote)." ] } ],