version 1.0.39

This commit is contained in:
Roope Astala
2019-05-14 16:01:14 -04:00
parent 8b1bffc200
commit 2d41c00488
76 changed files with 4441 additions and 3730 deletions

View File

@@ -24,8 +24,8 @@ pip install azureml-sdk
git clone https://github.com/Azure/MachineLearningNotebooks.git
# below steps are optional
# install the base SDK and a Jupyter notebook server
pip install azureml-sdk[notebooks]
# install the base SDK, Jupyter notebook server and tensorboard
pip install azureml-sdk[notebooks,tensorboard]
# install model explainability component
pip install azureml-sdk[explain]

View File

@@ -11,8 +11,7 @@ pip install azureml-sdk
Read more detailed instructions on [how to set up your environment](./NBSETUP.md) using Azure Notebook service, your own Jupyter notebook server, or Docker.
## How to navigate and use the example notebooks?
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [Configuration](./configuration.ipynb) notebook first if you haven't already to establish your connection to the AzureML Workspace.
It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples.
If you want to...
@@ -21,7 +20,7 @@ If you want to...
* ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb).
* ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb).
* ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb).
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](./how-to-use-azureml/machine-learning-pipelines/pipeline-mpi-batch-prediction.ipynb).
* ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring).
* ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) and [model data collection](./how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.ipynb).
## Tutorials
@@ -56,8 +55,4 @@ Visit following repos to see projects contributed by Azure ML users:
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/README.png)

View File

@@ -32,7 +32,6 @@
" 1. Workspace parameters\n",
" 1. Access your workspace\n",
" 1. Create a new workspace\n",
" 1. Create compute resources\n",
"1. [Next steps](#Next%20steps)\n",
"\n",
"---\n",
@@ -235,97 +234,6 @@
"ws.write_config()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create compute resources for your training experiments\n",
"\n",
"Many of the sample notebooks use Azure ML managed compute (AmlCompute) to train models using a dynamically scalable pool of compute. In this section you will create default compute clusters for use by the other notebooks and any other operations you choose.\n",
"\n",
"To create a cluster, you need to specify a compute configuration that specifies the type of machine to be used and the scalability behaviors. Then you choose a name for the cluster that is unique within the workspace that can be used to address the cluster later.\n",
"\n",
"The cluster parameters are:\n",
"* vm_size - this describes the virtual machine type and size used in the cluster. All machines in the cluster are the same type. You can get the list of vm sizes available in your region by using the CLI command\n",
"\n",
"```shell\n",
"az vm list-skus -o tsv\n",
"```\n",
"* min_nodes - this sets the minimum size of the cluster. If you set the minimum to 0 the cluster will shut down all nodes while note in use. Setting this number to a value higher than 0 will allow for faster start-up times, but you will also be billed when the cluster is not in use.\n",
"* max_nodes - this sets the maximum size of the cluster. Setting this to a larger number allows for more concurrency and a greater distributed processing of scale-out jobs.\n",
"\n",
"\n",
"To create a **CPU** cluster now, run the cell below. The autoscale settings mean that the cluster will scale down to 0 nodes when inactive and up to 4 nodes when busy."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your CPU cluster\n",
"cpu_cluster_name = \"cpucluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n",
" print(\"Found existing cpucluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new cpucluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_D2_V2\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
"\n",
" # Create the cluster with the specified name and configuration\n",
" cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n",
" \n",
" # Wait for the cluster to complete, show the output log\n",
" cpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To create a **GPU** cluster, run the cell below. Note that your subscription must have sufficient quota for GPU VMs or the command will fail. To increase quota, see [these instructions](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request). "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# Choose a name for your GPU cluster\n",
"gpu_cluster_name = \"gpucluster\"\n",
"\n",
"# Verify that cluster does not exist already\n",
"try:\n",
" gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n",
" print(\"Found existing gpu cluster\")\n",
"except ComputeTargetException:\n",
" print(\"Creating new gpucluster\")\n",
" \n",
" # Specify the configuration for the new cluster\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" min_nodes=0,\n",
" max_nodes=4)\n",
" # Create the cluster with the specified name and configuration\n",
" gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n",
"\n",
" # Wait for the cluster to complete, show the output log\n",
" gpu_cluster.wait_for_completion(show_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -249,7 +249,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.automl._vendor.automl.client.core.common.onnx_convert import OnnxConverter\n",
"from azureml.automl.core.onnx_convert import OnnxConverter\n",
"onnx_fl_path = \"./best_model.onnx\"\n",
"OnnxConverter.save_onnx_model(onnx_mdl, onnx_fl_path)"
]

View File

@@ -328,6 +328,12 @@
" print()\n",
" for estimator in step[1].estimators:\n",
" print_model(estimator[1], estimator[0]+ ' - ')\n",
" elif hasattr(step[1], '_base_learners') and hasattr(step[1], '_meta_learner'):\n",
" print(\"\\nMeta Learner\")\n",
" pprint(step[1]._meta_learner)\n",
" print()\n",
" for estimator in step[1]._base_learners:\n",
" print_model(estimator[1], estimator[0]+ ' - ')\n",
" else:\n",
" pprint(step[1].get_params())\n",
" print()\n",

View File

@@ -117,21 +117,12 @@
"outputs": [],
"source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
"dflow.get_profile()"
]
},
{
@@ -140,7 +131,30 @@
"metadata": {},
"outputs": [],
"source": [
"X.skip(1).head(5)"
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
]
},
{
@@ -162,9 +176,8 @@
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n",
" \"verbosity\" : logging.INFO,\n",
" \"n_cross_validations\": 3\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n",
"}"
]
},
@@ -181,7 +194,7 @@
"metadata": {},
"outputs": [],
"source": [
"dsvm_name = 'mydsvmc'\n",
"dsvm_name = 'mydsvmb'\n",
"\n",
"try:\n",
" while ws.compute_targets[dsvm_name].provisioning_state == 'Creating':\n",
@@ -257,6 +270,23 @@
"remote_run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Pre-process cache cleanup\n",
"The preprocess data gets cache at user default file store. When the run is completed the cache can be cleaned by running below cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"remote_run.clean_preprocessor_cache()"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -376,7 +406,8 @@
"source": [
"## Test\n",
"\n",
"#### Load Test Data"
"#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
]
},
{
@@ -385,12 +416,8 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"\n",
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"dflow_test = dflow_test.drop_nulls('Primary Type')"
]
},
{
@@ -398,7 +425,7 @@
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works."
"We will use confusion matrix to see how our model works."
]
},
{
@@ -407,65 +434,19 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sklearn.digits.data + target\n",
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(digits_complete.to_pandas_dataframe().shape)\n",
"labels_column = 'Column64'\n",
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
"\n",
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
"\n",
"print(cm)\n",
"\n",
"cm.plot()"
]
}
],

View File

@@ -115,23 +115,12 @@
"outputs": [],
"source": [
"# You can use `auto_read_file` which intelligently figures out delimiters and datatypes of a file.\n",
"# The data referenced here was pulled from `sklearn.datasets.load_digits()`.\n",
"simple_example_data_root = 'https://dprepdata.blob.core.windows.net/automl-notebook-data/'\n",
"X = dprep.auto_read_file(simple_example_data_root + 'X.csv').skip(1) # Remove the header row.\n",
"\n",
"# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n",
"# You can also use `read_csv` and `to_*` transformations to read (with overridable delimiter)\n",
"# and convert column types manually.\n",
"# Here we read a comma delimited file and convert all columns to integers.\n",
"y = dprep.read_csv(simple_example_data_root + 'y.csv').to_long(dprep.ColumnSelector(term='.*', use_regex = True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets."
"example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n",
"dflow = dprep.auto_read_file(example_data).skip(1) # Remove the header row.\n",
"dflow.get_profile()"
]
},
{
@@ -140,7 +129,30 @@
"metadata": {},
"outputs": [],
"source": [
"X.skip(1).head(5)"
"# As `Primary Type` is our y data, we need to drop the values those are null in this column.\n",
"dflow = dflow.drop_nulls('Primary Type')\n",
"dflow.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Review the Data Preparation Result\n",
"\n",
"You can peek the result of a Dataflow at any range using `skip(i)` and `head(j)`. Doing so evaluates only `j` records for all the steps in the Dataflow, which makes it fast even against large datasets.\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = dflow.drop_columns(columns=['Primary Type', 'FBI Code'])\n",
"y = dflow.keep_columns(columns=['Primary Type'], validate_column_exists=True)"
]
},
{
@@ -162,7 +174,7 @@
" \"iteration_timeout_minutes\" : 10,\n",
" \"iterations\" : 2,\n",
" \"primary_metric\" : 'AUC_weighted',\n",
" \"preprocess\" : False,\n",
" \"preprocess\" : True,\n",
" \"verbosity\" : logging.INFO\n",
"}"
]
@@ -326,7 +338,8 @@
"source": [
"## Test\n",
"\n",
"#### Load Test Data"
"#### Load Test Data\n",
"For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step."
]
},
{
@@ -335,12 +348,8 @@
"metadata": {},
"outputs": [],
"source": [
"from sklearn import datasets\n",
"\n",
"digits = datasets.load_digits()\n",
"X_test = digits.data[:10, :]\n",
"y_test = digits.target[:10]\n",
"images = digits.images[:10]"
"dflow_test = dprep.auto_read_file(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv').skip(1)\n",
"dflow_test = dflow_test.drop_nulls('Primary Type')"
]
},
{
@@ -348,7 +357,7 @@
"metadata": {},
"source": [
"#### Testing Our Best Fitted Model\n",
"We will try to predict 2 digits and see how our model works."
"We will use confusion matrix to see how our model works."
]
},
{
@@ -357,65 +366,18 @@
"metadata": {},
"outputs": [],
"source": [
"#Randomly select digits and test\n",
"from matplotlib import pyplot as plt\n",
"import numpy as np\n",
"from pandas_ml import ConfusionMatrix\n",
"\n",
"for index in np.random.choice(len(y_test), 2, replace = False):\n",
" print(index)\n",
" predicted = fitted_model.predict(X_test[index:index + 1])[0]\n",
" label = y_test[index]\n",
" title = \"Label value = %d Predicted value = %d \" % (label, predicted)\n",
" fig = plt.figure(1, figsize=(3,3))\n",
" ax1 = fig.add_axes((0,0,.8,.8))\n",
" ax1.set_title(title)\n",
" plt.imshow(images[index], cmap = plt.cm.gray_r, interpolation = 'nearest')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Appendix"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Capture the `Dataflow` Objects for Later Use in AutoML\n",
"y_test = dflow_test.keep_columns(columns=['Primary Type']).to_pandas_dataframe()\n",
"X_test = dflow_test.drop_columns(columns=['Primary Type', 'FBI Code']).to_pandas_dataframe()\n",
"\n",
"`Dataflow` objects are immutable and are composed of a list of data preparation steps. A `Dataflow` object can be branched at any point for further usage."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# sklearn.digits.data + target\n",
"digits_complete = dprep.auto_read_file('https://dprepdata.blob.core.windows.net/automl-notebook-data/digits-complete.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`digits_complete` (sourced from `sklearn.datasets.load_digits()`) is forked into `dflow_X` to capture all the feature columns and `dflow_y` to capture the label column."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(digits_complete.to_pandas_dataframe().shape)\n",
"labels_column = 'Column64'\n",
"dflow_X = digits_complete.drop_columns(columns = [labels_column])\n",
"dflow_y = digits_complete.keep_columns(columns = [labels_column])"
"ypred = fitted_model.predict(X_test)\n",
"\n",
"cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n",
"\n",
"print(cm)\n",
"\n",
"cm.plot()"
]
}
],

View File

@@ -220,7 +220,7 @@
"|**X**|(sparse) array-like, shape = [n_samples, n_features]|\n",
"|**y**|(sparse) array-like, shape = [n_samples, ], targets values.|\n",
"|**n_cross_validations**|Number of cross validation splits.|\n",
"|**country**|The country used to generate holiday features. These should be ISO 3166 two-letter country codes (i.e. 'US', 'GB').|\n",
"|**country_or_region**|The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes (i.e. 'US', 'GB').|\n",
"|**path**|Relative path to the project folder. AutoML stores configuration files for the experiment under this folder. You can specify a new empty folder. "
]
},
@@ -235,8 +235,8 @@
" \"time_column_name\": time_column_name,\n",
" # these columns are a breakdown of the total and therefore a leak\n",
" \"drop_column_names\": ['casual', 'registered'],\n",
" # knowing the country allows Automated ML to bring in holidays\n",
" \"country\" : 'US',\n",
" # knowing the country/region allows Automated ML to bring in holidays\n",
" \"country_or_region\" : 'US',\n",
" \"max_horizon\" : max_horizon,\n",
" \"target_lags\": 1 \n",
"}\n",

View File

@@ -112,9 +112,7 @@
"metadata": {},
"source": [
"### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you attach the default `AmlCompute` as your training compute resource.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -125,34 +123,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"\n",
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"automlcl\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 6)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
"compute_target = ws.get_default_compute_target(\"CPU\")"
]
},
{

View File

@@ -130,7 +130,7 @@
" print('Found an existing DSVM.')\n",
"except:\n",
" print('Creating a new DSVM.')\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2s_v3\")\n",
" dsvm_config = DsvmCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\")\n",
" dsvm_compute = DsvmCompute.create(ws, name = dsvm_name, provisioning_configuration = dsvm_config)\n",
" dsvm_compute.wait_for_completion(show_output = True)\n",
" print(\"Waiting one minute for ssh to be accessible\")\n",

View File

@@ -33,7 +33,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace."
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
]
},
{

View File

@@ -35,7 +35,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise,make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't."
]
},
{

View File

@@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -24,13 +31,6 @@
"4. Build new image and deploy it. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/enable-data-collection-for-models-in-aks/enable-data-collection-for-models-in-aks.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -1,5 +1,4 @@
# ONNX on Azure Machine Learning
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/README.png)
These tutorials show how to create and deploy Open Neural Network eXchange ([ONNX](http://onnx.ai)) models in Azure Machine Learning environments using [ONNX Runtime](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-build-deploy-onnx) for inference. Once deployed as a web service, you can ping the model with your own set of images to be analyzed!
@@ -37,3 +36,4 @@ Licensed under the MIT License.
These tutorials were developed by Vinitra Swamy and Prasanth Pulavarthi of the Microsoft AI Frameworks team and adapted for presentation at Microsoft Ignite 2018.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/onnx/README.png)

View File

@@ -3,7 +3,6 @@
Follow these sample notebooks to learn:
1. [Explain tabular data locally](explain-tabular-data-local): Basic example of explaining model trained on tabular data.
4. [Explain on remote AMLCompute](explain-on-amlcompute): Explain a model on a remote AMLCompute target.
5. [Explain tabular data with Run History](explain-tabular-data-run-history): Explain a model with Run History.
7. [Explain raw features](explain-tabular-data-raw-features): Explain the raw features of a trained model.

View File

@@ -203,9 +203,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"azureml_pip_packages = [\n",
" 'azureml-defaults', 'azureml-contrib-explain-model', 'azureml-core', 'azureml-telemetry',\n",
" 'azureml-explain-model'\n",

View File

@@ -217,6 +217,20 @@
"# 2. Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -218,6 +218,20 @@
"## Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -210,6 +210,20 @@
"## Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -12,16 +12,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.png)"
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License."
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-raw-features/explain-sklearn-raw-features.png)"
]
},
{
@@ -240,6 +240,20 @@
"# 2. Load visualization dashboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Note you will need to have extensions enabled prior to jupyter kernel starting\n",
"!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize\n",
"# Or, in Jupyter Labs, uncomment below\n",
"# jupyter labextension install @jupyter-widgets/jupyterlab-manager\n",
"# jupyter labextension install microsoft-mli-widget"
]
},
{
"cell_type": "code",
"execution_count": null,

View File

@@ -32,28 +32,16 @@ Azure Machine Learning Pipelines optimize for simplicity, speed, and efficiency.
**Tracking and versioning**: Instead of manually tracking data and result paths as you iterate, use the pipelines SDK to explicitly name and version your data sources, inputs, and outputs as well as manage scripts and data separately for increased productivity.
## End-to-end introductory notebook series
### Notebooks
Learn about Azure Machine Learning Pipelines by following the notebooks in this directory **in sequence**:
In this directory, there are two types of notebooks:
|Notebook|Description|
|--------|-----------|
|1. [aml-pipelines-getting-started.ipynb](https://aka.ms/pl-get-started)|Get started and run Azure Machine Learning Pipeline steps in parallel and in sequence.|
|2. [aml-pipelines-with-data-dependency-steps.ipynb](https://aka.ms/pl-data-dep)|Connect pipeline steps where data produced by one step is used by subsequent steps to force an explicit dependency between the steps. |
|3. [aml-pipelines-publish-and-run-using-rest-endpoint.ipynb](https://aka.ms/pl-pub-rep)|Publish pipelines to get a REST endpoint consumeable by Python and non-Pythons clients. |
|4. [aml-pipelines-data-transfer.ipynb](https://aka.ms/pl-data-trans)|Transfer data between supported datastores in pipelines.|
|5. [aml-pipelines-use-adla-as-compute-target.ipynb](https://aka.ms/pl-adla)|Run pipelines on Azure Data Lake Analytics (ADLA).|
|6. [aml-pipelines-how-to-use-estimatorstep.ipynb](https://aka.ms/pl-estimator)|Add estimator training to a pipeline with `EstimatorStep`.|
|7. [aml-pipelines-parameter-tuning-with-hyperdrive.ipynb](https://aka.ms/pl-hyperdrive)|Hyperparameter tune in your pipelines with `HyperDriveStep`.|
|8. [aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb](https://aka.ms/pl-azbatch)|Run custom code in an Azure Batch cluster with `AzureBatchStep`.|
|9. [aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb](https://aka.ms/pl-schedule)|Schedule published pipeline job at specific intervals or after change in datastore.|
|10. [aml-pipelines-with-automated-machine-learning-step.ipynb](https://aka.ms/pl-automl)|Use automated ML in your pipelines with `AutoMLStep`.|
* The first type of notebooks will introduce you to core Azure Machine Learning Pipelines features. These notebooks below belong in this category, and are designed to go in sequence; they're all located in the "intro-to-pipelines" folder:
Take a look at [intro-to-pipelines](./intro-to-pipelines/) for the list of notebooks that introduce Azure Machine Learning concepts for you.
## Advanced scenarios
* The second type of notebooks illustrate more sophisticated scenarios, and are independent of each other. These notebooks include:
|Notebook|Description|
|--------|-----------|
|[pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score)|Run a batch scoring job using Azure Machine Learning pipelines|
|[pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans)||
1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines.
2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute.
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/README.png)

View File

@@ -8,7 +8,6 @@
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -16,7 +15,6 @@
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -62,7 +60,7 @@
"source": [
"## Initialize Workspace\n",
"\n",
"Initialize a workspace object from persisted configuration.If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json\n",
"Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\\config.json\n",
"\n",
"If you don't have a config.json file, please go through the configuration Notebook located here:\n",
"https://github.com/Azure/MachineLearningNotebooks. \n",

View File

@@ -8,7 +8,6 @@
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -16,7 +15,6 @@
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -46,7 +44,7 @@
"metadata": {},
"source": [
"## Prerequisites and Azure Machine Learning Basics\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n"
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. \n"
]
},
{
@@ -155,7 +153,7 @@
"metadata": {},
"source": [
"### Datastore concepts\n",
"A [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target. \n",
"A [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target. \n",
"\n",
"A Datastore can either be backed by an Azure File Storage (default) or by an Azure Blob Storage.\n",
"\n",
@@ -198,7 +196,7 @@
"metadata": {},
"source": [
"#### (Optional) See your files using Azure Portal\n",
"Once you successfully uploaded the files, you can browse to them (or upload more files) using [Azure Portal](https://portal.azure.com). At the portal, make sure you have selected **AzureML Nursery** as your subscription (click *Resource Groups* and then select the subscription). Then look for your **Machine Learning Workspace** (it has your *alias* as the name). It has a link to your storage. Click on the storage link. It will take you to a page where you can see [Blobs](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction), [Files](https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction), [Tables](https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview), and [Queues](https://docs.microsoft.com/en-us/azure/storage/queues/storage-queues-introduction). We have just uploaded a file to the Blob storage and another one to the File storage. You should be able to see both of these files in their respective locations. "
"Once you successfully uploaded the files, you can browse to them (or upload more files) using [Azure Portal](https://portal.azure.com). At the portal, make sure you have selected your subscription (click *Resource Groups* and then select the subscription). Then look for your **Machine Learning Workspace** name. It has a link to your storage. Click on the storage link. It will take you to a page where you can see [Blobs](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction), [Files](https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction), [Tables](https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-overview), and [Queues](https://docs.microsoft.com/en-us/azure/storage/queues/storage-queues-introduction). We have uploaded a file each to the Blob storage and to the File storage in the above step. You should be able to see both of these files in their respective locations. "
]
},
{
@@ -235,15 +233,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. We will create an Azure Machine Learning Compute containing **STANDARD_D2_V2 CPU VMs**. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take about 3 minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**"
"#### Retrieve default Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
]
},
{
@@ -252,22 +243,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"aml-compute\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
"print(\"Azure Machine Learning Compute attached\")\n"
"aml_compute = ws.get_default_compute_target(\"CPU\")"
]
},
{
@@ -304,11 +280,15 @@
"## Creating a Step in a Pipeline\n",
"A Step is a unit of execution. Step typically needs a target of execution (compute target), a script to execute, and may require script arguments and inputs, and can produce outputs. The step also could take a number of other parameters. Azure Machine Learning Pipelines provides the following built-in Steps:\n",
"\n",
"- [**PythonScriptStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py): Add a step to run a Python script in a Pipeline.\n",
"- [**PythonScriptStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py): Adds a step to run a Python script in a Pipeline.\n",
"- [**AdlaStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.adla_step.adlastep?view=azure-ml-py): Adds a step to run U-SQL script using Azure Data Lake Analytics.\n",
"- [**DataTransferStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.data_transfer_step.datatransferstep?view=azure-ml-py): Transfers data between Azure Blob and Data Lake accounts.\n",
"- [**DatabricksStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.databricks_step.databricksstep?view=azure-ml-py): Adds a DataBricks notebook as a step in a Pipeline.\n",
"- [**HyperDriveStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.hyper_drive_step.hyperdrivestep?view=azure-ml-py): Creates a Hyper Drive step for Hyper Parameter Tuning in a Pipeline.\n",
"- [**AzureBatchStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.azurebatch_step.azurebatchstep?view=azure-ml-py): Creates a step for submitting jobs to Azure Batch\n",
"- [**EstimatorStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.estimator_step.estimatorstep?view=azure-ml-py): Adds a step to run Estimator in a Pipeline.\n",
"- [**MpiStep**](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.mpi_step.mpistep?view=azure-ml-py): Adds a step to run a MPI job in a Pipeline.\n",
"- [**AutoMLStep**](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py): Creates a AutoML step in a Pipeline.\n",
"\n",
"The following code will create a PythonScriptStep to be executed in the Azure Machine Learning Compute we created above using train.py, one of the files already made available in the project folder.\n",
"\n",
@@ -395,9 +375,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
"\n",
@@ -476,8 +453,8 @@
"source": [
"# Submit syntax\n",
"# submit(experiment_name, \n",
"# pipeline_parameters=None, \n",
"# continue_on_node_failure=False, \n",
"# pipeline_params=None, \n",
"# continue_on_step_failure=False, \n",
"# regenerate_outputs=False)\n",
"\n",
"pipeline_run1 = Experiment(ws, 'Hello_World1').submit(pipeline1, regenerate_outputs=False)\n",

View File

@@ -74,7 +74,7 @@
"source": [
"Initialize a workspace object from persisted configuration. Make sure the config file is present at .\\config.json\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, if you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the configuration Notebook located [here](https://github.com/Azure/MachineLearningNotebooks). \n",
"\n",
"This sets you up with a working config file that has information on your workspace, subscription id, etc. "
]
@@ -113,25 +113,7 @@
"metadata": {},
"outputs": [],
"source": [
"batch_compute_name = 'mybatchcompute' # Name to associate with new compute in workspace\n",
"\n",
"# Batch account details needed to attach as compute to workspace\n",
"batch_account_name = \"<batch_account_name>\" # Name of the Batch account\n",
"batch_resource_group = \"<batch_resource_group>\" # Name of the resource group which contains this account\n",
"\n",
"try:\n",
" # check if already attached\n",
" batch_compute = BatchCompute(ws, batch_compute_name)\n",
"except ComputeTargetException:\n",
" print('Attaching Batch compute...')\n",
" provisioning_config = BatchCompute.attach_configuration(resource_group=batch_resource_group, \n",
" account_name=batch_account_name)\n",
" batch_compute = ComputeTarget.attach(ws, batch_compute_name, provisioning_config)\n",
" batch_compute.wait_for_completion()\n",
" print(\"Provisioning state:{}\".format(batch_compute.provisioning_state))\n",
" print(\"Provisioning errors:{}\".format(batch_compute.provisioning_errors))\n",
"\n",
"print(\"Using Batch compute:{}\".format(batch_compute.cluster_resource_id))"
"batch_compute = ws.get_default_compute_target(\"CPU\")"
]
},
{
@@ -196,8 +178,8 @@
"\n",
"\n",
"def upload_file_to_datastore(datastore, file_name, content):\n",
" dir = create_local_file(content=content, file_name=file_name)\n",
" datastore.upload(src_dir=dir, overwrite=True, show_progress=True)"
" src_dir = create_local_file(content=content, file_name=file_name)\n",
" datastore.upload(src_dir=src_dir, overwrite=True, show_progress=True)"
]
},
{

View File

@@ -27,7 +27,7 @@
"\n",
"## Prerequisite:\n",
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise,go through the [configuration notebook](../../../configuration.ipynb) to:\n",
"* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n",
" * install the AML SDK\n",
" * create a workspace and its configuration file (`config.json`)"
]
@@ -76,18 +76,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
]
},
{
@@ -96,25 +86,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpucluster\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())"

View File

@@ -135,14 +135,7 @@
"metadata": {},
"source": [
"## Retrieve or create a Azure Machine Learning compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.\n",
"\n",
"If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:\n",
"\n",
"1. Create the configuration\n",
"2. Create the Azure Machine Learning compute\n",
"\n",
"**This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.**\n"
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Azure Machine Learning Compute in the current workspace. We will then run the training script on this compute target."
]
},
{
@@ -151,20 +144,7 @@
"metadata": {},
"outputs": [],
"source": [
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target {}.'.format(cluster_name))\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size=\"STANDARD_NC6\",\n",
" max_nodes=4)\n",
"\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
" compute_target.wait_for_completion(show_output=True, timeout_in_minutes=20)\n",
"\n",
"print(\"Azure Machine Learning Compute attached\")"
"compute_target = ws.get_default_compute_target(\"GPU\")"
]
},
{
@@ -266,7 +246,7 @@
"metadata": {},
"outputs": [],
"source": [
"hd_config = HyperDriveRunConfig(estimator=est, \n",
"hd_config = HyperDriveConfig(estimator=est, \n",
" hyperparameter_sampling=ps,\n",
" policy=early_termination_policy,\n",
" primary_metric_name='validation_acc', \n",
@@ -302,7 +282,7 @@
"### HyperDriveStep\n",
"HyperDriveStep can be used to run HyperDrive job as a step in pipeline.\n",
"- **name:** Name of the step\n",
"- **hyperdrive_run_config:** A HyperDriveRunConfig that defines the configuration for this HyperDrive run\n",
"- **hyperdrive_config:** A HyperDriveConfig that defines the configuration for this HyperDrive run\n",
"- **estimator_entry_script_arguments:** List of command-line arguments for estimator entry script\n",
"- **inputs:** List of input port bindings\n",
"- **outputs:** List of output port bindings\n",
@@ -324,7 +304,7 @@
"\n",
"hd_step = HyperDriveStep(\n",
" name=\"hyperdrive_module\",\n",
" hyperdrive_run_config=hd_config,\n",
" hyperdrive_config=hd_config,\n",
" estimator_entry_script_arguments=['--data-folder', data_folder],\n",
" inputs=[data_folder],\n",
" metrics_output=metirics_data)"

View File

@@ -79,20 +79,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"cpucluster\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n"
"aml_compute = ws.get_default_compute_target(\"CPU\")"
]
},
{

View File

@@ -54,7 +54,7 @@
"metadata": {},
"source": [
"### Compute Targets\n",
"#### Retrieve an already attached Azure Machine Learning Compute"
"#### Retrieve the default Azure Machine Learning Compute"
]
},
{
@@ -63,31 +63,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Run, Experiment, Datastore\n",
"\n",
"from azureml.widgets import RunDetails\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute, ComputeTarget\n",
"aml_compute_target = \"aml-compute\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"Found existing compute target: {}\".format(aml_compute_target))\n",
"except:\n",
" print(\"Creating new compute target: {}\".format(aml_compute_target))\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)"
"aml_compute = ws.get_default_compute_target(\"CPU\")"
]
},
{
@@ -383,7 +359,7 @@
"source": [
"### Create a schedule for the pipeline using a Datastore\n",
"This schedule will run when additions or modifications are made to Blobs in the Datastore.\n",
"By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Changes made to subfolders in the container/path will not trigger the schedule.\n",
"By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Note: the path_on_datastore will be under the container for the datastore, so the actual path monitored will be container/path_on_datastore. Changes made to subfolders in the container/path will not trigger the schedule.\n",
"Note: Only Blob Datastores are supported."
]
},
@@ -403,6 +379,7 @@
" datastore=datastore,\n",
" wait_for_provisioning=True,\n",
" description=\"Schedule Run\")\n",
" #polling_interval=5, use polling_interval to specify how often to poll for blob additions/modifications. Default value is 5 minutes.\n",
" #path_on_datastore=\"file/path\") use path_on_datastore to specify a specific folder to monitor for changes.\n",
"\n",
"# You may want to make sure that the schedule is provisioned properly\n",

View File

@@ -0,0 +1,553 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# How to Setup a PipelineEndpoint and Submit a Pipeline Using the PipelineEndpoint.\n",
"In this notebook, we will see how to setup a PipelineEndpoint and run specific pipeline version.\n",
"\n",
"PipelineEndpoint can be used to update a published pipeline while maintaining same endpoint.\n",
"PipelineEndpoint, provides a way to keep track of [PublishedPipelines](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.publishedpipeline) using versions. PipelineEndpoint uses endpoint with version information to trigger underlying published pipeline. Pipeline endpoints are uniquely named within a workspace. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prerequisites and AML Basics\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration Notebook](https://github.com/Azure/MachineLearningNotebooks) first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Notebook Overview\n",
"In this notebook, we provide an introduction to Azure machine learning PipelineEndpoints. It covers:\n",
"* [Create PipelineEndpoint](#Create-PipelineEndpoint), How to create PipelineEndpoint.\n",
"* [Retrieving PipelineEndpoint](#Retrieving-PipelineEndpoint), How to get specific PipelineEndpoint from worskpace by name/Id and get all [PipelineEndpoints](#Get-all-PipelineEndpoints-in-workspace) within workspace.\n",
"* [PipelineEndpoint Properties](#PipelineEndpoint-properties). How to get and set PipelineEndpoint properties, such as default version of PipelineEndpoint.\n",
"* [PipelineEndpoint Submission](#PipelineEndpoint-Submission). How to run a Pipeline using PipelineEndpoint."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create PipelineEndpoint\n",
"Following are required input parameters to create PipelineEndpoint:\n",
"\n",
"* *workspace*: AML workspace.\n",
"* *name*: name of PipelineEndpoint, it is unique within workspace.\n",
"* *description*: description details for PipelineEndpoint.\n",
"* *pipeline*: A [Pipeline](#Steps-to-create-simple-Pipeline) or [PublishedPipeline](#Publish-Pipeline), to set default version of PipelineEndpoint. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Initialization, Steps to create a Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.steps import PythonScriptStep\n",
"from azureml.pipeline.core import Pipeline\n",
"\n",
"aml_compute = ws.get_default_compute_target(\"CPU\")\n",
"\n",
"# source_directory\n",
"source_directory = '.'\n",
"# define a single step pipeline for demonstration purpose.\n",
"trainStep = PythonScriptStep(\n",
" name=\"Training_Step\",\n",
" script_name=\"train.py\", \n",
" compute_target=aml_compute_target, \n",
" source_directory=source_directory\n",
")\n",
"print(\"TrainStep created\")\n",
"# build and validate Pipeline\n",
"pipeline = Pipeline(workspace=ws, steps=[trainStep])\n",
"print(\"Pipeline is built\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Publish Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datetime import datetime\n",
"\n",
"timenow = datetime.now().strftime('%m-%d-%Y-%H-%M')\n",
"\n",
"pipeline_name = timenow + \"-Pipeline\"\n",
"print(pipeline_name)\n",
"\n",
"published_pipeline = pipeline.publish(\n",
" name=pipeline_name, \n",
" description=pipeline_name)\n",
"print(\"Newly published pipeline id: {}\".format(published_pipeline.id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Publishing PipelineEndpoint\n",
"Create PipelineEndpoint with required parameters: workspace, name, description and pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.pipeline.core import PipelineEndpoint\n",
"\n",
"pipeline_endpoint = PipelineEndpoint.publish(workspace=ws, name=\"PipelineEndpointTest\",\n",
" pipeline=pipeline, description=\"Test description Notebook\")\n",
"pipeline_endpoint"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Retrieving PipelineEndpoint\n",
"\n",
"PipelineEndpoint is uniquely defined by name and id within workspace. PipelineEndpoint in workspace can be retrived by Id or by name."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get PipelineEndpoint by Name\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"pipeline_endpoint_by_name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get PipelineEndpoint by Id\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#get the PipelineEndpoint Id\n",
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"endpoint_id = pipeline_endpoint_by_name.id\n",
"\n",
"pipeline_endpoint_by_id = PipelineEndpoint.get(workspace=ws, id=endpoint_id)\n",
"pipeline_endpoint_by_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all PipelineEndpoints in workspace\n",
"Returns all PipelineEndpoints within workspace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint_list = PipelineEndpoint.get_all(workspace=ws, active_only=True)\n",
"endpoint_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PipelineEndpoint properties"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Default Version of PipelineEndpoint\n",
"Default version of PipelineEndpoint starts from \"0\" and increments on addition of pipelines.\n",
"\n",
"##### Get the Default Version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"default_version = pipeline_endpoint_by_name.get_default_version()\n",
"default_version"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Set default version \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.set_default_version(\"0\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get the Published Pipeline corresponds to specific version of PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = pipeline_endpoint_by_name.get_pipeline(\"0\")\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get default version Published Pipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline = pipeline_endpoint_by_name.get_pipeline()\n",
"pipeline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Set Published Pipeline to default version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Set Published Pipeline to PipelineEndpoint, if exists\n",
"pipeline_endpoint_by_name.set_default(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all Versions in PipelineEndpoint\n",
"Returns list of published pipelines and its versions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"versions = pipeline_endpoint_by_name.get_all_versions()\n",
"\n",
"for ve in versions:\n",
" print(ve.version)\n",
" print(ve.pipeline.id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get all Published Pipelines in PipelineEndpoint\n",
"Returns all active pipelines in PipelineEnpoint, if active_only flag is set to True."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipelines = pipeline_endpoint_by_name.get_all_pipelines(active_only=True)\n",
"pipelines"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Name property of PipelineEndpoint\n",
"PipelineEndpoint is uniquely identified by name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Set Name PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.set_name(name=\"NewName\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add Published Pipeline to PipelineEndpoint, \n",
"Adding published pipeline, if its not present in PipelineEndpoint."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.add(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Add Published pipeline to PipelineEndpoint and set it to default version\n",
"Adding published pipeline to PipelineEndpoint if not present and set it to default"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name.add_default(published_pipeline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PipelineEndpoint Submission\n",
"PipelineEndpoint triggers specific versioned pipeline or default pipeline by:\n",
"* Rest Endpoint \n",
"* Submit call."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Run Pipeline by endpoint property of PipelineEndpoint\n",
"Run specific pipeline using endpoint property of PipelineEndpoint and executing http post."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pipeline_endpoint_by_name = PipelineEndpoint.get(workspace=ws, name=\"PipelineEndpointTest\")\n",
"\n",
"# endpoint with id \n",
"rest_endpoint_id = pipeline_endpoint_by_name.endpoint\n",
"\n",
"# for default version pipeline\n",
"rest_endpoint_id_without_version_with_id = rest_endpoint_id\n",
"\n",
"# for specific version pipeline just append version info\n",
"version=\"0\"\n",
"rest_endpoint_id_with_version = rest_endpoint_id_without_version_with_id+\"/\"+ version\n",
"print(rest_endpoint_id_with_version)\n",
"pipeline_endpoint_by_name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# endpoint with name\n",
"rest_endpoint_name = rest_endpoint_id.split(\"Id\", 1)[0] + \"Name?name=\" + pipeline_endpoint_by_name.name\n",
"\n",
"# for default version pipeline\n",
"rest_endpoint_name_without_version = rest_endpoint_name\n",
"\n",
"# for specific version pipeline just append version info\n",
"version=\"0\"\n",
"rest_endpoint_name_with_version = rest_endpoint_name_without_version+\"&pipelineVersion=\"+ version\n",
"print(rest_endpoint_name_with_version)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"import requests\n",
"\n",
"auth = InteractiveLoginAuthentication()\n",
"aad_token = auth.get_authentication_header()\n",
"\n",
"#endpoint = pipeline_endpoint_by_name.url\n",
"\n",
"print(\"You can perform HTTP POST on URL {} to trigger this pipeline\".format(rest_endpoint_name_with_version))\n",
"\n",
"# specify the param when running the pipeline\n",
"response = requests.post(rest_endpoint_name_with_version, \n",
" headers=aad_token, \n",
" json={\"ExperimentName\": \"default_pipeline\",\n",
" \"RunSource\": \"SDK\",\n",
" \"ParameterAssignments\": {\"1\": \"united\", \"2\":\"city\"}})\n",
"\n",
"run_id = response.json()[\"Id\"]\n",
"print(run_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Run Pipeline by Submit call of PipelineEndpoint \n",
"Run specific pipeline using Submit api of PipelineEndpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# submit pipeline with specific version\n",
"run_id = pipeline_endpoint_by_name.submit(\"TestPipelineEndpoint\", pipeline_version=\"0\")\n",
"print(run_id)\n",
"\n",
"# submit pipeline with default version\n",
"run_id = pipeline_endpoint_by_name.submit(\"TestPipelineEndpoint\")\n",
"print(run_id)"
]
}
],
"metadata": {
"authors": [
{
"name": "mameghwa"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -404,7 +404,9 @@
"metadata": {},
"source": [
"### 1. Running the demo notebook already added to the Databricks workspace\n",
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:"
"Create a notebook in the Azure Databricks workspace, and provide the path to that notebook as the value associated with the environment variable \"DATABRICKS_NOTEBOOK_PATH\". This will then set the variable\u00c2\u00a0notebook_path\u00c2\u00a0when you run the code cell below:\n",
"\n",
"your notebook's path in Azure Databricks UI by hovering over to notebook's title. A typical path of notebook looks like this `/Users/example@databricks.com/example`. See [Databricks Workspace](https://docs.azuredatabricks.net/user-guide/workspace.html) to learn about the folder structure."
]
},
{

View File

@@ -130,9 +130,7 @@
"metadata": {},
"source": [
"### Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process."
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your AutoML run. In this tutorial, you get the default `AmlCompute` as your training compute resource."
]
},
{
@@ -141,31 +139,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Choose a name for your cluster.\n",
"amlcompute_cluster_name = \"cpucluster\"\n",
"\n",
"found = False\n",
"# Check if this compute target already exists in the workspace.\n",
"cts = ws.compute_targets\n",
"if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n",
" found = True\n",
" print('Found existing compute target.')\n",
" compute_target = cts[amlcompute_cluster_name]\n",
" \n",
"if not found:\n",
" print('Creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n",
" #vm_priority = 'lowpriority', # optional\n",
" max_nodes = 4)\n",
"\n",
" # Create the cluster.\n",
" compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n",
" \n",
" # Can poll for a minimum number of nodes and for a specific timeout.\n",
" # If no min_node_count is provided, it will use the scale settings for the cluster.\n",
" compute_target.wait_for_completion(show_output = True, min_node_count = 1, timeout_in_minutes = 10)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()."
"compute_target = ws.get_default_compute_target(\"CPU\")"
]
},
{
@@ -244,7 +218,7 @@
" X_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/X_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
" y_train = pd.read_csv(\"/tmp/azureml_runs/bai_data/y_train.tsv\", delimiter=\"\\t\", header=None, quotechar='\"')\n",
"\n",
" return { \"X\" : X_train.values, \"y\" : y_train[0].values }\n"
" return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }\n"
]
},
{

View File

@@ -128,7 +128,7 @@
"metadata": {},
"source": [
"#### Retrieve or create a Aml compute\n",
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Aml Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target."
"Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's get the default Aml Compute in the current workspace. We will then run the training script on this compute target."
]
},
{
@@ -137,22 +137,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"aml_compute_target = \"aml-compute\"\n",
"try:\n",
" aml_compute = AmlCompute(ws, aml_compute_target)\n",
" print(\"found existing compute target.\")\n",
"except ComputeTargetException:\n",
" print(\"creating new compute target\")\n",
" \n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n",
" min_nodes = 1, \n",
" max_nodes = 4) \n",
" aml_compute = ComputeTarget.create(ws, aml_compute_target, provisioning_config)\n",
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
"print(\"Aml Compute attached\")\n"
"aml_compute = ws.get_default_compute_target(\"CPU\")\n"
]
},
{
@@ -297,9 +282,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])"
]

View File

@@ -162,7 +162,7 @@
"metadata": {},
"source": [
"### Create and attach Compute targets\n",
"Use the below code to create and attach Compute targets. "
"Use the below code to get the default Compute target. "
]
},
{
@@ -171,33 +171,9 @@
"metadata": {},
"outputs": [],
"source": [
"# choose a name for your cluster\n",
"aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpucluster\")\n",
"cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n",
"cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n",
"vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n",
"cluster_type = os.environ.get(\"AML_CLUSTER_TYPE\", \"GPU\")\n",
"\n",
"\n",
"if aml_compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[aml_compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + aml_compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n",
" vm_priority = 'lowpriority', # optional\n",
" min_nodes = cluster_min_nodes, \n",
" max_nodes = cluster_max_nodes)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
"compute_target = ws.get_default_compute_target(cluster_type)"
]
},
{

View File

@@ -32,7 +32,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwsie, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. "
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. "
]
},
{

View File

@@ -0,0 +1,260 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"## Authentication in Azure Machine Learning\n",
"\n",
"This notebook shows you how to authenticate to your Azure ML Workspace using\n",
"\n",
" 1. Interactive Login Authentication\n",
" 2. Azure CLI Authentication\n",
" 3. Service Principal Authentication\n",
" \n",
"The interactive authentication is suitable for local experimentation on your own computer. Azure CLI authentication is suitable if you are already using Azure CLI for managing Azure resources, and want to sign in only once. The Service Principal authentication is suitable for automated workflows, for example as part of Azure Devops build."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core import Workspace"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactive Authentication\n",
"\n",
"Interactive authentication is the default mode when using Azure ML SDK.\n",
"\n",
"When you connect to your workspace using workspace.from_config, you will get an interactive login dialog."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace.from_config()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Also, if you explicitly specify the subscription ID, resource group and resource group, you will get the dialog."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note the user you're authenticated as must have access to the subscription and resource group. If you receive an error\n",
"\n",
"```\n",
"AuthenticationException: You don't have access to xxxxxx-xxxx-xxx-xxx-xxxxxxxxxx subscription. All the subscriptions that you have access to = ...\n",
"```\n",
"\n",
"check that the you used correct login and entered the correct subscription ID."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In some cases, you may see a version of the error message containing text: ```All the subscriptions that you have access to = []```\n",
"\n",
"In such a case, you may have to specify the tenant ID of the Azure Active Directory you're using. An example would be accessing a subscription as a guest to a tenant that is not your default. You specify the tenant by explicitly instantiating _InteractiveLoginAuthentication_ with tenant ID as argument ([see instructions how to obtain tenant Id](#get-tenant-id))."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import InteractiveLoginAuthentication\n",
"\n",
"interactive_auth = InteractiveLoginAuthentication(tenant_id=\"my-tenant-id\")\n",
"\n",
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\",\n",
" auth=interactive_auth)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Azure CLI Authentication\n",
"\n",
"If you have installed azure-cli package, and used ```az login``` command to log in to your Azure Subscription, you can use _AzureCliAuthentication_ class.\n",
"\n",
"Note that interactive authentication described above won't use existing Azure CLI auth tokens. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.authentication import AzureCliAuthentication\n",
"\n",
"cli_auth = AzureCliAuthentication()\n",
"\n",
"ws = Workspace(subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\",\n",
" auth=cli_auth)\n",
"\n",
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Service Principal Authentication\n",
"\n",
"When setting up a machine learning workflow as an automated process, we recommend using Service Principal Authentication. This approach decouples the authentication from any specific user login, and allows managed access control.\n",
"\n",
"Note that you must have administrator privileges over the Azure subscription to complete these steps.\n",
"\n",
"The first step is to create a service principal. First, go to [Azure Portal](https://portal.azure.com), select **Azure Active Directory** and **App Registrations**. Then select **+New application registration**, give your service principal a name, for example _my-svc-principal_. You can leave application type as is, and specify a dummy value for Sign-on URL, such as _https://invalid_.\n",
"\n",
"Then click **Create**.\n",
"\n",
"![service principal creation]<img src=\"images/svc-pr-1.PNG\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step is to obtain the _Application ID_ (also called username) and create _password_ for the service principal.\n",
"\n",
"From the page for your newly created service principal, copy the _Application ID_. Then select **Settings** and **Keys**, write a description for your key, and select duration. Then click **Save**, and copy the _password_ to a secure location.\n",
"\n",
"![application id and password](images/svc-pr-2.PNG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a id =\"get-tenant-id\"></a>\n",
"\n",
"Also, you need to obtain the tenant ID of your Azure subscription. Go back to **Azure Active Directory**, select **Properties** and copy _Directory ID_.\n",
"\n",
"![tenant id](images/svc-pr-3.PNG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, you need to give the service principal permissions to access your workspace. Navigate to **Resource Groups**, to the resource group for your Machine Learning Workspace. \n",
"\n",
"Then select **Access Control (IAM)** and **Add a role assignment**. For _Role_, specify which level of access you need to grant, for example _Contributor_. Start entering your service principal name and once it is found, select it, and click **Save**.\n",
"\n",
"![add role](images/svc-pr-4.PNG)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now you are ready to use the service principal authentication. For example, to connect to your Workspace, see code below and enter your own values for tenant ID, application ID, subscription ID, resource group and workspace.\n",
"\n",
"**We strongly recommended that you do not insert the secret password to code**. Instead, you can use environment variables to pass it to your code, for example through Azure Key Vault, or through secret build variables in Azure DevOps. For local testing, you can for example use following PowerShell command to set the environment variable.\n",
"\n",
"```\n",
"$env:AZUREML_PASSWORD = \"my-password\"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from azureml.core.authentication import ServicePrincipalAuthentication\n",
"\n",
"svc_pr_password = os.environ.get(\"AZUREML_PASSWORD\")\n",
"\n",
"svc_pr = ServicePrincipalAuthentication(\n",
" tenant_id=\"my-tenant-id\",\n",
" service_principal_id=\"my-application-id\",\n",
" service_principal_password=svc_pr_password)\n",
"\n",
"\n",
"ws = Workspace(\n",
" subscription_id=\"my-subscription-id\",\n",
" resource_group=\"my-ml-rg\",\n",
" workspace_name=\"my-ml-workspace\",\n",
" auth=svc_pr\n",
" )\n",
"\n",
"print(\"Found workspace {} at location {}\".format(ws.name, ws.location))"
]
}
],
"metadata": {
"authors": [
{
"name": "roastala"
}
],
"kernelspec": {
"display_name": "Python 3.6",
"language": "python",
"name": "python36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@@ -17,4 +17,5 @@ These examples show you:
Learn more about how to use `Estimator` class to [train deep neural networks with Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-ml-models).
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/README.png)
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/README.png)

View File

@@ -95,10 +95,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code gets the default compute cluster.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -109,24 +107,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())"
@@ -136,7 +117,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"."
]
},
{

View File

@@ -98,10 +98,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use default `AmlCompute` as the training compute resource.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -112,24 +110,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current AmlCompute\n",
"print(compute_target.get_status().serialize())"

View File

@@ -96,10 +96,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code creates an `STANDARD_NC6` GPU cluster that autoscales from `0` to `4` nodes.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource. Specifically, the below code uses the default compute in the workspace.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -110,24 +108,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',\n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current AmlCompute. \n",
"print(compute_target.get_status().serialize())"
@@ -137,7 +118,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"."
]
},
{

View File

@@ -13,7 +13,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training-with-deep-learning/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.png)"
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/training/manage-runs/manage-runs.png)"
]
},
{
@@ -98,10 +98,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -112,24 +110,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
@@ -139,7 +120,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"."
]
},
{

View File

@@ -98,10 +98,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -112,24 +110,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"

View File

@@ -50,22 +50,6 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install the Azure ML TensorBoard integration package if you haven't already."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install azureml-tensorboard"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -1,14 +1,31 @@
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
import argparse
print("*********************************************************")
print("Hello Azure ML!")
parser = argparse.ArgumentParser()
parser.add_argument('--numbers-in-sequence', type=int, dest='num_in_sequence', default=10,
help='number of fibonacci numbers in sequence')
args = parser.parse_args()
num = args.num_in_sequence
def fibo(n):
if n < 2:
return n
else:
return fibo(n - 1) + fibo(n - 2)
try:
from azureml.core import Run
run = Run.get_context()
print("Log Fibonacci numbers.")
run.log_list('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
for i in range(0, num - 1):
run.log('Fibonacci numbers', fibo(i))
run.complete()
except:
print("Warning: you need to install Azure ML SDK in order to log metrics.")

View File

@@ -113,18 +113,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
]
},
{
@@ -133,25 +123,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"cpucluster\"\n",
"\n",
"try:\n",
" cpu_cluster = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', max_nodes=4)\n",
"\n",
" # create the cluster\n",
" cpu_cluster = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" cpu_cluster.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"cpu_cluster = ws.get_default_compute_target(\"CPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(cpu_cluster.get_status().serialize())"
@@ -161,7 +133,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'cpucluster' of type `AmlCompute`."
"Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns."
]
},
{
@@ -184,7 +156,7 @@
},
"source": [
"## Use a simple script\n",
"We have already created a simple \"hello world\" script. This is the script that we will submit through the estimator pattern. It prints a hello-world message, and if Azure ML SDK is installed, it will also logs an array of values ([Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number))."
"We have already created a simple \"hello world\" script. This is the script that we will submit through the estimator pattern. It prints a hello-world message, and if Azure ML SDK is installed, it will also logs an array of values ([Fibonacci numbers](https://en.wikipedia.org/wiki/Fibonacci_number)). The script takes as input the number of Fibonacci numbers in the sequence to log."
]
},
{
@@ -235,7 +207,10 @@
"outputs": [],
"source": [
"# use a conda environment, don't use Docker, on local computer\n",
"est = Estimator(source_directory='.', compute_target='local', entry_script='dummy_train.py', use_docker=False)\n",
"script_params = {\n",
" '--numbers-in-sequence': 10\n",
"}\n",
"est = Estimator(source_directory='.', script_params=script_params, compute_target='local', entry_script='dummy_train.py', use_docker=False)\n",
"run = exp.submit(est)\n",
"RunDetails(run).show()"
]
@@ -254,7 +229,10 @@
"outputs": [],
"source": [
"# use a conda environment on default Docker image in an AmlCompute cluster\n",
"est = Estimator(source_directory='.', compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)\n",
"script_params = {\n",
" '--numbers-in-sequence': 10\n",
"}\n",
"est = Estimator(source_directory='.', script_params=script_params, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)\n",
"run = exp.submit(est)\n",
"RunDetails(run).show()"
]
@@ -273,7 +251,11 @@
"outputs": [],
"source": [
"# add a conda package\n",
"script_params = {\n",
" '--numbers-in-sequence': 10\n",
"}\n",
"est = Estimator(source_directory='.', \n",
" script_params=script_params, \n",
" compute_target='local', \n",
" entry_script='dummy_train.py', \n",
" use_docker=False, \n",
@@ -313,7 +295,12 @@
"user_managed_dependencies = True\n",
"\n",
"# submit to a local Docker container. if you don't have Docker engine running locally, you can set compute_target to cpu_cluster.\n",
"est = Estimator(source_directory='.', compute_target='local', \n",
"script_params = {\n",
" '--numbers-in-sequence': 10\n",
"}\n",
"est = Estimator(source_directory='.', \n",
" script_params=script_params, \n",
" compute_target='local', \n",
" entry_script='dummy_train.py',\n",
" custom_docker_image=image_name,\n",
" # uncomment below line to use your private ACR\n",
@@ -332,6 +319,130 @@
"Note: if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Intelligent hyperparameter tuning\n",
"\n",
"The simple \"hello world\" script above lets the user fix the value of a parameter for the number of Fibonacci numbers in the sequence to log. Similarly, when training models, you can fix values of parameters of the training algorithm itself. E.g. the learning rate, the number of layers, the number of nodes in each layer in a neural network, etc. These adjustable parameters that govern the training process are referred to as the hyperparameters of the model. The goal of hyperparameter tuning is to search across various hyperparameter configurations and find the configuration that results in the best performance.\n",
"\n",
"\n",
"To demonstrate how Azure Machine Learning can help you automate the process of hyperarameter tuning, we will launch multiple runs with different values for numbers in the sequence. First let's define the parameter space using random sampling."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import choice\n",
"\n",
"ps = RandomParameterSampling(\n",
" {\n",
" '--numbers-in-sequence': choice(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will create a new estimator without the above numbers-in-sequence parameter since that will be passed in later. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"est = Estimator(source_directory='.', script_params={}, compute_target=cpu_cluster, entry_script='dummy_train.py', use_docker=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will look at training metrics and early termination policies. When training a model, users are interested in logging and optimizing certain metrics of the model e.g. maximize the accuracy of the model, or minimize loss. This metric is logged by the training script for each run. In our simple script above, we are logging Fibonacci numbers in a sequence. But a training script could just as easily log other metrics like accuracy or loss, which can be used to evaluate the performance of a given training run.\n",
"\n",
"The intelligent hyperparameter tuning capability in Azure Machine Learning automatically terminates poorly performing runs using an early termination policy. Early termination reduces wastage of compute resources and instead uses these resources for exploring other hyperparameter configurations. In this example, we use the BanditPolicy. This basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML will terminate the training run. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we are ready to configure a run configuration object for hyperparameter tuning. We need to call out the primary metric that we want the experiment to optimize. The name of the primary metric needs to exactly match the name of the metric logged by the training script and we specify that we are looking to maximize this value. Next, we control the resource budget for the experiment by setting the maximum total number of training runs to 10. We also set the maximum number of training runs to run concurrently at 4, which is the same as the number of nodes in our computer cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdc = HyperDriveConfig(estimator=est, \n",
" hyperparameter_sampling=ps, \n",
" policy=policy, \n",
" primary_metric_name='Fibonacci numbers', \n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, \n",
" max_total_runs=10,\n",
" max_concurrent_runs=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, let's launch the hyperparameter tuning job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"hdr = exp.submit(config=hdc)\n",
"RunDetails(hdr).show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When all the runs complete, we can find the run with the best performance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"best_run = hdr.get_best_run_by_primary_metric()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can register the model from the best run and use it to deploy a web service that can be used for Inferencing. Details on how how you can do this can be found in the sample folders for the ohter types of estimators.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -51,22 +51,6 @@
"print(\"SDK version:\", azureml.core.VERSION)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install the Azure ML TensorBoard package."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install azureml-tensorboard"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -95,10 +95,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
"\n",
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace, this code will skip the creation process.\n",
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, we use Azure ML managed compute ([AmlCompute](https://docs.microsoft.com/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute)) for our remote training compute resource.\n",
"\n",
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
]
@@ -109,24 +107,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target.')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" compute_target.wait_for_completion(show_output=True)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
@@ -136,7 +117,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
"The above code retrieves the default GPU compute. If you instead want to use default CPU compute, provide type=\"CPU\"."
]
},
{
@@ -337,7 +318,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive.runconfig import HyperDriveRunConfig\n",
"from azureml.train.hyperdrive.runconfig import HyperDriveConfig\n",
"from azureml.train.hyperdrive.sampling import RandomParameterSampling\n",
"from azureml.train.hyperdrive.policy import BanditPolicy\n",
"from azureml.train.hyperdrive.run import PrimaryMetricGoal\n",
@@ -350,7 +331,7 @@
" }\n",
")\n",
"\n",
"hyperdrive_run_config = HyperDriveRunConfig(estimator=estimator,\n",
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
" hyperparameter_sampling=param_sampling, \n",
" primary_metric_name='Accuracy',\n",
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
@@ -372,7 +353,7 @@
"outputs": [],
"source": [
"# start the HyperDrive run\n",
"hyperdrive_run = experiment.submit(hyperdrive_run_config)"
"hyperdrive_run = experiment.submit(hyperdrive_config)"
]
},
{

View File

@@ -239,18 +239,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
]
},
{
@@ -259,26 +249,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
@@ -288,7 +259,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named \"gpucluster\" of type `AmlCompute`."
"Now that you have retrtieved the compute target, let's see what the workspace's `compute_targets` property returns."
]
},
{
@@ -671,7 +642,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveRunConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import choice, loguniform\n",
"\n",
"ps = RandomParameterSampling(\n",
@@ -734,7 +705,7 @@
"metadata": {},
"outputs": [],
"source": [
"hdc = HyperDriveRunConfig(estimator=est, \n",
"hdc = HyperDriveConfig(estimator=est, \n",
" hyperparameter_sampling=ps, \n",
" policy=policy, \n",
" primary_metric_name='Accuracy', \n",

View File

@@ -261,18 +261,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Attach existing AmlCompute\n",
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_NC6` GPU VMs. This process is broken down into 3 steps:\n",
"1. create the configuration (this step is local and only takes a second)\n",
"2. create the cluster (this step will take about **20 seconds**)\n",
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
"## Get default AmlCompute\n",
"You can create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you use default `AmlCompute` as your training compute resource."
]
},
{
@@ -281,26 +271,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
"from azureml.core.compute_target import ComputeTargetException\n",
"\n",
"# choose a name for your cluster\n",
"cluster_name = \"gpucluster\"\n",
"\n",
"try:\n",
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
" print('Found existing compute target')\n",
"except ComputeTargetException:\n",
" print('Creating a new compute target...')\n",
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
" max_nodes=4)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
"\n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it uses the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
"compute_target = ws.get_default_compute_target(type=\"GPU\")\n",
"\n",
"# use get_status() to get a detailed status for the current cluster. \n",
"print(compute_target.get_status().serialize())"
@@ -310,7 +281,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you have created the compute target, let's see what the workspace's `compute_targets` property returns. You should now see one entry named 'gpucluster' of type `AmlCompute`."
"Now that you have retrieved the compute target, let's see what the workspace's `compute_targets` property returns."
]
},
{
@@ -691,7 +662,7 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveRunConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, HyperDriveConfig, PrimaryMetricGoal\n",
"from azureml.train.hyperdrive import choice, loguniform\n",
"\n",
"ps = RandomParameterSampling(\n",
@@ -753,7 +724,7 @@
"metadata": {},
"outputs": [],
"source": [
"htc = HyperDriveRunConfig(estimator=est, \n",
"htc = HyperDriveConfig(estimator=est, \n",
" hyperparameter_sampling=ps, \n",
" policy=policy, \n",
" primary_metric_name='validation_acc', \n",

View File

@@ -271,9 +271,11 @@
"metadata": {},
"source": [
"### Logging vectors\n",
"Vectors are good for recording information such as loss curves. You can log a vector by create a list of numbers and call ``log_list()`` and supply a name and the list, or by repeatedly logging a value using the same name.\n",
"Vectors are good for recording information such as loss curves. You can log a vector by creating a list of numbers, calling ``log_list()`` and supplying a name and the list, or by repeatedly logging a value using the same name.\n",
"\n",
"Vectors are presented in Run Details as a chart, and are directly comparable in experiment reports when placed in a chart. **Note:** vectors logged into the run are expected to be relatively small. Logging very large vectors into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
"Vectors are presented in Run Details as a chart, and are directly comparable in experiment reports when placed in a chart. \n",
"\n",
"**Note:** vectors logged into the run are expected to be relatively small. Logging very large vectors into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
]
},
{
@@ -304,7 +306,9 @@
"* Create a dictionary of lists where each list represents a column in the table and call ``log_table()``\n",
"* Repeatedly call ``log_row()`` providing the same table name with a consistent set of named args as the column values\n",
"\n",
"Tables are presented in Run Details as a chart using the first two columns of the table **Note:** tables logged into the run are expected to be relatively small. Logging very large tables into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
"Tables are presented in Run Details as a chart using the first two columns of the table \n",
"\n",
"**Note:** tables logged into the run are expected to be relatively small. Logging very large tables into Azure ML can result in reduced performance. If you need to store large amounts of data associated with the run, you can write the data to file that will be uploaded."
]
},
{
@@ -365,7 +369,7 @@
"source": [
"### Uploading files\n",
"\n",
"Any files that are placed in the ``.\\outputs`` directory are automatically uploaded when the run is completed. These files are also visible in the *Outputs* tab of the Run Details page. Files can also be uploaded explicitly and stored as artifacts along with the run record.\n"
"Files can also be uploaded explicitly and stored as artifacts along with the run record. These files are also visible in the *Outputs* tab of the Run Details page.\n"
]
},
{
@@ -374,9 +378,13 @@
"metadata": {},
"outputs": [],
"source": [
"%%writefile .\\outputs\\myfile.txt\n",
"file_name = 'outputs/myfile.txt'\n",
"\n",
"This is an output file that will be automatically uploaded."
"with open(file_name, \"w\") as f:\n",
" f.write('This is an output file that will be uploaded.\\n')\n",
"\n",
"# Upload the file explicitly into artifacts \n",
"run.upload_file(name = file_name, path_or_stream = file_name)"
]
},
{
@@ -504,7 +512,7 @@
"## Next steps\n",
"To experiment more with logging and to understand how metrics can be visualized, go back to the *Start a run* section, try changing the category and scale_factor values and going through the notebook several times. Play with the KPI, charting, and column selection options on the experiment's Run History reports page to see how the various metrics can be combined and visualized.\n",
"\n",
"After learning about all of the logging options, go to the [train on remote vm](..\\train_on_remote_vm\\train_on_remote_vm.ipnyb) notebook and experiment with logging from remote compute contexts."
"After learning about all of the logging options, go to the [train on remote vm](..\\train-on-remote-vm\\train-on-remote-vm.ipynb) notebook and experiment with logging from remote compute contexts."
]
}
],

View File

@@ -52,7 +52,7 @@
"source": [
"## Setup\n",
"\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't. Also, if you're new to Azure ML, we recommend that you go through [the tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) first to learn the basic concepts.\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. Also, if you're new to Azure ML, we recommend that you go through [the tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) first to learn the basic concepts.\n",
"\n",
"Let's first import required packages, check Azure ML SDK version, connect to your workspace and create an Experiment to hold the runs."
]

View File

@@ -32,7 +32,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
@@ -117,7 +117,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note** You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the `train-in-remote-vm` notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as `azureml.core.runconfig.DEFAULT_MMLSPARK_CPU_IMAGE`."
"**Note** You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the `train-in-remote-vm` notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as `microsoft/mmlspark:0.12`."
]
},
{

View File

@@ -38,7 +38,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{
@@ -168,9 +168,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Provision as a run based compute target\n",
"### Get the default compute target\n",
"\n",
"You can provision AmlCompute as a compute target at run-time. In this case, the compute is auto-created for your run, scales up to max_nodes that you specify, and then **deleted automatically** after the run completes."
"In this case, we use the default `AmlCompute`target from the workspace."
]
},
{
@@ -186,12 +186,10 @@
"# create a new runconfig object\n",
"run_config = RunConfiguration()\n",
"\n",
"# signal that you want to use AmlCompute to execute script.\n",
"run_config.target = \"amlcompute\"\n",
"default_compute_target = ws.get_default_compute_target(type=\"CPU\")\n",
"\n",
"# AmlCompute will be created in the same region as workspace\n",
"# Set vm size for AmlCompute\n",
"run_config.amlcompute.vm_size = 'STANDARD_D2_V2'\n",
"# signal that you want to use AmlCompute to execute script.\n",
"run_config.target = default_compute_target.name\n",
"\n",
"# enable Docker \n",
"run_config.environment.docker.enabled = True\n",
@@ -202,9 +200,6 @@
"# use conda_dependencies.yml to create a conda environment in the Docker image for execution\n",
"run_config.environment.python.user_managed_dependencies = False\n",
"\n",
"# auto-prepare the Docker image when used for execution (if it is not already prepared)\n",
"run_config.auto_prepare_environment = True\n",
"\n",
"# specify CondaDependencies obj\n",
"run_config.environment.python.conda_dependencies = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
"\n",

View File

@@ -21,22 +21,55 @@
"metadata": {},
"source": [
"# 02. Train locally\n",
"* Create or load workspace.\n",
"* Create scripts locally.\n",
"* Create `train.py` in a folder, along with a `my.lib` file.\n",
"* Configure & execute a local run in a user-managed Python environment.\n",
"* Configure & execute a local run in a system-managed Python environment.\n",
"* Configure & execute a local run in a Docker environment.\n",
"* Query run metrics to find the best model\n",
"* Register model for operationalization."
"_**Train a model locally: Directly on your machine and within a Docker container**_\n",
"\n",
"---\n",
"\n",
"\n",
"## Table of contents\n",
"1. [Introduction](#intro)\n",
"1. [Pre-requisites](#pre-reqs)\n",
"1. [Initialize Workspace](#init)\n",
"1. [Create An Experiment](#exp)\n",
"1. [View training and auxiliary scripts](#view)\n",
"1. [Configure & Run](#config-run)\n",
" 1. User-managed environment\n",
" 1. Set the environment up\n",
" 1. Submit the script to run in the user-managed environment\n",
" 1. Get run history details\n",
" 1. System-managed environment\n",
" 1. Set the environment up\n",
" 1. Submit the script to run in the system-managed environment\n",
" 1. Get run history details\n",
" 1. Docker-based execution\n",
" 1. Set the environment up\n",
" 1. Submit the script to run in the system-managed environment\n",
" 1. Get run history details\n",
" 1. Use a custom Docker image\n",
"1. [Query run metrics](#query)\n",
"\n",
"---\n",
"\n",
"## 1. Introduction <a id='intro'></a>\n",
"\n",
"In this notebook, we will learn how to:\n",
"\n",
"* Connect to our AML workspace\n",
"* Create or load a workspace\n",
"* Configure & execute a local run in:\n",
" - a user-managed Python environment\n",
" - a system-managed Python environment\n",
" - a Docker environment\n",
"* Query run metrics to find the best model trained in the run\n",
"* Register that model for operationalization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
"## 2. Pre-requisites <a id='pre-reqs'></a>\n",
"In this notebook, we assume that you have set your Azure Machine Learning workspace. If you have not, make sure you go through the [configuration notebook](../../../configuration.ipynb) first. In the end, you should have configuration file that contains the subscription ID, resource group and name of your workspace."
]
},
{
@@ -55,9 +88,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Initialize Workspace\n",
"## 3. Initialize Workspace <a id='init'></a>\n",
"\n",
"Initialize a workspace object from persisted configuration."
"Initialize your workspace object from configuration file"
]
},
{
@@ -76,8 +109,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create An Experiment\n",
"**Experiment** is a logical container in an Azure ML Workspace. It hosts run records which can include run metrics and output artifacts from your experiments."
"## 4. Create An Experiment <a id='exp'></a>\n",
"An experiment is a logical container in an Azure ML Workspace. It contains a series of trials called `Runs`. As such, it hosts run records such as run metrics, logs, and other output artifacts from your experiments."
]
},
{
@@ -95,9 +128,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## View `train.py`\n",
"## 5. View training and auxiliary scripts <a id='view'></a>\n",
"\n",
"`train.py` is already created for you."
"For convenience, we already created the training (`train.py`) script and supportive libraries (`mylib.py`) for you. Take a few minutes to examine both files."
]
},
{
@@ -110,13 +143,6 @@
" print(f.read())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note `train.py` also references a `mylib.py` file."
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -131,9 +157,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configure & Run\n",
"### User-managed environment\n",
"Below, we use a user-managed run, which means you are responsible to ensure all the necessary packages are available in the Python environment you choose to run the script."
"## 6. Configure & Run <a id='config-run'></a>\n",
"### 6.A User-managed environment\n",
"\n",
"#### 6.A.a Set the environment up\n",
"When using a user-managed environment, you are responsible for ensuring that all the necessary packages are available in the Python environment you choose to run the script in."
]
},
{
@@ -157,8 +185,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit script to run in the user-managed environment\n",
"Note whole script folder is submitted for execution, including the `mylib.py` file."
"#### 6.A.b Submit the script to run in the user-managed environment\n",
"Whatever the way you manage your environment, you need to use the `ScriptRunConfig` class. It allows you to further configure your run by pointing to the `train.py` script and to the working directory, which also contains the `mylib.py` file. These inputs indeed provide the commands to execute in the run. Once the run is configured, you submit it to your experiment."
]
},
{
@@ -177,7 +205,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get run history details"
"#### 6.A.c Get run history details\n",
"\n",
"While all calculations were run on your machine (cf. below), by using a `run` you also captured the results of your calculations into your run and experiment. You can then see them on the Azure portal, through the link displayed as output of the following cell.\n",
"\n",
"**Note**: The recording of the computation results into your run was made possible by the `run.log()` commands in the `train.py` file."
]
},
{
@@ -200,7 +232,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Block to wait till run finishes."
"Block any execution to wait until the run finishes."
]
},
{
@@ -216,8 +248,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### System-managed environment\n",
"You can also ask the system to build a new conda environment and execute your scripts in it. The environment is built once and will be reused in subsequent executions as long as the conda dependencies remain unchanged. "
"**Note:** All these calculations were run on your local machine, in the conda environment you defined above. You can find the results in:\n",
"- `~/.azureml/envs/azureml_xxxx` for the conda environment you just created\n",
"- `~/AppData/Local/Temp/azureml_runs/train-on-local_xxxx` for the machine learning models you trained (this path may differ depending on the platform you use). This folder also contains\n",
" - Logs (under azureml_logs/)\n",
" - Output pickled files (under outputs/)\n",
" - The configuration files (credentials, local and docker image setups)\n",
" - The train.py and mylib.py scripts\n",
" - The current notebook\n",
"\n",
"Take a few minutes to examine the output of the cell above. It shows the content of some of the log files, and extra information on the conda environment used."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 6.B System-managed environment\n",
"#### 6.B.a Set the environment up\n",
"Now, instead of managing the setup of the environment yourself, you can ask the system to build a new conda environment for you. The environment is built once, and will be reused in subsequent executions as long as the conda dependencies remain unchanged."
]
},
{
@@ -231,7 +280,6 @@
"run_config_system_managed = RunConfiguration()\n",
"\n",
"run_config_system_managed.environment.python.user_managed_dependencies = False\n",
"run_config_system_managed.auto_prepare_environment = True\n",
"\n",
"# Specify conda dependencies with scikit-learn\n",
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
@@ -242,8 +290,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit script to run in the system-managed environment\n",
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 mninutes. But this conda environment is reused so long as you don't change the conda dependencies."
"#### 6.B.b Submit the script to run in the system-managed environment\n",
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 minutes.\n",
"\n",
"The commands used to execute the run are then the same as the ones you used above."
]
},
{
@@ -260,7 +310,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Get run history details"
"#### 6.B.c Get run history details"
]
},
{
@@ -272,13 +322,6 @@
"run"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Block and wait till run finishes."
]
},
{
"cell_type": "code",
"execution_count": null,
@@ -292,12 +335,34 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Docker-based execution\n",
"**IMPORTANT**: You must have Docker engine installed locally in order to use this execution mode. If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n",
"### 6.C Docker-based execution\n",
"In this section, you will train the same models, but you will do so in a Docker container, on your local machine. For this, you then need to have the Docker engine installed locally. If you don't have it yet, please follow the instructions below.\n",
"\n",
"NOTE: The GPU base image must be used on Microsoft Azure Services only such as ACI, AML Compute, Azure VMs, and AKS.\n",
"#### How to install Docker\n",
"\n",
"You can also ask the system to pull down a Docker image and execute your scripts in it."
"- [Linux](https://docs.docker.com/install/linux/docker-ce/ubuntu/)\n",
"- [MacOs](https://docs.docker.com/docker-for-mac/install/)\n",
"- [Windows](https://docs.docker.com/docker-for-windows/install/)\n",
"\n",
" In case of issues, troubleshooting documentation can be found [here](https://docs.docker.com/docker-for-windows/troubleshoot/#running-docker-for-windows-in-nested-virtualization-scenarios). Additionally, you can follow the steps below, if Virtualization is not enabled on your machine:\n",
" - Go to Task Manager > Performance\n",
" - Check that Virtualization is enabled\n",
" - If it is not, go to `Start > Settings > Update and security > Recovery > Advanced Startup - Restart now > Troubleshoot > Advanced options > UEFI firmware settings - restart`\n",
" - In the BIOS, go to `Advanced > System options > Click the \"Virtualization Technology (VTx)\" only > Save > Exit > Save all changes` -- This will restart the machine\n",
"\n",
"**Notes**: \n",
"- If your kernel is already running in a Docker container, such as **Azure Notebooks**, this mode will **NOT** work.\n",
"- If you use a GPU base image, it needs to be used on Microsoft Azure Services such as ACI, AML Compute, Azure VMs, or AKS.\n",
"\n",
"You can also ask the system to pull down a Docker image and execute your scripts in it.\n",
"\n",
"#### 6.C.a Set the environment up\n",
"\n",
"In the cell below, you will configure your run to execute in a Docker container. It will:\n",
"- run on a CPU\n",
"- contain a conda environment in which the scikit-learn library will be installed.\n",
"\n",
"As before, you will finish configuring your run by pointing to the `train.py` and `mylib.py` files."
]
},
{
@@ -308,11 +373,10 @@
"source": [
"run_config_docker = RunConfiguration()\n",
"run_config_docker.environment.python.user_managed_dependencies = False\n",
"run_config_docker.auto_prepare_environment = True\n",
"run_config_docker.environment.docker.enabled = True\n",
"\n",
"# use the default CPU-based Docker image from Azure ML\n",
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n",
"run_config_docker.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE # Reference Docker image\n",
"\n",
"# Specify conda dependencies with scikit-learn\n",
"cd = CondaDependencies.create(conda_packages=['scikit-learn'])\n",
@@ -325,8 +389,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit script to run in the system-managed environment\n",
"A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 minutes. But this conda environment is reused so long as you don't change the conda dependencies."
"#### 6.C.b Submit the script to run in the system-managed environment\n",
"\n",
"The run is now configured and ready to be executed in a Docker container. If you are running this for the first time, the Docker container will get created, as well as the conda environment inside it. This will take several minutes. Once all this is generated, however, this conda environment will be reused as long as you don't change the conda dependencies."
]
},
{
@@ -345,7 +410,34 @@
" else:\n",
" run = exp.submit(src)\n",
"else:\n",
" print(\"Docker engine not installed.\")"
" print(\"Docker engine is not installed.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Potential issue on Windows and how to solve it\n",
"\n",
"If you are using a Windows machine, the creation of the Docker image may fail, and you may see the following error message\n",
"`docker: Error response from daemon: Drive has not been shared. Failed to launch docker container. Check that docker is running and that C:\\ on Windows and /tmp elsewhere is shared.`\n",
"\n",
"This is because the process above tries to create a linux-based, i.e. non-windows-based, Docker image. To fix this, you can:\n",
"- Open the Docker user interface\n",
"- Navigate to Settings > Shared drives\n",
"- Select C (or both C and D, if you have one)\n",
"- Apply\n",
"\n",
"When this is done, you can try and re-run the command above.\n",
"\n",
"<img src=\"./docker_settings.png\" width=\"500\" align=\"left\">"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 6.C.c Get run history details"
]
},
{
@@ -354,7 +446,7 @@
"metadata": {},
"outputs": [],
"source": [
"#Get run history details\n",
"# Get run history details\n",
"run"
]
},
@@ -371,22 +463,51 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Use a custom Docker image\n",
"The results obtained here should be the same as those obtained before. However, take a look at the \"Execution summary\" section in the output of the cell above. Look for \"docker\". There, you should see the \"enabled\" field set to True. Compare this to the 2 prior runs (\"enabled\" was then set to False)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 6.C.d Use a custom Docker image\n",
"\n",
"You can also specify a custom Docker image if you don't want to use the default image provided by Azure ML.\n",
"You can also specify a custom Docker image, if you don't want to use the default image provided by Azure ML.\n",
"\n",
"You can either pull an image directly from Anaconda:\n",
"```python\n",
"# use an image available in Docker Hub without authentication\n",
"# Use an image available in Docker Hub without authentication\n",
"run_config_docker.environment.docker.base_image = \"continuumio/miniconda3\"\n",
"```\n",
"\n",
"# or, use an image available in a private Azure Container Registry\n",
"Or one of the images you may already have created:\n",
"```python\n",
"# or, use an image available in your private Azure Container Registry\n",
"run_config_docker.environment.docker.base_image = \"mycustomimage:1.0\"\n",
"run_config_docker.environment.docker.base_image_registry.address = \"myregistry.azurecr.io\"\n",
"run_config_docker.environment.docker.base_image_registry.username = \"username\"\n",
"run_config_docker.environment.docker.base_image_registry.password = \"password\"\n",
"```\n",
"\n",
"When you are using a custom Docker image, you might already have your environment setup properly in a Python environment in the Docker image. In that case, you can skip specifying conda dependencies, and just use `user_managed_dependencies` option instead:\n",
"##### Where to find my Docker image name and registry credentials\n",
" If you do not know what the name of your Docker image or container registry is, or if you don't know how to access the username and password needed above, proceed as follows:\n",
" - Docker image name:\n",
" - In the portal, under your resource group, click on your current workspace\n",
" - Click on Experiments\n",
" - Click on Images\n",
" - Click on the image of your choice\n",
" - Copy the \"ID\" string\n",
" - In this notebook, replace \"mycustomimage:1/0\" with that ID string\n",
" - Username and password:\n",
" - In the portal, under your resource group, click on the container registry associated with your workspace\n",
" - If you have several and don't know which one you need, click on your workspace, go to Overview and click on the \"Registry\" name on the upper right of the screen\n",
" - There, go to \"Access keys\"\n",
" - Copy the username and one of the passwords\n",
" - In this notebook, replace \"username\" and \"password\" by these values\n",
"\n",
"In any case, you will need to use the lines above in place of the line marked as `# Reference Docker image` in section 6.C.a. \n",
"\n",
"When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, you can skip specifying conda dependencies, and just use the `user_managed_dependencies` option instead:\n",
"```python\n",
"run_config_docker.environment.python.user_managed_dependencies = True\n",
"# path to the Python environment in the custom Docker image\n",
@@ -398,7 +519,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Query run metrics"
"## 7. Query run metrics <a id='query'></a>\n",
"\n",
"Once your run has completed, you can now extract the metrics you captured by using the `get_metrics` method. As shown in the `train.py` file, these metrics are \"alpha\" and \"mse\"."
]
},
{
@@ -412,7 +535,7 @@
},
"outputs": [],
"source": [
"# get all metris logged in the run\n",
"# Get all metris logged in the run\n",
"run.get_metrics()\n",
"metrics = run.get_metrics()"
]
@@ -483,7 +606,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We know the model `ridge_0.40.pkl` is the best performing model from the earlier queries. So let's register it with the workspace."
"From the results obtained above, `ridge_0.40.pkl` is the best performing model. You can now register that particular model with the workspace. Once you have done so, go back to the portal and click on \"Models\". You should see it there."
]
},
{
@@ -492,7 +615,7 @@
"metadata": {},
"outputs": [],
"source": [
"# supply a model name, and the full path to the serialized model file.\n",
"# Supply a model name, and the full path to the serialized model file.\n",
"model = run.register_model(model_name='best_ridge_model', model_path='./outputs/ridge_0.40.pkl')"
]
},
@@ -502,14 +625,14 @@
"metadata": {},
"outputs": [],
"source": [
"print(model.name, model.version, model.url)"
"print(\"Registered model:\\n --> Name: {}\\n --> Version: {}\\n --> URL: {}\".format(model.name, model.version, model.url))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now you can deploy this model following the example in the 01 notebook."
"You can now deploy your model by following [this example](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb)."
]
}
],

View File

@@ -37,7 +37,7 @@
"metadata": {},
"source": [
"## Prerequisites\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration notebook](../../../configuration.ipynb) first if you haven't."
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
]
},
{

View File

@@ -43,7 +43,7 @@
" 1. Deploy your webservice\n",
" 1. Test your webservice\n",
" 1. Clean up\n",
"1. [Next Steps](#Next%20Steps)\n",
"1. [Next Steps](#nextsteps)\n",
"\n",
"---\n",
"\n",
@@ -64,7 +64,7 @@
"---\n",
"\n",
"## Setup\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met. From the configuration, the important sections are the workspace configuration and ACI regristration.\n",
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't already to establish your connection to the AzureML Workspace. From the configuration, the important sections are the workspace configuration and ACI regristration.\n",
"\n",
"We will also need the following libraries install to our conda environment. If these are not installed, use the following command to do so and restart the notebook.\n",
"```shell\n",
@@ -177,7 +177,12 @@
"run.log('mse', mean_squared_error(data['test']['y'], preds))\n",
"\n",
"# Save the model to the outputs directory for capture\n",
"joblib.dump(value=regression_model, filename='outputs/model.pkl')\n",
"model_file_name = 'outputs/model.pkl'\n",
"\n",
"joblib.dump(value = regression_model, filename = model_file_name)\n",
"\n",
"# upload the model file explicitly into artifacts \n",
"run.upload_file(name = model_file_name, path_or_stream = model_file_name)\n",
"\n",
"# Complete the run\n",
"run.complete()"
@@ -221,8 +226,6 @@
"import numpy as np\n",
"from tqdm import tqdm\n",
"\n",
"model_name = \"model.pkl\"\n",
"\n",
"# list of numbers from 0 to 1.0 with a 0.05 interval\n",
"alphas = np.arange(0.0, 1.0, 0.05)\n",
"\n",
@@ -421,7 +424,7 @@
"### Describe your target compute\n",
"In addition to the container, we also need to describe the type of compute we want to allocate for our webservice. In in this example we are using an [Azure Container Instance](https://azure.microsoft.com/en-us/services/container-instances/) which is a good choice for quick and cost-effective dev/test deployment scenarios. ACI instances require the number of cores you want to run and memory you need. Tags and descriptions are available for you to identify the instances in AML when viewing the Compute tab in the AML Portal.\n",
"\n",
"For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](11.production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n"
"For production workloads, it is better to use [Azure Kubernentes Service (AKS)](https://azure.microsoft.com/en-us/services/kubernetes-service/) instead. Try [this notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) to see how that can be done from Azure ML.\n"
]
},
{
@@ -517,6 +520,9 @@
"outputs": [],
"source": [
"import json\n",
"\n",
"service = ws.webservices['my-aci-svc']\n",
"\n",
"# scrape the first row from the test set.\n",
"test_samples = json.dumps({\"data\": X_test[0:1, :].tolist()})\n",
"\n",
@@ -650,7 +656,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"<a id='nextsteps'></a>\n",
"## Next Steps"
]
},
@@ -668,13 +674,6 @@
"If you want to deploy models to a production cluster try the [production-deploy-to-aks](../../deployment/production-deploy-to-aks\n",
") notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -698,7 +697,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
"version": "3.6.5"
}
},
"nbformat": 4,

View File

@@ -15,7 +15,7 @@ You will find in this repo:
- [How-To Guide Notebooks](how-to-guides) for more in-depth sample code at feature level.
## Installation
Here are the [SDK installation steps](https://docs.microsoft.com/python/api/overview/azure/dataprep/intro?view=azure-dataprep-py#install).
Here are the [SDK installation steps](https://aka.ms/aml-data-prep-installation).
## Documentation
Here is more information on how to use the new Data Prep SDK:
@@ -31,17 +31,40 @@ If you have any questions or feedback, send us an email at: [askamldataprep@micr
## Release Notes
### 2019-04-08 (version 1.1.1)
### 2019-04-17 (version 1.1.2)
Note: Data Prep Python SDK will no longer install `numpy` and `pandas` packages. See [updated installation instructions](https://aka.ms/aml-data-prep-installation).
New features
- You can read multiple Datastore/DataPath/DataReference sources using read_* transforms.
- You can now use the Pivot transform.
- How-to guide: [Pivot notebook](https://aka.ms/aml-data-prep-pivot-nb)
- You can now use regular expressions in native functions.
- Examples:
- `dflow.filter(dprep.RegEx('pattern').is_match(dflow['column_name']))`
- `dflow.assert_value('column_name', dprep.RegEx('pattern').is_match(dprep.value))`
- You can now use `to_upper` and `to_lower` functions in expression language.
- You can now see the number of unique values of each column in a data profile.
- For some of the commonly used reader steps, you can now pass in the `infer_column_types` argument. If it is set to `True`, Data Prep will attempt to detect and automatically convert column types.
- `inference_arguments` is now deprecated.
- You can now call `Dataflow.shape`.
Bug fixes and improvements
- `keep_columns` now accepts an additional optional argument `validate_column_exists`, which checks if the result of `keep_columns` will contain any columns.
- All reader steps (which read from a file) now accept an additional optional argument `verify_exists`.
- Improved performance of reading from pandas dataframe and getting data profiles.
- Fixed a bug where slicing a single step from a Dataflow failed with a single index.
### 2019-04-08 (version 1.1.1)
New features
- You can read multiple Datastore/DataPath/DataReference sources using read_* transforms.
- You can perform the following operations on columns to create a new column: division, floor, modulo, power, length.
- Data Prep is now part of the Azure ML diagnostics suite and will log diagnostic information by default.
- To turn this off, set this environment variable to true: DISABLE_DPREP_LOGGER
Bug fixes and improvements
- Improved code documentation for commonly used classes and functions.
- Fixed a bug in auto_read_file that failed to read Excel files.
- Fixed a bug in auto_read_file that failed to read Excel files.
- Added option to overwrite the folder in read_pandas_dataframe.
- Improved performance of dotnetcore2 dependency installation, and added support for Fedora 27/28 and Ubuntu 1804.
- Improved the performance of reading from Azure Blobs.

View File

@@ -99,6 +99,46 @@
"dflow_length.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `to_upper()`\n",
"Using the to_upper() expression, add a new numeric column \"Upper Case\", which contains the length of the string in \"Primary Type\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow_to_upper = dflow.add_column(new_column_name='Upper Case',\n",
" prior_column='Primary Type',\n",
" expression=dflow['Primary Type'].to_upper())\n",
"dflow_to_upper.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `to_lower()`\n",
"Using the to_lower() expression, add a new numeric column \"Lower Case\", which contains the length of the string in \"Primary Type\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dflow_to_lower = dflow.add_column(new_column_name='Lower Case',\n",
" prior_column='Primary Type',\n",
" expression=dflow['Primary Type'].to_lower())\n",
"dflow_to_lower.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},

View File

@@ -110,7 +110,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When reading delimited files, you can let the underlying runtime infer the parsing parameters (e.g. separator, encoding, whether to use headers, etc.) simply by not providing them. In this case, you can read a file by specifying only its location, then retrieve the first 10 rows to evaluate the result."
"When reading delimited files, the only required parameter is `path`. Other parameters (e.g. separator, encoding, whether to use headers, etc.) are available to modify default behavior.In this case, you can read a file by specifying only its location, then retrieve the first 5 rows to evaluate the result."
]
},
{
@@ -144,7 +144,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the data set contains the correct headers and the extraneous row has been skipped by read_csv. Next, look at the data types of the columns."
"Now the data set contains the correct headers and the extraneous row has been skipped by `read_csv`. Next, look at the data types of the columns."
]
},
{
@@ -160,8 +160,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Unfortunately, all of the columns came back as strings. This is because, by default, Data Prep will not change the type of the data. Since the data source is a text file, all values are kept as strings. In this case, however, numeric columns should be parsed as numbers. To do this, set the `inference_arguments` parameter to a new instance of the `InferenceArguments` class, which will trigger type inference to be performed.\n",
"Note that setting inference arguments at this step also requires you to choose a strategy for dealing with ambiguous dates. The example below shows the month before day option."
"Unfortunately, all of the columns came back as strings. This is because, by default, Data Prep will not change the type of the data. Since the data source is a text file, all values are kept as strings. In this case, however, numeric columns should be parsed as numbers. To do this, set the `infer_column_types` parameter to `True`, which will trigger type inference to be performed.\n"
]
},
{
@@ -172,7 +171,7 @@
"source": [
"dflow_inferred_types = dprep.read_csv(path='../data/crime_duplicate_headers.csv',\n",
" skip_rows=1,\n",
" inference_arguments=dprep.InferenceArguments(day_first=False))\n",
" infer_column_types=True)\n",
"dflow_inferred_types.dtypes"
]
},

View File

@@ -192,7 +192,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
"version": "3.6.4"
}
},
"nbformat": 4,

View File

@@ -0,0 +1,155 @@
# Azure Machine Learning Datasets (preview)
Azure Machine Learning Datasets (preview) make it easier to access and work with your data. Datasets manage data in various scenarios such as model training and pipeline creation. Using the Azure Machine Learning SDK, you can access underlying storage, explore and prepare data, manage the life cycle of different Dataset definitions, and compare between Datasets used in training and in production.
## Create and Register Datasets
It's easy to create Datasets from either local files, or Azure Datastores.
```Python
from azureml.core.workspace import Workspace
from azureml.core.datastore import Datastore
from azureml.core.dataset import Dataset
datastore_name = 'your datastore name'
# get existing workspace
workspace = Workspace.from_config()
# get Datastore from the workspace
dstore = Datastore.get(workspace, datastore_name)
# create an in-memory Dataset on your local machine
dataset = Dataset.from_delimited_files(dstore.path('data/src/crime.csv'))
```
To consume Datasets across various scenarios in Azure Machine Learning service such as automated machine learning, model training and pipeline creation, you need to register the Datasets with your workspace. By doing so, you can also share and reuse the Datasets within your organization.
```Python
dataset = dataset.register(workspace = workspace,
name = 'dataset_crime',
description = 'Training data'
)
```
## Sampling
Sampling can be particular useful with Datasets that are too large to efficiently analyze in full. It enables data scientists to work with a manageable amount of data to build and train machine learning models. At this time, the [`sample()`](https://docs.microsoft.com//python/api/azureml-core/azureml.core.dataset(class)?view=azure-ml-py#sample-sample-strategy--arguments-) method from the Dataset class supports Top N, Simple Random, and Stratified sampling strategies.
After sampling, you can convert your sampled Dataset to pandas DataFrame for training. By using the native [`sample()`](https://docs.microsoft.com//python/api/azureml-core/azureml.core.dataset(class)?view=azure-ml-py#sample-sample-strategy--arguments-) method from the Dataset class, you will load the sampled data on the fly instead of loading full data into memory.
### Top N sample
For Top N sampling, the first n records of your Dataset are your sample. This is helpful if you are just trying to get an idea of what your data records look like or to see what fields are in your data.
```Python
top_n_sample_dataset = dataset.sample('top_n', {'n': 5})
top_n_sample_dataset.to_pandas_dataframe()
```
### Simple random sample
In Simple Random sampling, every member of the data population has an equal chance of being selected as a part of the sample. In the `simple_random` sample strategy, the records from your Dataset are selected based on the probability specified and returns a modified Dataset. The seed parameter is optional.
```Python
simple_random_sample_dataset = dataset.sample('simple_random', {'probability':0.3, 'seed': seed})
simple_random_sample_dataset.to_pandas_dataframe()
```
### Stratified sample
Stratified samples ensure that certain groups of a population are represented in the sample. In the `stratified` sample strategy, the population is divided into strata, or subgroups, based on similarities, and records are randomly selected from each strata according to the strata weights indicated by the `fractions` parameter.
In the following example, we group each record by the specified columns, and include said record based on the strata X weight information in `fractions`. If a strata is not specified or the record cannot be grouped, the default weight to sample is 0.
```Python
# take 50% of records with `Primary Type` as `THEFT` and 20% of records with `Primary Type` as `DECEPTIVE PRACTICE` into sample Dataset
fractions = {}
fractions[('THEFT',)] = 0.5
fractions[('DECEPTIVE PRACTICE',)] = 0.2
sample_dataset = dataset.sample('stratified', {'columns': ['Primary Type'], 'fractions': fractions, 'seed': seed})
sample_dataset.to_pandas_dataframe()
```
## Explore with summary statistics
Detect anomalies, missing values, or error counts with the [`get_profile()`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-profile-arguments-none--generate-if-not-exist-true--workspace-none--compute-target-none-) method. This function gets the profile and summary statistics of your data, which in turn helps determine the necessary data preparation operations to apply.
```Python
# get pre-calculated profile
# if there is no precalculated profile available or the precalculated profile is not up-to-date, this method will generate a new profile of the Dataset
dataset.get_profile()
```
||Type|Min|Max|Count|Missing Count|Not Missing Count|Percent missing|Error Count|Empty count|0.1% Quantile|1% Quantile|5% Quantile|25% Quantile|50% Quantile|75% Quantile|95% Quantile|99% Quantile|99.9% Quantile|Mean|Standard Deviation|Variance|Skewness|Kurtosis
-|----|---|---|-----|-------------|-----------------|---------------|-----------|-----------|-------------|-----------|-----------|------------|------------|------------|------------|------------|--------------|----|------------------|--------|--------|--------
ID|FieldType.INTEGER|1.04986e+07|1.05351e+07|10.0|0.0|10.0|0.0|0.0|0.0|1.04986e+07|1.04992e+07|1.04986e+07|1.05166e+07|1.05209e+07|1.05259e+07|1.05351e+07|1.05351e+07|1.05351e+07|1.05195e+07|12302.7|1.51358e+08|-0.495701|-1.02814
Case Number|FieldType.STRING|HZ239907|HZ278872|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Date|FieldType.DATE|2016-04-04 23:56:00+00:00|2016-04-15 17:00:00+00:00|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Block|FieldType.STRING|004XX S KILBOURN AVE|113XX S PRAIRIE AVE|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
IUCR|FieldType.INTEGER|810|1154|10.0|0.0|10.0|0.0|0.0|0.0|810|850|810|890|1136|1153|1154|1154|1154|1058.5|137.285|18847.2|-0.785501|-1.3543
Primary Type|FieldType.STRING|DECEPTIVE PRACTICE|THEFT|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Description|FieldType.STRING|BOGUS CHECK|OVER $500|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Location Description|FieldType.STRING||SCHOOL, PUBLIC, BUILDING|10.0|0.0|10.0|0.0|0.0|1.0||||||||||||||
Arrest|FieldType.BOOLEAN|False|False|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Domestic|FieldType.BOOLEAN|False|False|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Beat|FieldType.INTEGER|531|2433|10.0|0.0|10.0|0.0|0.0|0.0|531|531|531|614|1318.5|1911|2433|2433|2433|1371.1|692.094|478994|0.105418|-1.60684
District|FieldType.INTEGER|5|24|10.0|0.0|10.0|0.0|0.0|0.0|5|5|5|6|13|19|24|24|24|13.5|6.94822|48.2778|0.0930109|-1.62325
Ward|FieldType.INTEGER|1|48|10.0|0.0|10.0|0.0|0.0|0.0|1|5|1|9|22.5|40|48|48|48|24.5|16.2635|264.5|0.173723|-1.51271
Community Area|FieldType.INTEGER|4|77|10.0|0.0|10.0|0.0|0.0|0.0|4|8.5|4|24|37.5|71|77|77|77|41.2|26.6366|709.511|0.112157|-1.73379
FBI Code|FieldType.INTEGER|6|11|10.0|0.0|10.0|0.0|0.0|0.0|6|6|6|6|11|11|11|11|11|9.4|2.36643|5.6|-0.702685|-1.59582
X Coordinate|FieldType.INTEGER|1.16309e+06|1.18336e+06|10.0|7.0|3.0|0.7|0.0|0.0|1.16309e+06|1.16309e+06|1.16309e+06|1.16401e+06|1.16678e+06|1.17921e+06|1.18336e+06|1.18336e+06|1.18336e+06|1.17108e+06|10793.5|1.165e+08|0.335126|-2.33333
Y Coordinate|FieldType.INTEGER|1.8315e+06|1.908e+06|10.0|7.0|3.0|0.7|0.0|0.0|1.8315e+06|1.8315e+06|1.8315e+06|1.83614e+06|1.85005e+06|1.89352e+06|1.908e+06|1.908e+06|1.908e+06|1.86319e+06|39905.2|1.59243e+09|0.293465|-2.33333
Year|FieldType.INTEGER|2016|2016|10.0|0.0|10.0|0.0|0.0|0.0|2016|2016|2016|2016|2016|2016|2016|2016|2016|2016|0|0|NaN|NaN
Updated On|FieldType.DATE|2016-05-11 15:48:00+00:00|2016-05-27 15:45:00+00:00|10.0|0.0|10.0|0.0|0.0|0.0||||||||||||||
Latitude|FieldType.DECIMAL|41.6928|41.9032|10.0|7.0|3.0|0.7|0.0|0.0|41.6928|41.6928|41.6928|41.7057|41.7441|41.8634|41.9032|41.9032|41.9032|41.78|0.109695|0.012033|0.292478|-2.33333
Longitude|FieldType.DECIMAL|-87.6764|-87.6043|10.0|7.0|3.0|0.7|0.0|0.0|-87.6764|-87.6764|-87.6764|-87.6734|-87.6645|-87.6194|-87.6043|-87.6043|-87.6043|-87.6484|0.0386264|0.001492|0.344429|-2.33333
Location|FieldType.STRING||(41.903206037, -87.676361925)|10.0|0.0|10.0|0.0|0.0|7.0||||||||||||||
## Training with Dataset
Now that you have registered your Dataset, you can call up the Dataset and convert it to pandas DataFrame or Spark DataFrame easily in your train.py script.
```Python
# Sample train.py script
import azureml.core
import pandas as pd
import datetime
import shutil
from azureml.core import Workspace, Datastore, Dataset, Experiment, Run
from sklearn.model_selection import train_test_split
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from sklearn.tree import DecisionTreeClassifier
run = Run.get_context()
workspace = run.experiment.workspace
# Access Dataset registered with the workspace by name
dataset_name = 'training_data'
dataset = Dataset.get(workspace=workspace, name=dataset_name)
ds_def = dataset.get_definition()
dataset_val, dataset_train = ds_def.random_split(percentage=0.3)
y_df = dataset_train.keep_columns(['HasDetections']).to_pandas_dataframe()
x_df = dataset_train.drop_columns(['HasDetections']).to_pandas_dataframe()
y_val = dataset_val.keep_columns(['HasDetections']).to_pandas_dataframe()
x_val = dataset_val.drop_columns(['HasDetections']).to_pandas_dataframe()
data = {"train": {"X": x_df, "y": y_df},
"validation": {"X": x_val, "y": y_val}}
clf = DecisionTreeClassifier().fit(data["train"]["X"], data["train"]["y"])
print('Accuracy of Decision Tree classifier on training set: {:.2f}'.format(clf.score(x_df, y_df)))
print('Accuracy of Decision Tree classifier on validation set: {:.2f}'.format(clf.score(x_val, y_val)))
```
For an end-to-end tutorial, you may refer to [Dataset tutorial](datasets-tutorial.ipynb). You will learn how to:
- Explore and prepare data for training the model.
- Register the Dataset in your workspace for easy access in training.
- Take snapshots of data to ensure models can be trained with the same data every time.
- Use registered Dataset in your training script.
- Create and use multiple Dataset definitions to ensure that updates to the definition don't break existing pipelines/scripts.

View File

@@ -18,5 +18,3 @@ If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwi
* [Part 2](regression-part2-automated-ml.ipynb): Train a model using Automated Machine Learning.
Also find quickstarts and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).
![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/README.png)

View File

@@ -9,13 +9,6 @@
"Licensed under the MIT License."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/img-classification-part1-training.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
@@ -133,9 +126,7 @@
"metadata": {},
"source": [
"### Create or Attach existing compute resource\n",
"By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n",
"\n",
"**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process."
"By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you use default Azure Machine Learning Compute as your training environment."
]
},
{
@@ -149,38 +140,10 @@
},
"outputs": [],
"source": [
"from azureml.core.compute import AmlCompute\n",
"from azureml.core.compute import ComputeTarget\n",
"import os\n",
"\n",
"# choose a name for your cluster\n",
"compute_name = os.environ.get(\"AML_COMPUTE_CLUSTER_NAME\", \"cpucluster\")\n",
"compute_min_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MIN_NODES\", 0)\n",
"compute_max_nodes = os.environ.get(\"AML_COMPUTE_CLUSTER_MAX_NODES\", 4)\n",
"\n",
"# This example uses CPU VM. For using GPU VM, set SKU to STANDARD_NC6\n",
"vm_size = os.environ.get(\"AML_COMPUTE_CLUSTER_SKU\", \"STANDARD_D2_V2\")\n",
"\n",
"\n",
"if compute_name in ws.compute_targets:\n",
" compute_target = ws.compute_targets[compute_name]\n",
" if compute_target and type(compute_target) is AmlCompute:\n",
" print('found compute target. just use it. ' + compute_name)\n",
"else:\n",
" print('creating a new compute target...')\n",
" provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,\n",
" min_nodes = compute_min_nodes, \n",
" max_nodes = compute_max_nodes)\n",
"\n",
" # create the cluster\n",
" compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
" \n",
" # can poll for a minimum number of nodes and for a specific timeout. \n",
" # if no min node count is provided it will use the scale settings for the cluster\n",
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
" \n",
" # For a more detailed view of current AmlCompute status, use get_status()\n",
" print(compute_target.get_status().serialize())"
"cluster_type = os.environ.get(\"AML_COMPUTE_CLUSTER_TYPE\", \"CPU\")\n",
"compute_target = ws.get_default_compute_target(cluster_type)"
]
},
{