mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-21 10:05:09 -05:00
update samples from Release-137 as a part of 1.0.53 SDK release
This commit is contained in:
@@ -38,6 +38,7 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
|||||||
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
- [Machine Learning Pipelines](./how-to-use-azureml/machine-learning-pipelines) - Examples showing how to create and use reusable pipelines for training and batch scoring
|
||||||
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
- [Deployment](./how-to-use-azureml/deployment) - Examples showing how to deploy and manage machine learning models and solutions
|
||||||
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
- [Azure Databricks](./how-to-use-azureml/azure-databricks) - Examples showing how to use Azure ML with Azure Databricks
|
||||||
|
- [Monitor Models](./how-to-use-azureml/monitor-models) - Examples showing how to enable model monitoring services such as DataDrift
|
||||||
|
|
||||||
---
|
---
|
||||||
## Documentation
|
## Documentation
|
||||||
@@ -52,6 +53,7 @@ The [How to use Azure ML](./how-to-use-azureml) folder contains specific example
|
|||||||
|
|
||||||
Visit following repos to see projects contributed by Azure ML users:
|
Visit following repos to see projects contributed by Azure ML users:
|
||||||
|
|
||||||
|
- [AMLSamples](https://github.com/Azure/AMLSamples) Number of end-to-end examples, including face recognition, predictive maintenance, customer churn and sentiment analysis.
|
||||||
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
- [Fine tune natural language processing models using Azure Machine Learning service](https://github.com/Microsoft/AzureML-BERT)
|
||||||
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
- [Fashion MNIST with Azure ML SDK](https://github.com/amynic/azureml-sdk-fashion)
|
||||||
|
|
||||||
|
|||||||
@@ -103,7 +103,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.48\r\n of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.0.53 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ name: azure_automl
|
|||||||
dependencies:
|
dependencies:
|
||||||
# The python interpreter version.
|
# The python interpreter version.
|
||||||
# Currently Azure ML only supports 3.5.2 and later.
|
# Currently Azure ML only supports 3.5.2 and later.
|
||||||
|
- pip
|
||||||
- nomkl
|
- nomkl
|
||||||
- python>=3.5.2,<3.6.8
|
- python>=3.5.2,<3.6.8
|
||||||
- nb_conda
|
- nb_conda
|
||||||
|
|||||||
@@ -578,7 +578,7 @@
|
|||||||
"metadata": {
|
"metadata": {
|
||||||
"authors": [
|
"authors": [
|
||||||
{
|
{
|
||||||
"name": "xiaga@microsoft.com, tosingli@microsoft.com, erwright@microsoft.com"
|
"name": "erwright"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
|
|||||||
@@ -587,7 +587,7 @@
|
|||||||
"metadata": {
|
"metadata": {
|
||||||
"authors": [
|
"authors": [
|
||||||
{
|
{
|
||||||
"name": "xiaga, tosingli, erwright"
|
"name": "erwright"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
|
|||||||
@@ -829,7 +829,7 @@
|
|||||||
"metadata": {
|
"metadata": {
|
||||||
"authors": [
|
"authors": [
|
||||||
{
|
{
|
||||||
"name": "erwright, tosingli"
|
"name": "erwright"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
|
|||||||
@@ -87,7 +87,7 @@ These instruction setup the integration for SQL Server 2017 on Windows.
|
|||||||
sudo /opt/mssql/mlservices/bin/python/python -m pip install --upgrade sklearn
|
sudo /opt/mssql/mlservices/bin/python/python -m pip install --upgrade sklearn
|
||||||
```
|
```
|
||||||
7. Start SQL Server.
|
7. Start SQL Server.
|
||||||
8. Execute the files aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql and AutoMLTrain.sql in SQL Server Management Studio.
|
8. Execute the files aml_model.sql, aml_connection.sql, AutoMLGetMetrics.sql, AutoMLPredict.sql, AutoMLForecast.sql and AutoMLTrain.sql in SQL Server Management Studio.
|
||||||
9. Create an Azure Machine Learning Workspace. You can use the instructions at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)
|
9. Create an Azure Machine Learning Workspace. You can use the instructions at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-workspace)
|
||||||
10. Create a config.json file file using the subscription id, resource group name and workspace name that you use to create the workspace. The file is described at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace)
|
10. Create a config.json file file using the subscription id, resource group name and workspace name that you use to create the workspace. The file is described at: [https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#workspace)
|
||||||
11. Create an Azure service principal. You can do this with the commands:
|
11. Create an Azure service principal. You can do this with the commands:
|
||||||
@@ -109,5 +109,5 @@ First you need to load the sample data in the database.
|
|||||||
|
|
||||||
You can then run the queries in the energy-demand folder:
|
You can then run the queries in the energy-demand folder:
|
||||||
* TrainEnergyDemand.sql runs AutoML, trains multiple models on data and selects the best model.
|
* TrainEnergyDemand.sql runs AutoML, trains multiple models on data and selects the best model.
|
||||||
* PredictEnergyDemand.sql predicts based on the most recent training run.
|
* ForecastEnergyDemand.sql forecasts based on the most recent training run.
|
||||||
* GetMetrics.sql returns all the metrics for each model in the most recent training run.
|
* GetMetrics.sql returns all the metrics for each model in the most recent training run.
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ Easily create and train a model using various deep neural networks (DNNs) as a f
|
|||||||
To learn more about the azureml-accel-model classes, see the section [Model Classes](#model-classes) below or the [Azure ML Accel Models SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py).
|
To learn more about the azureml-accel-model classes, see the section [Model Classes](#model-classes) below or the [Azure ML Accel Models SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel?view=azure-ml-py).
|
||||||
|
|
||||||
### Step 1: Create an Azure ML workspace
|
### Step 1: Create an Azure ML workspace
|
||||||
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
|
Follow [these instructions](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace) to install the Azure ML SDK on your local machine, create an Azure ML workspace, and set up your notebook environment, which is required for the next step.
|
||||||
|
|
||||||
### Step 2: Check your FPGA quota
|
### Step 2: Check your FPGA quota
|
||||||
Use the Azure CLI to check whether you have quota.
|
Use the Azure CLI to check whether you have quota.
|
||||||
|
|||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -230,11 +237,14 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Convert model\n",
|
"# Convert model\n",
|
||||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
|
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors_str)\n",
|
||||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||||
"convert_request.wait_for_completion(show_output=False)\n",
|
" # If the above call succeeded, get the converted model\n",
|
||||||
"converted_model = convert_request.result\n",
|
" converted_model = convert_request.result\n",
|
||||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"Model conversion failed. Showing output.\")\n",
|
||||||
|
" convert_request.wait_for_completion(show_output = True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Package into AccelContainerImage\n",
|
"# Package into AccelContainerImage\n",
|
||||||
"image_config = AccelContainerImage.image_configuration()\n",
|
"image_config = AccelContainerImage.image_configuration()\n",
|
||||||
@@ -298,6 +308,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
@@ -316,6 +327,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||||
@@ -342,10 +354,9 @@
|
|||||||
"## 5. Test the service\n",
|
"## 5. Test the service\n",
|
||||||
"<a id=\"create-client\"></a>\n",
|
"<a id=\"create-client\"></a>\n",
|
||||||
"### 5.a. Create Client\n",
|
"### 5.a. Create Client\n",
|
||||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
|
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||||
"\n",
|
|
||||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -356,18 +367,10 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||||
"from azureml.accel.client import PredictionClient\n",
|
"from azureml.accel import client_from_service\n",
|
||||||
"\n",
|
|
||||||
"address = aks_service.scoring_uri\n",
|
|
||||||
"ssl_enabled = address.startswith(\"https\")\n",
|
|
||||||
"address = address[address.find('/')+2:].strip('/')\n",
|
|
||||||
"port = 443 if ssl_enabled else 80\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Initialize AzureML Accelerated Models client\n",
|
"# Initialize AzureML Accelerated Models client\n",
|
||||||
"client = PredictionClient(address=address,\n",
|
"client = client_from_service(aks_service)"
|
||||||
" port=port,\n",
|
|
||||||
" use_ssl=ssl_enabled,\n",
|
|
||||||
" service_name=aks_service.name)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -486,7 +489,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.0"
|
"version": "3.5.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: accelerated-models-object-detection
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-accel-models
|
||||||
|
- tensorflow
|
||||||
|
- opencv-python
|
||||||
|
- matplotlib
|
||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -270,12 +277,15 @@
|
|||||||
"from azureml.accel import AccelOnnxConverter\n",
|
"from azureml.accel import AccelOnnxConverter\n",
|
||||||
"\n",
|
"\n",
|
||||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
||||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
"\n",
|
||||||
"convert_request.wait_for_completion(show_output = False)\n",
|
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||||
"# If the above call succeeded, get the converted model\n",
|
" # If the above call succeeded, get the converted model\n",
|
||||||
"converted_model = convert_request.result\n",
|
" converted_model = convert_request.result\n",
|
||||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||||
" converted_model.id, converted_model.created_time, '\\n')"
|
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"Model conversion failed. Showing output.\")\n",
|
||||||
|
" convert_request.wait_for_completion(show_output = True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -366,6 +376,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
@@ -384,9 +395,10 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"#Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||||
"# Authentication is enabled by default, but for testing we specify False\n",
|
"# Authentication is enabled by default, but for testing we specify False\n",
|
||||||
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
|
"aks_config = AksWebservice.deploy_configuration(autoscale_enabled=False,\n",
|
||||||
" num_replicas=1,\n",
|
" num_replicas=1,\n",
|
||||||
@@ -415,10 +427,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### 7.a. Create Client\n",
|
"### 7.a. Create Client\n",
|
||||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions.\n",
|
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||||
"\n",
|
|
||||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice, see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -429,18 +440,10 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||||
"from azureml.accel.client import PredictionClient\n",
|
"from azureml.accel import client_from_service\n",
|
||||||
"\n",
|
|
||||||
"address = aks_service.scoring_uri\n",
|
|
||||||
"ssl_enabled = address.startswith(\"https\")\n",
|
|
||||||
"address = address[address.find('/')+2:].strip('/')\n",
|
|
||||||
"port = 443 if ssl_enabled else 80\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Initialize AzureML Accelerated Models client\n",
|
"# Initialize AzureML Accelerated Models client\n",
|
||||||
"client = PredictionClient(address=address,\n",
|
"client = client_from_service(aks_service)"
|
||||||
" port=port,\n",
|
|
||||||
" use_ssl=ssl_enabled,\n",
|
|
||||||
" service_name=aks_service.name)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -540,7 +543,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.0"
|
"version": "3.5.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -0,0 +1,6 @@
|
|||||||
|
name: accelerated-models-quickstart
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-accel-models
|
||||||
|
- tensorflow
|
||||||
@@ -1,5 +1,12 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -410,6 +417,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"# Launch the training\n",
|
"# Launch the training\n",
|
||||||
"tf.reset_default_graph()\n",
|
"tf.reset_default_graph()\n",
|
||||||
"sess = tf.Session(graph=tf.get_default_graph())\n",
|
"sess = tf.Session(graph=tf.get_default_graph())\n",
|
||||||
@@ -582,11 +590,14 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Convert model\n",
|
"# Convert model\n",
|
||||||
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
"convert_request = AccelOnnxConverter.convert_tf_model(ws, registered_model, input_tensors, output_tensors)\n",
|
||||||
"# If it fails, you can run wait_for_completion again with show_output=True.\n",
|
"if convert_request.wait_for_completion(show_output = False):\n",
|
||||||
"convert_request.wait_for_completion(show_output=False)\n",
|
" # If the above call succeeded, get the converted model\n",
|
||||||
"converted_model = convert_request.result\n",
|
" converted_model = convert_request.result\n",
|
||||||
"print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
" print(\"\\nSuccessfully converted: \", converted_model.name, converted_model.url, converted_model.version, \n",
|
||||||
" converted_model.id, converted_model.created_time, '\\n')\n",
|
" converted_model.id, converted_model.created_time, '\\n')\n",
|
||||||
|
"else:\n",
|
||||||
|
" print(\"Model conversion failed. Showing output.\")\n",
|
||||||
|
" convert_request.wait_for_completion(show_output = True)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Package into AccelContainerImage\n",
|
"# Package into AccelContainerImage\n",
|
||||||
"image_config = AccelContainerImage.image_configuration()\n",
|
"image_config = AccelContainerImage.image_configuration()\n",
|
||||||
@@ -655,6 +666,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"aks_target.wait_for_completion(show_output = True)\n",
|
"aks_target.wait_for_completion(show_output = True)\n",
|
||||||
"print(aks_target.provisioning_state)\n",
|
"print(aks_target.provisioning_state)\n",
|
||||||
"print(aks_target.provisioning_errors)"
|
"print(aks_target.provisioning_errors)"
|
||||||
@@ -673,6 +685,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
"# Set the web service configuration (for creating a test service, we don't want autoscale enabled)\n",
|
||||||
@@ -700,10 +713,9 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"<a id=\"create-client\"></a>\n",
|
"<a id=\"create-client\"></a>\n",
|
||||||
"### 9.a. Create Client\n",
|
"### 9.a. Create Client\n",
|
||||||
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We have a client that can call into the docker image to get predictions. \n",
|
"The image supports gRPC and the TensorFlow Serving \"predict\" API. We will create a PredictionClient from the Webservice object that can call into the docker image to get predictions. If you do not have the Webservice object, you can also create [PredictionClient](https://docs.microsoft.com/en-us/python/api/azureml-accel-models/azureml.accel.predictionclient?view=azure-ml-py) directly.\n",
|
||||||
"\n",
|
|
||||||
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).",
|
|
||||||
"\n",
|
"\n",
|
||||||
|
"**Note:** If you chose to use auth_enabled=True when creating your AksWebservice.deploy_configuration(), see documentation [here](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.webservice(class)?view=azure-ml-py#get-keys--) on how to retrieve your keys and use either key as an argument to PredictionClient(...,access_token=key).\n",
|
||||||
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
"**WARNING:** If you are running on Azure Notebooks free compute, you will not be able to make outgoing calls to your service. Try locating your client on a different machine to consume it."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -714,18 +726,10 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
"# Using the grpc client in AzureML Accelerated Models SDK\n",
|
||||||
"from azureml.accel.client import PredictionClient\n",
|
"from azureml.accel import client_from_service\n",
|
||||||
"\n",
|
|
||||||
"address = aks_service.scoring_uri\n",
|
|
||||||
"ssl_enabled = address.startswith(\"https\")\n",
|
|
||||||
"address = address[address.find('/')+2:].strip('/')\n",
|
|
||||||
"port = 443 if ssl_enabled else 80\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"# Initialize AzureML Accelerated Models client\n",
|
"# Initialize AzureML Accelerated Models client\n",
|
||||||
"client = PredictionClient(address=address,\n",
|
"client = client_from_service(aks_service)"
|
||||||
" port=port,\n",
|
|
||||||
" use_ssl=ssl_enabled,\n",
|
|
||||||
" service_name=aks_service.name)"
|
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -854,7 +858,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.0"
|
"version": "3.5.6"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
|
|||||||
@@ -0,0 +1,9 @@
|
|||||||
|
name: accelerated-models-training
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-accel-models
|
||||||
|
- tensorflow
|
||||||
|
- keras
|
||||||
|
- tqdm
|
||||||
|
- sklearn
|
||||||
@@ -150,7 +150,9 @@
|
|||||||
"> Estimator object initialization involves specifying a list of DataReference objects in its 'inputs' parameter.\n",
|
"> Estimator object initialization involves specifying a list of DataReference objects in its 'inputs' parameter.\n",
|
||||||
" In Pipelines, a step can take another step's output or DataReferences as input. So when creating an EstimatorStep,\n",
|
" In Pipelines, a step can take another step's output or DataReferences as input. So when creating an EstimatorStep,\n",
|
||||||
" the parameters 'inputs' and 'outputs' need to be set explicitly and that will override 'inputs' parameter\n",
|
" the parameters 'inputs' and 'outputs' need to be set explicitly and that will override 'inputs' parameter\n",
|
||||||
" specified in the Estimator object."
|
" specified in the Estimator object.\n",
|
||||||
|
" \n",
|
||||||
|
"> The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -170,7 +172,9 @@
|
|||||||
" data_reference_name=\"input_data\",\n",
|
" data_reference_name=\"input_data\",\n",
|
||||||
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
" path_on_datastore=\"20newsgroups/20news.pkl\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"output = PipelineData(\"output\", datastore=def_blob_store)"
|
"output = PipelineData(\"output\", datastore=def_blob_store)\n",
|
||||||
|
"\n",
|
||||||
|
"source_directory = 'estimator_train'"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -181,7 +185,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.train.estimator import Estimator\n",
|
"from azureml.train.estimator import Estimator\n",
|
||||||
"\n",
|
"\n",
|
||||||
"est = Estimator(source_directory='.', \n",
|
"est = Estimator(source_directory=source_directory, \n",
|
||||||
" compute_target=cpu_cluster, \n",
|
" compute_target=cpu_cluster, \n",
|
||||||
" entry_script='dummy_train.py', \n",
|
" entry_script='dummy_train.py', \n",
|
||||||
" conda_packages=['scikit-learn'])"
|
" conda_packages=['scikit-learn'])"
|
||||||
|
|||||||
@@ -88,7 +88,11 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create an Azure ML experiment\n",
|
"## Create an Azure ML experiment\n",
|
||||||
"Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n"
|
"Let's create an experiment named \"tf-mnist\" and a folder to hold the training scripts. \n",
|
||||||
|
"\n",
|
||||||
|
"> The best practice is to use separate folders for scripts and its dependent files for each step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step. \n",
|
||||||
|
"\n",
|
||||||
|
"> The script runs will be recorded under the experiment in Azure."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -57,10 +57,8 @@
|
|||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Default datastore (Azure file storage)\n",
|
"# Default datastore (Azure blob storage)\n",
|
||||||
"def_file_store = ws.get_default_datastore() \n",
|
"# def_blob_store = ws.get_default_datastore()\n",
|
||||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
|
||||||
"\n",
|
|
||||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
||||||
]
|
]
|
||||||
@@ -147,7 +145,9 @@
|
|||||||
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
"#### Define a Step that consumes a datasource and produces intermediate data.\n",
|
||||||
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
"In this step, we define a step that consumes a datasource and produces intermediate data.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** "
|
"**Open `train.py` in the local machine and examine the arguments, inputs, and outputs for the script. That will give you a good sense of why the script argument names used below are important.** \n",
|
||||||
|
"\n",
|
||||||
|
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -158,13 +158,16 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# trainStep consumes the datasource (Datareference) in the previous step\n",
|
"# trainStep consumes the datasource (Datareference) in the previous step\n",
|
||||||
"# and produces processed_data1\n",
|
"# and produces processed_data1\n",
|
||||||
|
"\n",
|
||||||
|
"source_directory = \"publish_run_train\"\n",
|
||||||
|
"\n",
|
||||||
"trainStep = PythonScriptStep(\n",
|
"trainStep = PythonScriptStep(\n",
|
||||||
" script_name=\"train.py\", \n",
|
" script_name=\"train.py\", \n",
|
||||||
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
" arguments=[\"--input_data\", blob_input_data, \"--output_train\", processed_data1],\n",
|
||||||
" inputs=[blob_input_data],\n",
|
" inputs=[blob_input_data],\n",
|
||||||
" outputs=[processed_data1],\n",
|
" outputs=[processed_data1],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory='.'\n",
|
" source_directory=source_directory\n",
|
||||||
")\n",
|
")\n",
|
||||||
"print(\"trainStep created\")"
|
"print(\"trainStep created\")"
|
||||||
]
|
]
|
||||||
@@ -188,6 +191,7 @@
|
|||||||
"# extractStep to use the intermediate data produced by step4\n",
|
"# extractStep to use the intermediate data produced by step4\n",
|
||||||
"# This step also produces an output processed_data2\n",
|
"# This step also produces an output processed_data2\n",
|
||||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||||
|
"source_directory = \"publish_run_extract\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"extractStep = PythonScriptStep(\n",
|
"extractStep = PythonScriptStep(\n",
|
||||||
" script_name=\"extract.py\",\n",
|
" script_name=\"extract.py\",\n",
|
||||||
@@ -195,7 +199,7 @@
|
|||||||
" inputs=[processed_data1],\n",
|
" inputs=[processed_data1],\n",
|
||||||
" outputs=[processed_data2],\n",
|
" outputs=[processed_data2],\n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory='.')\n",
|
" source_directory=source_directory)\n",
|
||||||
"print(\"extractStep created\")"
|
"print(\"extractStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -247,8 +251,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
"# Now define step6 that takes two inputs (both intermediate data), and produce an output\n",
|
||||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||||
"\n",
|
"source_directory = \"publish_run_compare\"\n",
|
||||||
"\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"compareStep = PythonScriptStep(\n",
|
"compareStep = PythonScriptStep(\n",
|
||||||
" script_name=\"compare.py\",\n",
|
" script_name=\"compare.py\",\n",
|
||||||
@@ -256,7 +259,7 @@
|
|||||||
" inputs=[processed_data1, processed_data2],\n",
|
" inputs=[processed_data1, processed_data2],\n",
|
||||||
" outputs=[processed_data3], \n",
|
" outputs=[processed_data3], \n",
|
||||||
" compute_target=aml_compute, \n",
|
" compute_target=aml_compute, \n",
|
||||||
" source_directory='.')\n",
|
" source_directory=source_directory)\n",
|
||||||
"print(\"compareStep created\")"
|
"print(\"compareStep created\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -103,7 +103,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Define a pipeline step\n",
|
"### Define a pipeline step\n",
|
||||||
"Define a single step pipeline for demonstration purpose."
|
"Define a single step pipeline for demonstration purpose. The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -114,11 +114,13 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.pipeline.steps import PythonScriptStep\n",
|
"from azureml.pipeline.steps import PythonScriptStep\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"source_directory = \"publish_run_train\"\n",
|
||||||
|
"\n",
|
||||||
"trainStep = PythonScriptStep(\n",
|
"trainStep = PythonScriptStep(\n",
|
||||||
" name=\"Training_Step\",\n",
|
" name=\"Training_Step\",\n",
|
||||||
" script_name=\"train.py\", \n",
|
" script_name=\"train.py\", \n",
|
||||||
" compute_target=aml_compute_target, \n",
|
" compute_target=aml_compute_target, \n",
|
||||||
" source_directory='.'\n",
|
" source_directory=source_directory\n",
|
||||||
")\n",
|
")\n",
|
||||||
"print(\"TrainStep created\")"
|
"print(\"TrainStep created\")"
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -76,7 +76,9 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"#### Initialization, Steps to create a Pipeline"
|
"#### Initialization, Steps to create a Pipeline\n",
|
||||||
|
"\n",
|
||||||
|
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -105,7 +107,7 @@
|
|||||||
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
" aml_compute.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# source_directory\n",
|
"# source_directory\n",
|
||||||
"source_directory = '.'\n",
|
"source_directory = 'publish_run_train'\n",
|
||||||
"# define a single step pipeline for demonstration purpose.\n",
|
"# define a single step pipeline for demonstration purpose.\n",
|
||||||
"trainStep = PythonScriptStep(\n",
|
"trainStep = PythonScriptStep(\n",
|
||||||
" name=\"Training_Step\",\n",
|
" name=\"Training_Step\",\n",
|
||||||
|
|||||||
@@ -290,7 +290,9 @@
|
|||||||
"- **priority:** the priority value to use for the current job *(optional)*\n",
|
"- **priority:** the priority value to use for the current job *(optional)*\n",
|
||||||
"- **runtime_version:** the runtime version of the Data Lake Analytics engine *(optional)*\n",
|
"- **runtime_version:** the runtime version of the Data Lake Analytics engine *(optional)*\n",
|
||||||
"- **source_directory:** folder that contains the script, assemblies etc. *(optional)*\n",
|
"- **source_directory:** folder that contains the script, assemblies etc. *(optional)*\n",
|
||||||
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*"
|
"- **hash_paths:** list of paths to hash to detect a change (script file is always hashed) *(optional)*\n",
|
||||||
|
"\n",
|
||||||
|
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -175,7 +175,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Data Connections with Inputs and Outputs\n",
|
"## Data Connections with Inputs and Outputs\n",
|
||||||
"The DatabricksStep supports Azure Blob and ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
|
"The DatabricksStep supports DBFS, Azure Blob and ADLS for inputs and outputs. You also will need to define a [Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html) scope to enable authentication to external data sources such as Blob and ADLS from Databricks.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n",
|
"- Databricks documentation on [Azure Blob](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html)\n",
|
||||||
"- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n",
|
"- Databricks documentation on [ADLS](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake.html)\n",
|
||||||
|
|||||||
@@ -108,7 +108,9 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Create an Azure ML experiment\n",
|
"## Create an Azure ML experiment\n",
|
||||||
"Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n"
|
"Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n",
|
||||||
|
"\n",
|
||||||
|
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -76,14 +76,20 @@
|
|||||||
"ws = Workspace.from_config()\n",
|
"ws = Workspace.from_config()\n",
|
||||||
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Default datastore (Azure file storage)\n",
|
"# Default datastore (Azure blob storage)\n",
|
||||||
"def_file_store = ws.get_default_datastore() \n",
|
"# def_blob_store = ws.get_default_datastore()\n",
|
||||||
"print(\"Default datastore's name: {}\".format(def_file_store.name))\n",
|
|
||||||
"\n",
|
|
||||||
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
"def_blob_store = Datastore(ws, \"workspaceblobstore\")\n",
|
||||||
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
"print(\"Blobstore's name: {}\".format(def_blob_store.name))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Source Directory\n",
|
||||||
|
"The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step."
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
@@ -91,7 +97,7 @@
|
|||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# source directory\n",
|
"# source directory\n",
|
||||||
"source_directory = '.'\n",
|
"source_directory = 'data_dependency_run_train'\n",
|
||||||
" \n",
|
" \n",
|
||||||
"print('Sample scripts will be created in {} directory.'.format(source_directory))"
|
"print('Sample scripts will be created in {} directory.'.format(source_directory))"
|
||||||
]
|
]
|
||||||
@@ -340,6 +346,7 @@
|
|||||||
"# step5 to use the intermediate data produced by step4\n",
|
"# step5 to use the intermediate data produced by step4\n",
|
||||||
"# This step also produces an output processed_data2\n",
|
"# This step also produces an output processed_data2\n",
|
||||||
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
"processed_data2 = PipelineData(\"processed_data2\", datastore=def_blob_store)\n",
|
||||||
|
"source_directory = \"data_dependency_run_extract\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"extractStep = PythonScriptStep(\n",
|
"extractStep = PythonScriptStep(\n",
|
||||||
" script_name=\"extract.py\",\n",
|
" script_name=\"extract.py\",\n",
|
||||||
@@ -386,6 +393,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"# Now define the compare step which takes two inputs and produces an output\n",
|
"# Now define the compare step which takes two inputs and produces an output\n",
|
||||||
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
"processed_data3 = PipelineData(\"processed_data3\", datastore=def_blob_store)\n",
|
||||||
|
"source_directory = \"data_dependency_run_compare\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"compareStep = PythonScriptStep(\n",
|
"compareStep = PythonScriptStep(\n",
|
||||||
" script_name=\"compare.py\",\n",
|
" script_name=\"compare.py\",\n",
|
||||||
|
|||||||
@@ -0,0 +1,24 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In compare.py")
|
||||||
|
print("As a data scientist, this is where I use my compare code.")
|
||||||
|
parser = argparse.ArgumentParser("compare")
|
||||||
|
parser.add_argument("--compare_data1", type=str, help="compare_data1 data")
|
||||||
|
parser.add_argument("--compare_data2", type=str, help="compare_data2 data")
|
||||||
|
parser.add_argument("--output_compare", type=str, help="output_compare directory")
|
||||||
|
parser.add_argument("--pipeline_param", type=int, help="pipeline parameter")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.compare_data1)
|
||||||
|
print("Argument 2: %s" % args.compare_data2)
|
||||||
|
print("Argument 3: %s" % args.output_compare)
|
||||||
|
print("Argument 4: %s" % args.pipeline_param)
|
||||||
|
|
||||||
|
if not (args.output_compare is None):
|
||||||
|
os.makedirs(args.output_compare, exist_ok=True)
|
||||||
|
print("%s created" % args.output_compare)
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In extract.py")
|
||||||
|
print("As a data scientist, this is where I use my extract code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("extract")
|
||||||
|
parser.add_argument("--input_extract", type=str, help="input_extract data")
|
||||||
|
parser.add_argument("--output_extract", type=str, help="output_extract directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_extract)
|
||||||
|
print("Argument 2: %s" % args.output_extract)
|
||||||
|
|
||||||
|
if not (args.output_extract is None):
|
||||||
|
os.makedirs(args.output_extract, exist_ok=True)
|
||||||
|
print("%s created" % args.output_extract)
|
||||||
@@ -0,0 +1,22 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In train.py")
|
||||||
|
print("As a data scientist, this is where I use my training code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("train")
|
||||||
|
|
||||||
|
parser.add_argument("--input_data", type=str, help="input data")
|
||||||
|
parser.add_argument("--output_train", type=str, help="output_train directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_data)
|
||||||
|
print("Argument 2: %s" % args.output_train)
|
||||||
|
|
||||||
|
if not (args.output_train is None):
|
||||||
|
os.makedirs(args.output_train, exist_ok=True)
|
||||||
|
print("%s created" % args.output_train)
|
||||||
@@ -0,0 +1,30 @@
|
|||||||
|
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
# Licensed under the MIT License.
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("*********************************************************")
|
||||||
|
print("Hello Azure ML!")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument('--datadir', type=str, help="data directory")
|
||||||
|
parser.add_argument('--output', type=str, help="output")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.datadir)
|
||||||
|
print("Argument 2: %s" % args.output)
|
||||||
|
|
||||||
|
if not (args.output is None):
|
||||||
|
os.makedirs(args.output, exist_ok=True)
|
||||||
|
print("%s created" % args.output)
|
||||||
|
|
||||||
|
try:
|
||||||
|
from azureml.core import Run
|
||||||
|
run = Run.get_context()
|
||||||
|
print("Log Fibonacci numbers.")
|
||||||
|
run.log_list('Fibonacci numbers', [0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
|
||||||
|
run.complete()
|
||||||
|
except:
|
||||||
|
print("Warning: you need to install Azure ML SDK in order to log metrics.")
|
||||||
|
|
||||||
|
print("*********************************************************")
|
||||||
@@ -0,0 +1,24 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In compare.py")
|
||||||
|
print("As a data scientist, this is where I use my compare code.")
|
||||||
|
parser = argparse.ArgumentParser("compare")
|
||||||
|
parser.add_argument("--compare_data1", type=str, help="compare_data1 data")
|
||||||
|
parser.add_argument("--compare_data2", type=str, help="compare_data2 data")
|
||||||
|
parser.add_argument("--output_compare", type=str, help="output_compare directory")
|
||||||
|
parser.add_argument("--pipeline_param", type=int, help="pipeline parameter")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.compare_data1)
|
||||||
|
print("Argument 2: %s" % args.compare_data2)
|
||||||
|
print("Argument 3: %s" % args.output_compare)
|
||||||
|
print("Argument 4: %s" % args.pipeline_param)
|
||||||
|
|
||||||
|
if not (args.output_compare is None):
|
||||||
|
os.makedirs(args.output_compare, exist_ok=True)
|
||||||
|
print("%s created" % args.output_compare)
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In extract.py")
|
||||||
|
print("As a data scientist, this is where I use my extract code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("extract")
|
||||||
|
parser.add_argument("--input_extract", type=str, help="input_extract data")
|
||||||
|
parser.add_argument("--output_extract", type=str, help="output_extract directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_extract)
|
||||||
|
print("Argument 2: %s" % args.output_extract)
|
||||||
|
|
||||||
|
if not (args.output_extract is None):
|
||||||
|
os.makedirs(args.output_extract, exist_ok=True)
|
||||||
|
print("%s created" % args.output_extract)
|
||||||
@@ -0,0 +1,22 @@
|
|||||||
|
# Copyright (c) Microsoft. All rights reserved.
|
||||||
|
# Licensed under the MIT license.
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
|
print("In train.py")
|
||||||
|
print("As a data scientist, this is where I use my training code.")
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser("train")
|
||||||
|
|
||||||
|
parser.add_argument("--input_data", type=str, help="input data")
|
||||||
|
parser.add_argument("--output_train", type=str, help="output_train directory")
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
print("Argument 1: %s" % args.input_data)
|
||||||
|
print("Argument 2: %s" % args.output_train)
|
||||||
|
|
||||||
|
if not (args.output_train is None):
|
||||||
|
os.makedirs(args.output_train, exist_ok=True)
|
||||||
|
print("%s created" % args.output_train)
|
||||||
@@ -0,0 +1,724 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Track Data Drift between Training and Inference Data in Production \n",
|
||||||
|
"\n",
|
||||||
|
"With this notebook, you will learn how to enable the DataDrift service to automatically track and determine whether your inference data is drifting from the data your model was initially trained on. The DataDrift service provides metrics and visualizations to help stakeholders identify which specific features cause the concept drift to occur.\n",
|
||||||
|
"\n",
|
||||||
|
"Please email driftfeedback@microsoft.com with any issues. A member from the DataDrift team will respond shortly. \n",
|
||||||
|
"\n",
|
||||||
|
"The DataDrift Public Preview API can be found [here](https://docs.microsoft.com/en-us/python/api/azureml-contrib-datadrift/?view=azure-ml-py). "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Prerequisites and Setup"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Install the DataDrift package\n",
|
||||||
|
"\n",
|
||||||
|
"Install the azureml-contrib-datadrift, azureml-opendatasets and lightgbm packages before running this notebook.\n",
|
||||||
|
"```\n",
|
||||||
|
"pip install azureml-contrib-datadrift\n",
|
||||||
|
"pip install lightgbm\n",
|
||||||
|
"```"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Import Dependencies"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import json\n",
|
||||||
|
"import os\n",
|
||||||
|
"import time\n",
|
||||||
|
"from datetime import datetime, timedelta\n",
|
||||||
|
"\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import requests\n",
|
||||||
|
"from azureml.contrib.datadrift import DataDriftDetector, AlertConfiguration\n",
|
||||||
|
"from azureml.opendatasets import NoaaIsdWeather\n",
|
||||||
|
"from azureml.core import Dataset, Workspace, Run\n",
|
||||||
|
"from azureml.core.compute import AksCompute, ComputeTarget\n",
|
||||||
|
"from azureml.core.conda_dependencies import CondaDependencies\n",
|
||||||
|
"from azureml.core.experiment import Experiment\n",
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"from azureml.core.model import Model\n",
|
||||||
|
"from azureml.core.webservice import Webservice, AksWebservice\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from sklearn.externals import joblib\n",
|
||||||
|
"from sklearn.model_selection import train_test_split\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Set up Configuraton and Create Azure ML Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Please type in your initials/alias. The prefix is prepended to the names of resources created by this notebook. \n",
|
||||||
|
"prefix = \"dd\"\n",
|
||||||
|
"\n",
|
||||||
|
"# NOTE: Please do not change the model_name, as it's required by the score.py file\n",
|
||||||
|
"model_name = \"driftmodel\"\n",
|
||||||
|
"image_name = \"{}driftimage\".format(prefix)\n",
|
||||||
|
"service_name = \"{}driftservice\".format(prefix)\n",
|
||||||
|
"\n",
|
||||||
|
"# optionally, set email address to receive an email alert for DataDrift\n",
|
||||||
|
"email_address = \"\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Generate Train/Testing Data\n",
|
||||||
|
"\n",
|
||||||
|
"For this demo, we will use NOAA weather data from [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/). You may replace this step with your own dataset. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"usaf_list = ['725724', '722149', '723090', '722159', '723910', '720279',\n",
|
||||||
|
" '725513', '725254', '726430', '720381', '723074', '726682',\n",
|
||||||
|
" '725486', '727883', '723177', '722075', '723086', '724053',\n",
|
||||||
|
" '725070', '722073', '726060', '725224', '725260', '724520',\n",
|
||||||
|
" '720305', '724020', '726510', '725126', '722523', '703333',\n",
|
||||||
|
" '722249', '722728', '725483', '722972', '724975', '742079',\n",
|
||||||
|
" '727468', '722193', '725624', '722030', '726380', '720309',\n",
|
||||||
|
" '722071', '720326', '725415', '724504', '725665', '725424',\n",
|
||||||
|
" '725066']\n",
|
||||||
|
"\n",
|
||||||
|
"columns = ['usaf', 'wban', 'datetime', 'latitude', 'longitude', 'elevation', 'windAngle', 'windSpeed', 'temperature', 'stationName', 'p_k']\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"def enrich_weather_noaa_data(noaa_df):\n",
|
||||||
|
" hours_in_day = 23\n",
|
||||||
|
" week_in_year = 52\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"hour\"] = noaa_df[\"datetime\"].dt.hour\n",
|
||||||
|
" noaa_df[\"weekofyear\"] = noaa_df[\"datetime\"].dt.week\n",
|
||||||
|
" \n",
|
||||||
|
" noaa_df[\"sine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.sin((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
" noaa_df[\"cosine_weekofyear\"] = noaa_df['datetime'].transform(lambda x: np.cos((2*np.pi*x.dt.week-1)/week_in_year))\n",
|
||||||
|
"\n",
|
||||||
|
" noaa_df[\"sine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.sin(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" noaa_df[\"cosine_hourofday\"] = noaa_df['datetime'].transform(lambda x: np.cos(2*np.pi*x.dt.hour/hours_in_day))\n",
|
||||||
|
" \n",
|
||||||
|
" return noaa_df\n",
|
||||||
|
"\n",
|
||||||
|
"def add_window_col(input_df):\n",
|
||||||
|
" shift_interval = pd.Timedelta('-7 days') # your X days interval\n",
|
||||||
|
" df_shifted = input_df.copy()\n",
|
||||||
|
" df_shifted['datetime'] = df_shifted['datetime'] - shift_interval\n",
|
||||||
|
" df_shifted.drop(list(input_df.columns.difference(['datetime', 'usaf', 'wban', 'sine_hourofday', 'temperature'])), axis=1, inplace=True)\n",
|
||||||
|
"\n",
|
||||||
|
" # merge, keeping only observations where -1 lag is present\n",
|
||||||
|
" df2 = pd.merge(input_df,\n",
|
||||||
|
" df_shifted,\n",
|
||||||
|
" on=['datetime', 'usaf', 'wban', 'sine_hourofday'],\n",
|
||||||
|
" how='inner', # use 'left' to keep observations without lags\n",
|
||||||
|
" suffixes=['', '-7'])\n",
|
||||||
|
" return df2\n",
|
||||||
|
"\n",
|
||||||
|
"def get_noaa_data(start_time, end_time, cols, station_list):\n",
|
||||||
|
" isd = NoaaIsdWeather(start_time, end_time, cols=cols)\n",
|
||||||
|
" # Read into Pandas data frame.\n",
|
||||||
|
" noaa_df = isd.to_pandas_dataframe()\n",
|
||||||
|
" noaa_df = noaa_df.rename(columns={\"stationName\": \"station_name\"})\n",
|
||||||
|
" \n",
|
||||||
|
" df_filtered = noaa_df[noaa_df[\"usaf\"].isin(station_list)]\n",
|
||||||
|
" df_filtered.reset_index(drop=True)\n",
|
||||||
|
" \n",
|
||||||
|
" # Enrich with time features\n",
|
||||||
|
" df_enriched = enrich_weather_noaa_data(df_filtered)\n",
|
||||||
|
" \n",
|
||||||
|
" return df_enriched\n",
|
||||||
|
"\n",
|
||||||
|
"def get_featurized_noaa_df(start_time, end_time, cols, station_list):\n",
|
||||||
|
" df_1 = get_noaa_data(start_time - timedelta(days=7), start_time - timedelta(seconds=1), cols, station_list)\n",
|
||||||
|
" df_2 = get_noaa_data(start_time, end_time, cols, station_list)\n",
|
||||||
|
" noaa_df = pd.concat([df_1, df_2])\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Adding window feature\")\n",
|
||||||
|
" df_window = add_window_col(noaa_df)\n",
|
||||||
|
" \n",
|
||||||
|
" cat_columns = df_window.dtypes == object\n",
|
||||||
|
" cat_columns = cat_columns[cat_columns == True]\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Encoding categorical columns\")\n",
|
||||||
|
" df_encoded = pd.get_dummies(df_window, columns=cat_columns.keys().tolist())\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"Dropping unnecessary columns\")\n",
|
||||||
|
" df_featurized = df_encoded.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna().drop_duplicates()\n",
|
||||||
|
" \n",
|
||||||
|
" return df_featurized"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Train model on Jan 1 - 14, 2009 data\n",
|
||||||
|
"df = get_featurized_noaa_df(datetime(2009, 1, 1), datetime(2009, 1, 14, 23, 59, 59), columns, usaf_list)\n",
|
||||||
|
"df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"label = \"temperature\"\n",
|
||||||
|
"x_df = df.drop(label, axis=1)\n",
|
||||||
|
"y_df = df[[label]]\n",
|
||||||
|
"x_train, x_test, y_train, y_test = train_test_split(df, y_df, test_size=0.2, random_state=223)\n",
|
||||||
|
"print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)\n",
|
||||||
|
"\n",
|
||||||
|
"training_dir = 'outputs/training'\n",
|
||||||
|
"training_file = \"training.csv\"\n",
|
||||||
|
"\n",
|
||||||
|
"# Generate training dataframe to register as Training Dataset\n",
|
||||||
|
"os.makedirs(training_dir, exist_ok=True)\n",
|
||||||
|
"training_df = pd.merge(x_train.drop(label, axis=1), y_train, left_index=True, right_index=True)\n",
|
||||||
|
"training_df.to_csv(training_dir + \"/\" + training_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create/Register Training Dataset"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dataset_name = \"dataset\"\n",
|
||||||
|
"name_suffix = datetime.utcnow().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
|
||||||
|
"snapshot_name = \"snapshot-{}\".format(name_suffix)\n",
|
||||||
|
"\n",
|
||||||
|
"dstore = ws.get_default_datastore()\n",
|
||||||
|
"dstore.upload(training_dir, \"data/training\", show_progress=True)\n",
|
||||||
|
"dpath = dstore.path(\"data/training/training.csv\")\n",
|
||||||
|
"trainingDataset = Dataset.auto_read_files(dpath, include_path=True)\n",
|
||||||
|
"trainingDataset = trainingDataset.register(workspace=ws, name=dataset_name, description=\"dset\", exist_ok=True)\n",
|
||||||
|
"\n",
|
||||||
|
"datasets = [(Dataset.Scenario.TRAINING, trainingDataset)]\n",
|
||||||
|
"print(\"dataset registration done.\\n\")\n",
|
||||||
|
"datasets"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train and Save Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import lightgbm as lgb\n",
|
||||||
|
"\n",
|
||||||
|
"train = lgb.Dataset(data=x_train, \n",
|
||||||
|
" label=y_train)\n",
|
||||||
|
"\n",
|
||||||
|
"test = lgb.Dataset(data=x_test, \n",
|
||||||
|
" label=y_test,\n",
|
||||||
|
" reference=train)\n",
|
||||||
|
"\n",
|
||||||
|
"params = {'learning_rate' : 0.1,\n",
|
||||||
|
" 'boosting' : 'gbdt',\n",
|
||||||
|
" 'metric' : 'rmse',\n",
|
||||||
|
" 'feature_fraction' : 1,\n",
|
||||||
|
" 'bagging_fraction' : 1,\n",
|
||||||
|
" 'max_depth': 6,\n",
|
||||||
|
" 'num_leaves' : 31,\n",
|
||||||
|
" 'objective' : 'regression',\n",
|
||||||
|
" 'bagging_freq' : 1,\n",
|
||||||
|
" \"verbose\": -1,\n",
|
||||||
|
" 'min_data_per_leaf': 100}\n",
|
||||||
|
"\n",
|
||||||
|
"model = lgb.train(params, \n",
|
||||||
|
" num_boost_round=500,\n",
|
||||||
|
" train_set=train,\n",
|
||||||
|
" valid_sets=[train, test],\n",
|
||||||
|
" verbose_eval=50,\n",
|
||||||
|
" early_stopping_rounds=25)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_file = 'outputs/{}.pkl'.format(model_name)\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('outputs', exist_ok=True)\n",
|
||||||
|
"joblib.dump(model, model_file)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Register Model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model = Model.register(model_path=model_file,\n",
|
||||||
|
" model_name=model_name,\n",
|
||||||
|
" workspace=ws,\n",
|
||||||
|
" datasets=datasets)\n",
|
||||||
|
"\n",
|
||||||
|
"print(model_name, image_name, service_name, model)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Deploy Model To AKS"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": []
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prepare Environment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
|
||||||
|
" pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
|
||||||
|
"\n",
|
||||||
|
"with open(\"myenv.yml\",\"w\") as f:\n",
|
||||||
|
" f.write(myenv.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Image"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Image creation may take up to 15 minutes.\n",
|
||||||
|
"\n",
|
||||||
|
"image_name = image_name + str(model.version)\n",
|
||||||
|
"\n",
|
||||||
|
"if not image_name in ws.images:\n",
|
||||||
|
" # Use the score.py defined in this directory as the execution script\n",
|
||||||
|
" # NOTE: The Model Data Collector must be enabled in the execution script for DataDrift to run correctly\n",
|
||||||
|
" image_config = ContainerImage.image_configuration(execution_script=\"score.py\",\n",
|
||||||
|
" runtime=\"python\",\n",
|
||||||
|
" conda_file=\"myenv.yml\",\n",
|
||||||
|
" description=\"Image with weather dataset model\")\n",
|
||||||
|
" image = ContainerImage.create(name=image_name,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" image_config=image_config,\n",
|
||||||
|
" workspace=ws)\n",
|
||||||
|
"\n",
|
||||||
|
" image.wait_for_creation(show_output=True)\n",
|
||||||
|
"else:\n",
|
||||||
|
" image = ws.images[image_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create Compute Target"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_name = 'dd-demo-e2e'\n",
|
||||||
|
"prov_config = AksCompute.provisioning_configuration()\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_name in ws.compute_targets:\n",
|
||||||
|
" aks_target = ComputeTarget.create(workspace=ws,\n",
|
||||||
|
" name=aks_name,\n",
|
||||||
|
" provisioning_configuration=prov_config)\n",
|
||||||
|
"\n",
|
||||||
|
" aks_target.wait_for_completion(show_output=True)\n",
|
||||||
|
" print(aks_target.provisioning_state)\n",
|
||||||
|
" print(aks_target.provisioning_errors)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_target=ws.compute_targets[aks_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"aks_service_name = service_name\n",
|
||||||
|
"\n",
|
||||||
|
"if not aks_service_name in ws.webservices:\n",
|
||||||
|
" aks_config = AksWebservice.deploy_configuration(collect_model_data=True, enable_app_insights=True)\n",
|
||||||
|
" aks_service = Webservice.deploy_from_image(workspace=ws,\n",
|
||||||
|
" name=aks_service_name,\n",
|
||||||
|
" image=image,\n",
|
||||||
|
" deployment_config=aks_config,\n",
|
||||||
|
" deployment_target=aks_target)\n",
|
||||||
|
" aks_service.wait_for_deployment(show_output=True)\n",
|
||||||
|
" print(aks_service.state)\n",
|
||||||
|
"else:\n",
|
||||||
|
" aks_service = ws.webservices[aks_service_name]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Run DataDrift Analysis"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Send Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Download Scoring Data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Score Model on March 15, 2016 data\n",
|
||||||
|
"scoring_df = get_noaa_data(datetime(2016, 3, 15) - timedelta(days=7), datetime(2016, 3, 16), columns, usaf_list)\n",
|
||||||
|
"# Add the window feature column\n",
|
||||||
|
"scoring_df = add_window_col(scoring_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Drop features not used by the model\n",
|
||||||
|
"print(\"Dropping unnecessary columns\")\n",
|
||||||
|
"scoring_df = scoring_df.drop(['windAngle', 'windSpeed', 'datetime', 'elevation'], axis=1).dropna()\n",
|
||||||
|
"scoring_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# One Hot Encode the scoring dataset to match the training dataset schema\n",
|
||||||
|
"columns_dict = model.datasets[\"training\"][0].get_profile().columns\n",
|
||||||
|
"extra_cols = ('Path', 'Column1')\n",
|
||||||
|
"for k in extra_cols:\n",
|
||||||
|
" columns_dict.pop(k, None)\n",
|
||||||
|
"training_columns = list(columns_dict.keys())\n",
|
||||||
|
"\n",
|
||||||
|
"categorical_columns = scoring_df.dtypes == object\n",
|
||||||
|
"categorical_columns = categorical_columns[categorical_columns == True]\n",
|
||||||
|
"\n",
|
||||||
|
"test_df = pd.get_dummies(scoring_df[categorical_columns.keys().tolist()])\n",
|
||||||
|
"encoded_df = scoring_df.join(test_df)\n",
|
||||||
|
"\n",
|
||||||
|
"# Populate missing OHE columns with 0 values to match traning dataset schema\n",
|
||||||
|
"difference = list(set(training_columns) - set(encoded_df.columns.tolist()))\n",
|
||||||
|
"for col in difference:\n",
|
||||||
|
" encoded_df[col] = 0\n",
|
||||||
|
"encoded_df.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Serialize dataframe to list of row dictionaries\n",
|
||||||
|
"encoded_dict = encoded_df.to_dict('records')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit Scoring Data to Service"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"\n",
|
||||||
|
"# retreive the API keys. AML generates two keys.\n",
|
||||||
|
"key1, key2 = aks_service.get_keys()\n",
|
||||||
|
"\n",
|
||||||
|
"total_count = len(scoring_df)\n",
|
||||||
|
"i = 0\n",
|
||||||
|
"load = []\n",
|
||||||
|
"for row in encoded_dict:\n",
|
||||||
|
" load.append(row)\n",
|
||||||
|
" i = i + 1\n",
|
||||||
|
" if i % 100 == 0:\n",
|
||||||
|
" payload = json.dumps({\"data\": load})\n",
|
||||||
|
" \n",
|
||||||
|
" # construct raw HTTP request and send to the service\n",
|
||||||
|
" payload_binary = bytes(payload,encoding = 'utf8')\n",
|
||||||
|
" headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||||
|
" resp = requests.post(aks_service.scoring_uri, payload_binary, headers=headers)\n",
|
||||||
|
" \n",
|
||||||
|
" print(\"prediction:\", resp.content, \"Progress: {}/{}\".format(i, total_count)) \n",
|
||||||
|
"\n",
|
||||||
|
" load = []\n",
|
||||||
|
" time.sleep(3)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We need to wait up to 10 minutes for the Model Data Collector to dump the model input and inference data to storage in the Workspace, where it's used by the DataDriftDetector job."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"time.sleep(600)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Configure DataDrift"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"services = [service_name]\n",
|
||||||
|
"start = datetime.now() - timedelta(days=2)\n",
|
||||||
|
"end = datetime(year=2020, month=1, day=22, hour=15, minute=16)\n",
|
||||||
|
"feature_list = ['usaf', 'wban', 'latitude', 'longitude', 'station_name', 'p_k', 'sine_hourofday', 'cosine_hourofday', 'temperature-7']\n",
|
||||||
|
"alert_config = AlertConfiguration([email_address]) if email_address else None\n",
|
||||||
|
"\n",
|
||||||
|
"# there will be an exception indicating using get() method if DataDrift object already exist\n",
|
||||||
|
"try:\n",
|
||||||
|
" datadrift = DataDriftDetector.create(ws, model.name, model.version, services, frequency=\"Day\", alert_config=alert_config)\n",
|
||||||
|
"except KeyError:\n",
|
||||||
|
" datadrift = DataDriftDetector.get(ws, model.name, model.version)\n",
|
||||||
|
" \n",
|
||||||
|
"print(\"Details of DataDrift Object:\\n{}\".format(datadrift))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Run an Adhoc DataDriftDetector Run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"target_date = datetime.today()\n",
|
||||||
|
"run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"exp = Experiment(ws, datadrift._id)\n",
|
||||||
|
"dd_run = Run(experiment=exp, run_id=run)\n",
|
||||||
|
"RunDetails(dd_run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Get Drift Analysis Results"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"children = list(dd_run.get_children())\n",
|
||||||
|
"for child in children:\n",
|
||||||
|
" child.wait_for_completion()\n",
|
||||||
|
"\n",
|
||||||
|
"drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
|
||||||
|
"drift_metrics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Show all drift figures, one per serivice.\n",
|
||||||
|
"# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
|
||||||
|
"\n",
|
||||||
|
"drift_figures = datadrift.show(with_details=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Enable DataDrift Schedule"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"datadrift.enable_schedule()"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "rafarmah"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
name: azure-ml-datadrift
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-contrib-datadrift
|
||||||
|
- azureml-opendatasets
|
||||||
|
- lightgbm
|
||||||
|
- azureml-widgets
|
||||||
58
how-to-use-azureml/monitor-models/data-drift/score.py
Normal file
58
how-to-use-azureml/monitor-models/data-drift/score.py
Normal file
@@ -0,0 +1,58 @@
|
|||||||
|
import pickle
|
||||||
|
import json
|
||||||
|
import numpy
|
||||||
|
import azureml.train.automl
|
||||||
|
from sklearn.externals import joblib
|
||||||
|
from sklearn.linear_model import Ridge
|
||||||
|
from azureml.core.model import Model
|
||||||
|
from azureml.core.run import Run
|
||||||
|
from azureml.monitoring import ModelDataCollector
|
||||||
|
import time
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model, inputs_dc, prediction_dc, feature_names, categorical_features
|
||||||
|
|
||||||
|
print("Model is initialized" + time.strftime("%H:%M:%S"))
|
||||||
|
model_path = Model.get_model_path(model_name="driftmodel")
|
||||||
|
model = joblib.load(model_path)
|
||||||
|
|
||||||
|
feature_names = ["usaf", "wban", "latitude", "longitude", "station_name", "p_k",
|
||||||
|
"sine_weekofyear", "cosine_weekofyear", "sine_hourofday", "cosine_hourofday",
|
||||||
|
"temperature-7"]
|
||||||
|
|
||||||
|
categorical_features = ["usaf", "wban", "p_k", "station_name"]
|
||||||
|
|
||||||
|
inputs_dc = ModelDataCollector(model_name="driftmodel",
|
||||||
|
identifier="inputs",
|
||||||
|
feature_names=feature_names)
|
||||||
|
|
||||||
|
prediction_dc = ModelDataCollector("driftmodel",
|
||||||
|
identifier="predictions",
|
||||||
|
feature_names=["temperature"])
|
||||||
|
|
||||||
|
|
||||||
|
def run(raw_data):
|
||||||
|
global inputs_dc, prediction_dc
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_data)["data"]
|
||||||
|
data = pd.DataFrame(data)
|
||||||
|
|
||||||
|
# Remove the categorical features as the model expects OHE values
|
||||||
|
input_data = data.drop(categorical_features, axis=1)
|
||||||
|
|
||||||
|
result = model.predict(input_data)
|
||||||
|
|
||||||
|
# Collect the non-OHE dataframe
|
||||||
|
collected_df = data[feature_names]
|
||||||
|
|
||||||
|
inputs_dc.collect(collected_df.values)
|
||||||
|
prediction_dc.collect(result)
|
||||||
|
return result.tolist()
|
||||||
|
except Exception as e:
|
||||||
|
error = str(e)
|
||||||
|
|
||||||
|
print(error + time.strftime("%H:%M:%S"))
|
||||||
|
return error
|
||||||
@@ -153,7 +153,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"tensorboard-export-sample"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Export Run History to Tensorboard logs\n",
|
"# Export Run History to Tensorboard logs\n",
|
||||||
|
|||||||
@@ -227,7 +227,11 @@
|
|||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"tensorboard-sample"
|
||||||
|
]
|
||||||
|
},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.tensorboard import Tensorboard\n",
|
"from azureml.tensorboard import Tensorboard\n",
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
|
||||||
@@ -131,6 +132,8 @@ def main():
|
|||||||
|
|
||||||
run.log("Accuracy", np.float(val_accuracy))
|
run.log("Accuracy", np.float(val_accuracy))
|
||||||
|
|
||||||
|
serializers.save_npz(os.path.join(args.output_dir, 'model.npz'), model)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
main()
|
main()
|
||||||
|
|||||||
@@ -0,0 +1,45 @@
|
|||||||
|
import numpy as np
|
||||||
|
import os
|
||||||
|
import json
|
||||||
|
|
||||||
|
from chainer import serializers, using_config, Variable, datasets
|
||||||
|
import chainer.functions as F
|
||||||
|
import chainer.links as L
|
||||||
|
from chainer import Chain
|
||||||
|
|
||||||
|
from azureml.core.model import Model
|
||||||
|
|
||||||
|
|
||||||
|
class MyNetwork(Chain):
|
||||||
|
|
||||||
|
def __init__(self, n_mid_units=100, n_out=10):
|
||||||
|
super(MyNetwork, self).__init__()
|
||||||
|
with self.init_scope():
|
||||||
|
self.l1 = L.Linear(None, n_mid_units)
|
||||||
|
self.l2 = L.Linear(n_mid_units, n_mid_units)
|
||||||
|
self.l3 = L.Linear(n_mid_units, n_out)
|
||||||
|
|
||||||
|
def forward(self, x):
|
||||||
|
h = F.relu(self.l1(x))
|
||||||
|
h = F.relu(self.l2(h))
|
||||||
|
return self.l3(h)
|
||||||
|
|
||||||
|
|
||||||
|
def init():
|
||||||
|
global model
|
||||||
|
|
||||||
|
model_root = Model.get_model_path('chainer-dnn-mnist')
|
||||||
|
|
||||||
|
# Load our saved artifacts
|
||||||
|
model = MyNetwork()
|
||||||
|
serializers.load_npz(model_root, model)
|
||||||
|
|
||||||
|
|
||||||
|
def run(input_data):
|
||||||
|
i = np.array(json.loads(input_data)['data'])
|
||||||
|
|
||||||
|
_, test = datasets.get_mnist()
|
||||||
|
x = Variable(np.asarray([test[i][0]]))
|
||||||
|
y = model(x)
|
||||||
|
|
||||||
|
return np.ndarray.tolist(y.data.argmax(axis=1))
|
||||||
@@ -45,6 +45,16 @@
|
|||||||
"print(\"SDK version:\", azureml.core.VERSION)"
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"!jupyter nbextension install --py --user azureml.widgets\n",
|
||||||
|
"!jupyter nbextension enable --py --user azureml.widgets"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -121,6 +131,7 @@
|
|||||||
"except ComputeTargetException:\n",
|
"except ComputeTargetException:\n",
|
||||||
" print('Creating a new compute target...')\n",
|
" print('Creating a new compute target...')\n",
|
||||||
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||||
|
" min_nodes=2,\n",
|
||||||
" max_nodes=4)\n",
|
" max_nodes=4)\n",
|
||||||
"\n",
|
"\n",
|
||||||
" # create the cluster\n",
|
" # create the cluster\n",
|
||||||
@@ -206,7 +217,8 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import shutil\n",
|
"import shutil\n",
|
||||||
"\n",
|
"\n",
|
||||||
"shutil.copy('chainer_mnist.py', project_folder)"
|
"shutil.copy('chainer_mnist.py', project_folder)\n",
|
||||||
|
"shutil.copy('chainer_score.py', project_folder)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -353,6 +365,7 @@
|
|||||||
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
|
"hyperdrive_config = HyperDriveConfig(estimator=estimator,\n",
|
||||||
" hyperparameter_sampling=param_sampling, \n",
|
" hyperparameter_sampling=param_sampling, \n",
|
||||||
" primary_metric_name='Accuracy',\n",
|
" primary_metric_name='Accuracy',\n",
|
||||||
|
" policy=BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3),\n",
|
||||||
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
" primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\n",
|
||||||
" max_total_runs=8,\n",
|
" max_total_runs=8,\n",
|
||||||
" max_concurrent_runs=4)\n"
|
" max_concurrent_runs=4)\n"
|
||||||
@@ -398,14 +411,344 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"run.wait_for_completion(show_output=True)"
|
"hyperdrive_run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Find and register best model\n",
|
||||||
|
"When all jobs finish, we can find out the one that has the highest accuracy."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"best_run = hyperdrive_run.get_best_run_by_primary_metric()\n",
|
||||||
|
"print(best_run.get_details()['runDefinition']['arguments'])"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now, let's list the model files uploaded during the run."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(best_run.get_file_names())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"We can then register the folder (and all files in it) as a model named `chainer-dnn-mnist` under the workspace for deployment"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model = best_run.register_model(model_name='chainer-dnn-mnist', model_path='outputs/model.npz')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Deploy the model in ACI\n",
|
||||||
|
"Now, we are ready to deploy the model as a web service running in Azure Container Instance, [ACI](https://azure.microsoft.com/en-us/services/container-instances/). Azure Machine Learning accomplishes this by constructing a Docker image with the scoring logic and model baked in.\n",
|
||||||
|
"\n",
|
||||||
|
"### Create scoring script\n",
|
||||||
|
"First, we will create a scoring script that will be invoked by the web service call.\n",
|
||||||
|
"+ Now that the scoring script must have two required functions, `init()` and `run(input_data)`.\n",
|
||||||
|
" + In `init()`, you typically load the model into a global object. This function is executed only once when the Docker contianer is started.\n",
|
||||||
|
" + In `run(input_data)`, the model is used to predict a value based on the input data. The input and output to `run` uses NPZ as the serialization and de-serialization format because it is the preferred format for Chainer, but you are not limited to it.\n",
|
||||||
|
" \n",
|
||||||
|
"Refer to the scoring script `chainer_score.py` for this tutorial. Our web service will use this file to predict. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"shutil.copy('chainer_score.py', project_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create myenv.yml\n",
|
||||||
|
"We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `numpy` and `chainer`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.runconfig import CondaDependencies\n",
|
||||||
|
"\n",
|
||||||
|
"cd = CondaDependencies.create()\n",
|
||||||
|
"cd.add_conda_package('numpy')\n",
|
||||||
|
"cd.add_conda_package('chainer')\n",
|
||||||
|
"cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n",
|
||||||
|
"\n",
|
||||||
|
"print(cd.serialize_to_string())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Deploy to ACI\n",
|
||||||
|
"We are almost ready to deploy. Create a deployment configuration and specify the number of CPUs and gigabytes of RAM needed for your ACI container."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.webservice import AciWebservice\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n",
|
||||||
|
" auth_enabled=True, # this flag generates API keys to secure access\n",
|
||||||
|
" memory_gb=1,\n",
|
||||||
|
" tags={'name': 'mnist', 'framework': 'Chainer'},\n",
|
||||||
|
" description='Chainer DNN with MNIST')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**Deployment Process**\n",
|
||||||
|
"\n",
|
||||||
|
"Now we can deploy. **This cell will run for about 7-8 minutes.** Behind the scenes, it will do the following:\n",
|
||||||
|
"\n",
|
||||||
|
"1. **Build Docker image**\n",
|
||||||
|
"Build a Docker image using the scoring file (chainer_score.py), the environment file (myenv.yml), and the model object.\n",
|
||||||
|
"2. **Register image**\n",
|
||||||
|
"Register that image under the workspace.\n",
|
||||||
|
"3. **Ship to ACI**\n",
|
||||||
|
"And finally ship the image to the ACI infrastructure, start up a container in ACI using that image, and expose an HTTP endpoint to accept REST client calls."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.image import ContainerImage\n",
|
||||||
|
"\n",
|
||||||
|
"imgconfig = ContainerImage.image_configuration(execution_script=\"chainer_score.py\", \n",
|
||||||
|
" runtime=\"python\", \n",
|
||||||
|
" conda_file=\"myenv.yml\")"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"from azureml.core.webservice import Webservice\n",
|
||||||
|
"\n",
|
||||||
|
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||||
|
" name='chainer-mnist-1',\n",
|
||||||
|
" deployment_config=aciconfig,\n",
|
||||||
|
" models=[model],\n",
|
||||||
|
" image_config=imgconfig)\n",
|
||||||
|
"\n",
|
||||||
|
"service.wait_for_deployment(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(service.get_logs())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(service.scoring_uri)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**Tip: If something goes wrong with the deployment, the first thing to look at is the logs from the service by running the following command:** `print(service.get_logs())`"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This is the scoring web service endpoint: `print(service.scoring_uri)`"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Test the deployed model\n",
|
||||||
|
"Let's test the deployed model. Pick a random sample from the test set, and send it to the web service hosted in ACI for a prediction. Note, here we are using the an HTTP request to invoke the service.\n",
|
||||||
|
"\n",
|
||||||
|
"We can retrieve the API keys used for accessing the HTTP endpoint and construct a raw HTTP request to send to the service. Don't forget to add key to the HTTP header."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# retreive the API keys. two keys were generated.\n",
|
||||||
|
"key1, Key2 = service.get_keys()\n",
|
||||||
|
"print(key1)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%matplotlib inline\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import urllib\n",
|
||||||
|
"import gzip\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import struct\n",
|
||||||
|
"import requests\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# load compressed MNIST gz files and return numpy arrays\n",
|
||||||
|
"def load_data(filename, label=False):\n",
|
||||||
|
" with gzip.open(filename) as gz:\n",
|
||||||
|
" struct.unpack('I', gz.read(4))\n",
|
||||||
|
" n_items = struct.unpack('>I', gz.read(4))\n",
|
||||||
|
" if not label:\n",
|
||||||
|
" n_rows = struct.unpack('>I', gz.read(4))[0]\n",
|
||||||
|
" n_cols = struct.unpack('>I', gz.read(4))[0]\n",
|
||||||
|
" res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)\n",
|
||||||
|
" res = res.reshape(n_items[0], n_rows * n_cols)\n",
|
||||||
|
" else:\n",
|
||||||
|
" res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)\n",
|
||||||
|
" res = res.reshape(n_items[0], 1)\n",
|
||||||
|
" return res\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('./data/mnist', exist_ok=True)\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')\n",
|
||||||
|
"\n",
|
||||||
|
"X_test = load_data('./data/mnist/test-images.gz', False)\n",
|
||||||
|
"y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n",
|
||||||
|
"\n",
|
||||||
|
"\n",
|
||||||
|
"# send a random row from the test set to score\n",
|
||||||
|
"random_index = np.random.randint(0, len(X_test)-1)\n",
|
||||||
|
"input_data = \"{\\\"data\\\": [\" + str(random_index) + \"]}\"\n",
|
||||||
|
"\n",
|
||||||
|
"headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n",
|
||||||
|
"\n",
|
||||||
|
"# send sample to service for scoring\n",
|
||||||
|
"resp = requests.post(service.scoring_uri, input_data, headers=headers)\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"label:\", y_test[random_index])\n",
|
||||||
|
"print(\"prediction:\", resp.text[1])\n",
|
||||||
|
"\n",
|
||||||
|
"plt.imshow(X_test[random_index].reshape((28,28)), cmap='gray')\n",
|
||||||
|
"plt.axis('off')\n",
|
||||||
|
"plt.show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Let's look at the workspace after the web service was deployed. You should see\n",
|
||||||
|
"\n",
|
||||||
|
" + a registered model named 'chainer-dnn-mnist' and with the id 'chainer-dnn-mnist:1'\n",
|
||||||
|
" + an image called 'chainer-mnist-svc' and with a docker image location pointing to your workspace's Azure Container Registry (ACR)\n",
|
||||||
|
" + a webservice called 'chainer-mnist-svc' with some scoring URL"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"models = ws.models\n",
|
||||||
|
"for name, model in models.items():\n",
|
||||||
|
" print(\"Model: {}, ID: {}\".format(name, model.id))\n",
|
||||||
|
" \n",
|
||||||
|
"images = ws.images\n",
|
||||||
|
"for name, image in images.items():\n",
|
||||||
|
" print(\"Image: {}, location: {}\".format(name, image.image_location))\n",
|
||||||
|
" \n",
|
||||||
|
"webservices = ws.webservices\n",
|
||||||
|
"for name, webservice in webservices.items():\n",
|
||||||
|
" print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Clean up"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can delete the ACI deployment with a simple delete API call."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"service.delete()"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
"authors": [
|
"authors": [
|
||||||
{
|
{
|
||||||
"name": "ninhu"
|
"name": "dipeck"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"kernelspec": {
|
||||||
@@ -424,7 +767,8 @@
|
|||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.6"
|
"version": "3.6.6"
|
||||||
}
|
},
|
||||||
|
"msauthor": "dipeck"
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 2
|
"nbformat_minor": 2
|
||||||
|
|||||||
@@ -4,4 +4,9 @@ dependencies:
|
|||||||
- azureml-sdk
|
- azureml-sdk
|
||||||
- azureml-widgets
|
- azureml-widgets
|
||||||
- numpy
|
- numpy
|
||||||
- pytest
|
- matplotlib
|
||||||
|
- json
|
||||||
|
- urllib
|
||||||
|
- gzip
|
||||||
|
- struct
|
||||||
|
- requests
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ from azureml.core.model import Model
|
|||||||
|
|
||||||
def init():
|
def init():
|
||||||
global model
|
global model
|
||||||
model_path = Model.get_model_path('pytorch-hymenoptera')
|
model_path = Model.get_model_path('pytorch-birds')
|
||||||
model = torch.load(model_path, map_location=lambda storage, loc: storage)
|
model = torch.load(model_path, map_location=lambda storage, loc: storage)
|
||||||
model.eval()
|
model.eval()
|
||||||
|
|
||||||
@@ -22,7 +22,7 @@ def run(input_data):
|
|||||||
# get prediction
|
# get prediction
|
||||||
with torch.no_grad():
|
with torch.no_grad():
|
||||||
output = model(input_data)
|
output = model(input_data)
|
||||||
classes = ['ants', 'bees']
|
classes = ['chicken', 'turkey']
|
||||||
softmax = nn.Softmax(dim=1)
|
softmax = nn.Softmax(dim=1)
|
||||||
pred_probs = softmax(output).numpy()[0]
|
pred_probs = softmax(output).numpy()[0]
|
||||||
index = torch.argmax(output, 1)
|
index = torch.argmax(output, 1)
|
||||||
|
|||||||
@@ -165,8 +165,8 @@ def download_data():
|
|||||||
import urllib
|
import urllib
|
||||||
from zipfile import ZipFile
|
from zipfile import ZipFile
|
||||||
# download data
|
# download data
|
||||||
data_file = './hymenoptera_data.zip'
|
data_file = './fowl_data.zip'
|
||||||
download_url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'
|
download_url = 'https://msdocsdatasets.blob.core.windows.net/pytorchfowl/fowl_data.zip'
|
||||||
urllib.request.urlretrieve(download_url, filename=data_file)
|
urllib.request.urlretrieve(download_url, filename=data_file)
|
||||||
|
|
||||||
# extract files
|
# extract files
|
||||||
|
|||||||
Binary file not shown.
|
Before Width: | Height: | Size: 123 KiB After Width: | Height: | Size: 1.6 MiB |
@@ -24,7 +24,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (Azure ML) Python SDK.\n",
|
"In this tutorial, you will train, hyperparameter tune, and deploy a PyTorch model using the Azure Machine Learning (Azure ML) Python SDK.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify ants and bees by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset."
|
"This tutorial will train an image classification model using transfer learning, based on PyTorch's [Transfer Learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). The model is trained to classify chickens and turkeys by first using a pretrained ResNet18 model that has been trained on the [ImageNet](http://image-net.org/index) dataset."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -165,7 +165,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import os\n",
|
"import os\n",
|
||||||
"\n",
|
"\n",
|
||||||
"project_folder = './pytorch-hymenoptera'\n",
|
"project_folder = './pytorch-birds'\n",
|
||||||
"os.makedirs(project_folder, exist_ok=True)"
|
"os.makedirs(project_folder, exist_ok=True)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -174,7 +174,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"### Download training data\n",
|
"### Download training data\n",
|
||||||
"The dataset we will use (located [here](https://download.pytorch.org/tutorial/hymenoptera_data.zip) as a zip file) consists of about 120 training images each for ants and bees, with 75 validation images for each class. [Hymenoptera](https://en.wikipedia.org/wiki/Hymenoptera) is the order of insects that includes ants and bees. We will download and extract the dataset as part of our training script `pytorch_train.py`"
|
"The dataset we will use (located on a public blob [here](https://msdocsdatasets.blob.core.windows.net/pytorchfowl/fowl_data.zip) as a zip file) consists of about 120 training images each for turkeys and chickens, with 100 validation images for each class. The images are a subset of the [Open Images v5 Dataset](https://storage.googleapis.com/openimages/web/index.html). We will download and extract the dataset as part of our training script `pytorch_train.py`"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -235,7 +235,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"\n",
|
"\n",
|
||||||
"experiment_name = 'pytorch-hymenoptera'\n",
|
"experiment_name = 'pytorch-birds'\n",
|
||||||
"experiment = Experiment(ws, name=experiment_name)"
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -273,7 +273,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n",
|
"The `script_params` parameter is a dictionary containing the command-line arguments to your training script `entry_script`. Please note the following:\n",
|
||||||
"- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `hymenoptera_data` on our datastore.\n",
|
"- We passed our training data reference `ds_data` to our script's `--data_dir` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the training data `fowl_data` on our datastore.\n",
|
||||||
"- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by Azure ML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n",
|
"- We specified the output directory as `./outputs`. The `outputs` directory is specially treated by Azure ML in that all the content in this directory gets uploaded to your workspace as part of your run history. The files written to this directory are therefore accessible even once your remote run is over. In this tutorial, we will save our trained model to this output directory.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"To leverage the Azure VM's GPU for training, we set `use_gpu=True`."
|
"To leverage the Azure VM's GPU for training, we set `use_gpu=True`."
|
||||||
@@ -481,7 +481,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"model = best_run.register_model(model_name = 'pytorch-hymenoptera', model_path = 'outputs/model.pt')\n",
|
"model = best_run.register_model(model_name = 'pytorch-birds', model_path = 'outputs/model.pt')\n",
|
||||||
"print(model.name, model.id, model.version, sep = '\\t')"
|
"print(model.name, model.id, model.version, sep = '\\t')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -503,7 +503,7 @@
|
|||||||
"* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n",
|
"* `init()`: In this function, you typically load the model into a `global` object. This function is executed only once when the Docker container is started. \n",
|
||||||
"* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n",
|
"* `run(input_data)`: In this function, the model is used to predict a value based on the input data. The input and output typically use JSON as serialization and deserialization format, but you are not limited to that.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is an ant or a bee. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
"Refer to the scoring script `pytorch_score.py` for this tutorial. Our web service will use this file to predict whether an image is a chicken or a turkey. When writing your own scoring script, don't forget to test it locally first before you go and deploy the web service."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -549,7 +549,7 @@
|
|||||||
"image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n",
|
"image_config = ContainerImage.image_configuration(execution_script='pytorch_score.py', \n",
|
||||||
" runtime='python', \n",
|
" runtime='python', \n",
|
||||||
" conda_file='myenv.yml',\n",
|
" conda_file='myenv.yml',\n",
|
||||||
" description='Image with hymenoptera model')"
|
" description='Image with bird model')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -570,8 +570,8 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n",
|
||||||
" memory_gb=1, \n",
|
" memory_gb=1, \n",
|
||||||
" tags={'data': 'hymenoptera', 'method':'transfer learning', 'framework':'pytorch'},\n",
|
" tags={'data': 'birds', 'method':'transfer learning', 'framework':'pytorch'},\n",
|
||||||
" description='Classify ants/bees using transfer learning with PyTorch')"
|
" description='Classify turkey/chickens using transfer learning with PyTorch')"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -591,7 +591,7 @@
|
|||||||
"%%time\n",
|
"%%time\n",
|
||||||
"from azureml.core.webservice import Webservice\n",
|
"from azureml.core.webservice import Webservice\n",
|
||||||
"\n",
|
"\n",
|
||||||
"service_name = 'aci-hymenoptera'\n",
|
"service_name = 'aci-birds'\n",
|
||||||
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
"service = Webservice.deploy_from_model(workspace=ws,\n",
|
||||||
" name=service_name,\n",
|
" name=service_name,\n",
|
||||||
" models=[model],\n",
|
" models=[model],\n",
|
||||||
@@ -659,6 +659,7 @@
|
|||||||
"from PIL import Image\n",
|
"from PIL import Image\n",
|
||||||
"import matplotlib.pyplot as plt\n",
|
"import matplotlib.pyplot as plt\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
"%matplotlib inline\n",
|
||||||
"plt.imshow(Image.open('test_img.jpg'))"
|
"plt.imshow(Image.open('test_img.jpg'))"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -0,0 +1,123 @@
|
|||||||
|
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
# Licensed under the MIT License.
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
import argparse
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import tensorflow as tf
|
||||||
|
|
||||||
|
from azureml.core import Run
|
||||||
|
from utils import load_data
|
||||||
|
|
||||||
|
print("TensorFlow version:", tf.VERSION)
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
parser.add_argument('--data-folder', type=str, dest='data_folder', help='data folder mounting point')
|
||||||
|
|
||||||
|
parser.add_argument('--resume-from', type=str, default=None,
|
||||||
|
help='location of the model or checkpoint files from where to resume the training')
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
previous_model_location = args.resume_from
|
||||||
|
# You can also use environment variable to get the model/checkpoint files location
|
||||||
|
# previous_model_location = os.path.expandvars(os.getenv("AZUREML_DATAREFERENCE_MODEL_LOCATION", None))
|
||||||
|
|
||||||
|
data_folder = os.path.join(args.data_folder, 'mnist')
|
||||||
|
|
||||||
|
print('training dataset is stored here:', data_folder)
|
||||||
|
|
||||||
|
X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0
|
||||||
|
X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0
|
||||||
|
|
||||||
|
y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)
|
||||||
|
y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)
|
||||||
|
|
||||||
|
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep='\n')
|
||||||
|
training_set_size = X_train.shape[0]
|
||||||
|
|
||||||
|
n_inputs = 28 * 28
|
||||||
|
n_h1 = 100
|
||||||
|
n_h2 = 100
|
||||||
|
n_outputs = 10
|
||||||
|
learning_rate = 0.01
|
||||||
|
n_epochs = 20
|
||||||
|
batch_size = 50
|
||||||
|
|
||||||
|
with tf.name_scope('network'):
|
||||||
|
# construct the DNN
|
||||||
|
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name='X')
|
||||||
|
y = tf.placeholder(tf.int64, shape=(None), name='y')
|
||||||
|
h1 = tf.layers.dense(X, n_h1, activation=tf.nn.relu, name='h1')
|
||||||
|
h2 = tf.layers.dense(h1, n_h2, activation=tf.nn.relu, name='h2')
|
||||||
|
output = tf.layers.dense(h2, n_outputs, name='output')
|
||||||
|
|
||||||
|
with tf.name_scope('train'):
|
||||||
|
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output)
|
||||||
|
loss = tf.reduce_mean(cross_entropy, name='loss')
|
||||||
|
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
|
||||||
|
train_op = optimizer.minimize(loss)
|
||||||
|
|
||||||
|
with tf.name_scope('eval'):
|
||||||
|
correct = tf.nn.in_top_k(output, y, 1)
|
||||||
|
acc_op = tf.reduce_mean(tf.cast(correct, tf.float32))
|
||||||
|
|
||||||
|
init = tf.global_variables_initializer()
|
||||||
|
saver = tf.train.Saver()
|
||||||
|
|
||||||
|
# start an Azure ML run
|
||||||
|
run = Run.get_context()
|
||||||
|
|
||||||
|
with tf.Session() as sess:
|
||||||
|
start_epoch = 0
|
||||||
|
if previous_model_location:
|
||||||
|
checkpoint_file_path = tf.train.latest_checkpoint(previous_model_location)
|
||||||
|
saver.restore(sess, checkpoint_file_path)
|
||||||
|
checkpoint_filename = os.path.basename(checkpoint_file_path)
|
||||||
|
num_found = re.search(r'\d+', checkpoint_filename)
|
||||||
|
if num_found:
|
||||||
|
start_epoch = int(num_found.group(0))
|
||||||
|
print("Resuming from epoch {}".format(str(start_epoch)))
|
||||||
|
else:
|
||||||
|
init.run()
|
||||||
|
|
||||||
|
for epoch in range(start_epoch, n_epochs):
|
||||||
|
|
||||||
|
# randomly shuffle training set
|
||||||
|
indices = np.random.permutation(training_set_size)
|
||||||
|
X_train = X_train[indices]
|
||||||
|
y_train = y_train[indices]
|
||||||
|
|
||||||
|
# batch index
|
||||||
|
b_start = 0
|
||||||
|
b_end = b_start + batch_size
|
||||||
|
for _ in range(training_set_size // batch_size):
|
||||||
|
# get a batch
|
||||||
|
X_batch, y_batch = X_train[b_start: b_end], y_train[b_start: b_end]
|
||||||
|
|
||||||
|
# update batch index for the next batch
|
||||||
|
b_start = b_start + batch_size
|
||||||
|
b_end = min(b_start + batch_size, training_set_size)
|
||||||
|
|
||||||
|
# train
|
||||||
|
sess.run(train_op, feed_dict={X: X_batch, y: y_batch})
|
||||||
|
# evaluate training set
|
||||||
|
acc_train = acc_op.eval(feed_dict={X: X_batch, y: y_batch})
|
||||||
|
# evaluate validation set
|
||||||
|
acc_val = acc_op.eval(feed_dict={X: X_test, y: y_test})
|
||||||
|
|
||||||
|
# log accuracies
|
||||||
|
run.log('training_acc', np.float(acc_train))
|
||||||
|
run.log('validation_acc', np.float(acc_val))
|
||||||
|
print(epoch, '-- Training accuracy:', acc_train, '\b Validation accuracy:', acc_val)
|
||||||
|
y_hat = np.argmax(output.eval(feed_dict={X: X_test}), axis=1)
|
||||||
|
|
||||||
|
if epoch % 5 == 0:
|
||||||
|
saver.save(sess, './outputs/', global_step=epoch)
|
||||||
|
|
||||||
|
# saving only half of the model and resuming again from same epoch
|
||||||
|
if not previous_model_location and epoch == 10:
|
||||||
|
break
|
||||||
|
|
||||||
|
run.log('final_acc', np.float(acc_val))
|
||||||
@@ -0,0 +1,487 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copyright (c) Microsoft Corporation. All rights reserved.\n",
|
||||||
|
"\n",
|
||||||
|
"Licensed under the MIT License."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Resuming Tensorflow training from previous run\n",
|
||||||
|
"In this tutorial, you will resume a mnist model in TensorFlow from a previously submitted run."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Prerequisites\n",
|
||||||
|
"* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning (AML)\n",
|
||||||
|
"* Go through the [configuration notebook](../../../configuration.ipynb) to:\n",
|
||||||
|
" * install the AML SDK\n",
|
||||||
|
" * create a workspace and its configuration file (`config.json`)\n",
|
||||||
|
"* Review the [tutorial](../train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) on single-node TensorFlow training using the SDK"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Check core SDK version number\n",
|
||||||
|
"import azureml.core\n",
|
||||||
|
"\n",
|
||||||
|
"print(\"SDK version:\", azureml.core.VERSION)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Diagnostics\n",
|
||||||
|
"Opt-in diagnostics for better experience, quality, and security of future releases."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {
|
||||||
|
"tags": [
|
||||||
|
"Diagnostics"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.telemetry import set_diagnostics_collection\n",
|
||||||
|
"\n",
|
||||||
|
"set_diagnostics_collection(send_diagnostics=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Initialize workspace\n",
|
||||||
|
"Initialize a [Workspace](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.workspace import Workspace\n",
|
||||||
|
"\n",
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print('Workspace name: ' + ws.name, \n",
|
||||||
|
" 'Azure region: ' + ws.location, \n",
|
||||||
|
" 'Subscription id: ' + ws.subscription_id, \n",
|
||||||
|
" 'Resource group: ' + ws.resource_group, sep='\\n')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Create or Attach existing AmlCompute\n",
|
||||||
|
"You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for training your model. In this tutorial, you create `AmlCompute` as your training compute resource.\n",
|
||||||
|
"\n",
|
||||||
|
"**Creation of AmlCompute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
|
||||||
|
"\n",
|
||||||
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
|
"\n",
|
||||||
|
"# choose a name for your cluster\n",
|
||||||
|
"cluster_name = \"gpu-cluster\"\n",
|
||||||
|
"\n",
|
||||||
|
"try:\n",
|
||||||
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
|
" print('Found existing compute target.')\n",
|
||||||
|
"except ComputeTargetException:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n",
|
||||||
|
" max_nodes=4)\n",
|
||||||
|
"\n",
|
||||||
|
" # create the cluster\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
|
"\n",
|
||||||
|
" compute_target.wait_for_completion(show_output=True)\n",
|
||||||
|
"\n",
|
||||||
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||||
|
"print(compute_target.get_status().serialize())"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The above code creates a GPU cluster. If you instead want to create a CPU cluster, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Upload data to datastore\n",
|
||||||
|
"To make data accessible for remote training, AML provides a convenient way to do so via a [Datastore](https://docs.microsoft.com/azure/machine-learning/service/how-to-access-data). The datastore provides a mechanism for you to upload/download data to Azure Storage, and interact with it from your remote compute targets. \n",
|
||||||
|
"\n",
|
||||||
|
"If your data is already stored in Azure, or you download the data as part of your training script, you will not need to do this step. For this tutorial, although you can download the data in your training script, we will demonstrate how to upload the training data to a datastore and access it during training to illustrate the datastore functionality."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"First download the data from Yan LeCun's web site directly and save them in a data folder locally."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import os\n",
|
||||||
|
"import urllib\n",
|
||||||
|
"\n",
|
||||||
|
"os.makedirs('./data/mnist', exist_ok=True)\n",
|
||||||
|
"\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n",
|
||||||
|
"urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Each workspace is associated with a default datastore. In this tutorial, we will upload the training data to this default datastore."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ds = ws.get_default_datastore()\n",
|
||||||
|
"print(ds.datastore_type, ds.account_name, ds.container_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Upload MNIST data to the default datastore."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"For convenience, let's get a reference to the datastore. In the next section, we can then pass this reference to our training script's `--data-folder` argument. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"ds_data = ds.as_mount()\n",
|
||||||
|
"print(ds_data)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Train model on the remote compute"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a project directory\n",
|
||||||
|
"Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script, and any additional files your training script depends on."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"script_folder = './tf-resume-training'\n",
|
||||||
|
"os.makedirs(script_folder, exist_ok=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Copy the training script `tf_mnist_with_checkpoint.py` into this project directory."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import shutil\n",
|
||||||
|
"\n",
|
||||||
|
"# the training logic is in the tf_mnist_with_checkpoint.py file.\n",
|
||||||
|
"shutil.copy('./tf_mnist_with_checkpoint.py', script_folder)\n",
|
||||||
|
"\n",
|
||||||
|
"# the utils.py just helps loading data from the downloaded MNIST dataset into numpy arrays.\n",
|
||||||
|
"shutil.copy('./utils.py', script_folder)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create an experiment\n",
|
||||||
|
"Create an [Experiment](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#experiment) to track all the runs in your workspace for this distributed TensorFlow tutorial. "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Experiment\n",
|
||||||
|
"\n",
|
||||||
|
"experiment_name = 'tf-resume-training'\n",
|
||||||
|
"experiment = Experiment(ws, name=experiment_name)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Create a TensorFlow estimator\n",
|
||||||
|
"The AML SDK's TensorFlow estimator enables you to easily submit TensorFlow training jobs for both single-node and distributed runs. For more information on the TensorFlow estimator, refer [here](https://docs.microsoft.com/azure/machine-learning/service/how-to-train-tensorflow).\n",
|
||||||
|
"\n",
|
||||||
|
"The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.train.dnn import TensorFlow\n",
|
||||||
|
"\n",
|
||||||
|
"script_params={\n",
|
||||||
|
" '--data-folder': ds_data\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"estimator= TensorFlow(source_directory=script_folder,\n",
|
||||||
|
" compute_target=compute_target,\n",
|
||||||
|
" script_params=script_params,\n",
|
||||||
|
" entry_script='tf_mnist_with_checkpoint.py')"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"In the above code, we passed our training data reference `ds_data` to our script's `--data-folder` argument. This will 1) mount our datastore on the remote compute and 2) provide the path to the data zip file on our datastore."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Submit job\n",
|
||||||
|
"### Run your experiment by submitting your estimator object. Note that this call is asynchronous."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run = experiment.submit(estimator)\n",
|
||||||
|
"print(run)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Monitor your run\n",
|
||||||
|
"You can monitor the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"RunDetails(run).show()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Alternatively, you can block until the script has completed training before running more code."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# Now let's resume the training from the above run"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"First, we will get the DataPath to the outputs directory of the above run which\n",
|
||||||
|
"contains the checkpoint files and/or model"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"model_location = run._get_outputs_datapath()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now, we will create a new TensorFlow estimator and pass in the model location. On passing 'resume_from' parameter, a new entry in script_params is created with key as 'resume_from' and value as the model/checkpoint files location and the location gets automatically mounted on the compute target."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from azureml.train.dnn import TensorFlow\n",
|
||||||
|
"\n",
|
||||||
|
"script_params={\n",
|
||||||
|
" '--data-folder': ds_data\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"estimator2 = TensorFlow(source_directory=script_folder,\n",
|
||||||
|
" compute_target=compute_target,\n",
|
||||||
|
" script_params=script_params,\n",
|
||||||
|
" entry_script='tf_mnist_with_checkpoint.py',\n",
|
||||||
|
" resume_from=model_location)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now you can submit the experiment and it should resume from previous run's checkpoint files."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run2 = experiment.submit(estimator2)\n",
|
||||||
|
"print(run2)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"run2.wait_for_completion(show_output=True)"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"authors": [
|
||||||
|
{
|
||||||
|
"name": "hesuri"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.6"
|
||||||
|
},
|
||||||
|
"msauthor": "hesuri"
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
name: train-tensorflow-resume-training
|
||||||
|
dependencies:
|
||||||
|
- pip:
|
||||||
|
- azureml-sdk
|
||||||
|
- azureml-widgets
|
||||||
@@ -0,0 +1,27 @@
|
|||||||
|
# Copyright (c) Microsoft Corporation. All rights reserved.
|
||||||
|
# Licensed under the MIT License.
|
||||||
|
|
||||||
|
import gzip
|
||||||
|
import numpy as np
|
||||||
|
import struct
|
||||||
|
|
||||||
|
|
||||||
|
# load compressed MNIST gz files and return numpy arrays
|
||||||
|
def load_data(filename, label=False):
|
||||||
|
with gzip.open(filename) as gz:
|
||||||
|
struct.unpack('I', gz.read(4))
|
||||||
|
n_items = struct.unpack('>I', gz.read(4))
|
||||||
|
if not label:
|
||||||
|
n_rows = struct.unpack('>I', gz.read(4))[0]
|
||||||
|
n_cols = struct.unpack('>I', gz.read(4))[0]
|
||||||
|
res = np.frombuffer(gz.read(n_items[0] * n_rows * n_cols), dtype=np.uint8)
|
||||||
|
res = res.reshape(n_items[0], n_rows * n_cols)
|
||||||
|
else:
|
||||||
|
res = np.frombuffer(gz.read(n_items[0]), dtype=np.uint8)
|
||||||
|
res = res.reshape(n_items[0], 1)
|
||||||
|
return res
|
||||||
|
|
||||||
|
|
||||||
|
# one-hot encode a 1-D array
|
||||||
|
def one_hot_encode(array, num_of_classes):
|
||||||
|
return np.eye(num_of_classes)[array.reshape(-1)]
|
||||||
@@ -100,7 +100,7 @@
|
|||||||
"\n",
|
"\n",
|
||||||
"# Check core SDK version number\n",
|
"# Check core SDK version number\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using SDK version 1.0.48\r\n, you are currently running version\", azureml.core.VERSION)"
|
"print(\"This notebook was created using SDK version 1.0.53, you are currently running version\", azureml.core.VERSION)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -120,19 +120,42 @@
|
|||||||
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
"As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"If we could not find the cluster with the given name, then we will create a new cluster here. We will create an `AmlCompute` cluster of `STANDARD_D2_V2` CPU VMs. This process is broken down into 3 steps:\n",
|
||||||
|
"1. create the configuration (this step is local and only takes a second)\n",
|
||||||
|
"2. create the cluster (this step will take about **20 seconds**)\n",
|
||||||
|
"3. provision the VMs to bring the cluster to the initial size (of 1 in this case). This step will take about **3-5 minutes** and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core.compute import ComputeTarget\n",
|
"from azureml.core.compute import ComputeTarget, AmlCompute\n",
|
||||||
|
"from azureml.core.compute_target import ComputeTargetException\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# choose a name for your cluster\n",
|
"# choose a name for your cluster\n",
|
||||||
"cluster_name = \"cpu-cluster\"\n",
|
"cluster_name = \"cpu-cluster\"\n",
|
||||||
"\n",
|
"\n",
|
||||||
"compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
"try:\n",
|
||||||
"print('Found existing compute target.')\n",
|
" compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n",
|
||||||
|
" print('Found existing compute target')\n",
|
||||||
|
"except ComputeTargetException:\n",
|
||||||
|
" print('Creating a new compute target...')\n",
|
||||||
|
" compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n",
|
||||||
|
" max_nodes=4)\n",
|
||||||
|
"\n",
|
||||||
|
" # create the cluster\n",
|
||||||
|
" compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n",
|
||||||
|
"\n",
|
||||||
|
" # can poll for a minimum number of nodes and for a specific timeout. \n",
|
||||||
|
" # if no min node count is provided it uses the scale settings for the cluster\n",
|
||||||
|
" compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# use get_status() to get a detailed status for the current cluster. \n",
|
"# use get_status() to get a detailed status for the current cluster. \n",
|
||||||
"print(compute_target.get_status().serialize())"
|
"print(compute_target.get_status().serialize())"
|
||||||
@@ -142,7 +165,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code retrieves an existing CPU compute target. Scikit-learn does not support GPU computing."
|
"The above code retrieves a CPU compute target. Scikit-learn does not support GPU computing."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
@@ -289,7 +312,7 @@
|
|||||||
" script_params=script_params,\n",
|
" script_params=script_params,\n",
|
||||||
" compute_target=compute_target,\n",
|
" compute_target=compute_target,\n",
|
||||||
" entry_script='train_iris.py',\n",
|
" entry_script='train_iris.py',\n",
|
||||||
" pip_packages=['joblib']\n",
|
" pip_packages=['joblib==0.13.2']\n",
|
||||||
" )"
|
" )"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -507,7 +530,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"model = best_run.register_model(model_name='sklearn-iris', model_path='model.joblib')"
|
"model = best_run.register_model(model_name='sklearn-iris', model_path='outputs/model.joblib')"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
@@ -1,6 +1,7 @@
|
|||||||
# Modified from https://www.geeksforgeeks.org/multiclass-classification-using-scikit-learn/
|
# Modified from https://www.geeksforgeeks.org/multiclass-classification-using-scikit-learn/
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
|
import os
|
||||||
|
|
||||||
# importing necessary libraries
|
# importing necessary libraries
|
||||||
import numpy as np
|
import numpy as np
|
||||||
@@ -50,8 +51,9 @@ def main():
|
|||||||
cm = confusion_matrix(y_test, svm_predictions)
|
cm = confusion_matrix(y_test, svm_predictions)
|
||||||
print(cm)
|
print(cm)
|
||||||
|
|
||||||
# save model
|
os.makedirs('outputs', exist_ok=True)
|
||||||
joblib.dump(svm_model_linear, 'model.joblib')
|
# files saved in the "outputs" folder are automatically uploaded into run history
|
||||||
|
joblib.dump(svm_model_linear, 'outputs/model.joblib')
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
|
|||||||
@@ -102,7 +102,7 @@
|
|||||||
"source": [
|
"source": [
|
||||||
"import azureml.core\n",
|
"import azureml.core\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"This notebook was created using version 1.0.48\r\n of the Azure ML SDK\")\n",
|
"print(\"This notebook was created using version 1.0.53 of the Azure ML SDK\")\n",
|
||||||
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
"print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -1,385 +1,385 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Copyright (c) Microsoft Corporation. All rights reserved."
|
"Copyright (c) Microsoft Corporation. All rights reserved."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# Tutorial: Train your first model"
|
"# Tutorial: Train your first model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"This tutorial is **part two of a two-part tutorial series**. In the previous tutorial, you created a workspace and chose a development environment. In this tutorial, you learn the foundational design patterns in Azure Machine Learning service, and train a simple scikit-learn model based on the diabetes data set. After completing this tutorial, you will have the practical knowledge of the SDK to scale up to developing more-complex experiments and workflows. \n",
|
"This tutorial is **part two of a two-part tutorial series**. In the previous tutorial, you created a workspace and chose a development environment. In this tutorial, you learn the foundational design patterns in Azure Machine Learning service, and train a simple scikit-learn model based on the diabetes data set. After completing this tutorial, you will have the practical knowledge of the SDK to scale up to developing more-complex experiments and workflows. \n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this tutorial, you learn the following tasks:\n",
|
"In this tutorial, you learn the following tasks:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"> * Connect your workspace and create an experiment \n",
|
"> * Connect your workspace and create an experiment \n",
|
||||||
"> * Load data and train a scikit-learn model\n",
|
"> * Load data and train a scikit-learn model\n",
|
||||||
"> * View training results in the portal\n",
|
"> * View training results in the portal\n",
|
||||||
"> * Retrieve the best model"
|
"> * Retrieve the best model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Prerequisites\n",
|
"## Prerequisites\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The only prerequisite is to run the previous tutorial, Setup environment and workspace."
|
"The only prerequisite is to run the previous tutorial, Setup environment and workspace."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Connect workspace and create experiment"
|
"## Connect workspace and create experiment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Import the `Workspace` class, and load your subscription information from the file `config.json` using the function `from_config().` This looks for the JSON file in the current directory by default, but you can also specify a path parameter to point to the file using `from_config(path=\"your/file/path\")`. If you are running this notebook in a cloud notebook server in your workspace, the file is automatically in the root directory.\n",
|
"Import the `Workspace` class, and load your subscription information from the file `config.json` using the function `from_config().` This looks for the JSON file in the current directory by default, but you can also specify a path parameter to point to the file using `from_config(path=\"your/file/path\")`. If you are running this notebook in a cloud notebook server in your workspace, the file is automatically in the root directory.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If the following code asks for additional authentication, simply paste the link in a browser and enter the authentication token."
|
"If the following code asks for additional authentication, simply paste the link in a browser and enter the authentication token."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Workspace\n",
|
"from azureml.core import Workspace\n",
|
||||||
"ws = Workspace.from_config()"
|
"ws = Workspace.from_config()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Portal. Parameters include your workspace reference, and a string name for the experiment."
|
"Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Portal. Parameters include your workspace reference, and a string name for the experiment."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Experiment\n",
|
"from azureml.core import Experiment\n",
|
||||||
"experiment = Experiment(workspace=ws, name=\"diabetes-experiment\")"
|
"experiment = Experiment(workspace=ws, name=\"diabetes-experiment\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Load data and prepare for training"
|
"## Load data and prepare for training"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"For this tutorial, you use the diabetes data set, which is a pre-normalized data set included in scikit-learn. This data set uses features like age, gender, and BMI to predict diabetes disease progression. Load the data from the `load_diabetes()` static function, and split it into training and test sets using `train_test_split()`. This function segregates the data so the model has unseen data to use for testing following training."
|
"For this tutorial, you use the diabetes data set, which is a pre-normalized data set included in scikit-learn. This data set uses features like age, gender, and BMI to predict diabetes disease progression. Load the data from the `load_diabetes()` static function, and split it into training and test sets using `train_test_split()`. This function segregates the data so the model has unseen data to use for testing following training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from sklearn.datasets import load_diabetes\n",
|
"from sklearn.datasets import load_diabetes\n",
|
||||||
"from sklearn.model_selection import train_test_split\n",
|
"from sklearn.model_selection import train_test_split\n",
|
||||||
"\n",
|
"\n",
|
||||||
"X, y = load_diabetes(return_X_y = True)\n",
|
"X, y = load_diabetes(return_X_y = True)\n",
|
||||||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=66)"
|
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=66)"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Train a model"
|
"## Train a model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Training a simple scikit-learn model can easily be done locally for small-scale training, but when training many iterations with dozens of different feature permutations and hyperparameter settings, it is easy to lose track of what models you've trained and how you trained them. The following design pattern shows how to leverage the SDK to easily keep track of your training in the cloud.\n",
|
"Training a simple scikit-learn model can easily be done locally for small-scale training, but when training many iterations with dozens of different feature permutations and hyperparameter settings, it is easy to lose track of what models you've trained and how you trained them. The following design pattern shows how to leverage the SDK to easily keep track of your training in the cloud.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Build a script that trains ridge models in a loop through different hyperparameter alpha values."
|
"Build a script that trains ridge models in a loop through different hyperparameter alpha values."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from sklearn.linear_model import Ridge\n",
|
"from sklearn.linear_model import Ridge\n",
|
||||||
"from sklearn.metrics import mean_squared_error\n",
|
"from sklearn.metrics import mean_squared_error\n",
|
||||||
"from sklearn.externals import joblib\n",
|
"from sklearn.externals import joblib\n",
|
||||||
"import math\n",
|
"import math\n",
|
||||||
"\n",
|
"\n",
|
||||||
"alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
|
"alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for alpha in alphas:\n",
|
"for alpha in alphas:\n",
|
||||||
" run = experiment.start_logging()\n",
|
" run = experiment.start_logging()\n",
|
||||||
" run.log(\"alpha_value\", alpha)\n",
|
" run.log(\"alpha_value\", alpha)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" model = Ridge(alpha=alpha)\n",
|
" model = Ridge(alpha=alpha)\n",
|
||||||
" model.fit(X=X_train, y=y_train)\n",
|
" model.fit(X=X_train, y=y_train)\n",
|
||||||
" y_pred = model.predict(X=X_test)\n",
|
" y_pred = model.predict(X=X_test)\n",
|
||||||
" rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))\n",
|
" rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))\n",
|
||||||
" run.log(\"rmse\", rmse)\n",
|
" run.log(\"rmse\", rmse)\n",
|
||||||
" \n",
|
" \n",
|
||||||
" model_name = \"model_alpha_\" + str(alpha) + \".pkl\"\n",
|
" model_name = \"model_alpha_\" + str(alpha) + \".pkl\"\n",
|
||||||
" filename = \"outputs/\" + model_name\n",
|
" filename = \"outputs/\" + model_name\n",
|
||||||
" \n",
|
" \n",
|
||||||
" joblib.dump(value=model, filename=filename)\n",
|
" joblib.dump(value=model, filename=filename)\n",
|
||||||
" run.upload_file(name=model_name, path_or_stream=filename)\n",
|
" run.upload_file(name=model_name, path_or_stream=filename)\n",
|
||||||
" run.complete()"
|
" run.complete()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"The above code accomplishes the following:\n",
|
"The above code accomplishes the following:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. For each alpha hyperparameter value in the `alphas` array, a new run is created within the experiment. The alpha value is logged to differentiate between each run.\n",
|
"1. For each alpha hyperparameter value in the `alphas` array, a new run is created within the experiment. The alpha value is logged to differentiate between each run.\n",
|
||||||
"1. In each run, a Ridge model is instantiated, trained, and used to run predictions. The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. At this point the run has metadata attached for both the alpha value and the rmse accuracy.\n",
|
"1. In each run, a Ridge model is instantiated, trained, and used to run predictions. The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. At this point the run has metadata attached for both the alpha value and the rmse accuracy.\n",
|
||||||
"1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the portal.\n",
|
"1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the portal.\n",
|
||||||
"1. At the end of each iteration the run is completed by calling `run.complete()`.\n",
|
"1. At the end of each iteration the run is completed by calling `run.complete()`.\n",
|
||||||
"\n"
|
"\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"After the training has completed, call the `experiment` variable to fetch a link to the experiment in the portal."
|
"After the training has completed, call the `experiment` variable to fetch a link to the experiment in the portal."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"experiment"
|
"experiment"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## View training results in portal"
|
"## View training results in portal"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Following the **Link to Azure Portal** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n",
|
"Following the **Link to Azure Portal** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"When training models at scale over hundreds and thousands of runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time."
|
"When training models at scale over hundreds and thousands of runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Clicking on a run number link in the `RUN NUMBER` column takes you to the page for each individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually."
|
"Clicking on a run number link in the `RUN NUMBER` column takes you to the page for each individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
""
|
""
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Get the best model"
|
"## Get the best model"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"In addition to being able to download model files from the experiment in the portal, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error."
|
"In addition to being able to download model files from the experiment in the portal, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"minimum_rmse_runid = None\n",
|
"minimum_rmse_runid = None\n",
|
||||||
"minimum_rmse = None\n",
|
"minimum_rmse = None\n",
|
||||||
"\n",
|
"\n",
|
||||||
"for run in experiment.get_runs():\n",
|
"for run in experiment.get_runs():\n",
|
||||||
" run_metrics = run.get_metrics()\n",
|
" run_metrics = run.get_metrics()\n",
|
||||||
" run_details = run.get_details()\n",
|
" run_details = run.get_details()\n",
|
||||||
" # each logged metric becomes a key in this returned dict\n",
|
" # each logged metric becomes a key in this returned dict\n",
|
||||||
" run_rmse = run_metrics[\"rmse\"]\n",
|
" run_rmse = run_metrics[\"rmse\"]\n",
|
||||||
" run_id = run_details[\"runId\"]\n",
|
" run_id = run_details[\"runId\"]\n",
|
||||||
" \n",
|
" \n",
|
||||||
" if minimum_rmse is None:\n",
|
" if minimum_rmse is None:\n",
|
||||||
" minimum_rmse = run_rmse\n",
|
" minimum_rmse = run_rmse\n",
|
||||||
" minimum_rmse_runid = run_id\n",
|
" minimum_rmse_runid = run_id\n",
|
||||||
" else:\n",
|
" else:\n",
|
||||||
" if run_rmse < minimum_rmse:\n",
|
" if run_rmse < minimum_rmse:\n",
|
||||||
" minimum_rmse = run_rmse\n",
|
" minimum_rmse = run_rmse\n",
|
||||||
" minimum_rmse_runid = run_id\n",
|
" minimum_rmse_runid = run_id\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(\"Best run_id: \" + minimum_rmse_runid)\n",
|
"print(\"Best run_id: \" + minimum_rmse_runid)\n",
|
||||||
"print(\"Best run_id rmse: \" + str(minimum_rmse)) "
|
"print(\"Best run_id rmse: \" + str(minimum_rmse)) "
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Use the best run id to fetch the individual run using the `Run` constructor along with the experiment object. Then call `get_file_names()` to see all the files available for download from this run. In this case, you only uploaded one file for each run during training."
|
"Use the best run id to fetch the individual run using the `Run` constructor along with the experiment object. Then call `get_file_names()` to see all the files available for download from this run. In this case, you only uploaded one file for each run during training."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"from azureml.core import Run\n",
|
"from azureml.core import Run\n",
|
||||||
"best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)\n",
|
"best_run = Run(experiment=experiment, run_id=minimum_rmse_runid)\n",
|
||||||
"print(best_run.get_file_names())"
|
"print(best_run.get_file_names())"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Call `download()` on the run object, specifying the model file name to download. By default this function downloads to the current directory."
|
"Call `download()` on the run object, specifying the model file name to download. By default this function downloads to the current directory."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"best_run.download_file(name=\"model_alpha_0.1.pkl\")"
|
"best_run.download_file(name=\"model_alpha_0.1.pkl\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Clean up resources\n",
|
"## Clean up resources\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Do not complete this section if you plan on running other Azure Machine Learning service tutorials.\n",
|
"Do not complete this section if you plan on running other Azure Machine Learning service tutorials.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Stop the notebook VM\n",
|
"### Stop the notebook VM\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.\n",
|
"If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. In your workspace, select **Notebook VMs**.\n",
|
"1. In your workspace, select **Notebook VMs**.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. From the list, select the VM.\n",
|
"1. From the list, select the VM.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. Select **Stop**.\n",
|
"1. Select **Stop**.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. When you're ready to use the server again, select **Start**.\n",
|
"1. When you're ready to use the server again, select **Start**.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"### Delete everything\n",
|
"### Delete everything\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you don't plan to use the resources you created, delete them, so you don't incur any charges:\n",
|
"If you don't plan to use the resources you created, delete them, so you don't incur any charges:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. In the Azure portal, select **Resource groups** on the far left.\n",
|
"1. In the Azure portal, select **Resource groups** on the far left.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. From the list, select the resource group you created.\n",
|
"1. From the list, select the resource group you created.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. Select **Delete resource group**.\n",
|
"1. Select **Delete resource group**.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. Enter the resource group name. Then select **Delete**.\n",
|
"1. Enter the resource group name. Then select **Delete**.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**."
|
"You can also keep the resource group but delete a single workspace. Display the workspace properties and select **Delete**."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Next steps\n",
|
"## Next steps\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this tutorial, you did the following tasks:\n",
|
"In this tutorial, you did the following tasks:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"> * Connected your workspace and created an experiment\n",
|
"> * Connected your workspace and created an experiment\n",
|
||||||
"> * Loaded data and trained scikit-learn models\n",
|
"> * Loaded data and trained scikit-learn models\n",
|
||||||
"> * Viewed training results in the portal and retrieved models\n",
|
"> * Viewed training results in the portal and retrieved models\n",
|
||||||
"\n",
|
"\n",
|
||||||
"[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning.\n",
|
"[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning.\n",
|
||||||
"Learn how to develop [automated machine learning](https://docs.microsoft.com/azure/machine-learning/service/tutorial-auto-train-models) experiments."
|
"Learn how to develop [automated machine learning](https://docs.microsoft.com/azure/machine-learning/service/tutorial-auto-train-models) experiments."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
|
||||||
"metadata": {
|
|
||||||
"authors": [
|
|
||||||
{
|
|
||||||
"name": "trbye"
|
|
||||||
}
|
|
||||||
],
|
],
|
||||||
"kernelspec": {
|
"metadata": {
|
||||||
"display_name": "Python 3.6",
|
"authors": [
|
||||||
"language": "python",
|
{
|
||||||
"name": "python36"
|
"name": "trbye"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3.6",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python36"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.6.5"
|
||||||
|
},
|
||||||
|
"msauthor": "trbye"
|
||||||
},
|
},
|
||||||
"language_info": {
|
"nbformat": 4,
|
||||||
"codemirror_mode": {
|
"nbformat_minor": 2
|
||||||
"name": "ipython",
|
|
||||||
"version": 3
|
|
||||||
},
|
|
||||||
"file_extension": ".py",
|
|
||||||
"mimetype": "text/x-python",
|
|
||||||
"name": "python",
|
|
||||||
"nbconvert_exporter": "python",
|
|
||||||
"pygments_lexer": "ipython3",
|
|
||||||
"version": "3.6.5"
|
|
||||||
},
|
|
||||||
"msauthor": "trbye"
|
|
||||||
},
|
|
||||||
"nbformat": 4,
|
|
||||||
"nbformat_minor": 2
|
|
||||||
}
|
}
|
||||||
45
work-with-data/dataprep/data/ADLSgen2-datapreptest.crt
Normal file
45
work-with-data/dataprep/data/ADLSgen2-datapreptest.crt
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
-----BEGIN PRIVATE KEY-----
|
||||||
|
MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC/C0oc6vvF1UEc
|
||||||
|
y9JeGDXdtKynG11wTTIHIokFhNinHNSpJBLmNWFyFkqzvjJCPR4kWuqw4IXhCS3L
|
||||||
|
VoqRmT680SvUFFF6HnEaa75Bc1YSACn1ZsHuCRGrqO9BaTgt3mM0sRYC67+f+W0E
|
||||||
|
tA+k+EA0XnTtDdEBX3RLzvaYAR4yijEHIBQeeNemPYK4msW6Xw67ib1xn59blX4Z
|
||||||
|
a4Z85FjrekmoTl9493bFj6znDTX6wpKsPF7WLEF9S+oD/Lg4EHBi9BfefFxQpGZ9
|
||||||
|
FQHToFKyz1tA2iaY/9LjCtJcincMkuXt3KuQA4Nv2GiTzz4+FEy1pOqHnyNL2tFR
|
||||||
|
1G5n04BHAgMBAAECggEAAqcXeltQ76hMZSf3XdMcPF3b394jaAHKZgr2uBrmHzvp
|
||||||
|
QAf+MzAekET6+I/1hrHujzar95TGhx9ngWFMP0VPd7O31hQKJZXyoBlK5QHC+jEC
|
||||||
|
ZCPvIW0Cz81itRfO7eQeoIas9ZFscb4240/Uv8eqrI97NCdy9X/rz3mqNuYdEzqN
|
||||||
|
2v9XlwE/Fyx79O1PQqzPRiQt3n4ss9NO169y7X99KUZtYiZAiyBBGS8wYdaGF69G
|
||||||
|
URZ3qwoUE+nByZdeRfFLLTy+UDCOwQZV+0V4p0J++YLqQAac340A1F4D60qzMHnv
|
||||||
|
KVKnMc+RrYYVFOZU+USRlphSl3Ws5j0u94CiLitK4QKBgQDivJVHNmk1JleI/MPF
|
||||||
|
bx/YT5gzcVRFhGxkGso12JrQiFPs05JmoRFaqNBDNoZYDn2ggUrMwZVfPI5C6+7U
|
||||||
|
tCe2vrjVpvcAO9reK1u4N9ohpUpkocxWQy0nNHlrorDTZnyKreRtPC87W8xpiwl4
|
||||||
|
R/+nMgGd8vex7tGfchpThj8ZeQKBgQDXs2sgpE8vmnZBWrXAuGD8M9VnfcALEjwL
|
||||||
|
Fi3NR+XCr8jHkeIJVbSI2/asWsBGg8v6gV6Cdx9KV9r+fHDzdocS85X4P7crP83A
|
||||||
|
IX2rTT6Hsmc170SzCDa2jJJyLHQ6qtXBS9ZW8/dPFc1fiBf0NcmTLrRoNg5N8Px6
|
||||||
|
Qt0T51q3vwKBgQCYAfhOetMD2AW9iEAzwDFoUsxmSKdHx+TnI/LHMMVx4sPpNVqk
|
||||||
|
RX2d+ylMtmRQ6r4cejHMnkfnRnDVutkubu1lHe5LBpn35Sjx472k/oTWI7uBRdv5
|
||||||
|
RSYjb5GrsLG9uKrsSnKnLT85G20qoRUjN5nU3LiqzPZ0qviMXfH6ZzkseQKBgQCT
|
||||||
|
ft6MTY7QUGD4w5xxEiNPkeolgHmnmGpyclITg0x7WlSDEyBrna17wF3m8Y91KH58
|
||||||
|
56XGtMoyvezEBDgAY1ZuAR7VyEvqSRDahow2bPWLONUWrmxduAohvfIOHJPF4jeU
|
||||||
|
m9UPVHgSHih3YMpwda9G87LtZ7lUVqtutvYRvCvuZQKBgAypo514DZW7Y9lMCgkR
|
||||||
|
GpJLKCWFR0Sl9bQXI7N5nAG0YFz5ZhdA1PjS2tj+OKyWR6wekbv3g0CyVXT4XYsi
|
||||||
|
tKRu9PR2OUQLPv/h2qLAeSOYdScfWoOU5tlb4tkLoUNmj5/N9VpqbvLdDh6hPWQL
|
||||||
|
o4s+29QYKEoNmOrcZ6oRkRP8
|
||||||
|
-----END PRIVATE KEY-----
|
||||||
|
-----BEGIN CERTIFICATE-----
|
||||||
|
MIICoTCCAYkCAgPoMA0GCSqGSIb3DQEBBQUAMBQxEjAQBgNVBAMMCUNMSS1Mb2dp
|
||||||
|
bjAiGA8yMDE5MDUwMzIwMDIwOVoYDzIwMjAwNTAzMjAwMjExWjAUMRIwEAYDVQQD
|
||||||
|
DAlDTEktTG9naW4wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC/C0oc
|
||||||
|
6vvF1UEcy9JeGDXdtKynG11wTTIHIokFhNinHNSpJBLmNWFyFkqzvjJCPR4kWuqw
|
||||||
|
4IXhCS3LVoqRmT680SvUFFF6HnEaa75Bc1YSACn1ZsHuCRGrqO9BaTgt3mM0sRYC
|
||||||
|
67+f+W0EtA+k+EA0XnTtDdEBX3RLzvaYAR4yijEHIBQeeNemPYK4msW6Xw67ib1x
|
||||||
|
n59blX4Za4Z85FjrekmoTl9493bFj6znDTX6wpKsPF7WLEF9S+oD/Lg4EHBi9Bfe
|
||||||
|
fFxQpGZ9FQHToFKyz1tA2iaY/9LjCtJcincMkuXt3KuQA4Nv2GiTzz4+FEy1pOqH
|
||||||
|
nyNL2tFR1G5n04BHAgMBAAEwDQYJKoZIhvcNAQEFBQADggEBAGz3pOgNPESr+QoO
|
||||||
|
OVCgSS6VtWlmrAcxl5JaiNBFpBGAqfvbfRe1eZY7Rn6fuw1jc3pPBVzNTf8Plel+
|
||||||
|
DcuLzDLJAEag2GpRE+Xg57DNSwPqP6jZfHRE/ufLwIRLcNG9wRUwqlBvdAu1Kign
|
||||||
|
nlTZvTEAwxlQdvmIIT1XrTLZ+OwtVXcgrf0vInmueZKz/UDqsSDPY+d426S9eOWt
|
||||||
|
60h2WgXPU3QvBYfA6Yd2ReeP3+SHwBd4/1ByNFWBytcI9ow3pp2JznU366dfX4IQ
|
||||||
|
Q0iOTvHzXbfPmtsxqho6+hBbLvXVNWJMg8e22Pp/TyXYqeV5V09k18EgCnuA/9Gd
|
||||||
|
kKDVROA=
|
||||||
|
-----END CERTIFICATE-----
|
||||||
@@ -222,7 +222,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.6.8"
|
"version": "3.6.4"
|
||||||
},
|
},
|
||||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
||||||
},
|
},
|
||||||
|
|||||||
@@ -47,6 +47,7 @@
|
|||||||
"[Read PostgreSQL](#postgresql)<br>\n",
|
"[Read PostgreSQL](#postgresql)<br>\n",
|
||||||
"[Read From Azure Blob](#azure-blob)<br>\n",
|
"[Read From Azure Blob](#azure-blob)<br>\n",
|
||||||
"[Read From ADLS](#adls)<br>\n",
|
"[Read From ADLS](#adls)<br>\n",
|
||||||
|
"[Read From ADLSGen2](#adlsgen2)<br>\n",
|
||||||
"[Read Pandas DataFrame](#pandas-df)<br>"
|
"[Read Pandas DataFrame](#pandas-df)<br>"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -315,6 +316,25 @@
|
|||||||
"df"
|
"df"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You can see in the results that the FBI Code column now contains some NaN values where before, when calling head, it didn't. By default, `to_pandas_dataframe` attempts to coalesce columns into a single type for better performance and lower memory overhead. This specific column has a mixutre of both numbers and strings and the strings were replaced with NaN values.\n",
|
||||||
|
"\n",
|
||||||
|
"If you wish to keep the mixed-type column in the Pandas DataFrame, you can set the `extended_types` argument to True when calling `to_pandas_dataframe`."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df = dflow_skipped_rows.to_pandas_dataframe(extended_types=True)\n",
|
||||||
|
"df"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -635,7 +655,7 @@
|
|||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"df = dflow.to_pandas_dataframe()\n",
|
"df = dflow.to_pandas_dataframe(extended_types=True)\n",
|
||||||
"df.dtypes"
|
"df.dtypes"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@@ -751,7 +771,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"There are two ways the Data Prep API can acquire the necessary OAuth token to access Azure DataLake Storage:\n",
|
"Data Prep currently supports both ADLS and ADLSGen2. There are two ways the Data Prep API can acquire the necessary OAuth token to access Azure DataLake Storage:\n",
|
||||||
"1. Retrieve the access token from a recent login session of the user's [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) login.\n",
|
"1. Retrieve the access token from a recent login session of the user's [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) login.\n",
|
||||||
"2. Use a ServicePrincipal (SP) and a certificate as a secret."
|
"2. Use a ServicePrincipal (SP) and a certificate as a secret."
|
||||||
]
|
]
|
||||||
@@ -883,6 +903,70 @@
|
|||||||
"dflow.to_pandas_dataframe().head()"
|
"dflow.to_pandas_dataframe().head()"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"<a id=\"adlsgen2\"></a>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Read from ADLSGen2"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Please refer to the Read for ADLS section above to get details of how to register a Service Principal and obtain an OAuth access token.[ADLS](http://localhost:8888/notebooks/notebooks/how-to-guides/data-ingestion.ipynb#adls)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Configure ADLSGen2 Account for ServicePrincipal"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"certThumbprint = '23:66:84:6B:3A:14:9E:B1:17:CA:EE:E3:BB:2C:21:2D:20:B0:DF:F2'\n",
|
||||||
|
"certificate = ''\n",
|
||||||
|
"with open('../data/ADLSgen2-datapreptest.crt', 'rt', encoding='utf-8') as crtFile:\n",
|
||||||
|
" certificate = crtFile.read()\n",
|
||||||
|
"\n",
|
||||||
|
"servicePrincipalAppId = \"127a58c3-f307-46a1-969e-a6b63da3f411\""
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"### Acquire an OAuth Access Token for ADLSGen2"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import adal\n",
|
||||||
|
"from azureml.dataprep.api.datasources import ADLSGen2\n",
|
||||||
|
"\n",
|
||||||
|
"ctx = adal.AuthenticationContext('https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47')\n",
|
||||||
|
"token = ctx.acquire_token_with_client_certificate('https://storage.azure.com/', servicePrincipalAppId, certificate, certThumbprint)\n",
|
||||||
|
"dflow = dprep.read_csv(path = ADLSGen2(path='https://adlsgen2datapreptest.dfs.core.windows.net/datapreptest/people.csv', accessToken=token['accessToken']))\n",
|
||||||
|
"dflow.to_pandas_dataframe().head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
@@ -923,7 +1007,24 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"After loading in the data you can now do `read_pandas_dataframe`."
|
"After loading in the data you can now do `read_pandas_dataframe`. If you only need to consume the Dataflow created from the current environment, you can read the DataFrame in memory."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"dflow_df = dprep.read_pandas_dataframe(df, in_memory=True)\n",
|
||||||
|
"dflow_df.head(5)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"However, if you intend to use this Dataflow past the end of your current Python session (such as by saving the Dataflow to a file), you can provide a cache directory where the contents of the DataFrame will be stored so they can be retrieved later."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|||||||
@@ -183,6 +183,37 @@
|
|||||||
"dflow_adls = dprep.read_csv(path=DataPath(datastore, path_on_datastore='/input/crime0-10.csv'))\n",
|
"dflow_adls = dprep.read_csv(path=DataPath(datastore, path_on_datastore='/input/crime0-10.csv'))\n",
|
||||||
"dflow_adls.head(5)"
|
"dflow_adls.head(5)"
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Now you can read all the files in the `dataprep_adlsgen2` datastore which references an ADLSGen2 Storage account."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# read a file from ADLSGen2\n",
|
||||||
|
"datastore = Datastore(workspace=workspace, name='adlsgen2')\n",
|
||||||
|
"dflow_adlsgen2 = dprep.read_csv(path=DataPath(datastore, path_on_datastore='/testfolder/peopletest.csv'))\n",
|
||||||
|
"dflow_adlsgen2.head(5)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# read all files from ADLSGen2 directory\n",
|
||||||
|
"datastore = Datastore(workspace=workspace, name='adlsgen2')\n",
|
||||||
|
"dflow_adlsgen2 = dprep.read_csv(path=DataPath(datastore, path_on_datastore='/testfolder/testdir'))\n",
|
||||||
|
"dflow_adlsgen2.head()"
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"metadata": {
|
"metadata": {
|
||||||
|
|||||||
@@ -186,7 +186,7 @@
|
|||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"Now we have successfully split the data into useful columns through examples. "
|
"Now we have successfully split the data into useful columns through examples."
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|||||||
Reference in New Issue
Block a user