update samples - test

Merge pull request #621 from Azure/ak/revert-db-overwrite
Revert automatic overwrite of databricks content
2025-12-20 09:37:04 -05:00 · 2019-10-15 22:01:55 +00:00 · 2019-10-15 16:07:37 -04:00
10 changed files with 27 additions and 150 deletions
--- a/how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc
+++ b/how-to-use-azureml/azure-databricks/Databricks_AMLSDK_1-4_6.dbc
--- a/how-to-use-azureml/azure-databricks/README.md
+++ b/how-to-use-azureml/azure-databricks/README.md
@@ -21,49 +21,9 @@ Notebook 6 is an Automated ML sample notebook for Classification.

 Learn more about [how to use Azure Databricks as a development environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#azure-databricks) for Azure Machine Learning service.

-**Databricks as a Compute Target from Azure ML Pipelines**
+**Databricks as a Compute Target from AML Pipelines**
 You can use Azure Databricks as a compute target from [Azure Machine Learning Pipelines](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines). Take a look at this notebook for details: [aml-pipelines-use-databricks-as-compute-target.ipynb](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/azure-databricks/databricks-as-remote-compute-target/aml-pipelines-use-databricks-as-compute-target.ipynb). 

-# Linked Azure Databricks and Azure Machine Learning Workspaces (Preview)
-Customers can now link Azure Databricks and AzureML Workspaces to better enable cross-Azure ML scenarios by [managing their tracking data in a single place when using the MLflow client](https://mlflow.org/docs/latest/tracking.html#mlflow-tracking) - the Azure ML workspace.
-
-## Linking the Workspaces (Admin operation)
-
-1. The Azure Databricks Azure portal blade now includes a new button to link an Azure ML workspace.
-![New ADB Portal Link button](./img/adb-link-button.png)
-2. Both a new or existing Azure ML Workspace can be linked in the resulting prompt. Follow any instructions to set up the Azure ML Workspace.
-![Link Prompt](./img/link-prompt.png)
-3. After a successful link operation, you should see the Azure Databricks overview reflect the linked status
-![Linked Successfully](./img/adb-successful-link.png)
-
-## Configure MLflow to send data to Azure ML (All roles)
-
-1. Add azureml-mlflow as a library to any notebook or cluster that should send data to Azure ML. You can do this via:
-    1. [DBUtils](https://docs.azuredatabricks.net/user-guide/dev-tools/dbutils.html#dbutils-library)
-        ```
-        dbutils.library.installPyPI("azureml-mlflow")
-        dbutils.library.restartPython()  # Removes Python state
-        ```
-    2. [Cluster Libraries](https://docs.azuredatabricks.net/user-guide/libraries.html#install-a-library-on-a-cluster)
-    ![Cluster Library](./img/cluster-library.png)
-2. [Set the MLflow tracking URI](https://mlflow.org/docs/latest/tracking.html#where-runs-are-recorded) to the following scheme:
-    ```
-    adbazureml://${azuremlRegion}.experiments.azureml.net/history/v1.0/subscriptions/${azuremlSubscriptionId}/resourceGroups/${azuremlResourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${azuremlWorkspaceName}
-    ```
-    1. You can automatically configure this on your clusters for all subsequent notebook sessions using this helper script instead of manually setting the tracking URI in the notebook:
-        * [AzureML Tracking Cluster Init Script](./linking/README.md)
-3. If configured correctly, you'll now be able to see your MLflow tracking data in both Azure ML (via the REST API and all clients) and Azure Databricks (in the MLflow UI and using the MLflow client)
-
-
-## Known Preview Limitations
-While we roll this experience out to customers for feedback, there are some known limitations we'd love comments on in addition to any other issues seen in your workflow.
-### 1-to-1 Workspace linking
-Currently, an Azure ML Workspace can only be linked to one Azure Databricks Workspace at a time.
-### Data synchronization
-At the moment, data is only generated in the Azure Machine Learning workspace for tracking. Editing tags via the Azure Databricks MLflow UI won't be reflected in the Azure ML UI.
-### Java and R support
-The experience currently is only available from the Python MLflow client.
-
 For more on SDK concepts, please refer to [notebooks](https://github.com/Azure/MachineLearningNotebooks).

 **Please let us know your feedback.**
--- a/how-to-use-azureml/azure-databricks/img/adb-link-button.png
+++ b/how-to-use-azureml/azure-databricks/img/adb-link-button.png
--- a/how-to-use-azureml/azure-databricks/img/adb-successful-link.png
+++ b/how-to-use-azureml/azure-databricks/img/adb-successful-link.png
--- a/how-to-use-azureml/azure-databricks/img/cluster-library.png
+++ b/how-to-use-azureml/azure-databricks/img/cluster-library.png
--- a/how-to-use-azureml/azure-databricks/img/link-prompt.png
+++ b/how-to-use-azureml/azure-databricks/img/link-prompt.png
--- a/how-to-use-azureml/azure-databricks/linking/README.md
+++ b/how-to-use-azureml/azure-databricks/linking/README.md
@@ -1,56 +0,0 @@
-# Adding an init script to an Azure Databricks cluster
-
-The [azureml-cluster-init.sh](./azureml-cluster-init.sh) script configures the environment to
-1. Use the configured AzureML Workspace with Workspace.from_config()
-2. Set the default MLflow Tracking Server to be the AzureML managed one
-
-Modify azureml-cluster-init.sh by providing the values for region, subscriptionId, resourceGroupName, and workspaceName of your target Azure ML workspace in the highlighted section at the top of the script.
-
-To create the Azure Databricks cluster-scoped init script
-
-1. Create the base directory you want to store the init script in if it does not exist.
-    ```
-    dbutils.fs.mkdirs("dbfs:/databricks/<directory>/")
-    ```
-
-2. Create the script by copying the contents of azureml-cluster-init.sh
-    ```
-    dbutils.fs.put("/databricks/<directory>/azureml-cluster-init.sh","""
-    <configured_contents_of_azureml-cluster-init.sh>
-    """, True)
-
-3. Check that the script exists.
-    ```
-    display(dbutils.fs.ls("dbfs:/databricks/<directory>/azureml-cluster-init.sh"))
-    ```
-
-1. Configure the cluster to run the script.
-    * Using the cluster configuration page
-        1. On the cluster configuration page, click the Advanced Options toggle.
-        1. At the bottom of the page, click the Init Scripts tab.
-        1. In the Destination drop-down, select a destination type. Example: 'DBFS'
-        1. Specify a path to the init script.
-            ```
-            dbfs:/databricks/<directory>/azureml-cluster-init.sh
-            ```
-        1. Click Add
-
-    * Using the API.
-        ```
-        curl -n -X POST -H 'Content-Type: application/json' -d '{
-        "cluster_id": "<cluster_id>",
-        "num_workers": <num_workers>,
-        "spark_version": "<spark_version>",
-        "node_type_id": "<node_type_id>",
-        "cluster_log_conf": {
-            "dbfs" : {
-            "destination": "dbfs:/cluster-logs"
-            }
-        },
-        "init_scripts": [ {
-            "dbfs": {
-            "destination": "dbfs:/databricks/<directory>/azureml-cluster-init.sh"
-            }
-        } ]
-        }' https://<databricks-instance>/api/2.0/clusters/edit
-        ```
--- a/how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh
+++ b/how-to-use-azureml/azure-databricks/linking/azureml-cluster-init.sh
@@ -1,24 +0,0 @@
-#!/bin/bash
-# This script configures the environment to
-# 1. Use the configured AzureML Workspace with azureml.core.Workspace.from_config()
-# 2. Set the default MLflow Tracking Server to be the AzureML managed one
-
-############## START CONFIGURATION #################
-# Provide the required *AzureML* workspace information
-region="" # example: westus2
-subscriptionId="" # example: bcb65f42-f234-4bff-91cf-9ef816cd9936
-resourceGroupName="" # example: dev-rg
-workspaceName="" # example: myazuremlws
-
-# Optional config directory
-configLocation="/databricks/config.json"
-############### END CONFIGURATION #################
-
-
-# Drop the workspace configuration on the cluster
-sudo touch $configLocation
-sudo echo {\\"subscription_id\\": \\"${subscriptionId}\\", \\"resource_group\\": \\"${resourceGroupName}\\", \\"workspace_name\\": \\"${workspaceName}\\"} > $configLocation
-
-# Set the MLflow Tracking URI
-trackingUri="adbazureml://${region}.experiments.azureml.net/history/v1.0/subscriptions/${subscriptionId}/resourceGroups/${resourceGroupName}/providers/Microsoft.MachineLearningServices/workspaces/${workspaceName}"
-sudo echo export MLFLOW_TRACKING_URI=${trackingUri} >> /databricks/spark/conf/spark-env.sh
--- a/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb
+++ b/how-to-use-azureml/monitor-models/data-drift/azure-ml-datadrift.ipynb
@@ -361,7 +361,7 @@
      "outputs": [],
      "source": [
        "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn', 'joblib', 'lightgbm', 'pandas'],\n",
-        "                                 pip_packages=['azureml-monitoring', 'azureml-sdk[automl]'])\n",
+        "                                 pip_packages=['azureml-monitoring', 'azureml-defaults'])\n",
        "\n",
        "with open(\"myenv.yml\",\"w\") as f:\n",
        "    f.write(myenv.serialize_to_string())"
@@ -626,7 +626,8 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "target_date = datetime.today()\n",
+        "now = datetime.utcnow()\n",
+        "target_date = datetime(now.year, now.month, now.day)\n",
        "run = datadrift.run(target_date, services, feature_list=feature_list, create_compute_target=True)"
      ]
    },
@@ -655,7 +656,7 @@
      "source": [
        "child_run.wait_for_completion(wait_post_processing=True)\n",
        "\n",
-        "drift_metrics = datadrift.get_output(start_time=start, end_time=end)\n",
+        "drift_metrics = datadrift.get_output(run_id=run.id)\n",
        "drift_metrics"
      ]
    },
@@ -668,7 +669,7 @@
        "# Show all drift figures, one per serivice.\n",
        "# If setting with_details is False (by default), only drift will be shown; if it's True, all details will be shown.\n",
        "\n",
-        "drift_figures = datadrift.show(with_details=True)"
+        "drift_figures = datadrift.show()"
      ]
    },
    {
@@ -691,7 +692,7 @@
  "metadata": {
    "authors": [
      {
-        "name": "rafarmah"
+        "name": "dmdatadrift"
      }
    ],
    "kernelspec": {
--- a/how-to-use-azureml/monitor-models/data-drift/score.py
+++ b/how-to-use-azureml/monitor-models/data-drift/score.py
@@ -1,14 +1,10 @@
-import pickle
 import json
-import numpy
-import azureml.train.automl
-from sklearn.externals import joblib
-from sklearn.linear_model import Ridge
-from azureml.core.model import Model
-from azureml.core.run import Run
-from azureml.monitoring import ModelDataCollector
 import time
+
 import pandas as pd
+from azureml.core.model import Model
+from azureml.monitoring import ModelDataCollector
+from sklearn.externals import joblib


 def init():
@@ -25,11 +21,11 @@ def init():
    categorical_features = ["usaf", "wban", "p_k", "station_name"]

    inputs_dc = ModelDataCollector(model_name="driftmodel",
-                                   identifier="inputs",
+                                   designation="inputs",
                                   feature_names=feature_names)

-    prediction_dc = ModelDataCollector("driftmodel",
-                                       identifier="predictions",
+    prediction_dc = ModelDataCollector(model_name="driftmodel",
+                                       designation="predictions",
                                       feature_names=["temperature"])
Author	SHA1	Message	Date
vizhur	828a976907	update samples - test	2019-10-15 22:01:55 +00:00
vizhur	1a373f11a0	Merge pull request #621 from Azure/ak/revert-db-overwrite Revert automatic overwrite of databricks content	2019-10-15 16:07:37 -04:00