MachineLearningNotebooks/databricks/automl_adb_readme.md at df025e6a1779bf391eb503f931f641a71834c0ee

jprdonnelly/MachineLearningNotebooks

Fork 0

Files

Parashar Shah df025e6a17 instructions for installing automl on adb

2018-11-29 19:57:34 -08:00

2.3 KiB

Raw Blame History

Create Azure Databricks Cluster:

Select New Cluster and fill in following detail:

Cluster name: clustername
Cluster Mode: High Concurrency preferred
Databricks Runtime: Any 4.* runtime (NO GPU) or Recommended: 4.3(includes Apache spark 2.3.1, Scala 2.11))
Python version: 3
Driver type – you may select a small driver node size (eg. Standard_DS3_v2 0.75 DBU)
Worker node VM types: Memory optimized preferred.
- Number of concurrent runs should be less than or equal to the number of cores in your Databricks cluster.
- For a 1 GB numeric only dataset, to do 10 cross validations with run 16 concurrent runs, the minimum usable cluster memory should be 1 GB X 16 concurrent runs X 3 = 48 GB. This is in addition to what Spark itself will use on your cluster.
- For text dataset, with featurization (eg. one hot encoding) & cross validation this requirement is much higher. For a 500 MB string+numeric dataset, to do 5 cross validation with 4 concurrent runs, the minimum usable cluster memory should be 0.5 GB X 4 concurrent runs X 3 X 5 cross validations X 3 = 90 GB
Uncheck Enable Autoscaling
Workers: 2 or higher
It will take few minutes to create the cluster.

Ensure that the cluster state is running before proceeding further.

Install Azure ML with Automated ML SDK

Select Import library
Source: Upload Python Egg or PyPI
PyPi Name: azureml-sdk[automl_databricks]
Click Install Library
Do not select Attach automatically to all clusters. In case you have selected earlier then you can go to your Home folder and deselect it.
Select the check box Attach next to your cluster name

(More details on attach and detach are here - https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster )

Ensure that there are no errors until Status changes to Attached. It may take a couple of minutes.

Note - If you have the old build the please deselect it from cluster’s installed libs > move to trash. Install the new build and restart the cluster. And if still there is an issue then detach and reattach your cluster.

Now you are run the Automated ML sample notebook on your Azure Databricks cluster.

2.3 KiB Raw Blame History Unescape Escape

2.3 KiB

Raw Blame History