instructions for installing automl on adb

2018-11-29 19:57:34 -08:00
parent 613db3158d
commit df025e6a17
1 changed files with 45 additions and 0 deletions
--- a/databricks/automl_adb_readme.md
+++ b/databricks/automl_adb_readme.md
@@ -0,0 +1,45 @@
+**Create Azure Databricks Cluster:**
+
+Select New Cluster and fill in following detail:
+ - Cluster name: clustername
+ - Cluster Mode: **High Concurrency** preferred
+ - Databricks Runtime: Any 4.* runtime (NO GPU) or Recommended: 4.3(includes Apache spark 2.3.1, Scala 2.11))
+ - Python version: **3**
+ - Driver type – you may select a small driver node size (eg. Standard_DS3_v2 0.75 DBU)
+ - Worker node VM types: Memory optimized preferred.
+	 - Number of concurrent runs should be less than or equal to the number
+   of cores in your Databricks cluster.
+   -   For a 1 GB numeric only dataset, to do 10 cross validations with run 16 concurrent runs, the minimum usable cluster memory should be 1
+   GB X 16 concurrent runs X 3 = 48 GB. This is in addition to what
+   Spark itself will use on your cluster.
+   -   For text dataset, with featurization (eg. one hot encoding) & cross validation this requirement is much higher. For a 500 MB
+   string+numeric dataset, to do 5 cross validation with 4 concurrent
+   runs, the minimum usable cluster memory should be 0.5 GB X 4
+   concurrent runs X 3 X 5 cross validations X 3 = 90 GB
+  - Uncheck _Enable Autoscaling_
+- Workers: 2 or higher
+- It will take few minutes to create the cluster.
+
+Ensure that the cluster state is running before proceeding further.
+
+**Install Azure ML with Automated ML SDK**
+
+- Select Import library
+
+- Source: Upload Python Egg or PyPI
+
+- PyPi Name: **azureml-sdk[automl_databricks]**
+
+- Click Install Library
+
+- Do not select _Attach automatically to all clusters_. In case you have selected earlier then you can go to your Home folder and deselect it.
+
+- Select the check box _Attach_ next to your cluster name
+
+(More details on attach and detach are here - [https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster](https://docs.databricks.com/user-guide/libraries.html#attach-a-library-to-a-cluster) )
+
+- Ensure that there are no errors until Status changes to _Attached_. It may take a couple of minutes.
+
+**Note** - If you have the old build the please deselect it from cluster’s installed libs > move to trash. Install the new build and restart the cluster. And if still there is an issue then detach and reattach your cluster.
+
+**Now you are run the Automated ML sample notebook on your Azure Databricks cluster.**