update samples - test

2019-11-01 14:48:01 +00:00
parent 46ec74f8df
commit 4ed3f0767a
308 changed files with 13971 additions and 59495 deletions
--- a/tutorials/tutorial-1st-experiment-sdk-train.ipynb
+++ b/tutorials/tutorial-1st-experiment-sdk-train.ipynb
@@ -98,7 +98,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "For this tutorial, you use the diabetes data set, which is a pre-normalized data set included in scikit-learn. This data set uses features like age, gender, and BMI to predict diabetes disease progression. Load the data from the `load_diabetes()` static function, and split it into training and test sets using `train_test_split()`. This function segregates the data so the model has unseen data to use for testing following training."
+        "For this tutorial, you use the diabetes data set, which uses features like age, gender, and BMI to predict diabetes disease progression. Load the data from the Azure Open Datasets class, and split it into training and test sets using `train_test_split()`. This function segregates the data so the model has unseen data to use for testing following training."
      ]
    },
    {
@@ -107,11 +107,13 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "from sklearn.datasets import load_diabetes\n",
+        "from azureml.opendatasets import Diabetes\n",
        "from sklearn.model_selection import train_test_split\n",
        "\n",
-        "X, y = load_diabetes(return_X_y = True)\n",
-        "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=66)"
+        "x_df = Diabetes.get_tabular_dataset().to_pandas_dataframe().dropna()\n",
+        "y_df = x_df.pop(\"Y\")\n",
+        "\n",
+        "X_train, X_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=66)"
      ]
    },
    {