update samples from Release-67 as a part of SDK release

2025-12-19 17:17:04 -05:00 · 2020-09-23 22:48:56 +00:00
parent 8e2032fcde
commit 6059c1dc0c
1 changed files with 5 additions and 12 deletions
--- a/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb
+++ b/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb
@@ -50,7 +50,7 @@
        "* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n",
        "* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n",
        "\n",
-        "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages."
+        "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/README.md#setup-using-a-local-conda-environment)."
      ]
    },
    {
@@ -73,8 +73,8 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "from azureml.opendatasets import NycTlcGreen\n",
        "import pandas as pd\n",
+        "from azureml.core import Dataset\n",
        "from datetime import datetime\n",
        "from dateutil.relativedelta import relativedelta"
      ]
@@ -83,7 +83,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data."
+        "Begin by creating a dataframe to hold the taxi data. Then preview the data."
      ]
    },
    {
@@ -92,15 +92,8 @@
      "metadata": {},
      "outputs": [],
      "source": [
-        "green_taxi_df = pd.DataFrame([])\n",
-        "start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n",
-        "end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n",
-        "\n",
-        "for sample_month in range(12):\n",
-        "    temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n",
-        "        .get_tabular_dataset().to_pandas_dataframe()\n",
-        "    green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n",
-        "    \n",
+        "green_taxi_dataset = Dataset.Tabular.from_parquet_files(path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/green_taxi_data.parquet\")\n",
+        "green_taxi_df = green_taxi_dataset.to_pandas_dataframe()\n",
        "green_taxi_df.head(10)"
      ]
    },