update samples from Release-67 as a part of SDK release

This commit is contained in:
amlrelsa-ms
2020-09-23 22:48:56 +00:00
parent 8e2032fcde
commit 6059c1dc0c

View File

@@ -50,7 +50,7 @@
"* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n",
"* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n",
"\n",
"This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages."
"This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/README.md#setup-using-a-local-conda-environment)."
]
},
{
@@ -73,8 +73,8 @@
"metadata": {},
"outputs": [],
"source": [
"from azureml.opendatasets import NycTlcGreen\n",
"import pandas as pd\n",
"from azureml.core import Dataset\n",
"from datetime import datetime\n",
"from dateutil.relativedelta import relativedelta"
]
@@ -83,7 +83,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data."
"Begin by creating a dataframe to hold the taxi data. Then preview the data."
]
},
{
@@ -92,15 +92,8 @@
"metadata": {},
"outputs": [],
"source": [
"green_taxi_df = pd.DataFrame([])\n",
"start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n",
"end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n",
"\n",
"for sample_month in range(12):\n",
" temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n",
" .get_tabular_dataset().to_pandas_dataframe()\n",
" green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n",
" \n",
"green_taxi_dataset = Dataset.Tabular.from_parquet_files(path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/green_taxi_data.parquet\")\n",
"green_taxi_df = green_taxi_dataset.to_pandas_dataframe()\n",
"green_taxi_df.head(10)"
]
},