mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-19 17:17:04 -05:00
Merge pull request #1157 from Azure/release_update/Release-67
update samples from Release-67 as a part of SDK release
This commit is contained in:
@@ -50,7 +50,7 @@
|
||||
"* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n",
|
||||
"* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n",
|
||||
"\n",
|
||||
"This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages."
|
||||
"This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/README.md#setup-using-a-local-conda-environment)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -73,8 +73,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from azureml.opendatasets import NycTlcGreen\n",
|
||||
"import pandas as pd\n",
|
||||
"from azureml.core import Dataset\n",
|
||||
"from datetime import datetime\n",
|
||||
"from dateutil.relativedelta import relativedelta"
|
||||
]
|
||||
@@ -83,7 +83,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data."
|
||||
"Begin by creating a dataframe to hold the taxi data. Then preview the data."
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -92,15 +92,8 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"green_taxi_df = pd.DataFrame([])\n",
|
||||
"start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n",
|
||||
"end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n",
|
||||
"\n",
|
||||
"for sample_month in range(12):\n",
|
||||
" temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n",
|
||||
" .get_tabular_dataset().to_pandas_dataframe()\n",
|
||||
" green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n",
|
||||
" \n",
|
||||
"green_taxi_dataset = Dataset.Tabular.from_parquet_files(path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/green_taxi_data.parquet\")\n",
|
||||
"green_taxi_df = green_taxi_dataset.to_pandas_dataframe()\n",
|
||||
"green_taxi_df.head(10)"
|
||||
]
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user