Compare commits

...

2 Commits

Author SHA1 Message Date
amlrelsa-ms
6059c1dc0c update samples from Release-67 as a part of SDK release 2020-09-23 22:48:56 +00:00
Harneet Virk
8e2032fcde Merge pull request #1153 from Azure/release_update/Release-66
update samples from Release-66 as a part of  SDK release
2020-09-21 16:04:23 -07:00

View File

@@ -50,7 +50,7 @@
"* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n", "* Complete the [setup tutorial](https://docs.microsoft.com/azure/machine-learning/service/tutorial-1st-experiment-sdk-setup) if you don't already have an Azure Machine Learning service workspace or notebook virtual machine.\n",
"* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n", "* After you complete the setup tutorial, open the **tutorials/regression-automated-ml.ipynb** notebook using the same notebook server.\n",
"\n", "\n",
"This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment#local). Run `pip install azureml-sdk[automl] azureml-opendatasets azureml-widgets` to get the required packages." "This tutorial is also available on [GitHub](https://github.com/Azure/MachineLearningNotebooks/tree/master/tutorials) if you wish to run it in your own [local environment](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/README.md#setup-using-a-local-conda-environment)."
] ]
}, },
{ {
@@ -73,8 +73,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"from azureml.opendatasets import NycTlcGreen\n",
"import pandas as pd\n", "import pandas as pd\n",
"from azureml.core import Dataset\n",
"from datetime import datetime\n", "from datetime import datetime\n",
"from dateutil.relativedelta import relativedelta" "from dateutil.relativedelta import relativedelta"
] ]
@@ -83,7 +83,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"Begin by creating a dataframe to hold the taxi data. When working in a non-Spark environment, Open Datasets only allows downloading one month of data at a time with certain classes to avoid `MemoryError` with large datasets. To download taxi data, iteratively fetch one month at a time, and before appending it to `green_taxi_df` randomly sample 2,000 records from each month to avoid bloating the dataframe. Then preview the data." "Begin by creating a dataframe to hold the taxi data. Then preview the data."
] ]
}, },
{ {
@@ -92,15 +92,8 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"green_taxi_df = pd.DataFrame([])\n", "green_taxi_dataset = Dataset.Tabular.from_parquet_files(path=\"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/green_taxi_data.parquet\")\n",
"start = datetime.strptime(\"1/1/2015\",\"%m/%d/%Y\")\n", "green_taxi_df = green_taxi_dataset.to_pandas_dataframe()\n",
"end = datetime.strptime(\"1/31/2015\",\"%m/%d/%Y\")\n",
"\n",
"for sample_month in range(12):\n",
" temp_df_green = NycTlcGreen(start + relativedelta(months=sample_month), end + relativedelta(months=sample_month)) \\\n",
" .get_tabular_dataset().to_pandas_dataframe()\n",
" green_taxi_df = green_taxi_df.append(temp_df_green.sample(2000))\n",
" \n",
"green_taxi_df.head(10)" "green_taxi_df.head(10)"
] ]
}, },