mirror of
https://github.com/Azure/MachineLearningNotebooks.git
synced 2025-12-21 10:05:09 -05:00
update samples from Release-132 as a part of 1.0.48 SDK release
This commit is contained in:
@@ -0,0 +1,360 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Add Column using Expression\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. This is similar to using Python to create a [new script column](./custom-python-transforms.ipynb#New-Script-Column) except the Data Prep expressions are more limited and will execute faster. The expressions used are the same as for [filtering rows](./filtering.ipynb#Filtering-rows) and hence have the same functions and operators available.\n",
|
||||
"<p>\n",
|
||||
"Here we add additional columns. First we get input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import azureml.dataprep as dprep"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# loading data\n",
|
||||
"dflow = dprep.auto_read_file('../data/crime-spring.csv')\n",
|
||||
"dflow.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `substring(start, length)`\n",
|
||||
"Add a new column \"Case Category\" using the `substring(start, length)` expression to extract the prefix from the \"Case Number\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"case_category = dflow.add_column(new_column_name='Case Category',\n",
|
||||
" prior_column='Case Number',\n",
|
||||
" expression=dflow['Case Number'].substring(0, 2))\n",
|
||||
"case_category.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `substring(start)`\n",
|
||||
"Add a new column \"Case Id\" using the `substring(start)` expression to extract just the number from \"Case Number\" column and then convert it to numeric."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"case_id = dflow.add_column(new_column_name='Case Id',\n",
|
||||
" prior_column='Case Number',\n",
|
||||
" expression=dflow['Case Number'].substring(2))\n",
|
||||
"case_id = case_id.to_number('Case Id')\n",
|
||||
"case_id.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `length()`\n",
|
||||
"Using the length() expression, add a new numeric column \"Length\", which contains the length of the string in \"Primary Type\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_length = dflow.add_column(new_column_name='Length',\n",
|
||||
" prior_column='Primary Type',\n",
|
||||
" expression=dflow['Primary Type'].length())\n",
|
||||
"dflow_length.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `to_upper()`\n",
|
||||
"Using the to_upper() expression, add a new numeric column \"Upper Case\", which contains the length of the string in \"Primary Type\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_to_upper = dflow.add_column(new_column_name='Upper Case',\n",
|
||||
" prior_column='Primary Type',\n",
|
||||
" expression=dflow['Primary Type'].to_upper())\n",
|
||||
"dflow_to_upper.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `to_lower()`\n",
|
||||
"Using the to_lower() expression, add a new numeric column \"Lower Case\", which contains the length of the string in \"Primary Type\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_to_lower = dflow.add_column(new_column_name='Lower Case',\n",
|
||||
" prior_column='Primary Type',\n",
|
||||
" expression=dflow['Primary Type'].to_lower())\n",
|
||||
"dflow_to_lower.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `RegEx.extract_record()`\n",
|
||||
"Using the `RegEx.extract_record()` expression, add a new record column \"Stream Date Record\", which contains the name capturing groups in the regex with value."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')\n",
|
||||
"regex = dprep.RegEx('\\/(?<year>\\d{4})\\/(?<month>\\d{2})\\/(?<day>\\d{2})\\/')\n",
|
||||
"dflow_regex_extract_record = dflow_regex_extract_record.add_column(new_column_name='Stream Date Record',\n",
|
||||
" prior_column='Stream Path',\n",
|
||||
" expression=regex.extract_record(dflow_regex_extract_record['Stream Path']))\n",
|
||||
"dflow_regex_extract_record.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `create_datetime()`\n",
|
||||
"Using the `create_datetime()` expression, add a new column \"Stream Date\", which contains datetime values constructed from year, month, day values extracted from a record column \"Stream Date Record\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"year = dprep.col('year', dflow_regex_extract_record['Stream Date Record'])\n",
|
||||
"month = dprep.col('month', dflow_regex_extract_record['Stream Date Record'])\n",
|
||||
"day = dprep.col('day', dflow_regex_extract_record['Stream Date Record'])\n",
|
||||
"dflow_create_datetime = dflow_regex_extract_record.add_column(new_column_name='Stream Date',\n",
|
||||
" prior_column='Stream Date Record',\n",
|
||||
" expression=dprep.create_datetime(year, month, day))\n",
|
||||
"dflow_create_datetime.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) + col(column2)`\n",
|
||||
"Add a new column \"Total\" to show the result of adding the values in the \"FBI Code\" column to the \"Community Area\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_total = dflow.add_column(new_column_name='Total',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']+dflow['FBI Code'])\n",
|
||||
"dflow_total.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) - col(column2)`\n",
|
||||
"Add a new column \"Subtract\" to show the result of subtracting the values in the \"FBI Code\" column from the \"Community Area\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_diff = dflow.add_column(new_column_name='Difference',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']-dflow['FBI Code'])\n",
|
||||
"dflow_diff.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) * col(column2)`\n",
|
||||
"Add a new column \"Product\" to show the result of multiplying the values in the \"FBI Code\" column to the \"Community Area\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_prod = dflow.add_column(new_column_name='Product',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']*dflow['FBI Code'])\n",
|
||||
"dflow_prod.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) / col(column2)`\n",
|
||||
"Add a new column \"True Quotient\" to show the result of true (decimal) division of the values in \"Community Area\" column by the \"FBI Code\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_true_div = dflow.add_column(new_column_name='True Quotient',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']/dflow['FBI Code'])\n",
|
||||
"dflow_true_div.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) // col(column2)`\n",
|
||||
"Add a new column \"Floor Quotient\" to show the result of floor (integer) division of the values in \"Community Area\" column by the \"FBI Code\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_floor_div = dflow.add_column(new_column_name='Floor Quotient',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']//dflow['FBI Code'])\n",
|
||||
"dflow_floor_div.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) % col(column2)`\n",
|
||||
"Add a new column \"Mod\" to show the result of applying the modulo operation on the \"FBI Code\" column and the \"Community Area\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_mod = dflow.add_column(new_column_name='Mod',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']%dflow['FBI Code'])\n",
|
||||
"dflow_mod.head(5)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### `col(column1) ** col(column2)`\n",
|
||||
"Add a new column \"Power\" to show the result of applying the exponentiation operation when the base is the \"Community Area\" column and the exponent is \"FBI Code\" column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"dflow_pow = dflow.add_column(new_column_name='Power',\n",
|
||||
" prior_column='FBI Code',\n",
|
||||
" expression=dflow['Community Area']**dflow['FBI Code'])\n",
|
||||
"dflow_pow.head(5)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"authors": [
|
||||
{
|
||||
"name": "sihhu"
|
||||
}
|
||||
],
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.6",
|
||||
"language": "python",
|
||||
"name": "python36"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.6.4"
|
||||
},
|
||||
"notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
||||
Reference in New Issue
Block a user