![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/add-column-using-expression.png)

# Add Column using Expression


With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. This is similar to using Python to create a [new script column](./custom-python-transforms.ipynb#New-Script-Column) except the Data Prep expressions are more limited and will execute faster. The expressions used are the same as for [filtering rows](./filtering.ipynb#Filtering-rows) and hence have the same functions and operators available.
<p>
Here we add additional columns. First we get input data.

In [None]:
import azureml.dataprep as dprep

In [None]:
# loading data
dflow = dprep.auto_read_file('../data/crime-spring.csv')
dflow.head(5)

#### `substring(start, length)`
Add a new column "Case Category" using the `substring(start, length)` expression to extract the prefix from the "Case Number" column.

In [None]:
case_category = dflow.add_column(new_column_name='Case Category',
                                 prior_column='Case Number',
                                 expression=dflow['Case Number'].substring(0, 2))
case_category.head(5)

#### `substring(start)`
Add a new column "Case Id" using the `substring(start)` expression to extract just the number from "Case Number" column and then convert it to numeric.

In [None]:
case_id = dflow.add_column(new_column_name='Case Id',
                           prior_column='Case Number',
                           expression=dflow['Case Number'].substring(2))
case_id = case_id.to_number('Case Id')
case_id.head(5)

#### `length()`
Using the length() expression, add a new numeric column "Length", which contains the length of the string in "Primary Type".

In [None]:
dflow_length = dflow.add_column(new_column_name='Length',
                                prior_column='Primary Type',
                                expression=dflow['Primary Type'].length())
dflow_length.head(5)

#### `to_upper()`
Using the to_upper() expression, add a new numeric column "Upper Case", which contains the length of the string in "Primary Type".

In [None]:
dflow_to_upper = dflow.add_column(new_column_name='Upper Case',
                                prior_column='Primary Type',
                                expression=dflow['Primary Type'].to_upper())
dflow_to_upper.head(5)

#### `to_lower()`
Using the to_lower() expression, add a new numeric column "Lower Case", which contains the length of the string in "Primary Type".

In [None]:
dflow_to_lower = dflow.add_column(new_column_name='Lower Case',
                                prior_column='Primary Type',
                                expression=dflow['Primary Type'].to_lower())
dflow_to_lower.head(5)

#### `RegEx.extract_record()`
Using the `RegEx.extract_record()` expression, add a new record column "Stream Date Record", which contains the name capturing groups in the regex with value.

In [None]:
dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')
regex = dprep.RegEx('\/(?<year>\d{4})\/(?<month>\d{2})\/(?<day>\d{2})\/')
dflow_regex_extract_record = dflow_regex_extract_record.add_column(new_column_name='Stream Date Record',
                                                                   prior_column='Stream Path',
                                                                   expression=regex.extract_record(dflow_regex_extract_record['Stream Path']))
dflow_regex_extract_record.head(5)

#### `create_datetime()`
Using the `create_datetime()` expression, add a new column "Stream Date", which contains datetime values constructed from year, month, day values extracted from a record column "Stream Date Record".

In [None]:
year = dprep.col('year', dflow_regex_extract_record['Stream Date Record'])
month = dprep.col('month', dflow_regex_extract_record['Stream Date Record'])
day = dprep.col('day', dflow_regex_extract_record['Stream Date Record'])
dflow_create_datetime = dflow_regex_extract_record.add_column(new_column_name='Stream Date',
                                                              prior_column='Stream Date Record',
                                                              expression=dprep.create_datetime(year, month, day))
dflow_create_datetime.head(5)

#### `col(column1) + col(column2)`
Add a new column "Total" to show the result of adding the values in the "FBI Code" column to the "Community Area" column.

In [None]:
dflow_total = dflow.add_column(new_column_name='Total',
                               prior_column='FBI Code',
                               expression=dflow['Community Area']+dflow['FBI Code'])
dflow_total.head(5)

#### `col(column1) - col(column2)`
Add a new column "Subtract" to show the result of subtracting the values in the "FBI Code" column from the "Community Area" column.

In [None]:
dflow_diff = dflow.add_column(new_column_name='Difference',
                              prior_column='FBI Code',
                              expression=dflow['Community Area']-dflow['FBI Code'])
dflow_diff.head(5)

#### `col(column1) * col(column2)`
Add a new column "Product" to show the result of multiplying the values in the "FBI Code" column to the "Community Area" column.

In [None]:
dflow_prod = dflow.add_column(new_column_name='Product',
                              prior_column='FBI Code',
                              expression=dflow['Community Area']*dflow['FBI Code'])
dflow_prod.head(5)

#### `col(column1) / col(column2)`
Add a new column "True Quotient" to show the result of true (decimal) division of the values in "Community Area" column by the "FBI Code" column.

In [None]:
dflow_true_div = dflow.add_column(new_column_name='True Quotient',
                             prior_column='FBI Code',
                             expression=dflow['Community Area']/dflow['FBI Code'])
dflow_true_div.head(5)

#### `col(column1) // col(column2)`
Add a new column "Floor Quotient" to show the result of floor (integer) division of the values in "Community Area" column by the "FBI Code" column.

In [None]:
dflow_floor_div = dflow.add_column(new_column_name='Floor Quotient',
                             prior_column='FBI Code',
                             expression=dflow['Community Area']//dflow['FBI Code'])
dflow_floor_div.head(5)

#### `col(column1) % col(column2)`
Add a new column "Mod" to show the result of applying the modulo operation on the "FBI Code" column and the "Community Area" column.

In [None]:
dflow_mod = dflow.add_column(new_column_name='Mod',
                             prior_column='FBI Code',
                             expression=dflow['Community Area']%dflow['FBI Code'])
dflow_mod.head(5)

#### `col(column1) ** col(column2)`
Add a new column "Power" to show the result of applying the exponentiation operation when the base is the "Community Area" column and the exponent is "FBI Code" column.

In [None]:
dflow_pow = dflow.add_column(new_column_name='Power',
                             prior_column='FBI Code',
                             expression=dflow['Community Area']**dflow['FBI Code'])
dflow_pow.head(5)