diff --git a/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb b/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb deleted file mode 100644 index 3fa0e65e..00000000 --- a/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb +++ /dev/null @@ -1,360 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Add Column using Expression\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. This is similar to using Python to create a [new script column](./custom-python-transforms.ipynb#New-Script-Column) except the Data Prep expressions are more limited and will execute faster. The expressions used are the same as for [filtering rows](./filtering.ipynb#Filtering-rows) and hence have the same functions and operators available.\n", - "
\n",
- "Here we add additional columns. First we get input data."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import azureml.dataprep as dprep"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# loading data\n",
- "dflow = dprep.auto_read_file('../data/crime-spring.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `substring(start, length)`\n",
- "Add a new column \"Case Category\" using the `substring(start, length)` expression to extract the prefix from the \"Case Number\" column."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "case_category = dflow.add_column(new_column_name='Case Category',\n",
- " prior_column='Case Number',\n",
- " expression=dflow['Case Number'].substring(0, 2))\n",
- "case_category.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `substring(start)`\n",
- "Add a new column \"Case Id\" using the `substring(start)` expression to extract just the number from \"Case Number\" column and then convert it to numeric."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "case_id = dflow.add_column(new_column_name='Case Id',\n",
- " prior_column='Case Number',\n",
- " expression=dflow['Case Number'].substring(2))\n",
- "case_id = case_id.to_number('Case Id')\n",
- "case_id.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `length()`\n",
- "Using the length() expression, add a new numeric column \"Length\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_length = dflow.add_column(new_column_name='Length',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].length())\n",
- "dflow_length.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `to_upper()`\n",
- "Using the to_upper() expression, add a new numeric column \"Upper Case\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_to_upper = dflow.add_column(new_column_name='Upper Case',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].to_upper())\n",
- "dflow_to_upper.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `to_lower()`\n",
- "Using the to_lower() expression, add a new numeric column \"Lower Case\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_to_lower = dflow.add_column(new_column_name='Lower Case',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].to_lower())\n",
- "dflow_to_lower.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `RegEx.extract_record()`\n",
- "Using the `RegEx.extract_record()` expression, add a new record column \"Stream Date Record\", which contains the name capturing groups in the regex with value."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')\n",
- "regex = dprep.RegEx('\\/(?