diff --git a/work-with-data/dataprep/how-to-guides/column-manipulations.ipynb b/work-with-data/dataprep/how-to-guides/column-manipulations.ipynb
deleted file mode 100644
index bf1836f9..00000000
--- a/work-with-data/dataprep/how-to-guides/column-manipulations.ipynb
+++ /dev/null
@@ -1,563 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Column Manipulations\n",
- "\n",
- "Azure ML Data Prep has many methods for manipulating columns, including basic CUD operations and several other more complex manipulations.\n",
- "\n",
- "This notebook will focus primarily on data-agnostic operations. For all other column manipulation operations, we will link to their specific how-to guide.\n",
- "\n",
- "## Table of Contents\n",
- "[ColumnSelector](#ColumnSelector)
\n",
- "[add_column](#add_column)
\n",
- "[append_columns](#append_columns)
\n",
- "[drop_columns](#drop_columns)
\n",
- "[duplicate_column](#duplicate_column)
\n",
- "[fuzzy_group_column](#fuzzy_group_column)
\n",
- "[keep_columns](#keep_columns)
\n",
- "[map_column](#map_column)
\n",
- "[new_script_column](#new_script_column)
\n",
- "[rename_columns](#rename_columns)
\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## ColumnSelector\n",
- "`ColumnSelector` is a Data Prep class that allows us to select columns by name. The idea is to be able to describe columns generally instead of explicitly, using a search term or regex expression, with various options.\n",
- "\n",
- "Note that a `ColumnSelector` does not represent the columns they match themselves, but the selector of the described columns. Therefore if we use the same `ColumnSelector` on two different dataflows, we may get different results depending on the columns of each dataflow.\n",
- "\n",
- "Column manipulations that can utilize `ColumnSelector` will be noted in their respective sections in this book."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "All parameters to a `ColumnSelector` are shown here for completeness. We will use `keep_columns` in our example, which will keep only the columns in the dataflow that we tell it to keep.\n",
- "\n",
- "In the below example, we match all columns with the letter 'i'. Because we set `ignore_case` to false and `match_whole_word` to false, then any column that contains 'i' or 'I' will be selected."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import ColumnSelector\n",
- "column_selector = ColumnSelector(term=\"i\",\n",
- " use_regex=False,\n",
- " ignore_case=True,\n",
- " match_whole_word=False,\n",
- " invert=False)\n",
- "dflow_selected = dflow.keep_columns(column_selector)\n",
- "dflow_selected.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "If we set `invert` to true, we get the opposite of what we matched earlier."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "column_selector = ColumnSelector(term=\"i\",\n",
- " use_regex=False,\n",
- " ignore_case=True,\n",
- " match_whole_word=False,\n",
- " invert=True)\n",
- "dflow_selected = dflow.keep_columns(column_selector)\n",
- "dflow_selected.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "If we change the search term to 'I' and set case sensitivity to true, we get only the handful of columns that contain an upper case 'I'."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "column_selector = ColumnSelector(term=\"I\",\n",
- " use_regex=False,\n",
- " ignore_case=False,\n",
- " match_whole_word=False,\n",
- " invert=False)\n",
- "dflow_selected = dflow.keep_columns(column_selector)\n",
- "dflow_selected.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "And if we set `match_whole_word` to true, we get no results at all as there is no column called 'I'."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "column_selector = ColumnSelector(term=\"I\",\n",
- " use_regex=False,\n",
- " ignore_case=False,\n",
- " match_whole_word=True,\n",
- " invert=False)\n",
- "dflow_selected = dflow.keep_columns(column_selector)\n",
- "dflow_selected.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Finally, the `use_regex` flag dictates whether or not to treat the search term as a regex. It can be combined still with the other options.\n",
- "\n",
- "Here we define all columns that begin with the capital letter 'I'."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "column_selector = ColumnSelector(term=\"I.*\",\n",
- " use_regex=True,\n",
- " ignore_case=True,\n",
- " match_whole_word=True,\n",
- " invert=False)\n",
- "dflow_selected = dflow.keep_columns(column_selector)\n",
- "dflow_selected.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## add_column"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Please see [add-column-using-expression](add-column-using-expression.ipynb)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## append_columns"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Please see [append-columns-and-rows](append-columns-and-rows.ipynb)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## drop_columns"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Data Prep supports dropping columns one or more columns in a single statement. Supports `ColumnSelector`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Note that there are 22 columns to begin with. We will now drop the 'ID' column and observe that the resulting dataflow contains 21 columns."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_dropped = dflow.drop_columns('ID')\n",
- "dflow_dropped.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We can also drop more than one column at once by passing a list of column names."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_dropped = dflow_dropped.drop_columns(['IUCR', 'Description'])\n",
- "dflow_dropped.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## duplicate_column"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Data Prep supports duplicating columns one or more columns in a single statement.\n",
- "\n",
- "Duplicated columns are placed to the immediate right of their source column."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We decide which column(s) to duplicate and what the new column name(s) should be with a key value pairing (dictionary)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_dupe = dflow.duplicate_column({'ID': 'ID2', 'IUCR': 'IUCR_Clone'})\n",
- "dflow_dupe.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## fuzzy_group_column"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Please see [fuzzy-group](fuzzy-group.ipynb)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## keep_columns"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Data Prep supports keeping one or more columns in a single statement. The resulting dataflow will contain only the column(s) specified; dropping all the other columns. Supports `ColumnSelector`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_keep = dflow.keep_columns(['ID', 'Date', 'Description'])\n",
- "dflow_keep.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Similar to `drop_columns`, we can pass a single column name or a list of them."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_keep = dflow_keep.keep_columns('ID')\n",
- "dflow_keep.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## map_column"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Data Prep supports string mapping. For a column containing strings, we can provide specific mappings from an original value to a new value, and then produce a new column that contains the mapped values.\n",
- "\n",
- "The mapped columns are placed to the immediate right of their source column."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import ReplacementsValue\n",
- "replacements = [ReplacementsValue('THEFT', 'THEFT2'), ReplacementsValue('BATTERY', 'BATTERY!!!')]\n",
- "dflow_mapped = dflow.map_column(column='Primary Type', \n",
- " new_column_id='Primary Type V2',\n",
- " replacements=replacements)\n",
- "dflow_mapped.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## new_script_column"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Please see [custom-python-transforms](custom-python-transforms.ipynb)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## rename_columns"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Data Prep supports renaming one or more columns in a single statement."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.dataprep import auto_read_file\n",
- "dflow = auto_read_file(path='../data/crime-dirty.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We decide which column(s) to rename and what the new column name(s) should be with a key value pairing (dictionary)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_renamed = dflow.rename_columns({'ID': 'ID2', 'IUCR': 'IUCR_Clone'})\n",
- "dflow_renamed.head(5)"
- ]
- }
- ],
- "metadata": {
- "authors": [
- {
- "name": "sihhu"
- }
- ],
- "kernelspec": {
- "display_name": "Python 3.6",
- "language": "python",
- "name": "python36"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.4"
- },
- "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License."
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}
\ No newline at end of file