From 883ad806ba6a9459f3dc4775e7df761a49125066 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Shan=C3=A9=20Winner?=
<43390034+swinner95@users.noreply.github.com>
Date: Sun, 28 Jul 2019 00:20:22 -0700
Subject: [PATCH] Delete add-column-using-expression.ipynb
---
.../add-column-using-expression.ipynb | 361 ------------------
1 file changed, 361 deletions(-)
delete mode 100644 how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb
diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb
deleted file mode 100644
index 46b09c65..00000000
--- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb
+++ /dev/null
@@ -1,361 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Add Column using Expression\n",
- "Copyright (c) Microsoft Corporation. All rights reserved.
\n",
- "Licensed under the MIT License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. This is similar to using Python to create a [new script column](./custom-python-transforms.ipynb#New-Script-Column) except the Data Prep expressions are more limited and will execute faster. The expressions used are the same as for [filtering rows](./filtering.ipynb#Filtering-rows) and hence have the same functions and operators available.\n",
- "
\n",
- "Here we add additional columns. First we get input data."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import azureml.dataprep as dprep"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# loading data\n",
- "dflow = dprep.auto_read_file('../data/crime-spring.csv')\n",
- "dflow.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `substring(start, length)`\n",
- "Add a new column \"Case Category\" using the `substring(start, length)` expression to extract the prefix from the \"Case Number\" column."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "case_category = dflow.add_column(new_column_name='Case Category',\n",
- " prior_column='Case Number',\n",
- " expression=dflow['Case Number'].substring(0, 2))\n",
- "case_category.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `substring(start)`\n",
- "Add a new column \"Case Id\" using the `substring(start)` expression to extract just the number from \"Case Number\" column and then convert it to numeric."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "case_id = dflow.add_column(new_column_name='Case Id',\n",
- " prior_column='Case Number',\n",
- " expression=dflow['Case Number'].substring(2))\n",
- "case_id = case_id.to_number('Case Id')\n",
- "case_id.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `length()`\n",
- "Using the length() expression, add a new numeric column \"Length\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_length = dflow.add_column(new_column_name='Length',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].length())\n",
- "dflow_length.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `to_upper()`\n",
- "Using the to_upper() expression, add a new numeric column \"Upper Case\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_to_upper = dflow.add_column(new_column_name='Upper Case',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].to_upper())\n",
- "dflow_to_upper.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `to_lower()`\n",
- "Using the to_lower() expression, add a new numeric column \"Lower Case\", which contains the length of the string in \"Primary Type\"."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_to_lower = dflow.add_column(new_column_name='Lower Case',\n",
- " prior_column='Primary Type',\n",
- " expression=dflow['Primary Type'].to_lower())\n",
- "dflow_to_lower.head(5)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### `RegEx.extract_record()`\n",
- "Using the `RegEx.extract_record()` expression, add a new record column \"Stream Date Record\", which contains the name capturing groups in the regex with value."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')\n",
- "regex = dprep.RegEx('\\/(?