From 883ad806ba6a9459f3dc4775e7df761a49125066 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Shan=C3=A9=20Winner?= <43390034+swinner95@users.noreply.github.com> Date: Sun, 28 Jul 2019 00:20:22 -0700 Subject: [PATCH] Delete add-column-using-expression.ipynb --- .../add-column-using-expression.ipynb | 361 ------------------ 1 file changed, 361 deletions(-) delete mode 100644 how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb deleted file mode 100644 index 46b09c65..00000000 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.ipynb +++ /dev/null @@ -1,361 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/add-column-using-expression.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Add Column using Expression\n", - "Copyright (c) Microsoft Corporation. All rights reserved.
\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "With Azure ML Data Prep you can add a new column to data with `Dataflow.add_column` by using a Data Prep expression to calculate the value from existing columns. This is similar to using Python to create a [new script column](./custom-python-transforms.ipynb#New-Script-Column) except the Data Prep expressions are more limited and will execute faster. The expressions used are the same as for [filtering rows](./filtering.ipynb#Filtering-rows) and hence have the same functions and operators available.\n", - "

\n", - "Here we add additional columns. First we get input data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.dataprep as dprep" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# loading data\n", - "dflow = dprep.auto_read_file('../data/crime-spring.csv')\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `substring(start, length)`\n", - "Add a new column \"Case Category\" using the `substring(start, length)` expression to extract the prefix from the \"Case Number\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "case_category = dflow.add_column(new_column_name='Case Category',\n", - " prior_column='Case Number',\n", - " expression=dflow['Case Number'].substring(0, 2))\n", - "case_category.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `substring(start)`\n", - "Add a new column \"Case Id\" using the `substring(start)` expression to extract just the number from \"Case Number\" column and then convert it to numeric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "case_id = dflow.add_column(new_column_name='Case Id',\n", - " prior_column='Case Number',\n", - " expression=dflow['Case Number'].substring(2))\n", - "case_id = case_id.to_number('Case Id')\n", - "case_id.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `length()`\n", - "Using the length() expression, add a new numeric column \"Length\", which contains the length of the string in \"Primary Type\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_length = dflow.add_column(new_column_name='Length',\n", - " prior_column='Primary Type',\n", - " expression=dflow['Primary Type'].length())\n", - "dflow_length.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `to_upper()`\n", - "Using the to_upper() expression, add a new numeric column \"Upper Case\", which contains the length of the string in \"Primary Type\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_to_upper = dflow.add_column(new_column_name='Upper Case',\n", - " prior_column='Primary Type',\n", - " expression=dflow['Primary Type'].to_upper())\n", - "dflow_to_upper.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `to_lower()`\n", - "Using the to_lower() expression, add a new numeric column \"Lower Case\", which contains the length of the string in \"Primary Type\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_to_lower = dflow.add_column(new_column_name='Lower Case',\n", - " prior_column='Primary Type',\n", - " expression=dflow['Primary Type'].to_lower())\n", - "dflow_to_lower.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `RegEx.extract_record()`\n", - "Using the `RegEx.extract_record()` expression, add a new record column \"Stream Date Record\", which contains the name capturing groups in the regex with value." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_regex_extract_record = dprep.auto_read_file('../data/stream-path.csv')\n", - "regex = dprep.RegEx('\\/(?\\d{4})\\/(?\\d{2})\\/(?\\d{2})\\/')\n", - "dflow_regex_extract_record = dflow_regex_extract_record.add_column(new_column_name='Stream Date Record',\n", - " prior_column='Stream Path',\n", - " expression=regex.extract_record(dflow_regex_extract_record['Stream Path']))\n", - "dflow_regex_extract_record.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `create_datetime()`\n", - "Using the `create_datetime()` expression, add a new column \"Stream Date\", which contains datetime values constructed from year, month, day values extracted from a record column \"Stream Date Record\"." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "year = dprep.col('year', dflow_regex_extract_record['Stream Date Record'])\n", - "month = dprep.col('month', dflow_regex_extract_record['Stream Date Record'])\n", - "day = dprep.col('day', dflow_regex_extract_record['Stream Date Record'])\n", - "dflow_create_datetime = dflow_regex_extract_record.add_column(new_column_name='Stream Date',\n", - " prior_column='Stream Date Record',\n", - " expression=dprep.create_datetime(year, month, day))\n", - "dflow_create_datetime.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) + col(column2)`\n", - "Add a new column \"Total\" to show the result of adding the values in the \"FBI Code\" column to the \"Community Area\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_total = dflow.add_column(new_column_name='Total',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']+dflow['FBI Code'])\n", - "dflow_total.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) - col(column2)`\n", - "Add a new column \"Subtract\" to show the result of subtracting the values in the \"FBI Code\" column from the \"Community Area\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_diff = dflow.add_column(new_column_name='Difference',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']-dflow['FBI Code'])\n", - "dflow_diff.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) * col(column2)`\n", - "Add a new column \"Product\" to show the result of multiplying the values in the \"FBI Code\" column to the \"Community Area\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_prod = dflow.add_column(new_column_name='Product',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']*dflow['FBI Code'])\n", - "dflow_prod.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) / col(column2)`\n", - "Add a new column \"True Quotient\" to show the result of true (decimal) division of the values in \"Community Area\" column by the \"FBI Code\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_true_div = dflow.add_column(new_column_name='True Quotient',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']/dflow['FBI Code'])\n", - "dflow_true_div.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) // col(column2)`\n", - "Add a new column \"Floor Quotient\" to show the result of floor (integer) division of the values in \"Community Area\" column by the \"FBI Code\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_floor_div = dflow.add_column(new_column_name='Floor Quotient',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']//dflow['FBI Code'])\n", - "dflow_floor_div.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) % col(column2)`\n", - "Add a new column \"Mod\" to show the result of applying the modulo operation on the \"FBI Code\" column and the \"Community Area\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_mod = dflow.add_column(new_column_name='Mod',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']%dflow['FBI Code'])\n", - "dflow_mod.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### `col(column1) ** col(column2)`\n", - "Add a new column \"Power\" to show the result of applying the exponentiation operation when the base is the \"Community Area\" column and the exponent is \"FBI Code\" column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_pow = dflow.add_column(new_column_name='Power',\n", - " prior_column='FBI Code',\n", - " expression=dflow['Community Area']**dflow['FBI Code'])\n", - "dflow_pow.head(5)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sihhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file