Delete filtering.ipynb

2019-08-21 10:11:53 -07:00
parent f7f1087337
commit d5ba3916f7
1 changed files with 0 additions and 220 deletions
--- a/work-with-data/dataprep/how-to-guides/filtering.ipynb
+++ b/work-with-data/dataprep/how-to-guides/filtering.ipynb
@@ -1,220 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/filtering.png)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "# Filtering\n"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Azure ML Data Prep has the ability to filter out columns or rows using `Dataflow.drop_columns` or `Dataflow.filter`."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# initial set up\n",
-        "import azureml.dataprep as dprep\n",
-        "from datetime import datetime\n",
-        "dflow = dprep.read_csv(path='../data/crime-spring.csv')\n",
-        "dflow.head(5)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Filtering columns"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "To filter columns, use `Dataflow.drop_columns`. This method takes a list of columns to drop or a more complex argument called `ColumnSelector`."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Filtering columns with list of strings"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "In this example, `drop_columns` takes a list of strings. Each string should exactly match the desired column to drop."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "dflow = dflow.drop_columns(['ID', 'Location Description', 'Ward', 'Community Area', 'FBI Code'])\n",
-        "dflow.head(5)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Filtering columns with regex"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Alternatively, a `ColumnSelector` can be used to drop columns that match a regex expression. In this example, we drop all the columns that match the expression `Column*|.*longitud|.*latitude`."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "dflow = dflow.drop_columns(dprep.ColumnSelector('Column*|.*longitud|.*latitude', True, True))\n",
-        "dflow.head(5)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "## Filtering rows"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "To filter rows, use `DataFlow.filter`. This method takes an `Expression` as an argument, and returns a new dataflow with the rows in which the expression evaluates to `True`. Expressions are built by indexing the `Dataflow` with a column name (`dataflow['myColumn']`) and regular operators (`>`, `<`, `>=`, `<=`, `==`, `!=`)."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Filtering rows with simple expressions"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "Index into the Dataflow specifying the column name as a string argument `dataflow['column_name']` and in combination with one of the following standard operators `>, <, >=, <=, ==, !=`, build an expression such as `dataflow['District'] > 9`.  Finally, pass the built expression into the `Dataflow.filter` function.\n",
-        "\n",
-        "In this example, `dataflow.filter(dataflow['District'] > 9)` returns a new dataflow with the rows in which the value of \"District\" is greater than '10' \n",
-        "\n",
-        "*Note that \"District\" is first converted to numeric, which allows us to build an expression comparing it against other numeric values.*"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "dflow = dflow.to_number(['District'])\n",
-        "dflow = dflow.filter(dflow['District'] > 9)\n",
-        "dflow.head(5)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "### Filtering rows with complex expressions"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "To filter using complex expressions, combine one or more simple expressions with the operators `&`, `|`, and `~`. Please note that the precedence of these operators is lower than that of the comparison operators; therefore, you'll need to use parentheses to group clauses together. \n",
-        "\n",
-        "In this example, `Dataflow.filter` returns a new dataflow with the rows in which \"Primary Type\" equals 'DECEPTIVE PRACTICE' and \"District\" is greater than or equal to '10'."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "dflow = dflow.to_number(['District'])\n",
-        "dflow = dflow.filter((dflow['Primary Type'] == 'DECEPTIVE PRACTICE') & (dflow['District'] >= 10))\n",
-        "dflow.head(5)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "It is also possible to filter rows combining more than one expression builder to create a nested expression.\n",
-        "\n",
-        "*Note that `'Date'` and `'Updated On'` are first converted to datetime, which allows us to build an expression comparing it against other datetime values.*"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "dflow = dflow.to_datetime(['Date', 'Updated On'], ['%Y-%m-%d %H:%M:%S'])\n",
-        "dflow = dflow.to_number(['District', 'Y Coordinate'])\n",
-        "comparison_date = datetime(2016,4,13)\n",
-        "dflow = dflow.filter(\n",
-        "    ((dflow['Date'] > comparison_date) | (dflow['Updated On'] > comparison_date))\n",
-        "    | ((dflow['Y Coordinate'] > 1900000) & (dflow['District'] > 10.0)))\n",
-        "dflow.head(5)"
-      ]
-    }
-  ],
-  "metadata": {
-    "authors": [
-      {
-        "name": "sihhu"
-      }
-    ],
-    "kernelspec": {
-      "display_name": "Python 3.6",
-      "language": "python",
-      "name": "python36"
-    },
-    "language_info": {
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "file_extension": ".py",
-      "mimetype": "text/x-python",
-      "name": "python",
-      "nbconvert_exporter": "python",
-      "pygments_lexer": "ipython3",
-      "version": "3.6.4"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 2
-}