From 1f4e4cdda22b9b36a8e973f727994d3bad42f1f8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Shan=C3=A9=20Winner?= <43390034+swinner95@users.noreply.github.com> Date: Sun, 28 Jul 2019 00:23:28 -0700 Subject: [PATCH] Delete filtering.ipynb --- .../dataprep/how-to-guides/filtering.ipynb | 222 ------------------ 1 file changed, 222 deletions(-) delete mode 100644 how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb deleted file mode 100644 index 780309bb..00000000 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.ipynb +++ /dev/null @@ -1,222 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/filtering.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Filtering\n", - "Copyright (c) Microsoft Corporation. All rights reserved.
\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Azure ML Data Prep has the ability to filter out columns or rows using `Dataflow.drop_columns` or `Dataflow.filter`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# initial set up\n", - "import azureml.dataprep as dprep\n", - "from datetime import datetime\n", - "dflow = dprep.read_csv(path='../data/crime-spring.csv')\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Filtering columns" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To filter columns, use `Dataflow.drop_columns`. This method takes a list of columns to drop or a more complex argument called `ColumnSelector`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Filtering columns with list of strings" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this example, `drop_columns` takes a list of strings. Each string should exactly match the desired column to drop." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow = dflow.drop_columns(['ID', 'Location Description', 'Ward', 'Community Area', 'FBI Code'])\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Filtering columns with regex" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Alternatively, a `ColumnSelector` can be used to drop columns that match a regex expression. In this example, we drop all the columns that match the expression `Column*|.*longitud|.*latitude`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow = dflow.drop_columns(dprep.ColumnSelector('Column*|.*longitud|.*latitude', True, True))\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Filtering rows" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To filter rows, use `DataFlow.filter`. This method takes an `Expression` as an argument, and returns a new dataflow with the rows in which the expression evaluates to `True`. Expressions are built by indexing the `Dataflow` with a column name (`dataflow['myColumn']`) and regular operators (`>`, `<`, `>=`, `<=`, `==`, `!=`)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Filtering rows with simple expressions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Index into the Dataflow specifying the column name as a string argument `dataflow['column_name']` and in combination with one of the following standard operators `>, <, >=, <=, ==, !=`, build an expression such as `dataflow['District'] > 9`. Finally, pass the built expression into the `Dataflow.filter` function.\n", - "\n", - "In this example, `dataflow.filter(dataflow['District'] > 9)` returns a new dataflow with the rows in which the value of \"District\" is greater than '10' \n", - "\n", - "*Note that \"District\" is first converted to numeric, which allows us to build an expression comparing it against other numeric values.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow = dflow.to_number(['District'])\n", - "dflow = dflow.filter(dflow['District'] > 9)\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Filtering rows with complex expressions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To filter using complex expressions, combine one or more simple expressions with the operators `&`, `|`, and `~`. Please note that the precedence of these operators is lower than that of the comparison operators; therefore, you'll need to use parentheses to group clauses together. \n", - "\n", - "In this example, `Dataflow.filter` returns a new dataflow with the rows in which \"Primary Type\" equals 'DECEPTIVE PRACTICE' and \"District\" is greater than or equal to '10'." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow = dflow.to_number(['District'])\n", - "dflow = dflow.filter((dflow['Primary Type'] == 'DECEPTIVE PRACTICE') & (dflow['District'] >= 10))\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It is also possible to filter rows combining more than one expression builder to create a nested expression.\n", - "\n", - "*Note that `'Date'` and `'Updated On'` are first converted to datetime, which allows us to build an expression comparing it against other datetime values.*" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow = dflow.to_datetime(['Date', 'Updated On'], ['%Y-%m-%d %H:%M:%S'])\n", - "dflow = dflow.to_number(['District', 'Y Coordinate'])\n", - "comparison_date = datetime(2016,4,13)\n", - "dflow = dflow.filter(\n", - " ((dflow['Date'] > comparison_date) | (dflow['Updated On'] > comparison_date))\n", - " | ((dflow['Y Coordinate'] > 1900000) & (dflow['District'] > 10.0)))\n", - "dflow.head(5)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sihhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file