From 33bda032b82e4ea5255e577b78c8cde16f9bd796 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Shan=C3=A9=20Winner?= <43390034+swinner95@users.noreply.github.com> Date: Sun, 28 Jul 2019 00:24:43 -0700 Subject: [PATCH] Delete one-hot-encoder.ipynb --- .../how-to-guides/one-hot-encoder.ipynb | 180 ------------------ 1 file changed, 180 deletions(-) delete mode 100644 how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb diff --git a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb b/how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb deleted file mode 100644 index ad5350ec..00000000 --- a/how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.ipynb +++ /dev/null @@ -1,180 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/dataprep/how-to-guides/one-hot-encoder.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# One Hot Encoder\n", - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Azure ML Data Prep has the ability to perform one hot encoding on a selected column using `one_hot_encode`. The result Dataflow will have a new binary column for each categorical label encountered in the selected column." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.dataprep as dprep\n", - "dflow = dprep.read_csv(path='../data/crime-spring.csv')\n", - "dflow.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To use `one_hot_encode` from a Dataflow, simply specify the source column. `one_hot_encode` will figure out all the distinct values or categorical labels in the source column using the current data, and it will return a new Dataflow with a new binary column for each categorical label. Note that the categorical labels are remembered in the Dataflow step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_result = dflow.one_hot_encode(source_column='Location Description')\n", - "dflow_result.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "By default, all the new columns will use the `source_column` name as a prefix. However, if you would like to specify your own prefix, simply pass a `prefix` string as a second parameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_result = dflow.one_hot_encode(source_column='Location Description', prefix='LOCATION_')\n", - "dflow_result.head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To have more control over the categorical labels, create a builder using `dataflow.builders.one_hot_encode`. The builder allows to preview and modify the categorical labels before generating a new Dataflow with the results." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "builder = dflow.builders.one_hot_encode(source_column='Location Description', prefix='LOCATION_')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To generate the categorical labels, call the `learn` method on the builder object:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "builder.learn()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To preview the categorical labels, simply access them through the property `categorical_labels` on the builder object:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "builder.categorical_labels" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To modify the generated `categorical_labels`, assign a new value to `categorical_labels` or modify the existing one. The following example adds a missing label not found on the sample data to `categorical_labels`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "builder.categorical_labels.append('TOWNHOUSE')\n", - "builder.categorical_labels" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once the desired results are achieved, call `builder.to_dataflow` to get the new Dataflow with the encoded labels." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dflow_result = builder.to_dataflow()\n", - "dflow_result.head(5)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sihhu" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file