{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/one-hot-encoder.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# One Hot Encoder\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Azure ML Data Prep has the ability to perform one hot encoding on a selected column using `one_hot_encode`. The result Dataflow will have a new binary column for each categorical label encountered in the selected column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import azureml.dataprep as dprep\n", "dflow = dprep.read_csv(path='../data/crime-spring.csv')\n", "dflow.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To use `one_hot_encode` from a Dataflow, simply specify the source column. `one_hot_encode` will figure out all the distinct values or categorical labels in the source column using the current data, and it will return a new Dataflow with a new binary column for each categorical label. Note that the categorical labels are remembered in the Dataflow step." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow_result = dflow.one_hot_encode(source_column='Location Description')\n", "dflow_result.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, all the new columns will use the `source_column` name as a prefix. However, if you would like to specify your own prefix, simply pass a `prefix` string as a second parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow_result = dflow.one_hot_encode(source_column='Location Description', prefix='LOCATION_')\n", "dflow_result.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To have more control over the categorical labels, create a builder using `dataflow.builders.one_hot_encode`. The builder allows to preview and modify the categorical labels before generating a new Dataflow with the results." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "builder = dflow.builders.one_hot_encode(source_column='Location Description', prefix='LOCATION_')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate the categorical labels, call the `learn` method on the builder object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "builder.learn()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To preview the categorical labels, simply access them through the property `categorical_labels` on the builder object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "builder.categorical_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To modify the generated `categorical_labels`, assign a new value to `categorical_labels` or modify the existing one. The following example adds a missing label not found on the sample data to `categorical_labels`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "builder.categorical_labels.append('TOWNHOUSE')\n", "builder.categorical_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the desired results are achieved, call `builder.to_dataflow` to get the new Dataflow with the encoded labels." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflow_result = builder.to_dataflow()\n", "dflow_result.head(5)" ] } ], "metadata": { "authors": [ { "name": "sihhu" } ], "kernelspec": { "display_name": "Python 3.6", "language": "python", "name": "python36" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License." }, "nbformat": 4, "nbformat_minor": 2 }