![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/data-profile.png)

# Data Profile


A DataProfile collects summary statistics on each column of the data produced by a Dataflow. This can be used to:
- Understand the input data.
- Determine which columns might need further preparation.
- Verify that data preparation operations produced the desired result.

`Dataflow.get_profile()` executes the Dataflow, calculates profile information, and returns a newly constructed DataProfile.

In [None]:
import azureml.dataprep as dprep

dflow = dprep.auto_read_file('../data/crime-spring.csv')

profile = dflow.get_profile()
profile

A DataProfile contains a collection of ColumnProfiles, indexed by column name. Each ColumnProfile has attributes for the calculated column statistics. For non-numeric columns, profiles include only basic statistics like min, max, and error count. For numeric columns, profiles also include statistical moments and estimated quantiles.

In [None]:
profile.columns['Beat']

You can also extract and filter data from profiles by using list and dict comprehensions.

In [None]:
variances = [c.variance for c in profile.columns.values() if c.variance]
variances

In [None]:
column_types = {c.name: c.type for c in profile.columns.values()}
column_types

If a column has fewer than a thousand unique values, its ColumnProfile contains a summary of values with their respective counts.

In [None]:
profile.columns['Primary Type'].value_counts

Numeric ColumnProfiles include an estimated histogram of the data.

In [None]:
profile.columns['District'].histogram

To configure the number of bins in the histogram, you can pass an integer as the `number_of_histogram_bins` parameter.

In [None]:
profile_more_bins = dflow.get_profile(number_of_histogram_bins=5)
profile_more_bins.columns['District'].histogram

For columns containing data of mixed types, the ColumnProfile also provides counts of each type.

In [None]:
profile.columns['X Coordinate'].type_counts