![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/work-with-data/dataprep/how-to-guides/impute-missing-values.png)

# Impute missing values


Azure ML Data Prep has the ability to impute missing values in specified columns. In this case, we will attempt to impute the missing _Latitude_ and _Longitude_ values in the input data.

In [None]:
import azureml.dataprep as dprep

In [None]:
# loading input data
dflow = dprep.read_csv(path= '../data/crime-spring.csv')
dflow = dflow.keep_columns(['ID', 'Arrest', 'Latitude', 'Longitude'])
dflow = dflow.to_number(['Latitude', 'Longitude'])
dflow.head(5)

The third record from input data has _Latitude_ and _Longitude_ missing. To impute those missing values, we can use `ImputeMissingValuesBuilder` to learn a fixed program which imputes the columns with either a calculated `MIN`, `MAX` or `MEAN` value or a `CUSTOM` value. When `group_by_columns` is specified, missing values will be imputed by group with `MIN`, `MAX` and `MEAN` calculated per group.

Firstly, let us quickly see check the `MEAN` value of _Latitude_ column.

In [None]:
dflow_mean = dflow.summarize(group_by_columns=['Arrest'],
 summary_columns=[dprep.SummaryColumnsValue(column_id='Latitude',
 summary_column_name='Latitude_MEAN',
 summary_function=dprep.SummaryFunction.MEAN)])
dflow_mean = dflow_mean.filter(dprep.col('Arrest') == 'FALSE')
dflow_mean.head(1)

The `MEAN` value of _Latitude_ looks good. So we will impute _Latitude_ with it. As for `Longitude`, we will impute it using `42` based on external knowledge.

In [None]:
# impute with MEAN
impute_mean = dprep.ImputeColumnArguments(column_id='Latitude',
 impute_function=dprep.ReplaceValueFunction.MEAN)
# impute with custom value 42
impute_custom = dprep.ImputeColumnArguments(column_id='Longitude',
 custom_impute_value=42)
# get instance of ImputeMissingValuesBuilder
impute_builder = dflow.builders.impute_missing_values(impute_columns=[impute_mean, impute_custom],
 group_by_columns=['Arrest'])
# call learn() to learn a fixed program to impute missing values
impute_builder.learn()
# call to_dataflow() to get a dataflow with impute step added
dflow_imputed = impute_builder.to_dataflow()

In [None]:
# check impute result
dflow_imputed.head(5)

As the result above, the missing _Latitude_ has been imputed with the `MEAN` value of `Arrest=='false'` group, and the missing _Longitude_ has been imputed with `42`.