mirror of
https://github.com/apache/impala.git
synced 2026-01-10 00:00:16 -05:00
Impala is case insensitive for column names and generally deals with them in all lower case. Kudu is case sensitive. This can lead to a problems when a table is created externally in Kudu with a column name with upper case letters. This patch solves the problem by having KuduColumn always store its name in lower case, so that general Impala code that has been written expecting lower cased column names can use Column.getName() safely. It also adds the method KuduColumn.getKuduName(), which returns the column name in the case that it appears in Kudu. Any code that passes column names into the Kudu API must call this method first to get the correct column name. There are four specific situations fixed by this patch: - When ordering on a Kudu column, the Analyzer would create two SlotDescriptors that point to the same column because registerSlotRef() was being called with inconsistent casing. It is now always called with the lower cased names. - 'ADD RANGE PARTITION' would fail to find the range partition column if it isn't all lower case in Kudu. - 'ALTER TABLE DROP COLUMN' and 'ALTER TABLE CHANGE' only worked if the column name was specified in Kudu case. - 'CREATE EXTERNAL TABLE' called on a Kudu table with column names that differ only in case now returns an error, since Impala has no way of handling this situation. Testing: - Added e2e tests in test_kudu.py. - Manually edited functional_kudu to change column names to have mixed casing and ran the kudu tests. Change-Id: I14aba88510012174716691b9946e1c7d54d01b44 Reviewed-on: http://gerrit.cloudera.org:8080/6902 Reviewed-by: Thomas Tauber-Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins