This change whitelists the supported filesystems which can be set
as Default FS for Impala to run on.
This patch configures Impala to use S3 as the default filesystem, rather
than a secondary filesystem as before.
Change-Id: I2f45bef6c94ece634045acb906d12591587ccfed
Reviewed-on: http://gerrit.cloudera.org:8080/1121
Reviewed-by: anujphadke <aphadke@cloudera.com>
Tested-by: Internal Jenkins
Impala could crash or return wrong result if it uses codegend
avro decoding function to scan avro file that has different
schema than table schema. With AVRO-1617 fix, we make sure
Impala doesn't use codegen if table schema has less columns
than file schema.
Change-Id: I268419e421404ad6b084482dee417634f17ecf60
Reviewed-on: http://gerrit.cloudera.org:8080/1696
Reviewed-by: Juan Yu <jyu@cloudera.com>
Tested-by: Internal Jenkins
This patch contains the following changes:
- Add a metastore_snapshot_file parameter to build.sh
- Enable skipping loading the metadata.
- create-load-data.sh is refactored into functions.
- A lot of scripts source impala-config, which creates a lot of log spew. This has now
been muted.
- Unecessary log spew from compute-table-stats has been muted.
- build_thirdparty.sh determins its parallelism from the system, it was previously hard
coded to 4
- Only force load data of the particular dataset if a schema change is detected.
Change-Id: I909336451e5c1ca57d21f040eb94c0e831546837
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5540
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: jenkins
Hive converts "bytes"-type fields to an array<tinyint> column, which
we can't even load the metadata for. However, if a bytes field appears
in a file schema but not the table schema, this change allows us to
read (but not materialize) the field. Otherwise we can't read the file
at all.
This change also adds a "bytes"-type field to one of the files in
functional_avro_snap.schema_resolution_test.
Change-Id: I25953ee049e174fc4dbff5d68520a6f87e545339
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3823
Reviewed-by: Skye Wanderman-Milne <skye@cloudera.com>
Tested-by: jenkins
(cherry picked from commit 0e2e7c1ac0f63623b7ec3724920e9927cd782508)
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3895
Avro tables that were not created with a column-definition list do not have
their columns properly populated in the Metastore backend DB (HIVE-6308).
For such tables COMPUTE STATS and Hive's ANALYZE TABLE cannot succeed.
This patch fails COMPUTE STATS in analysis for such broken Avro tables
and adds tests for Avro tables with mismatched a column-definition list
and Avro schema.
Change-Id: I561ecea944ae2f83d69950b7a1ab9edaa89bdcea
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1892
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: jenkins
Reviewed-on: http://gerrit.ent.cloudera.com:8080/1920
Fixes a bug (regression) where the catalog server was not properly resolving column
names when a table's column definition did not match its Avro schema definition.
The expected behavior in this case is that the the Avro scehma definition should be
used instead of the table columns. We had no test tables that were mismatched so
this wasn't caught.
This loading of the schema and columns happens when a table's metadata is loaded, so
the fix is to just add a toThrift() to Column and not reference
metastore.getSd().getCols() directly since it might be the "wrong" set of columns.
Change-Id: I341a3a8834f5748f90c246d2093ddb983ecfdd4f
Reviewed-on: http://gerrit.ent.cloudera.com:8080/770
Reviewed-by: Lenni Kuff <lskuff@cloudera.com>
Tested-by: Lenni Kuff <lskuff@cloudera.com>