impala

mirror of https://github.com/apache/impala.git synced 2025-12-19 18:12:08 -05:00

Author	SHA1	Message	Date
Riza Suminto	cb496104d9	IMPALA-14027: Implement HS2 NULL_TYPE using TStringValue HS2 NULL_TYPE should be implemented using TStringValue. However, due to incompatibility with Hive JDBC driver implementation then, Impala choose to implement NULL type using TBoolValue (see IMPALA-914, IMPALA-1370). HIVE-4172 might be the root cause for such decision. Today, the Hive JDBC (org.apache.hive.jdbc.HiveDriver) does not have that issue anymore, as shown in this reproduction after applying this patch: ./bin/run-jdbc-client.sh -q "select null" -t NOSASL Using JDBC Driver Name: org.apache.hive.jdbc.HiveDriver Connecting to: jdbc:hive2://localhost:21050/;auth=noSasl Executing: select null ----[START]---- NULL ----[END]---- Returned 1 row(s) in 0.343s Thus, we can reimplement NULL_TYPE using TStringValue to match HiveServer2 behavior. Testing: - Pass core tests. Change-Id: I354110164b360013d9893f1eb4398c3418f80472 Reviewed-on: http://gerrit.cloudera.org:8080/22852 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-05-06 19:41:17 +00:00
Joe McDonnell	c5a0ec8bdf	IMPALA-11980 (part 1): Put all thrift-generated python code into the impala_thrift_gen package This puts all of the thrift-generated python code into the impala_thrift_gen package. This is similar to what Impyla does for its thrift-generated python code, except that it uses the impala_thrift_gen package rather than impala._thrift_gen. This is a preparatory patch for fixing the absolute import issues. This patches all of the thrift files to add the python namespace. This has code to apply the patching to the thirdparty thrift files (hive_metastore.thrift, fb303.thrift) to do the same. Putting all the generated python into a package makes it easier to understand where the imports are getting code. When the subsequent change rearranges the shell code, the thrift generated code can stay in a separate directory. This uses isort to sort the imports for the affected Python files with the provided .isort.cfg file. This also adds an impala-isort shell script to make it easy to run. Testing: - Ran a core job Change-Id: Ie2927f22c7257aa38a78084efe5bd76d566493c0 Reviewed-on: http://gerrit.cloudera.org:8080/20169 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Riza Suminto <riza.suminto@cloudera.com>	2025-04-15 17:03:02 +00:00
Riza Suminto	c08aff420d	IMPALA-13672: Migrate query_test/test_kudu.py to use hs2 protocol This patch migrate query_test/test_kudu.py to use hs2 client protocol. Here are the steps taken: - Override default_test_protocol() to return 'hs2'. See documentation in ImpalaTestSuite about what this method does. - Remove usage of deprecated cursor and unique_cursor fixture. - Replace all direct ImpalaTestSuite.client usage with helper function call such as execute_query() or execute_query_using_vector(). - Remove all "SET" query invocation and replace it with passing exec_option dictionary to helper method. - Replace veryfing kudu modified / inserted rows from reading query output to reading runtime profile counters. - Add HS2_TYPES section at test cases where only TYPES exist. - Remove all drop_impala_table_after_context() calls and replace it with proper use of unique_database fixture. KuduTestSuite is fixed with hs2 protocol dimension. Meanwhile, CustomKuduTest is fixed to use beeswax protocol dimension until proper migration can be done. Added following convenience methods: - ImpalaTestSuite.default_test_protocol() to allow individual test class to override its default test procol. - ImpylaHS2ResultSet.tuples() to access the raw HS2 result set that is a list of tuples. This patch also added several literal constants around test vector dimension to help with traceability. Fixed a bug where "SHOW PARTITIONS" via hs2 over kudu table will shows NULL number of #Replicas because TResultRowBuilder does not have overload for int type value. Adjust numFiles variable inside HdfsTable.getTableStats() from int to long to match Type.BIGINT of column '#Files'. Fixed py.test classes that does not inherit BaseTestSuite. Fixed flake8 issues in test_statestore.py. Testing: - Run and pass all tests extended from KuduTestSuite in exhaustive mode. Change-Id: I5f38baf5a0bbde1a1ad0bb4666c300f4f3cabd33 Reviewed-on: http://gerrit.cloudera.org:8080/22358 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2025-02-07 11:57:59 +00:00
Joe McDonnell	82bd087fb1	IMPALA-11973: Add absolute_import, division to all eligible Python files This takes steps to make Python 2 behave like Python 3 as a way to flush out issues with running on Python 3. Specifically, it handles two main differences: 1. Python 3 requires absolute imports within packages. This can be emulated via "from __future__ import absolute_import" 2. Python 3 changed division to "true" division that doesn't round to an integer. This can be emulated via "from __future__ import division" This changes all Python files to add imports for absolute_import and division. For completeness, this also includes print_function in the import. I scrutinized each old-division location and converted some locations to use the integer division '//' operator if it needed an integer result (e.g. for indices, counts of records, etc). Some code was also using relative imports and needed to be adjusted to handle absolute_import. This fixes all Pylint warnings about no-absolute-import and old-division, and these warnings are now banned. Testing: - Ran core tests Change-Id: Idb0fcbd11f3e8791f5951c4944be44fb580e576b Reviewed-on: http://gerrit.cloudera.org:8080/19588 Reviewed-by: Joe McDonnell <joemcdonnell@cloudera.com> Tested-by: Joe McDonnell <joemcdonnell@cloudera.com>	2023-03-09 17:17:57 +00:00
Csaba Ringhofer	7ca11dfc7f	IMPALA-9482: Support for BINARY columns This patch adds support for BINARY columns for all table formats with the exception of Kudu. In Hive the main difference between STRING and BINARY is that STRING is assumed to be UTF8 encoded, while BINARY can be any byte array. Some other differences in Hive: - BINARY can be only cast from/to STRING - Only a small subset of built-in STRING functions support BINARY. - In several file formats (e.g. text) BINARY is base64 encoded. - No NDV is calculated during COMPUTE STATISTICS. As Impala doesn't treat STRINGs as UTF8, BINARY and STRING become nearly identical, especially from the backend's perspective. For this reason, BINARY is implemented a bit differently compared to other types: while the frontend treats STRING and BINARY as two separate types, most of the backend uses PrimitiveType::TYPE_STRING for BINARY too, e.g. in SlotDesc. Only the following parts of backend need to differentiate between STRING and BINARY: - table scanners - table writers - HS2/Beeswax service These parts have access to column metadata, which allows to add special handling for BINARY. Only a very few builtins are allowed for BINARY at the moment: - length - min/max/count - coalesce and similar "selector" functions Other STRING functions can be only used by casting to STRING first. Adding support for more of these functions is very easy, as simply the BINARY type has to be "connected" to the already existing STRING function's signature. Functions where the result depends on utf8_mode need to ensure that with BINARY it always works as if utf8_mode=0 (for example length() is mapped to bytes() as length count utf8 chars if utf8_mode=1). All kinds of UDFs (native, Hive legacy, Hive generic) support BINARY, though in case of legacy Hive UDFs it is only supported if the argument and return types are set explicitely to ensure backward compatibility. See IMPALA-11340 for details. The original plan was to behave as close to Hive as possible, but I realized that Hive has more relaxed casting rules than Impala, which led to STRING<->BINARY casts being necessary in more cases in Impala. This was needed to disallow passing a BINARY to functions that expect a STRING argument. An example for the difference is that in INSERT ... VALUES () string literals need to be explicitly cast to BINARY, while this is not needed in Hive. Testing: - Added functional.binary_tbl for all file formats (except Kudu) to test scanning. - Removed functional.unsupported_types and related tests, as now Impala supports all (non-complex) types that Hive does. - Added FE/EE tests mainly based on the ones added to the DATE type Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Reviewed-on: http://gerrit.cloudera.org:8080/16066 Reviewed-by: Quanlong Huang <huangquanlong@gmail.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2022-08-19 13:55:42 +00:00
stakiar	c47fca5960	IMPALA-8962: FETCH_ROWS_TIMEOUT_MS should apply before rows are available IMPALA-7312 added the query option FETCH_ROWS_TIMEOUT_MS, but it only applies to fetch requests against a query that has already transitioned to the 'FINISHED' state. This patch changes the timeout so that it applies to queries in the 'RUNNING' state as well. Before this patch, fetch requests issued while a query was 'RUNNING' blocked until the query transitioned to the 'FINISHED' state, and then it fetched results and returned them. After this patch, fetch requests against queries in the 'RUNNING' state will block for 'FETCH_ROWS_TIMEOUT_MS' and then return. For HS2 clients, fetch requests that return while a query is 'RUNNING' set their TStatusCode to STILL_EXECUTING_STATUS. For Beeswax clients, fetch requests that return while a query is 'RUNNING' set the 'ready' flag to false. For both clients, hasMoreRows is set to true. If the following sequence of events occurs: * A fetch request is issued and blocks on a 'RUNNING' query * The query transitions to the 'FINISHED' state * The fetch request attempts to read multiple batches Then the time spent waiting for the query to finish is deducted from the timeout used when waiting for rows to be produced by the Coordinator fragment. Fixed a bug in the current usage of FETCH_ROWS_TIMEOUT_MS where the time units for FETCH_ROWS_TIMEOUT_MS and MonotonicStopWatch were not being converted properly. Tests: * Moved existing fetch timeout tests from hs2/test_fetch.py into a new test file hs2/test_fetch_timeout.py. * Added several new tests to hs2/test_fetch_timeout.py to validate that the timeout is applied to 'RUNNING' queries and that the timeout applies across a 'RUNNING' and 'FINISHED' query. * Added new tests to query_test/test_fetch.py to validate the timeout while using the Beeswax protocol. Change-Id: I2cba6bf062dcc1af19471d21857caa797c1ea4a4 Reviewed-on: http://gerrit.cloudera.org:8080/14332 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-10-08 19:13:33 +00:00
Sahil Takiar	151835116a	IMPALA-7312: Non-blocking mode for Fetch() RPC Adds the query option FETCH_ROWS_TIMEOUT_MS to control the client timeout when fetching rows. Set to 10 seconds by default to avoid unnecessary fetch requests. Timeout applies when result spooling is enabled or disabled. When result spooling is disabled, the timeout controls how long the client thread will wait for a single RowBatch to be produced by the coordinator fragment. When result spooling is enabled, a client can fetch multiple RowBatches at a time, so the timeout controls the total time spent waiting for RowBatches to be produced. The timeout applies to both waiting for rows to be sent by the fragment instance thread, and waiting for rows to be materialized (e.g. the time measured by RowMaterializationTimer). Testing: * Added new tests to test_fetch.py * Ran core tests Change-Id: I331acaba23a65dab43cca48e9dc0dc957b9c632d Reviewed-on: http://gerrit.cloudera.org:8080/14157 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-09-10 05:56:57 +00:00
Tim Armstrong	ab908d54c2	IMPALA-8605: clean up HS2/beeswax session management Session/operation secrets are part of the HS2 handle but we haven't made use of them up until now. This patch checks the value and treats it as part of the session key, as originally intended. I.e. if the secret is missing, the session lookup fails. The operation secret is the same as the session secret to save having to generate and store extra secrets (there's no real benefit). A secret is added to each Beeswax session. This secret is internal to the server and not exposed. Adds validation that client requests accessed via Beeswax belong to the same user as the session. We switch uuid_generator_ to use boost::random_device, which uses /dev/urandom as its source of randomness to be more robust - otherwise it's hard to be sure that we won't have collisions, although it doesn't seem to be a problem in practice. For requests - GetRuntimeProfile() and GetExecSummary() that provide both a session and query ID, the code already checks that the session's user matches the query. An exception to the validation mechanisms above is added for Close() and Cancel() beeswax operations, because impala-shell and some administrative tools allow cancellation of queries on different threads and from different tools. We skip validating the session secret when cancelling queries from the web UI, since web UI users don't have the secret. Testing: * Ran exhaustive tests. * Add tests for all HS2 RPCs that provide invalid session secrets. * Add tests for HS2 RPCs that provide both session and query ID to ensure that query belongs to the session. * Add basic test for beeswax testing accessing a query from different connections. Change-Id: I4c014d1a32e273275a773f842b9ed9793dbdba6b Reviewed-on: http://gerrit.cloudera.org:8080/13585 Reviewed-by: Lars Volker <lv@cloudera.com> Reviewed-by: Thomas Marshall <tmarshall@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-06-11 22:38:24 +00:00
Attila Jeges	b5805de3e6	IMPALA-7368: Add initial support for DATE type DATE values describe a particular year/month/day in the form yyyy-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a time of day component. The range of values supported for the DATE type is 0000-01-01 to 9999-12-31. This initial DATE type support covers TEXT and HBASE fileformats only. 'DateValue' is used as the internal type to represent DATE values. The changes are as follows: - Support for DATE literal syntax. - Explicit casting between DATE and other types (note that invalid casts will fail with an error just like invalid DECIMAL_V2 casts, while failed casts to other types do no lead to warning or error): - from STRING to DATE. The string value must be formatted as yyyy-MM-dd HH:mm:ss.SSSSSSSSS. The date component is mandatory, the time component is optional. If the time component is present, it will be truncated silently. - from DATE to STRING. The resulting string value is formatted as yyyy-MM-dd. - from TIMESTAMP to DATE. The source timestamp's time of day component is ignored. - from DATE to TIMESTAMP. The target timestamp's time of day component is set to 00:00:00. - Implicit casting between DATE and other types: - from STRING to DATE if the source string value is used in a context where a DATE value is expected. - from DATE to TIMESTAMP if the source date value is used in a context where a TIMESTAMP value is expected. - Since STRING -> DATE, STRING -> TIMESTAMP and DATE -> TIMESTAMP implicit conversions are now all possible, the existing function overload resolution logic is not adequate anymore. For example, it resolves the if(false, '2011-01-01', DATE '1499-02-02') function call to the if(BOOLEAN, TIMESTAMP, TIMESTAMP) version of the overloaded function, instead of the if(BOOLEAN, DATE, DATE) version. This is clearly wrong, so the function overload resolution logic had to be changed to resolve function calls to the best-fit overloaded function definition if there are multiple applicable candidates. An overloaded function definition is an applicable candidate for a function call if each actual parameter in the function call either matches the corresponding formal parameter's type (without casting) or is implicitly castable to that type. When looking for the best-fit applicable candidate, a parameter match score (i.e. the number of actual parameters in the function call that match their corresponding formal parameter's type without casting) is calculated and the applicable candidate with the highest parameter match score is chosen. There's one more issue that the new resolution logic has to address: if two applicable candidates have the same parameter match score and the only difference between the two is that the first one requires a STRING -> TIMESTAMP implicit cast for some of its parameters while the second one requires a STRING -> DATE implicit cast for the same parameters then the first candidate has to be chosen not to break backward compatibility. E.g: year('2019-02-15') function call must resolve to year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is not implemented yet, so this is not an issue at the moment but it will be in the future. When the resolution algorithm considers overloaded function definitions, first it orders them lexicographically by the types in their parameter lists. To ensure the backward compatible behavior Primitivetype.DATE enum value has to come after PrimitiveType.TIMESTAMP. - Codegen infrastructure changes for expression evaluation. - 'IS [NOT] NULL' and '[NOT] IN' predicates. - Common comparison operators (including the 'BETWEEN' operator). - Infrastructure changes for built-in functions. - Some built-in functions: conditional, aggregate, analytical and math functions. - C++ UDF/UDA support. - Support partitioning and grouping by DATE. - Beeswax, HiveServer2 support. These items are tightly coupled and it makes sense to implement them in one change-set. Testing: - A new partitioned TEXT table 'functional.date_tbl' (and the corresponding HBASE table 'functional_hbase.date_tbl') was introduced for DATE-related tests. - BE and FE tests were extended to cover DATE type. - E2E tests: - since DATE type is supported for TEXT and HBASE fileformats only, most DATE tests were implemented separately in tests/query_test/test_date_queries.py. Note, that this change-set is not a complete DATE type implementation, but it lays the foundation for future work: - Add date support to the random query generator. - Implement a complete set of built-in functions. - Add Parquet support. - Add Kudu support. - Optionally support Avro and ORC. For further details, see IMPALA-6169. Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f Reviewed-on: http://gerrit.cloudera.org:8080/12481 Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>	2019-04-23 13:33:57 +00:00
Taras Bobrovytsky	d07580c171	IMPALA-4962: Fix SHOW COLUMN STATS for HS2 Impala incorrectly returned NULLs in the "Max Size" column of the SHOW COLUMN STATS result when executed through the HS2 interface. The issue was that the column was specified to be type INT in the result schema, but the actual type of the contents that we inserted into it was "long". The reason why this is not an issue in Impala shell is because we stringify the contents without inspecting the metadata for beeswax results. The issue was fixed by changing the type from INT to BIGINT. Change-Id: I419657744635dfdc2e1562fe60a597617fff446e Reviewed-on: http://gerrit.cloudera.org:8080/6109 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Impala Public Jenkins	2017-02-22 23:10:34 +00:00
Dan Hecht	ffa7829b70	IMPALA-3918: Remove Cloudera copyrights and add ASF license header For files that have a Cloudera copyright (and no other copyright notice), make changes to follow the ASF source file header policy here: http://www.apache.org/legal/src-headers.html#headers Specifically: 1) Remove the Cloudera copyright. 2) Modify NOTICE.txt according to http://www.apache.org/legal/src-headers.html#notice to follow that format and add a line for Cloudera. 3) Replace or add the existing ASF license text with the one given on the website. Much of this change was automatically generated via: git grep -li 'Copyright.Cloudera' > modified_files.txt cat modified_files.txt \| xargs perl -n -i -e 'print unless m#Copyright.Cloudera#i;' cat modified_files_txt \| xargs fix_apache_license.py [1] Some manual fixups were performed following those steps, especially when license text was completely missing from the file. [1] https://gist.github.com/anonymous/ff71292094362fc5c594 with minor modification to ORIG_LICENSE to match Impala's license text. Change-Id: I2e0bd8420945b953e1b806041bea4d72a3943d86 Reviewed-on: http://gerrit.cloudera.org:8080/3779 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Tested-by: Internal Jenkins	2016-08-09 08:19:41 +00:00
Casey Ching	074e5b4349	Remove hashbang from non-script python files Many python files had a hashbang and the executable bit set though they were not intended to be run a standalone script. That makes determining which python files are actually scripts very difficult. A future patch will update the hashbang in real python scripts so they use $IMPALA_HOME/bin/impala-python. Change-Id: I04eafdc73201feefe65b85817a00474e182ec2ba Reviewed-on: http://gerrit.cloudera.org:8080/599 Reviewed-by: Casey Ching <casey@cloudera.com> Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com> Tested-by: Internal Jenkins	2015-08-04 05:26:07 +00:00
Henry Robinson	79913b01e6	IMPALA-2064: Add effective_user() builtin The user() builtin always returns the connecteduser. However, if the client wants to see which user its queries are actually delegated to, there was no easy way to do that. This patch adds effective_user(), which returns the proxy delegated user for authorization purposes. If no delegated user is set, the effective user is the same as that returned from user(). The only way to test this is via a new custom cluster test, which sets impala.doas.user so that the effective user might be different from the connected one. Change-Id: I7048c27c6808a6986dbe1246929816176dca9f76 Reviewed-on: http://gerrit.cloudera.org:8080/458 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: Internal Jenkins	2015-06-16 23:42:40 +00:00
Henry Robinson	66295a8554	IMPALA-1264: Re-enable test_fetch_first Since Impala may sometimes return fewer rows than asked for, it's not safe to test for an exact size response from a single fetch() call except in very particular cases (when either 0 or 1 rows are expected). The HS2 fetch_first tests relied on the full request always being honoured in a couple of places, and as a result are prone to occasional failures due to 'underflow'. This patch changes the fetch() call to use fetch_until() in all places where underflow could happen, and removes the xfail restriction from those V1 and V6 tests. Change-Id: Ia62f3624947530d516a87f84e706e305048b916f Reviewed-on: http://gerrit.cloudera.org:8080/192 Reviewed-by: Dan Hecht <dhecht@cloudera.com> Reviewed-by: Matthew Jacobs <mj@cloudera.com> Tested-by: Internal Jenkins	2015-03-11 16:39:39 -07:00
Alex Behm	63805fa4f2	IMPALA-1799: Correctly populate HS2 result type metadata for VARCHAR. The bug was that VARCHAR was handled in the wrong case within a switch on the result type. As a result, the HS2 type qualifiers were not populated. This issue manifested itself from ODBC/JDBC as reporting 32767 as the max length of any VARCHAR type in query results. Change-Id: Ieffff1f51a23472b2b97ae53d8148e1cb209f5b8 Reviewed-on: http://gerrit.cloudera.org:8080/66 Reviewed-by: Alex Behm <alex.behm@cloudera.com> Tested-by: Internal Jenkins	2015-02-19 19:02:23 +00:00
Martin Grund	cee1e84c1e	IMPALA-1587: Extending caching directives for multiple replicas This patch adds the possibility to specify the number of replicas that should be cached in main memory. This can be useful in high QPS scenarios as the majority of the load is no longer the single cached replica, but a set of cached replicas. While the cache replication factor can be larger than the block replication factor on disk, the difference will be ignored by HDFS until more replicas become available. This extends the current syntax for specifying the cache pool in the following way: cached in 'poolName' is extended with the optional replication factor cached in 'poolName' with replication = XX By default, the cache replication factor is set to 1. As this value is not yet configurable in HDFS it's defined as a constant in the JniCatalog thrift specification. If a partitioned table is cached, all its child partitions inherit this cache replication factor. If child partitions have a custom cache replication factor, changing the cache replication factor on the partitioned table afterwards will overwrite this custom value. If a new partition is added to the table, it will again inherit the cache replication factor of the parent independent of the cache pool that is used to cache the partition. To review changes and status of the replication factor for tables and partitions the replication factor is part of output of the "show partitions" command. Change-Id: I2aee63258d6da14fb5ce68574c6b070cf948fb4d Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5533 Tested-by: jenkins Reviewed-by: Martin Grund <mgrund@cloudera.com>	2015-01-26 20:30:59 -08:00
Henry Robinson	0dc83f54be	CDH-23776: Compute stats crashes Impala with a HS2 V6 client COMPUTE STATS runs two child queries via the HS2 interface. When the client that is issuing the parent query is also using HS2, Impala would inherit the protocol version from the parent for use in the child queries, and that particularly affected the orientation that results were returned in. However, the compute stats post-processing path assumed that the results were in V1 (i.e. row-major) format. This patch adds a new key-value pair to the HS2 conf overlay to tell Impala that this is a child query. When it is set, Impala always uses the V1 protocol internally, meaning that child query results are always returned in row-major format. Since they are always consumed internally, and never sent to the user directly, this is a viable simplification. Change-Id: I9846ec2cb6a4f3b54ab0d29dd4a99916442b8e71 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5538 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-12-10 17:33:23 -08:00
Henry Robinson	9468528477	IMPALA-1370: Fix NULL literal types in HS2 API Change-Id: Id5dd2f42ca167726351b220193704770512e94cc Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5017 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins (cherry picked from commit 37a74ec8e6e0c3f2e67db7233a7ffae52881d231) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5021	2014-10-30 00:01:08 -07:00
Henry Robinson	6af7c8fe4a	IMPALA-1330: Fix column types for SHOW {table, partition} STATS Because we add 'total' to the last row in SHOW PARTITIONS, we set the partition key columns to be string. At least, that's what the comment said, but we didn't do that in fact. This patch also corrects the column type for max width, which should be INT. Change-Id: I787ab17be27f45107340119017e528c58a3daad3 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4678 Reviewed-by: Henry Robinson <henry@cloudera.com> Tested-by: jenkins	2014-10-06 15:16:56 -07:00
Victor Bittorf	b10743a89f	Hive Server 2 CHAR(N) patch Adds support for CHAR(N) to hive server 2. Change-Id: I125e58ccab247e76b1a22b54daceee67a122e852 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4433 Tested-by: jenkins Reviewed-by: Lenni Kuff <lskuff@cloudera.com>	2014-09-23 07:26:45 -07:00
Henry Robinson	6bc411c890	Add support for HS2 protocol V6 This patch adds support for V6 of the HS2 protocol, which notably includes columnar organisation of result sets. Clients that set their protocol version to < V6 will receive result sets in the traditional row orientation. The performance of fetches over HS2 goes up significantly as a result, since the V1 protocol had some pathologies in its deserialisation performance. Beeswax Row materialisation: 455ms, client processing time: 523ms HS2 V6: Row materialisation: 444ms, client processing time: 1.8s HS2 V1: Row materialisation: 585ms, client processing time: 15.9s (!) TODO: Add support for the CHAR datatype The following patch is also included: Fix wait-for-hiveserver2.py when Impala moves to HS2 V6 Due to HIVE-6050, older versions of Hive are not compatible with newer clients (even those that try to use old protocol versions). wait-for-hiveserver2.py uses HS2 to talk to the HiveServer2 service, but picks up the newer version from V6, and fails. This patch temporarily re-adds cli_service.thrift (renaming the Thrift service as LegacyTCLIService) only for wait-for-hiveserver2.py to use. As soon as Impala's thirdparty Hive moves to HS2 V6, we can get rid of this change. Change-Id: I2cbe884345ae7e772620b80a29b6574bd6532940 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/4402 Tested-by: jenkins Reviewed-by: Henry Robinson <henry@cloudera.com>	2014-09-18 20:17:18 -07:00
Nong Li	8994f388db	Update hs2 client API to the version hive uses in 5.1. The driving motivation is to be able to return precision/scale for decimal types. Change-Id: I1b49c5a61b59a292bc09612c7945191078a22bf8 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/3772 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins	2014-08-05 16:09:39 -07:00
Nong Li	4b883ac7eb	Fix decimal bugs. Fix overflow handling in a few cases and add decimal as hs2 type. Change-Id: Ifde1988365f6be961e7eb7404ed37d7bbaab875c Reviewed-on: http://gerrit.ent.cloudera.com:8080/2531 Reviewed-by: Nong Li <nong@cloudera.com> Tested-by: jenkins Reviewed-on: http://gerrit.ent.cloudera.com:8080/2564	2014-05-16 00:17:38 -07:00

23 Commits