impala

mirror of https://github.com/apache/impala.git synced 2026-01-07 18:02:33 -05:00

Author	SHA1	Message	Date
Tim Armstrong	0589b86481	Changes to allow running stress test against MiniCluster Miscellaneous fixes to allow running the binary mem_limit search against a local mini cluster of varying size. Change-Id: Ic87f8e6eeae97791c9e3d69355aac45d366a1882 Reviewed-on: http://gerrit.cloudera.org:8080/2209 Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com> Tested-by: Internal Jenkins	2016-02-19 01:30:11 +00:00
Casey Ching	f288867833	Stress test: Various changes The major changes are: 1) Collect backtrace and fatal log on crash. 2) Poll memory usage. The data is only displayed at this time. 3) Support kerberos. 4) Add random queries. 5) Generate random and TPC-H nested data on a remote cluster. The random data generator was converted to use MR for scaling. 6) Add a cluster abstraction to run data loading for #5 on a remote or local cluster. This also moves and consolidates some Cloudera Manager utilities that were in the stress test. 7) Cleanup the wrappers around impyla. That stuff was getting messy. Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7 Reviewed-on: http://gerrit.cloudera.org:8080/1298 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2016-01-20 23:00:25 +00:00
Szehon Ho	5787dc2cf3	First commit to run the random query generator on Hive. With this change, random query generator can run continuously on Hive and approximately half of its generated queries are able to run. 1. Connect timeout from Impyla to HS2 was too small, increasing it to match Impala's. 2. Query timeout to wait for Hive queries was too short, making it configurable so we can play with different values. 3. Hive does not support 'with' clause in subquery, but interestingly supports it at the top-level. Added a profile flag "use_nested_with" to disable nested with's. 4. Hive does not support 'having' without 'group by'. Added a profile flag "use_having_without_groupby" to always generate a group by with having. 5. Hive does not support "interval" keyword for timestamp. Added a profile 'restrict' list to restrict certain functions, and added 'dateAdd' to this list for Hive. 6. Hive 'greatest' and 'least' UDF's do not do implicit type casting like other databases. Modified the query-generator to only choose args of the same type for these, and for HiveSqlWriter to add a cast as there were still some lingering issues like udf's on int returning bigint. 7. Hive always orders the Nulls first in ORDER BY ASC, opposite to other databases, and does not have any 'NULLS FIRST' or 'NULLS LAST' option. Thus the only workaround is to add a "nulls_order_asc" flag to the profile, and pass it in to the ref database's SqlWriter to generate the 'NULLS FIRST' or 'NULLS LAST' statement on that end. 8. Hive strangely does not support multiple sort keys in a window without frame specification. The workaround is for HiveSqlWriter to add 'rows unbounded preceding' to specify the default frame if there are no existing frames. Change-Id: I2a5b07e37378f695de1b50af49845283468b4f0f Reviewed-on: http://gerrit.cloudera.org:8080/619 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-08-21 08:19:04 +00:00
Casey Ching	a4fe24c1b2	Python: Add more logging and CM options to common CLI parser Example output of --help: Options: --debug-log-file=DEBUG_LOG_FILE Path to debug log file. [default: /tmp/concurrent_select.py.log] --cm-host=host name The host name of the CM server. --cm-port=port number The port of the CM server. [default: 7180] --cm-user=user name The name of the CM user. [default: admin] --cm-password=password The password for the CM user. [default: admin] --cm-cluster-name=name If CM manages multiple clusters, use this to specify which cluster to use. Change-Id: I614383f4a65e700348572204e3d8fd5670f5bcf7 Reviewed-on: http://gerrit.cloudera.org:8080/472 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Casey Ching <casey@cloudera.com>	2015-08-15 23:10:10 +00:00
Szehon Ho	44151730db	This adds the following flags to data_generator to populate data into Hive. ('--use-hive', action='store_true', default=False, help='Use Hive') ('--hive-host', default='localhost', help="The name of the host running the HS2") ('--hive-port', default=10000, type=int, help="The hs2 port of the host") ('--hive-user', default='hive', help="The user name to use when connecting to HiveServer2") ('--hive-password', default='hive', help="The password to use when connecting to HiveServer2") ('--hdfs-host', help='The host of HDFS backing Hive tables, necessary for external HiveServer2') ('--hdfs-port', help='The port of HDFS backing Hive tables, necessary for external HiveServer2') These configurations allow it to talk to an external HiveServer2, so that it can be used as a standalone tool running against a Hive cluster in Hive automated The Hive connection is backed by Impyla. Impyala has been fixed to work with Hive on the latest patch: `a1053ce73e` Change-Id: I29b5c8937babf711f8c93ceb3c91fb75cd91d8eb Reviewed-on: http://gerrit.cloudera.org:8080/553 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: Internal Jenkins	2015-07-24 23:43:30 +00:00
casey	b013495e1d	Misc updates to the query generator (part 1 of 2) Summary of changes: 1) Simplified type system. The old system was overly complicated for the task of query generation. The modeling of types used to mirror the types used in Impala. For simplicity, new system only uses a subset of types, Boolean, Char, Decimal, Float, Int, and Timestamp. 2) Functions now have fully typed signatures. Previously you had to know which functions accepted which inputs, now arbitrary permutations of functions can be generated. The chance of being able to add a new function without needing to change the query generation logic is much higher now. 3) Query generation profiles. The randomness of the previous version was hardcoded in various places in throughout the query generator. Now there is a profile to determine which SQL features should be used. There is still a lot of room for improvement in terms of intuitiveness and documentation for configuring the profiles. 4) Greater diversity of queries. Besides the function permutations, various restrictions to simplify query generation have been removed. Also constants are used in queries. 5) Eliminate spinning and infinite loops. Also the old version would sometimes "hope" that a generated SQL element would be compatible with the context and if not, it would try again which would lead to noticeable spinning and/or infinite loops. 6) Catchup with Impala 2.0 features: subqueries, analytics, and Char/VarChar. Change-Id: Ia25f4e85d6a06f7958a906aa42d9f90d63675bc0 Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5640 Reviewed-by: Casey Ching <casey@cloudera.com> Tested-by: jenkins	2014-12-19 03:30:44 -08:00

6 Commits