Commit Graph

6 Commits

Author SHA1 Message Date
Tim Armstrong
0589b86481 Changes to allow running stress test against MiniCluster
Miscellaneous fixes to allow running the binary mem_limit search against
a local mini cluster of varying size.

Change-Id: Ic87f8e6eeae97791c9e3d69355aac45d366a1882
Reviewed-on: http://gerrit.cloudera.org:8080/2209
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins
2016-02-19 01:30:11 +00:00
Casey Ching
f288867833 Stress test: Various changes
The major changes are:

1) Collect backtrace and fatal log on crash.
2) Poll memory usage. The data is only displayed at this time.
3) Support kerberos.
4) Add random queries.
5) Generate random and TPC-H nested data on a remote cluster. The
   random data generator was converted to use MR for scaling.
6) Add a cluster abstraction to run data loading for #5 on a
   remote or local cluster. This also moves and consolidates some
   Cloudera Manager utilities that were in the stress test.
7) Cleanup the wrappers around impyla. That stuff was getting
   messy.

Change-Id: I4e4b72dbee1c867626a0b22291dd6462819e35d7
Reviewed-on: http://gerrit.cloudera.org:8080/1298
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2016-01-20 23:00:25 +00:00
Szehon Ho
5787dc2cf3 First commit to run the random query generator on Hive.
With this change, random query generator can run continuously on Hive
and approximately half of its generated queries are able to run.

1. Connect timeout from Impyla to HS2 was too small,
increasing it to match Impala's.
2. Query timeout to wait for Hive queries was too short,
making it configurable so we can play with different values.
3. Hive does not support 'with' clause in subquery,
but interestingly supports it at the top-level.
Added a profile flag "use_nested_with" to disable nested with's.
4. Hive does not support 'having' without 'group by'.
Added a profile flag "use_having_without_groupby" to always
generate a group by with having.
5. Hive does not support "interval" keyword for timestamp.
Added a profile 'restrict' list to restrict certain functions,
and added 'dateAdd' to this list for Hive.
6. Hive 'greatest' and 'least' UDF's do not do implicit type casting
like other databases.  Modified the query-generator to only choose args of
the same type for these, and for HiveSqlWriter to add a cast as there
were still some lingering issues like udf's on int returning bigint.
7. Hive always orders the Nulls first in ORDER BY ASC,
opposite to other databases,
and does not have any 'NULLS FIRST' or 'NULLS LAST' option.
Thus the only workaround is to add a "nulls_order_asc" flag
to the profile, and pass it in to the ref database's SqlWriter
to generate the 'NULLS FIRST' or 'NULLS LAST' statement on that end.
8. Hive strangely does not support multiple sort keys in a window
without frame specification.  The workaround is for HiveSqlWriter
to add 'rows unbounded preceding' to specify the default frame if
there are no existing frames.

Change-Id: I2a5b07e37378f695de1b50af49845283468b4f0f
Reviewed-on: http://gerrit.cloudera.org:8080/619
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-08-21 08:19:04 +00:00
Casey Ching
a4fe24c1b2 Python: Add more logging and CM options to common CLI parser
Example output of --help:

Options:
  --debug-log-file=DEBUG_LOG_FILE
                        Path to debug log file. [default:
                        /tmp/concurrent_select.py.log]
  --cm-host=host name   The host name of the CM server.
  --cm-port=port number
                        The port of the CM server. [default: 7180]
  --cm-user=user name   The name of the CM user. [default: admin]
  --cm-password=password
                        The password for the CM user. [default: admin]
  --cm-cluster-name=name
                        If CM manages multiple clusters, use this to
                        specify which cluster to use.

Change-Id: I614383f4a65e700348572204e3d8fd5670f5bcf7
Reviewed-on: http://gerrit.cloudera.org:8080/472
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Casey Ching <casey@cloudera.com>
2015-08-15 23:10:10 +00:00
Szehon Ho
44151730db This adds the following flags to data_generator to populate data into Hive.
('--use-hive', action='store_true', default=False,
    help='Use Hive')
('--hive-host', default='localhost',
    help="The name of the host running the HS2")
('--hive-port', default=10000, type=int,
    help="The hs2 port of the host")
('--hive-user', default='hive',
    help="The user name to use when connecting to HiveServer2")
('--hive-password', default='hive',
    help="The password to use when connecting to HiveServer2")
('--hdfs-host',
    help='The host of HDFS backing Hive tables, necessary for external HiveServer2')
('--hdfs-port',
    help='The port of HDFS backing Hive tables, necessary for external HiveServer2')
These configurations allow it to talk to an external HiveServer2, so that it can be used as a standalone tool running against a Hive cluster in Hive automated

The Hive connection is backed by Impyla.

Impyala has been fixed to work with Hive on the latest patch:
a1053ce73e

Change-Id: I29b5c8937babf711f8c93ceb3c91fb75cd91d8eb
Reviewed-on: http://gerrit.cloudera.org:8080/553
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
2015-07-24 23:43:30 +00:00
casey
b013495e1d Misc updates to the query generator (part 1 of 2)
Summary of changes:

  1) Simplified type system. The old system was overly complicated for
     the task of query generation. The modeling of types used to mirror
     the types used in Impala. For simplicity, new system only uses a
     subset of types, Boolean, Char, Decimal, Float, Int, and Timestamp.

  2) Functions now have fully typed signatures. Previously you had to
     know which functions accepted which inputs, now arbitrary
     permutations of functions can be generated. The chance of being
     able to add a new function without needing to change the query
     generation logic is much higher now.

  3) Query generation profiles. The randomness of the previous version
     was hardcoded in various places in throughout the query generator.
     Now there is a profile to determine which SQL features should be
     used. There is still a lot of room for improvement in terms of
     intuitiveness and documentation for configuring the profiles.

  4) Greater diversity of queries. Besides the function permutations,
     various restrictions to simplify query generation have been
     removed. Also constants are used in queries.

  5) Eliminate spinning and infinite loops. Also the old version would
     sometimes "hope" that a generated SQL element would be compatible
     with the context and if not, it would try again which would lead
     to noticeable spinning and/or infinite loops.

  6) Catchup with Impala 2.0 features: subqueries, analytics, and
     Char/VarChar.

Change-Id: Ia25f4e85d6a06f7958a906aa42d9f90d63675bc0
Reviewed-on: http://gerrit.sjc.cloudera.com:8080/5640
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: jenkins
2014-12-19 03:30:44 -08:00