The generated data is identical to the pregenerated tpch.tar.gz
and tpcds.tar.gz data that was used previously and were not
publically accessible.
This adds a "preload" hook to bin/load-data.py that can execute custom
logic for each data set. This is used to call the TPC-H and TPC-DS data
generation utilities that are already available in the Impala toolchain.
Testing:
Ran private test job with loading from snapshot disabled and without
the tpch/tpcds tarballs available.
Change-Id: Ieccfbd7d8d4a91bffddbe35abb7f5572e71a71cf
Reviewed-on: http://gerrit.cloudera.org:8080/3761
Reviewed-by: Tim Armstrong <tarmstrong@cloudera.com>
Tested-by: Internal Jenkins