Commit Graph

612 Commits

Author SHA1 Message Date
Henry Robinson
829c0dc948 Remove line break from version metric 2014-01-08 10:49:45 -08:00
Nong Li
ab709cb517 Use fvn hash for data sender and different seeds for each step of the execution. 2014-01-08 10:49:45 -08:00
Henry Robinson
cd9debb792 Typo in HdfsTableSink::GetHashTblKey 2014-01-08 10:49:44 -08:00
Alan Choi
b282175461 IMPA-213 Disable DN server check; disable all checks if impala cannot detech cdh version 2014-01-08 10:49:44 -08:00
Alex Behm
7ccb1b8194 IMPALA-229: The built-in function regexp_extract() return wrong results. 2014-01-08 10:49:43 -08:00
Nong Li
925223d437 Change query id for debug page urls to the same as all our other query id formats. 2014-01-08 10:49:42 -08:00
Nong Li
57aac373ae Add last refresh time metric. 2014-01-08 10:49:42 -08:00
Nong Li
2a4982ffd4 Remove <pre> tag for encoded profiles. 2014-01-08 10:49:41 -08:00
Nong Li
1f6481382e Fix parquet test setup. 2014-01-08 10:49:41 -08:00
Henry Robinson
189575f23f Version metric for statestored and impalad 2014-01-08 10:49:41 -08:00
Nong Li
0dcfbfafed Fix bugs in parquet scanner. 2014-01-08 10:49:39 -08:00
Nong Li
c998c30771 Compress encoded runtime profiles and persist them to log. 2014-01-08 10:49:39 -08:00
Henry Robinson
14d29aa579 Add plan and number of fragment instances to profile 2014-01-08 10:49:38 -08:00
Henry Robinson
1f9f656247 Throughput counters in data-stream sender 2014-01-08 10:49:36 -08:00
Henry Robinson
b72b711bdb IMPALA-211: Fix excessive logging in state-store subscriber 2014-01-08 10:49:36 -08:00
Lenni Kuff
d0c08eb8d6 IMPALA-215: DDL commands stay in the in flight query log 2014-01-08 10:49:35 -08:00
Nong Li
0891179bbf Add load factor to hash table counters. 2014-01-08 10:49:35 -08:00
Henry Robinson
1cc976819e Revert "IMPALA-206: Stop INSERT queries from always finishing in EXCEPTION state"
This reverts commit 40ea325b53d3154328686ea1152417b8abbcb2ac.
2014-01-08 10:49:35 -08:00
Nong Li
32ee207de4 Fix data errors test. 2014-01-08 10:49:34 -08:00
Alan Choi
612e1b22dc Fix impala-server.scan-ranges.num-missing-volume-id metrics
There's a bug in hdfs-scan-node.cc where we only increment this metrics once per query.
2014-01-08 10:49:33 -08:00
Henry Robinson
02183620d7 IMPALA-206: Stop INSERT queries from always finishing in EXCEPTION state 2014-01-08 10:49:33 -08:00
Alex Behm
1b2e8280d4 Fix NULL issues. 2014-01-08 10:49:32 -08:00
Nong Li
189f4313dc Fix disk-io-mgr-test. 2014-01-08 10:49:32 -08:00
Alan Choi
c419ae1891 Add 4.1 direct read configuration check
Impala detects the HDFS version by reading the Namenode web UI and run
the corresponding check.

On 4.1, Impala tries to check the datanode (server side) config by reading
the datanode web UI.
2014-01-08 10:49:31 -08:00
Nong Li
8c3287db82 Integrate io mgr and mem limits 2014-01-08 10:49:31 -08:00
Henry Robinson
bc63ac2461 Add destination plan node ID to data-stream sender profile 2014-01-08 10:49:30 -08:00
Alan Choi
5f9e26b4a8 Average Scanner Thread Concurrency is a new metrics in the profile that reports
the average number of active scanner thread (i.e. those that are not blocked by
IO).

In the hdfs-scan-node, whenever a thread is started, it will increment the
active_scanner_thread_counter_. When a scanner thread enter the
scan-range-context's GetRawBytes or GetBytes, the counter will be decremented.

A new sampling thread is created to sample the value of
active_scanner_thread_counter_ and compute the average.

A bucket couting of HdfsReadThreadConcurrent is also added.

The output of the hdfs-scan-node profile is also updated. Here's the new output
for hdfs-scan-node after running count(*) from tpch.lineitem.

      HDFS_SCAN_NODE (id=0):(10s254ms 99.75%)
        File Formats: TEXT/NONE:12
        Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:6/351.21M
(351208888) 1:6/402.65M (402653184)
         - AverageHdfsReadThreadConcurrency: 1.95
           - HdfsReadThreadConcurrencyCountPercentage=0: 0.00
           - HdfsReadThreadConcurrencyCountPercentage=1: 5.00
           - HdfsReadThreadConcurrencyCountPercentage=2: 95.00
           - HdfsReadThreadConcurrencyCountPercentage=3: 0.00
         - AverageScannerThreadConcurrency: 0.15
         - BytesRead: 718.94 MB
         - MemoryUsed: 0.00
         - NumDisksAccessed: 2
         - PerReadThreadRawHdfsThroughput: 36.75 MB/sec
         - RowsReturned: 6.00M (6001215)
         - RowsReturnedRate: 585.25 K/sec
         - ScanRangesComplete: 12
         - ScannerThreadsInvoluntaryContextSwitches: 168
         - ScannerThreadsTotalWallClockTime: 1m40s
           - DelimiterParseTime: 2s128ms
           - MaterializeTupleTime: 723.0us
           - ScannerThreadsSysTime: 10.0ms
           - ScannerThreadsUserTime: 2s090ms
         - ScannerThreadsVoluntaryContextSwitches: 99
         - TotalRawHdfsReadTime: 19s561ms
         - TotalReadThroughput: 68.69 MB/sec
2014-01-08 10:49:30 -08:00
Marcel Kornacker
d7e22f44bb Partitioned hash joins
- added PlanNode.numNodes, PlanNode.avgRowSize and PlanNode.computeStats()
- fixing up some cardinality estimates
- Planner now tries to do a cost-based decision between broadcast join and join with full repartitioning (both inputs)
- ExchangeNode now distinguishes between its input and output row descriptor: the output potentially contains more tuples
- fixed problem related to cancellation and concurrent hash table builds.

Not included:
- partitioned joins that take advantage of existing partitions of the inputs; those will have to wait for a follow-on change
2014-01-08 10:49:29 -08:00
Henry Robinson
f25c60884e Fix various counter issues 2014-01-08 10:49:29 -08:00
Henry Robinson
5f7084b4d2 IMP-837: Make JSON for StatMetrics valid 2014-01-08 10:49:28 -08:00
Henry Robinson
eb385311cb IMPALA-201: State-store subscriber should use client cache 2014-01-08 10:49:28 -08:00
Nong Li
1fcfb72bc4 IMPALA-145: Fix order by limit 0 crash. 2014-01-08 10:49:27 -08:00
Nong Li
6f4b30a66b Fix non-deterministic error reporting (DataErrorsTest) 2014-01-08 10:49:26 -08:00
Nong Li
05739af43d Added /inflight_query_ids to debug webpage. 2014-01-08 10:49:26 -08:00
Henry Robinson
554b6b2cb7 Stop subscriber from delaying further heartbeats when in recovery mode 2014-01-08 10:49:25 -08:00
Nong Li
ebab23841a Add back pressure in hdfs-scan-node to prevent excessive buffer queueing. 2014-01-08 10:49:25 -08:00
Skye Wanderman-Milne
a7e15b1417 Update Parquet scanner to only scan a file if assigned the first split.
Also re-enable Parquet tests.
2014-01-08 10:49:25 -08:00
Alan Choi
4d8d58f35b IMP-834 add backend addr to profile 2014-01-08 10:49:24 -08:00
Henry Robinson
f144b90650 IMPALA-200: Sort /backends by name 2014-01-08 10:49:24 -08:00
Henry Robinson
2ae20cbbb7 Statestore-2.0: New state-store implementation
* API simplified to deal only with 'topics', not services and objects
* Scalability improved: heartbeat loop is now multi-threaded
* State-store can store arbitrary objects
* State-store may send either deltas or complete topic state (delta computation to come)
2014-01-08 10:49:23 -08:00
Nong Li
0b5b6c5667 Add snappy compression to parquet 2014-01-08 10:49:23 -08:00
Henry Robinson
e7488e25e6 Build hash-node build tables in parallel 2014-01-08 10:49:23 -08:00
Henry Robinson
9a0e40958c DataStreamRecvr counters to measure amount of time senders are blocked 2014-01-08 10:49:22 -08:00
Henry Robinson
1f12e8a85c IMPALA-169: Update DataStreamSender to get and release client from cache for each RPC.
Also introduces scoped ClientConnection to avoid leaking clients.
2014-01-08 10:49:22 -08:00
Alex Behm
673d7b97cf IMPALA-190: Insert with NULL partition keys results in SIGSEGV. 2014-01-08 10:49:22 -08:00
Alan Choi
80168d1162 IMPALA-175 fix unregister query 2014-01-08 10:49:21 -08:00
Alan Choi
4a503a4e35 IMP-808 construct runtime state in fe-support to eval now() 2014-01-08 10:49:20 -08:00
Nong Li
10af855e04 Fix uninitialized variable in io mgr. 2014-01-08 10:49:20 -08:00
Nong Li
b3cdcc6290 Fix includes in hdfs-text-scanner.cc 2014-01-08 10:49:19 -08:00
Nong Li
20fc700002 Fix precision issue in text table writer. 2014-01-08 10:49:19 -08:00