impala

mirror of https://github.com/apache/impala.git synced 2026-01-06 06:01:03 -05:00

Author	SHA1	Message	Date
Nong Li	1f6481382e	Fix parquet test setup.	2014-01-08 10:49:41 -08:00
Henry Robinson	14d29aa579	Add plan and number of fragment instances to profile	2014-01-08 10:49:38 -08:00
Alex Behm	1b2e8280d4	Fix NULL issues.	2014-01-08 10:49:32 -08:00
Lenni Kuff	e218721386	IMPALA-198: Support setting file format, table comment in CREATE TABLE LIKE statements	2014-01-08 10:49:31 -08:00
Alan Choi	5f9e26b4a8	Average Scanner Thread Concurrency is a new metrics in the profile that reports the average number of active scanner thread (i.e. those that are not blocked by IO). In the hdfs-scan-node, whenever a thread is started, it will increment the active_scanner_thread_counter_. When a scanner thread enter the scan-range-context's GetRawBytes or GetBytes, the counter will be decremented. A new sampling thread is created to sample the value of active_scanner_thread_counter_ and compute the average. A bucket couting of HdfsReadThreadConcurrent is also added. The output of the hdfs-scan-node profile is also updated. Here's the new output for hdfs-scan-node after running count(*) from tpch.lineitem. HDFS_SCAN_NODE (id=0):(10s254ms 99.75%) File Formats: TEXT/NONE:12 Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:6/351.21M (351208888) 1:6/402.65M (402653184) - AverageHdfsReadThreadConcurrency: 1.95 - HdfsReadThreadConcurrencyCountPercentage=0: 0.00 - HdfsReadThreadConcurrencyCountPercentage=1: 5.00 - HdfsReadThreadConcurrencyCountPercentage=2: 95.00 - HdfsReadThreadConcurrencyCountPercentage=3: 0.00 - AverageScannerThreadConcurrency: 0.15 - BytesRead: 718.94 MB - MemoryUsed: 0.00 - NumDisksAccessed: 2 - PerReadThreadRawHdfsThroughput: 36.75 MB/sec - RowsReturned: 6.00M (6001215) - RowsReturnedRate: 585.25 K/sec - ScanRangesComplete: 12 - ScannerThreadsInvoluntaryContextSwitches: 168 - ScannerThreadsTotalWallClockTime: 1m40s - DelimiterParseTime: 2s128ms - MaterializeTupleTime: 723.0us - ScannerThreadsSysTime: 10.0ms - ScannerThreadsUserTime: 2s090ms - ScannerThreadsVoluntaryContextSwitches: 99 - TotalRawHdfsReadTime: 19s561ms - TotalReadThroughput: 68.69 MB/sec	2014-01-08 10:49:30 -08:00
Marcel Kornacker	d7e22f44bb	Partitioned hash joins - added PlanNode.numNodes, PlanNode.avgRowSize and PlanNode.computeStats() - fixing up some cardinality estimates - Planner now tries to do a cost-based decision between broadcast join and join with full repartitioning (both inputs) - ExchangeNode now distinguishes between its input and output row descriptor: the output potentially contains more tuples - fixed problem related to cancellation and concurrent hash table builds. Not included: - partitioned joins that take advantage of existing partitions of the inputs; those will have to wait for a follow-on change	2014-01-08 10:49:29 -08:00
Henry Robinson	2ae20cbbb7	Statestore-2.0: New state-store implementation * API simplified to deal only with 'topics', not services and objects * Scalability improved: heartbeat loop is now multi-threaded * State-store can store arbitrary objects * State-store may send either deltas or complete topic state (delta computation to come)	2014-01-08 10:49:23 -08:00
Lenni Kuff	15f0313283	Add analysis checks for length of RowFormat strings, fix escaping of row format values	2014-01-08 10:49:21 -08:00
Alan Choi	afc6f83ba6	IMP-819 Pass file length to backend	2014-01-08 10:49:18 -08:00
Lenni Kuff	5a0b1270c4	Add support for ALTER ... PARTITION (partitionSpec) SET FILEFORMAT/LOCATION Adds support for: * ALTER TABLE <table> PARTITION (partitionSpec) SET FILEFORMAT * ALTER TABLE <table> PARTITION (partitionSpec) SET LOCATION This enables setting the location and fileformat of specific partitions.	2014-01-08 10:49:17 -08:00
Lenni Kuff	1fb72fbc73	IMPALA-156: Support core 'ALTER TABLE' DDL command This patch adds support for - ALTER TABLE ADD\|REPLACE COLUMNS - ALTER TABLE DROP COLUMN - ALTER TABLE ADD/DROP PARTITION - ALTER TABLE SET FILEFORMAT - ALTER TABLE SET LOCATION - ALTER TABLE RENAME	2014-01-08 10:49:14 -08:00
Skye Wanderman-Milne	c5afb11558	Compress serialized RowBatchs	2014-01-08 10:49:13 -08:00
Alan Choi	051c56073a	IMPALA-158: query options should be optional	2014-01-08 10:49:06 -08:00
Elliott Clark	0e0c02b6bd	Add the ability to Select into HBase table. * Changed frontend analysis for HBase tables * Changed Thrift messages to allow HBase as a sink type. * JNI Wrapper around htable * Create hbase-table-sink * Create hbase-table-writer * Static init lots of JNI related code for HBase. * Cleaned up some cpplint issues. * Changed junit analysis tests * Create a new HBase test table. * Added functional tests for HBase inserts.	2014-01-08 10:49:06 -08:00
Alan Choi	991db9001b	IMPALA-113 Raise error when default order by limit is exceeded	2014-01-08 10:49:03 -08:00
Marcel Kornacker	0c36c7f327	Partitioned merge aggregation.	2014-01-08 10:48:59 -08:00
Lenni Kuff	ca0d23a844	IMPALA-157: Support CREATE TABLE LIKE DDL	2014-01-08 10:48:55 -08:00
Alex Behm	be03e6c21c	IMPALA-138: Error messages for unknown column types are particularly bad.	2014-01-08 10:48:53 -08:00
Nong Li	6e293090e6	Parquet writer. Change-Id: I7117b545e3d3a7803a219234ad992040a6c7c4ec	2014-01-08 10:48:44 -08:00
Lenni Kuff	0bcb54fcf8	Add GetRuntimeProfile RPC and enable printing runtime profile from impala-shell	2014-01-08 10:48:44 -08:00
Marcel Kornacker	d7bfe6c68d	IMPALA-144: partition pruning for arbitrary predicates that are fully bound by partition columns This makes partition pruning more effective by extending it to predicates that are fully bound by the partition column, e.g., '<col> IN (1, 2, 3)' will also be used to prune partitions, in addition to equality and binary comparisons.	2014-01-08 10:48:41 -08:00
Lenni Kuff	d57440e87d	Allow column comments for CREATE TABLE and DESCRIBE <table> statements	2014-01-08 10:48:37 -08:00
Skye Wanderman-Milne	57c3072188	Add support for reading Avro files compressed using the deflate codec.	2014-01-08 10:48:36 -08:00
Lenni Kuff	9f71374875	IMPALA-102: Add support for CREATE TABLE ... PARTITIONED BY (col1, col2)	2014-01-08 10:48:35 -08:00
Henry Robinson	71e6d81d1b	IMP-261: Clean up network address handling	2014-01-08 10:48:33 -08:00
Marcel Kornacker	77f4fc8cf9	Adding memory limits - new class MemLimit - new query flag MEM_LIMIT - implementation of impalad flag mem_limit Still missing: - parsing a mem limit spec that contains "M/G", as in: 1.25G	2014-01-08 10:48:33 -08:00
Alan Choi	4b6ce8ecb3	This patch changes the clock to CLOCK_MONOTONIC. Rdtsc is not accurate, due to changes in cpu frequency. Very often, the time reported in the profile is even longer than the time reported by the shell. This patch replaces Rdtcs with CLOCK_MONOTONIC. It is as fast as Rdtsc and accurate. It is not affected by cpu frequency changes and it is not affected by user setting the system clock. Note that the new profile report will always report time, rather than in clock cycle. Here's the new profile: Averaged Fragment 1:(68.241ms 0.00%) completion times: min:69ms max:69ms mean: 69ms stddev:0 execution rates: min:91.60 KB/sec max:91.60 KB/sec mean:91.60 KB/sec stddev:0.00 /sec split sizes: min: 6.32 KB, max: 6.32 KB, avg: 6.32 KB, stddev: 0.00 - RowsProduced: 1 CodeGen: - CodegenTime: 566.104us <--* reporting in microsec instead of clock cycle - CompileTime: 33.202ms - LoadTime: 2.671ms - ModuleFileSize: 44.61 KB DataStreamSender: - BytesSent: 16.00 B - DataSinkTime: 50.719us - SerializeBatchTime: 18.365us - ThriftTransmitTime: 145.945us AGGREGATION_NODE (id=1):(68.384ms 15.50%) - BuildBuckets: 1.02K - BuildTime: 13.734us - GetResultsTime: 6.650us - MemoryUsed: 32.01 KB - RowsReturned: 1 - RowsReturnedRate: 14.00 /sec HDFS_SCAN_NODE (id=0):(57.808ms 84.71%) - BytesRead: 6.32 KB - DelimiterParseTime: 62.370us - MaterializeTupleTime: 767ns - MemoryUsed: 0.00 - PerDiskReadThroughput: 9.32 MB/sec - RowsReturned: 100 - RowsReturnedRate: 1.73 K/sec - ScanRangesComplete: 4 - ScannerThreadsInvoluntaryContextSwitches: 0 - ScannerThreadsReadTime: 662.431us - ScannerThreadsSysTime: 0 - ScannerThreadsTotalWallClockTime: 25ms - ScannerThreadsUserTime: 0 - ScannerThreadsVoluntaryContextSwitches: 4 - TotalReadThroughput: 0.00 /sec	2014-01-08 10:48:32 -08:00
Lenni Kuff	87d8f79efe	Add support for CREATE TABLE ... STORED AS PARQUETFILE	2014-01-08 10:48:32 -08:00
Lenni Kuff	1cd847c856	IMPALA-81: Add support for CREATE/DROP DATABASE/TABLE This adds Impala support for CREATE/DROP DATABASE/TABLE. With this change, Impala supports creating tables in the metastore stored as text, sequence, and rc file format. It currently only supports creating unpartitioned tables and tables stored in HDFS.	2014-01-08 10:48:30 -08:00
Marcel Kornacker	c02d25baa8	IMPALA-20: Limit clause in inline view not handled correctly by planner - this adds a SelectNode that evaluates conjuncts and enforces the limit - all limits are now distributed: enforced both by the child plan fragment and by the merging ExchangeNode - all limits w/ Order By are now distributed: enforced both by the child plan fragment and by the merging TopN node	2014-01-08 10:48:29 -08:00
Alan Choi	9c11c0ce2d	HiveServer2 clean up This patch adds 1. use boost uuid 2. add unit test for HiveServer2 metadata operation 3. add JDBC metadata unit test 4. implement all remaining HiveServer2: GetFunctions and GetTableTypes 5. remove in-process impala server from fe-support	2014-01-08 10:48:06 -08:00
Skye Wanderman-Milne	8b87099998	IMPALA-2: Support for Avro data files Adds HdfsAvroScanner, as well as modifies the sequence scanners to be more general.	2014-01-08 10:48:05 -08:00
Nong Li	868a99135a	Add network benchmark	2014-01-08 10:47:56 -08:00
Alan Choi	073de3e02e	Generate HiveServer2 files from thrift	2014-01-08 10:47:52 -08:00
Marcel Kornacker	63e3cd0279	Adding query option DEBUG_ACTION	2014-01-08 10:47:37 -08:00
Alan Choi	be98df19c8	HiveServer2 This patch implements the HiveServer2 API. We have tested it with Lenni's patch against the tpch workload. It has also been tested manually against Hive's beeline with queries and metadata operations. All of the HiveServer2 code is implemented in impala-hs2-server.cc. Beeswax code is refactored to impala-beeswax-server.cc. HiveServer2 has a few more metadata operations. These operations go through impala-hs2-server to ddl-executor and then to FE. The logics are implemented in fe/src/main/java/com/cloudera/impala/service/MetadataOp.java. Because of the Thrift union issue, I have to modify the generated c++ file. Therefore, all the HiveServer2 thrift generated c++ code are checked into be/src/service/hiveserver2/. Once the thrift issue is resolved, I'll remove these files. Change-Id: I9a8fe5a09bf250ddc43584249bdc87b6da5a5881	2014-01-08 10:47:24 -08:00
Henry Robinson	7ba437a52e	Code changes to build against thrift 0.9.0 in thirdparty/	2014-01-08 10:47:22 -08:00
Alan Choi	ff704ce586	IMP-690: impala-shell calls PingImpalaService thrift API to verify the connected server is an impalad.	2014-01-08 10:47:13 -08:00
Henry Robinson	b7f937577d	Add missing thrift file	2014-01-08 10:47:12 -08:00
Henry Robinson	986f3cddf6	Move sparrow/ to statestore/ and remove sparrow namespace	2014-01-08 10:47:12 -08:00
Skye Wanderman-Milne	982747c856	IMP-653: add CURRENT_TIMESTAMP() function as synonym for now()	2014-01-08 10:47:09 -08:00
Marcel Kornacker	bf56c21c1b	IMP-618 Adding DEFAULT_ORDER_BY_LIMIT query option. Also removing deprecated PARTITION_AGG query option.	2014-01-08 10:47:04 -08:00
Michael Ubell	8a5297a526	Add HdfsLzoTextScanner	2014-01-08 10:46:35 -08:00
Henry Robinson	2f339f2ed8	Add ASL license to all public files	2014-01-08 10:46:32 -08:00
ishaan	ccb020c4a0	Adding copyrights to remaining files.	2014-01-08 10:46:30 -08:00
ishaan	05c65789bb	Change Copyrights from 2011 ti 2012	2014-01-08 10:46:29 -08:00
Henry Robinson	dd0e9f1180	IMP-265: State-store subscriber recovery mode	2014-01-08 10:46:25 -08:00
Michael Ubell	c1852e2dcf	Add from_unixtime and unix_timestamp(string, string)	2014-01-08 10:46:22 -08:00
Marcel Kornacker	ea050a43ad	Switching over backend runtime structures to new planner. Added container-util.h	2014-01-08 10:46:20 -08:00
Nong Li	08968c1d07	Performance improvements for aggregation and hash join nodes with codegen.	2014-01-08 10:46:19 -08:00

1 2 3

133 Commits