CREATE DATABASE statement
Creates a new database.
In Impala, a database is both:
-
A logical construct for grouping together related tables, views, and functions within their own namespace.
You might use a separate database for each application, set of related tables, or round of experimentation.
-
A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and
data files are all located under this directory. You can perform HDFS-level operations such as backing it up and measuring space usage,
or remove it with a DROP DATABASE statement.
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name[COMMENT 'database_comment']
[LOCATION hdfs_path];
A database is physically represented as a directory in HDFS, with a filename extension .db,
under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you.
All databases and their associated directories are top-level objects, with no physical or logical nesting.
After creating a database, to make it the current database within an impala-shell session,
use the USE statement. You can refer to tables in the current database without prepending
any qualifier to their names.
When you first connect to Impala through impala-shell, the database you start in (before
issuing any CREATE DATABASE or USE statements) is named
default.
After creating a database, your impala-shell session or another
impala-shell connected to the same node can immediately access that database. To access
the database through the Impala daemon on a different node, issue the INVALIDATE METADATA
statement first while connected to that other node.
Setting the LOCATION attribute for a new database is a way to work with sets of files in an
HDFS directory structure outside the default Impala data directory, as opposed to setting the
LOCATION attribute for each individual table.
When you create a database in Impala, the database can also be used by Hive.
When you create a database in Hive, issue an INVALIDATE METADATA
statement in Impala to make Impala permanently aware of the new database.
The SHOW DATABASES statement lists all databases, or the databases whose name
matches a wildcard pattern. In and higher, the
SHOW DATABASES output includes a second column that displays the associated
comment, if any, for each database.
To specify that any tables created within a database reside on the Amazon S3 system,
you can include an s3a:// prefix on the LOCATION
attribute. In and higher, Impala automatically creates any
required folders as the databases, tables, and partitions are created, and removes
them when they are dropped.
The user ID that the impalad daemon runs under,
typically the impala user, must have write
permission for the parent HDFS directory under which the database
is located.
, ,
, ,