docs/content/code-security/codeql-cli/codeql-cli-manual/database-create.md

---
title: database create
versions: # DO NOT MANUALLY EDIT. CHANGES WILL BE OVERWRITTEN BY A 🤖
  fpt: '*'
  ghec: '*'
  ghes: '*'
topics:
  - Advanced Security
  - Code scanning
  - CodeQL
type: reference
product: '{% data reusables.gated-features.codeql %}'
autogenerated: codeql-cli
intro: |-
  Create a CodeQL database for a source tree that can be analyzed using
  one of the CodeQL products.
redirect_from:
  - /code-security/codeql-cli/manual/database-create
---


<!-- Content after this section is automatically generated -->

{% data reusables.codeql-cli.man-pages-version-note %}

## Synopsis

```shell copy
codeql database create [--language=<lang>[,<lang>...]] [--github-auth-stdin] [--github-url=<url>] [--source-root=<dir>] [--threads=<num>] [--ram=<MB>] [--command=<command>] [--extractor-option=<extractor-option-name=value>] <options>... -- <database>
```

## Description

Create a CodeQL database for a source tree that can be analyzed using
one of the CodeQL products.

## Options

### Primary Options

#### `<database>`

\[Mandatory] Path to the CodeQL database to create. This directory will
be created, and _must not_ already exist (but its parent must).

If the `--db-cluster` option is given, this will not be a database
itself, but a directory that will _contain_ databases for several
languages built from the same source root.

It is important that this directory is not in a location that the build
process will interfere with. For instance, the `target` directory of a
Maven project would not be a suitable choice.

#### `--[no-]overwrite`

\[Advanced] If the database already exists, delete it and proceed with
this command instead of failing. If the directory exists, but it does
not look like a database, an error will be thrown.

#### `--[no-]force-overwrite`

\[Advanced] If the database already exists, delete it even if it does
not look like a database and proceed with this command instead of
failing. This option should be used with caution as it may recursively
delete the entire database directory.

#### `--codescanning-config=<file>`

\[Advanced] Read a Code Scanning configuration file specifying options
on how to create the CodeQL databases and what queries to run in later
steps. For more details on the format of this configuration file, refer
to [AUTOTITLE](/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning). To run queries from
this file in a later step, invoke [codeql database analyze](/code-security/codeql-cli/codeql-cli-manual/database-analyze) without any other queries specified.

#### `--[no-]db-cluster`

Instead of creating a single database, create a "cluster" of databases
for different languages, each of which is a subdirectory of the
directory given on the command line.

#### `-l, --language=<lang>[,<lang>...]`

The language that the new database will be used to analyze.

Use [codeql resolve languages](/code-security/codeql-cli/codeql-cli-manual/resolve-languages) to get a list of the pluggable language extractors found on the search path.

When the `--db-cluster` option is given, this can appear multiple times,
or the value can be a comma-separated list of languages.

If this option is omitted, and the source root being analysed is a
checkout of a GitHub repository, the CodeQL CLI will make a call to the
GitHub API to attempt to automatically determine what languages to
analyse. Note that to be able to do this, a GitHub PAT token must be
supplied either in the environment variable GITHUB\_TOKEN or via standard
input using the `--github-auth-stdin` option.

#### `--build-mode=<mode>`

The build mode that will be used to create the database.

Choose your build mode based on the language you are analyzing:

`none`: The database will be created without building the source root.
Available for C#, Java, JavaScript/TypeScript, Python, and Ruby.

`autobuild`: The database will be created by attempting to automatically
build the source root. Available for C/C++, C#, Go, Java/Kotlin, and
Swift.

`manual`: The database will be created by building the source root using
a manually specified build command. Available for C/C++, C#, Go,
Java/Kotlin, and Swift.

When creating a database with `--command`, there is no need to
additionally specify '--build-mode manual'.

Available since `v2.16.4`.

#### `-s, --source-root=<dir>`

\[Default: .] The root source code directory. In many cases, this will
be the checkout root. Files within it are considered to be the primary
source files for this database. In some output formats, files will be
referred to by their relative path from this directory.

#### `-j, --threads=<num>`

Use this many threads for the import operation, and pass it as a hint to
any invoked build commands.

Defaults to 1. You can pass 0 to use one thread per core on the machine,
or -_N_ to leave _N_ cores unused (except still use at least one
thread).

#### `-M, --ram=<MB>`

Use this much memory for the import operation, and pass it as a hint to
any invoked build commands.

#### `-c, --command=<command>`

For compiled languages, build commands that will cause the compiler to
be invoked on the source code to analyze. These commands will be
executed under an instrumentation environment that allows analysis of
generated code and (in some cases) standard libraries.

If no build command is specified, the command attempts to figure out
automatically how to build the source tree, based on heuristics from the
selected language pack.

Beware that some combinations of multiple languages _require_ an
explicit build command to be specified.

#### `--no-cleanup`

\[Advanced] Suppress all database cleanup after finalization. Useful
for debugging purposes.

#### `--no-pre-finalize`

\[Advanced] Skip any pre-finalize script specified by the active CodeQL
extractor.

#### `--[no-]skip-empty`

\[Advanced] Output a warning instead of failing if a database is empty
because no source code was seen during the build. The empty database
will be left unfinalized.

#### `--[no-]linkage-aware-import`

\[Advanced] Controls whether [codeql dataset import](/code-security/codeql-cli/codeql-cli-manual/dataset-import) is linkage-aware _(default)_ or not. On projects where this part of database creation
consumes too much memory, disabling this option may help them progress
at the expense of database completeness.

Available since `v2.15.3`.

### Baseline calculation options

#### `--[no-]calculate-baseline`

\[Advanced] Calculate baseline information about the code being
analyzed and add it to the database. By default, this is enabled unless
the source root is the root of a filesystem. This flag can be used to
either disable, or force the behavior to be enabled even in the root of
the filesystem.

#### `--[no-]sublanguage-file-coverage`

\[GitHub.com and GitHub Enterprise Server v3.12.0+ only] Use
sub-language file coverage information. This calculates, displays, and
exports separate file coverage information for languages which share a
CodeQL extractor like C and C++, Java and Kotlin, and JavaScript and
TypeScript.

Available since `v2.15.2`.

### Extractor selection options

#### `--search-path=<dir>[:<dir>...]`

A list of directories under which extractor packs may be found. The
directories can either be the extractor packs themselves or directories
that contain extractors as immediate subdirectories.

If the path contains multiple directory trees, their order defines
precedence between them: if the target language is matched in more than
one of the directory trees, the one given first wins.

The extractors bundled with the CodeQL toolchain itself will always be
found, but if you need to use separately distributed extractors you need
to give this option (or, better yet, set up `--search-path` in a
per-user configuration file).

(Note: On Windows the path separator is `;`).

### Options to configure how to call the GitHub API to auto-detect languages.

#### `-a, --github-auth-stdin`

Accept a GitHub Apps token or personal access token via standard input.

This overrides the GITHUB\_TOKEN environment variable.

#### `-g, --github-url=<url>`

URL of the GitHub instance to use. If omitted, the CLI will attempt to
autodetect this from the checkout path and if this is not possible
default to <https://github.com/>

### Options to configure the package manager.

#### `--registries-auth-stdin`

Authenticate to GitHub Enterprise Server Container registries by passing
a comma-separated list of \<registry\_url>=\<token> pairs.

For example, you can pass
`https://containers.GHEHOSTNAME1/v2/=TOKEN1,https://containers.GHEHOSTNAME2/v2/=TOKEN2`
to authenticate to two GitHub Enterprise Server instances.

This overrides the CODEQL\_REGISTRIES\_AUTH and GITHUB\_TOKEN environment
variables. If you only need to authenticate to the github.com Container
registry, you can instead authenticate using the simpler
`--github-auth-stdin` option.

### Low-level dataset cleanup options

#### `--max-disk-cache=<MB>`

Set the maximum amount of space that the disk cache for intermediate
query results can use.

If this size is not configured explicitly, the evaluator will try to use
a "reasonable" amount of cache space, based on the size of the dataset
and the complexity of the queries. Explicitly setting a higher limit
than this default usage will enable additional caching which can speed
up later queries.

#### `--min-disk-free=<MB>`

\[Advanced] Set target amount of free space on file system.

If `--max-disk-cache` is not given, the evaluator will try hard to
curtail disk cache usage if the free space on the file system drops
below this value.

#### `--min-disk-free-pct=<pct>`

\[Advanced] Set target fraction of free space on file system.

If `--max-disk-cache` is not given, the evaluator will try hard to
curtail disk cache usage if the free space on the file system drops
below this percentage.

#### `--cache-cleanup=<mode>`

Select how aggressively to trim the cache. Choices include:

`clear`: Remove the entire cache, trimming down to the state of a
freshly extracted dataset

`trim` _(default)_: Trim everything except explicitly "cached"
predicates.

`fit`: Simply make sure the defined size limits for the disk cache are
observed, deleting as many intermediates as necessary.

#### `--cleanup-upgrade-backups`

Delete any backup directories resulting from database upgrades.

### Tracing options

#### `--no-tracing`

\[Advanced] Do not trace the specified command, instead rely on it to
produce all necessary data directly.

#### `--extra-tracing-config=<tracing-config.lua>`

\[Advanced] The path to a tracer configuration file. It may be used to
modify the behaviour of the build tracer. It may be used to pick out
compiler processes that run as part of the build command, and trigger
the execution of other tools. The extractors will provide default tracer
configuration files that should work in most situations.

### Build command customization options

#### `--working-dir=<dir>`

\[Advanced] The directory in which the specified command should be
executed. If this argument is not provided, the command is executed in
the value of `--source-root` passed to [codeql database create](/code-security/codeql-cli/codeql-cli-manual/database-create), if one exists. If no `--source-root` argument is provided, the command is executed in the
current working directory.

#### `--no-run-unnecessary-builds`

\[Advanced] Only run the specified build command(s) if a database under
construction uses an extractor that depends on tracing a build process.
If this option is not given, the command will be executed even when
CodeQL doesn't need it, on the assumption that you need its side
effects for other reasons.

### Options to control extractor behavior

#### `-O, --extractor-option=<extractor-option-name=value>`

Set options for CodeQL extractors. `extractor-option-name` should be of
the form extractor\_name.group1.group2.option\_name or
group1.group2.option\_name. If `extractor_option_name` starts with an
extractor name, the indicated extractor must declare the option
group1.group2.option\_name. Otherwise, any extractor that declares the
option group1.group2.option\_name will have the option set. `value` can
be any string that does not contain a newline.

You can use this command-line option repeatedly to set multiple
extractor options. If you provide multiple values for the same extractor
option, the behaviour depends on the type that the extractor option
expects. String options will use the last value provided. Array options
will use all the values provided, in order. Extractor options specified
using this command-line option are processed after extractor options
given via `--extractor-options-file`.

When passed to [codeql database init](/code-security/codeql-cli/codeql-cli-manual/database-init) or `codeql database begin-tracing`, the options will only be
applied to the indirect tracing environment. If your workflow also makes
calls to
[codeql database trace-command](/code-security/codeql-cli/codeql-cli-manual/database-trace-command) then the options also need to be passed there if desired.

See <https://codeql.github.com/docs/codeql-cli/extractor-options> for
more information on CodeQL extractor options, including how to list the
options declared by each extractor.

#### `--extractor-options-file=<extractor-options-bundle-file>`

Specify extractor option bundle files. An extractor option bundle file
is a JSON file (extension `.json`) or YAML file (extension `.yaml` or
`.yml`) that sets extractor options. The file must have the top-level
map key 'extractor' and, under it, extractor names as second-level map
keys. Further levels of maps represent nested extractor groups, and
string and array options are map entries with string and array values.

Extractor option bundle files are read in the order they are specified.
If different extractor option bundle files specify the same extractor
option, the behaviour depends on the type that the extractor option
expects. String options will use the last value provided. Array options
will use all the values provided, in order. Extractor options specified
using this command-line option are processed before extractor options
given via `--extractor-option`.

When passed to [codeql database init](/code-security/codeql-cli/codeql-cli-manual/database-init) or `codeql database begin-tracing`, the options will only be
applied to the indirect tracing environment. If your workflow also makes
calls to
[codeql database trace-command](/code-security/codeql-cli/codeql-cli-manual/database-trace-command) then the options also need to be passed there if desired.

See <https://codeql.github.com/docs/codeql-cli/extractor-options> for
more information on CodeQL extractor options, including how to list the
options declared by each extractor.

### Common options

#### `-h, --help`

Show this help text.

#### `-J=<opt>`

\[Advanced] Give option to the JVM running the command.

(Beware that options containing spaces will not be handled correctly.)

#### `-v, --verbose`

Incrementally increase the number of progress messages printed.

#### `-q, --quiet`

Incrementally decrease the number of progress messages printed.

#### `--verbosity=<level>`

\[Advanced] Explicitly set the verbosity level to one of errors,
warnings, progress, progress+, progress++, progress+++. Overrides `-v`
and `-q`.

#### `--logdir=<dir>`

\[Advanced] Write detailed logs to one or more files in the given
directory, with generated names that include timestamps and the name of
the running subcommand.

(To write a log file with a name you have full control over, instead
give `--log-to-stderr` and redirect stderr as desired.)

#### `--common-caches=<dir>`

\[Advanced] Controls the location of cached data on disk that will
persist between several runs of the CLI, such as downloaded QL packs and
compiled query plans. If not set explicitly, this defaults to a
directory named `.codeql` in the user's home directory; it will be
created if it doesn't already exist.

Available since `v2.15.2`.