Connecting to a Database¶

Biofilter needs a database to run any report. You have two options:

Option A — connect to a database that already exists (someone else manages it).
Option B — bootstrap a new local database, then run the ETL to populate it.

Pick the one that matches your situation.

Option A — Connect to an existing database¶

Use this when you have a connection string from a colleague, a shared lab instance, or a managed deployment.

What you need¶

A connection URL in SQLAlchemy format:

postgresql+psycopg2://<user>:<password>@<host>:<port>/<database>

Example: postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod

Setting it¶

You can configure the connection in two ways. Pick whichever feels cleaner.

Via configuration file (persistent across runs):

biofilter config init --path .
biofilter config set database.db_uri "postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod"

Via environment variable (preferred in containers, CI, or short-lived shells):

export DATABASE_URL="postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod"

Verify the connection¶

Show the resolved configuration:

biofilter db ping

Test that the database is actually reachable:

biofilter db ping

If the ping succeeds, you’ll see the engine, host, database name, and latency. You’re done — skip to Find a report that fits your need.

Option B — Bootstrap a new local database¶

Use this when you want to run BF4 fully on your own machine. Two engines are supported:

Engine	Best for	Notes
SQLite	Quick start, single user, light datasets	No setup, file-based
PostgreSQL	Production, multi-user, full data	Recommended for variants and large ETLs

1. Initialize configuration¶

biofilter config init --path .

This creates a .biofilter.toml in the current directory. Set the database URI and the directory that will hold raw and processed ETL files:

# SQLite (simplest)
biofilter config set database.db_uri "sqlite:///./biofilter_dev.sqlite3"

# OR PostgreSQL
biofilter config set database.db_uri "postgresql+psycopg2://bioadmin:secret@localhost:5432/biofilter_dev"

biofilter config set etl.data_root "./biofilter_data"

Validate:

biofilter config show

2. Create the schema¶

Confirm the database is reachable, then apply the schema:

biofilter db ping
biofilter db migrate --target head
biofilter db upgrade

The ping returns engine, host, database name, and latency. The first migration applies all schema changes. The upgrade loads seed data (entity groups, relationship types, source systems).

3. Run your first ETL¶

This pulls and ingests data for a single source. Start with hgnc (small, fast, no dependencies):

biofilter etl update --data-source hgnc
biofilter etl status

etl status shows which data sources are loaded and when. From here you can add more sources (gene_ncbi, reactome, mondo, …) as needed.

For the full ETL operations guide, see ETL.

Next step¶

Now that you can talk to a database, find a report and run it.