# Connecting to a Database Biofilter needs a database to run any report. You have two options: - **Option A** — connect to a database that already exists (someone else manages it). - **Option B** — bootstrap a new local database, then run the ETL to populate it. Pick the one that matches your situation. --- ## Option A — Connect to an existing database Use this when you have a connection string from a colleague, a shared lab instance, or a managed deployment. ### What you need A connection URL in SQLAlchemy format: ``` postgresql+psycopg2://:@:/ ``` Example: `postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod` ### Setting it You can configure the connection in two ways. Pick whichever feels cleaner. **Via configuration file** (persistent across runs): ```bash biofilter config init --path . biofilter config set database.db_uri "postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod" ``` **Via environment variable** (preferred in containers, CI, or short-lived shells): ```bash export DATABASE_URL="postgresql+psycopg2://bioadmin:secret@db.example.com:5432/biofilter_prod" ``` ### Verify the connection Show the resolved configuration: ```bash biofilter db ping ``` Test that the database is actually reachable: ```bash biofilter db ping ``` If the ping succeeds, you'll see the engine, host, database name, and latency. You're done — skip to [Find a report that fits your need](finding_reports.md). --- ## Option B — Bootstrap a new local database Use this when you want to run BF4 fully on your own machine. Two engines are supported: | Engine | Best for | Notes | | -------------- | ---------------------------------------- | --------------------------------------- | | **SQLite** | Quick start, single user, light datasets | No setup, file-based | | **PostgreSQL** | Production, multi-user, full data | Recommended for variants and large ETLs | ### 1. Initialize configuration ```bash biofilter config init --path . ``` This creates a `.biofilter.toml` in the current directory. Set the database URI and the directory that will hold raw and processed ETL files: ```bash # SQLite (simplest) biofilter config set database.db_uri "sqlite:///./biofilter_dev.sqlite3" # OR PostgreSQL biofilter config set database.db_uri "postgresql+psycopg2://bioadmin:secret@localhost:5432/biofilter_dev" biofilter config set etl.data_root "./biofilter_data" ``` Validate: ```bash biofilter config show ``` ### 2. Create the schema Confirm the database is reachable, then apply the schema: ```bash biofilter db ping biofilter db migrate --target head biofilter db upgrade ``` The ping returns engine, host, database name, and latency. The first migration applies all schema changes. The upgrade loads seed data (entity groups, relationship types, source systems). ### 3. Run your first ETL This pulls and ingests data for a single source. Start with `hgnc` (small, fast, no dependencies): ```bash biofilter etl update --data-source hgnc biofilter etl status ``` `etl status` shows which data sources are loaded and when. From here you can add more sources (`gene_ncbi`, `reactome`, `mondo`, …) as needed. For the full ETL operations guide, see [ETL](../etl.md). --- ## Next step Now that you can talk to a database, [find a report](finding_reports.md) and [run it](running_reports.md).