Database Topology
=================
.. verified:: 2025-11-12
:reviewer: Christof Buchbender
The ops-db-api uses a distributed PostgreSQL architecture with a single authoritative database and read-only replicas at remote sites.
.. contents:: Table of Contents
:local:
:depth: 2
Overview
--------
The database topology consists of:
* **Main Database**: Cologne, Germany (write target)
* **Replica Database(s)**: Observatory in Chile, potentially other sites (read-only)
* **Streaming Replication**: PostgreSQL WAL-based continuous sync
.. mermaid::
graph LR
Main[(Main Database
Cologne
Read + Write)]
Replica1[(Replica
Observatory
Read Only)]
Replica2[(Replica
Institute
Read Only)]
Main -.->|WAL Stream| Replica1
Main -.->|WAL Stream| Replica2
API_Main[API @ Main] -->|Writes| Main
API_Main -->|Reads| Main
API_Obs[API @ Observatory] -->|Buffered Writes| Main
API_Obs -->|Reads| Replica1
style Main fill:#90EE90
style Replica1 fill:#FFB6C1
style Replica2 fill:#FFB6C1
Main Database (Cologne)
-----------------------
Characteristics
~~~~~~~~~~~~~~~
**Role**: Single authoritative source of truth
**Location**: Cologne, Germany (data center)
**Access**:
* Direct write access (low latency)
* Direct read access (low latency)
* Public endpoint for remote API instances
**Configuration**:
.. code-block:: ini
# postgresql.conf
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
synchronous_commit = on
**Replication Settings**:
.. code-block:: ini
# Enable WAL archiving
archive_mode = on
archive_command = 'cp %p /archive/%f'
# Replication slots (one per replica)
max_replication_slots = 10
Schema Management
~~~~~~~~~~~~~~~~~
Database schema is managed by the ``ops-db`` package:
.. code-block:: bash
# Install ops-db package
pip install git+https://github.com/ccatobs/ops-db.git
# Models are imported from ccat_ops_db
from ccat_ops_db import models
**Key tables**:
* ``executed_obs_unit`` - Observatory operation records
* ``raw_data_package`` - Data file groupings
* ``raw_data_file`` - Individual file metadata
* ``observing_program`` - Planned observations
* ``source`` - Astronomical sources
* ``user`` - System users
* ``api_token`` - Authentication tokens
Write Operations
~~~~~~~~~~~~~~~~
All writes go through the main database:
.. code-block:: python
# At main site: Direct write
async with main_db_session() as session:
obs = models.ExecutedObsUnit(**data)
session.add(obs)
await session.commit()
# At secondary site: Buffered write
transaction = builder.create(
model_class=models.ExecutedObsUnit,
data=data
)
await transaction_manager.buffer_transaction(transaction)
# Background processor later writes to main database
Replica Databases
-----------------
Observatory Replica (Chile)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Role**: Local read-only copy for observatory operations
**Location**: CCAT observatory, Chile (5600m altitude)
**Purpose**:
* Fast local reads (no network latency)
* Continue operations during network outages
* Merge with buffered data for consistent views
**Connection from API**:
.. code-block:: python
# Site configuration for secondary site
LOCAL_DB_HOST=localhost # Points to local replica
LOCAL_DB_PORT=5432
**Replication Lag**:
Typical lag depends on network conditions:
* Good network: 1-10 seconds
* Slow network: 10-60 seconds
* Network down: Accumulates until reconnection
Institute Replicas (Optional)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Additional replicas can be deployed at partner institutions:
**Purpose**:
* Distributed data access for scientists
* Load balancing for read queries
* Disaster recovery backup
**Configuration**: Same as observatory replica
PostgreSQL Streaming Replication
---------------------------------
How It Works
~~~~~~~~~~~~
PostgreSQL uses Write-Ahead Logging (WAL) for replication:
1. **Main database writes to WAL** before applying changes
2. **WAL records sent to replicas** continuously or in batches
3. **Replicas replay WAL** to apply the same changes
4. **Replication lag** is the time between main write and replica apply
WAL (Write-Ahead Log)
~~~~~~~~~~~~~~~~~~~~~
The WAL contains:
* All data modifications (INSERT, UPDATE, DELETE)
* Transaction commits and rollbacks
* Schema changes (DDL)
* Checkpoint records
**WAL segment size**: Typically 16MB per file
**WAL retention**: Configured to retain enough for replicas to catch up
**WAL format**: Binary, PostgreSQL-specific
LSN (Log Sequence Number)
~~~~~~~~~~~~~~~~~~~~~~~~~~
Each WAL record has a unique LSN:
**Format**: ``file_offset/byte_offset``
Example: ``0/12345678``
**Properties**:
* Monotonically increasing
* Unique per transaction
* Same across main and replicas
* Used to track replication progress
**Querying LSN**:
.. code-block:: sql
-- On main database: Current write position
SELECT pg_current_wal_lsn();
-- Returns: 0/12345678
-- On replica: Last replayed position
SELECT pg_last_wal_replay_lsn();
-- Returns: 0/12345600 (slightly behind)
**Comparing LSNs**:
.. code-block:: python
def parse_lsn(lsn_string: str) -> int:
"""Convert LSN string to integer for comparison"""
file_part, offset_part = lsn_string.split('/')
return (int(file_part, 16) << 32) + int(offset_part, 16)
main_lsn = parse_lsn("0/12345678")
replica_lsn = parse_lsn("0/12345600")
if replica_lsn >= main_lsn:
print("Replica has caught up")
else:
bytes_behind = main_lsn - replica_lsn
print(f"Replica is {bytes_behind} bytes behind")
Replication Slots
~~~~~~~~~~~~~~~~~
**Purpose**: Ensure main database retains WAL until replica has consumed it
**Creating a slot**:
.. code-block:: sql
SELECT pg_create_physical_replication_slot('observatory_slot');
**Monitoring slots**:
.. code-block:: sql
SELECT
slot_name,
active,
restart_lsn,
pg_current_wal_lsn() - restart_lsn AS bytes_behind
FROM pg_replication_slots;
**Danger**: Inactive slots can cause WAL accumulation and disk space issues
Connection Configuration
------------------------
API Configuration
~~~~~~~~~~~~~~~~~
The API uses separate connection strings for main and local databases:
**At Main Site** (Cologne):
.. code-block:: bash
SITE_TYPE=main
MAIN_DB_HOST=localhost
MAIN_DB_PORT=5432
LOCAL_DB_HOST=localhost # Same as main
LOCAL_DB_PORT=5432
**At Secondary Site** (Observatory):
.. code-block:: bash
SITE_TYPE=secondary
MAIN_DB_HOST=main-db.example.com # Remote
MAIN_DB_PORT=5432
LOCAL_DB_HOST=localhost # Local replica
LOCAL_DB_PORT=5432
Replication Monitoring
----------------------
Checking Replication Status
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**On main database**:
.. code-block:: sql
-- Active replication connections
SELECT
client_addr,
state,
sync_state,
replay_lsn,
pg_current_wal_lsn() - replay_lsn AS bytes_behind
FROM pg_stat_replication;
**On replica**:
.. code-block:: sql
-- Replication status
SELECT
pg_is_in_recovery(), -- Should be true
pg_last_wal_receive_lsn(),
pg_last_wal_replay_lsn(),
pg_last_xact_replay_timestamp();
Replication Lag Metrics
~~~~~~~~~~~~~~~~~~~~~~~~
**Time-based lag**:
.. code-block:: sql
-- On replica
SELECT
now() - pg_last_xact_replay_timestamp() AS replication_lag
FROM pg_stat_replication;
**Byte-based lag**:
.. code-block:: sql
-- On main database
SELECT
client_addr,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_behind
FROM pg_stat_replication;
LSN Tracking in API
~~~~~~~~~~~~~~~~~~~
The API actively tracks replication using LSN:
.. literalinclude:: ../../ccat_ops_db_api/transaction_buffering/lsn_tracker.py
:language: python
:lines: 18-52
:emphasize-lines: 11-12, 27-30
Failover and Recovery
---------------------
Network Partition Scenarios
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Scenario 1: Observatory loses connection to main**
* Buffering continues (Redis local)
* Reads from local replica (may be stale)
* Background processor retries periodically
* When reconnected: Buffer drains automatically
**Scenario 2: Main database goes down**
* At main site: API becomes unavailable
* At secondary site: Buffering continues
* Reads work from replica
* When main restored: Buffer processes
**Scenario 3: Replica falls very behind**
* Reads become increasingly stale
* Smart query manager still merges buffer
* LSN tracking detects lag
* Alert monitoring triggers
* Manual intervention may be needed
Database Migrations
-------------------
Schema changes must be coordinated:
Migration Process
~~~~~~~~~~~~~~~~~
1. **Develop migration** in ops-db package
2. **Test on development database**
3. **Apply to main database** (Cologne)
4. **Replicas receive automatically** via WAL
5. **Update API** to use new schema
6. **Deploy API changes** to all sites
Example Migration
~~~~~~~~~~~~~~~~~
.. code-block:: python
# In ops-db package
# alembic/versions/xxx_add_observation_status.py
def upgrade():
op.add_column(
'executed_obs_unit',
sa.Column('detailed_status', sa.String(50))
)
def downgrade():
op.drop_column('executed_obs_unit', 'detailed_status')
**Apply migration**:
.. code-block:: bash
# On main database
alembic upgrade head
**Replicas** receive the schema change automatically through WAL.
Backup and Restore
------------------
Backup Strategy
~~~~~~~~~~~~~~~
**Main database**:
* Continuous WAL archiving to S3/backup storage
* Daily full backups with pg_dump
* Point-in-time recovery capability
* Retention: 30 days
**Replicas**:
* No backups needed (can rebuild from main)
* Optional: Snapshot for faster rebuild
Restore Procedure
~~~~~~~~~~~~~~~~~
.. code-block:: bash
# Restore from backup
pg_restore -d ccat_ops_db backup_file.dump
# Or rebuild replica from main
pg_basebackup -h main-db.example.com -D /var/lib/postgresql/data -U replication -v -P
Performance Considerations
--------------------------
Replication Impact on Main
~~~~~~~~~~~~~~~~~~~~~~~~~~~
**WAL generation**: Minimal overhead (< 5%)
**Network bandwidth**: Depends on write volume
* Light writes: < 1 MB/s
* Heavy writes: 10-50 MB/s
**Replication slots**: Can accumulate WAL if replica is down (monitor!)
Replica Query Performance
~~~~~~~~~~~~~~~~~~~~~~~~~~
**Read-only**: Cannot create indexes not on main
**Hot standby feedback**: Prevents query cancellation, may delay vacuum
**Connection pooling**: Same as main database
**Query routing**: API automatically queries local replica
Summary
-------
The database topology provides:
* **Single source of truth**: Main database in Cologne
* **Distributed reads**: Replicas at each site
* **Automatic synchronization**: PostgreSQL streaming replication
* **Precise tracking**: LSN-based replication monitoring
* **Network resilience**: Operations continue during outages
Key characteristics:
* All writes → main database
* Reads from local replica (with buffer merge)
* WAL-based replication (continuous or near-continuous)
* LSN tracking for cache management
* Monitoring for replication health
Next Steps
----------
* :doc:`site-configuration` - How sites are configured
* :doc:`../deep-dive/transaction-buffering/lsn-tracking` - LSN tracking implementation
* :doc:`../development/debugging-buffering` - Troubleshooting replication issues