Location Model ============== .. verified:: 2025-11-25 :reviewer: Christof Buchbender CCAT data centers are geographically distributed (Chile, Cologne, Cornell). Data must be tracked across multiple sites, each with multiple storage locations of different types. Site ---- :py:class:`~ccat_ops_db.models.Site` groups data locations that belong to the same physical or logical location. **Examples**: * CCAT (short_name: ccat): Cerro Chajnantor, Chile - telescope location * Cologne (short_name: cologne): University of Cologne, Germany - primary archive * Cornell (short_name: us): Cornell University, USA - US archive For complete attribute details, see :py:class:`~ccat_ops_db.models.Site`. DataLocation ------------ :py:class:`~ccat_ops_db.models.DataLocation` is the base class for all storage locations with polymorphic storage types. It defines WHERE data can be stored within a site. **LocationType Enum**: SOURCE (telescope instrument computers), BUFFER (intermediate storage), LONG_TERM_ARCHIVE (permanent storage), PROCESSING (temporary analysis areas). **StorageType Enum**: DISK (traditional filesystem), S3 (object storage), TAPE (tape libraries). For complete attribute details, see :py:class:`~ccat_ops_db.models.DataLocation`. Polymorphic Storage Types ------------------------- The database uses polymorphic inheritance to support different storage backends: .. mermaid:: graph TB DL[DataLocation
Base Class] DISK[DiskDataLocation] S3[S3DataLocation] TAPE[TapeDataLocation] DL -->|polymorphic| DISK DL -->|polymorphic| S3 DL -->|polymorphic| TAPE style DL fill:#e1f5ff style DISK fill:#fff4e1 style S3 fill:#ffe1f5 style TAPE fill:#e1ffe1 DiskDataLocation ^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.DiskDataLocation` represents filesystem-based storage (local or remote). Used for local telescope storage, network-mounted buffers, and processing areas. **Example**: FYST source location at "telescope.ccat.cl:/data/fyst" For complete attribute details, see :py:class:`~ccat_ops_db.models.DiskDataLocation`. S3DataLocation ^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.S3DataLocation` represents object storage for large-scale archival. Used for long-term archives and cloud storage. Credentials are retrieved via :py:func:`~ccat_ops_db.models.S3DataLocation.get_s3_credentials` method using environment variable patterns. **Example**: Cologne long-term archive using Coscine S3-compatible storage For complete attribute details, see :py:class:`~ccat_ops_db.models.S3DataLocation`. TapeDataLocation ^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.TapeDataLocation` represents tape library systems for deep archival. Used for long-term cold storage with high capacity and low access frequency. Not currently in production, but supported by the architecture. For complete attribute details, see :py:class:`~ccat_ops_db.models.TapeDataLocation`. Buffer Hierarchy and Failover ----------------------------- Multiple buffer locations can exist at a site, enabling failover and load distribution. **Active Flag**: Indicates if location is operational **Priority Field**: Defines failover order (lower number = higher priority) **Use Case**: If primary buffer is full or offline, data-transfer can route to secondary buffer. **Example**: * cologne_buffer_1 (priority 0, active=True) - Primary buffer * cologne_buffer_2 (priority 1, active=True) - Secondary buffer The system uses: * **Priority** (lower number = higher priority): Determines which location to use first * **Active** flag: Allows temporarily disabling locations for maintenance Example Locations ----------------- .. list-table:: Example Data Locations :header-rows: 1 :widths: 20 20 20 20 20 * - Site - Name - LocationType - StorageType - Path/Details * - CCAT - fyst_source - SOURCE - DISK - telescope.ccat.cl:/data/fyst * - Cologne - cologne_buffer_1 - BUFFER - DISK - buffer.data.uni-koeln.de:/mnt/buffer * - Cologne - cologne_lta - LONG_TERM_ARCHIVE - S3 - bucket: ccat-archive * - Cologne - ramses_processing - PROCESSING - DISK - ramses.cluster:/scratch/ccat Why This Structure? ------------------- **Polymorphic Design** Allows different storage backends without changing core logic. The same code can work with disk, S3, or tape storage. **Site Grouping** Enables geographic routing and replication strategies. Data can be replicated across multiple sites for redundancy. **Location Type vs Storage Type** * ``location_type`` captures functional role (where in the workflow) * ``storage_type`` captures technical implementation * Separation allows flexibility: A BUFFER location could be DISK or S3 depending on site infrastructure **Active/Priority Fields** Enable dynamic routing and failover without code changes. Locations can be disabled for maintenance or prioritized based on capacity. Integration with Physical Copies --------------------------------- Each :py:class:`~ccat_ops_db.models.PhysicalCopy` references a :py:class:`~ccat_ops_db.models.DataLocation`. The ``full_path`` property combines: * For :py:class:`~ccat_ops_db.models.DiskDataLocation`: ``DataLocation.path + file.relative_path`` * For :py:class:`~ccat_ops_db.models.S3DataLocation`: ``DataLocation.bucket_name + file.relative_path`` (S3 key) * For :py:class:`~ccat_ops_db.models.TapeDataLocation`: ``DataLocation.mount_path + file.relative_path`` Geographic Distribution ----------------------- Storage locations currently span multiple sites: * **CCAT Observatory (Chile)** - SOURCE and BUFFER locations at telescope * **University of Cologne (Germany)** - BUFFER, LONG_TERM_ARCHIVE, and PROCESSING (RAMSES) * **Cornell University (USA)** - Future archive site **Future Expansion**: The architecture supports additional sites and multi-tiered transfer routing (e.g., Chile → Cologne → Cornell). Related Documentation --------------------- * Complete API reference: :doc:`../api_reference/models` * Transfer model: :doc:`transfer_model` * Data model: :doc:`data_model`