Data Model ========== .. verified:: 2025-11-25 :reviewer: Christof Buchbender ops-db doesn't store actual data files - it tracks metadata about where files exist. This is crucial for distributed data management across multiple sites and storage types. File Hierarchy -------------- Data files are organized in a hierarchy from individual files to transfer bundles: .. mermaid:: graph TB RDF[RawDataFile] RDP[RawDataPackage] DTP[DataTransferPackage] RDF -->|bundled into| RDP RDP -->|bundled into| DTP style RDF fill:#e1f5ff style RDP fill:#fff4e1 style DTP fill:#ffe1f5 RawDataFile ----------- :py:class:`~ccat_ops_db.models.RawDataFile` represents an individual data file produced by an instrument module. Uses UUID because files are created by telescope systems that may be offline. See also :doc:`/ops-db-api/docs/index`. For complete attribute details, see :py:class:`~ccat_ops_db.models.RawDataFile`. RawDataPackage -------------- :py:class:`~ccat_ops_db.models.RawDataPackage` is a bundled collection of related :py:class:`~ccat_ops_db.models.RawDataFile` objects packaged as a tar archive. The RawDataPackage groups files per `ExecutedObsUnit` and `InstrumentModule`. Thousands of small files are inefficient for archiving. Packaging in this ways consolidates them into manageable units of closely related files that will have to be processed together. The packaging preserves directory structure and metadata so that when the data is unpacked the original directory structure is restored. **State Meanings**: * **WAITING** (yellow hourglass in UI): Only exists in primary location * **TRANSFERRING** (blue circle): Part of an active DataTransferPackage * **ARCHIVED** (green checkmark): Successfully stored in long-term archive * **FAILED** (red cross): Transfer or archive failed For complete attribute details, see :py:class:`~ccat_ops_db.models.RawDataPackage`. RawDataPackageMetadata ---------------------- :py:class:`~ccat_ops_db.models.RawDataPackageMetadata` stores additional metadata for IVOA-compatible metadata generation. Keeps the core :py:class:`~ccat_ops_db.models.RawDataPackage` model clean while allowing flexible metadata storage. For complete attribute details, see :py:class:`~ccat_ops_db.models.RawDataPackageMetadata`. DataTransferPackage ------------------- :py:class:`~ccat_ops_db.models.DataTransferPackage` bundles multiple :py:class:`~ccat_ops_db.models.RawDataPackage` objects for efficient network transfer. Optimizes network transfer efficiency - many packages → fewer transfer operations. For long distance transfers, optimal package sizes exist in the range of 10-50TB. One :py:class:`~ccat_ops_db.models.DataTransferPackage` can have multiple :py:class:`~ccat_ops_db.models.DataTransfer` records (same bundle to multiple destinations). For complete attribute details, see :py:class:`~ccat_ops_db.models.DataTransferPackage`. Physical Copy Tracking ---------------------- The Physical Copy System ^^^^^^^^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.PhysicalCopy` tracks where each file/package physically exists across all storage locations. Data can exist in multiple places simultaneously (buffer, archive, staging area), enabling safe deletion and staged unpacking. It is polymorphic with subclasses: :py:class:`~ccat_ops_db.models.RawDataFilePhysicalCopy`, :py:class:`~ccat_ops_db.models.RawDataPackagePhysicalCopy`, and :py:class:`~ccat_ops_db.models.DataTransferPackagePhysicalCopy`. Each subclass has a ``full_path`` property that constructs the actual filesystem/S3 path. For complete attribute details, see :py:class:`~ccat_ops_db.models.PhysicalCopy` and its subclasses. PhysicalCopyStatus Enum ^^^^^^^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.PhysicalCopyStatus` tracks the lifecycle state of a physical copy: .. list-table:: PhysicalCopyStatus Values :header-rows: 1 :widths: 30 70 * - Status - Meaning * - PRESENT - File exists and is available * - STAGED - Package unpacked, original archive removed to save space * - DELETION_POSSIBLE - Eligible for cleanup (exists in other locations) * - DELETION_PENDING - Scheduled for removal * - DELETION_SCHEDULED - Cleanup task queued * - DELETION_IN_PROGRESS - Currently being deleted * - DELETION_FAILED - Deletion attempt failed * - DELETED - Successfully removed For complete enum details, see :py:class:`~ccat_ops_db.models.PhysicalCopyStatus`. Physical Copy Relationships ^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. mermaid:: graph TB RDP[RawDataPackage] PC1[PhysicalCopy
at Location 1] PC2[PhysicalCopy
at Location 2] PC3[PhysicalCopy
at Location 3] DL1[DataLocation 1
Chile Buffer] DL2[DataLocation 2
Cologne Archive] DL3[DataLocation 3
Processing] RDP -->|has| PC1 RDP -->|has| PC2 RDP -->|has| PC3 PC1 -->|at| DL1 PC2 -->|at| DL2 PC3 -->|at| DL3 style RDP fill:#e1f5ff style PC1 fill:#fff4e1 style PC2 fill:#fff4e1 style PC3 fill:#fff4e1 **Example**: A :py:class:`~ccat_ops_db.models.RawDataPackage` might have 3 physical copies: * One PRESENT at Chile buffer * One PRESENT at Cologne archive * One STAGED at processing location (unpacked, archive removed) Status and State Management --------------------------- Status Enum ^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.Status` is used for operations (transfer, archive, staging): .. list-table:: Status Values :header-rows: 1 :widths: 30 70 * - Status - Meaning * - PENDING - Queued but not started * - SCHEDULED - Assigned to worker * - IN_PROGRESS - Currently executing * - COMPLETED - Finished successfully * - FAILED - Failed and won't retry For complete enum details, see :py:class:`~ccat_ops_db.models.Status`. PackageState Enum ^^^^^^^^^^^^^^^^^ :py:class:`~ccat_ops_db.models.PackageState` is used for data lifecycle: .. list-table:: PackageState Values :header-rows: 1 :widths: 30 70 * - State - Meaning * - WAITING - Only in primary location * - TRANSFERRING - Being transferred * - ARCHIVED - In long-term archive * - FAILED - Operation failed For complete enum details, see :py:class:`~ccat_ops_db.models.PackageState`. Why This Structure? ------------------- **Separation of File and Package Levels** Allows tracking at two granularities: * File-level: For detailed provenance and access * Package-level: For efficient transfer and storage management **Physical Copy Tracking** Enables: * Knowing exactly where each copy exists * Safe deletion (only delete if copies exist elsewhere) * Staged unpacking (remove archive after extraction to save space) **Status/State Separation** Allows: * Tracking operational progress (status) * Tracking data lifecycle state (state) * Retry logic and failure handling Relationship to Data Transfer ----------------------------- The :doc:`/data-transfer/docs/index` package orchestrates moving data based on these records. ops-db just tracks metadata - data-transfer does the actual file operations. For detailed data flow documentation, see the :doc:`/data-transfer/docs/index` documentation. Related Documentation --------------------- * Complete API reference: :doc:`../api_reference/models` * Location model: :doc:`location_model` * Transfer model: :doc:`transfer_model` * Data transfer workflows: :doc:`/data-transfer/docs/index`