DAS Data Standard & Pipeline
Overview
The GDR's distributed acoustic sensing (DAS) data pipeline automatically converts DAS data from non-standardized SEG-Y formats into a standardized HDF5 format, based on PRODML and the IRIS DAS RCN's DAS metadata standard. This pipeline works both for DAS data uploaded directly to the GDR submission form (not recommended), and for DAS data added to the GDR Data Lake.
Tips to Ensure your Data are Standardized
To ensure your drilling data are standardized, follow these tips:
- Do not put your DAS data into zipped directories.
- While your data will still be standardized if you upload them directly to a GDR submission form, the process is much less efficient - both for you and for us! As an alternative, consider adding your DAS data to the GDR Data Lake. For more information, contact GDR Help.
- Ensure that your DAS data are in a compatible format:
- SEG-Y format
- Provide as much metadata as possible, either in the SEG-Y file headers, in a separate README file, or by using our DAS metadata template, which is based on the DAS RCN Metadata Standard (see link under Helpful Resources below).
- Upload a channel map with your data that includes individual DAS channel coordinates and elevation.
If you think your dataset should have been standardized, but was not, please contact GDR Help. We are happy to assist and are constantly looking to improve are data standards and pipelines.
Helpful Resources
Here are some helpful resources related to the GDR's DAS data standard and pipeline:
- GRC Paper on DAS data pipeline - or request a copy from GDR Help.
- PRODML Standards
- DAS RCN Metadata Standard
- HDF5 File Format
How Does it Work?
When DAS data in SEG-Y format is added to the GDR Data Lake, it is recognized as DAS data and its location is parsed into our DAS data pipeline, which creates a new directory in the data lake for the standardized version of the DAS dataset. Before translation begins, our curators may reach out to request additional metadata in an attempt to better adhere to the DAS RCN's metadata standard. Note that the GDR stores most of the metadata within HDF5 files rather than the DAS RCN's suggestion of storing all metadata in a separate json file. This is so that most of the metadata stays with the data.
Once any additional metadata is acquired, it is parsed into the DAS data pipeline along with the non-standardized SEG-Y DAS data files. Metadata is extracted from the headers in the SEG-Y files using direct and fuzzy field matching, as described by the image to the right. This metadata is stored in the appropriate attributes of the standardized HDF5 file's groups and datasets.
Data is extracted from the traces in each SEG-Y file and mapped to the associated channel (or column) in an array in the standardized HDF5 file.
The resulting HDF5 format is structured like the image below, with one major group for DasRawData, one for DasMetadata, and an optional additional group for DasProcessedData.
The GDR DAS Data Standard
The GDR DAS Data Standard | GDR Standardized HDF5 Object Name & Description | HDF5 Object Type | SEG-Y Field Location | SEG-Y Field Label (if applicable) |
Extraction Method | Required |
---|---|---|---|---|---|
DasRawData
Group of information and data associated with raw DAS data. |
Group | Traces | NA | Forms array by assuming each trace is a DAS channel. | |
DasMetadata
Group of DAS metadata attributes. Format should follow that of the DAS RCN's metadata standard. |
Group | Binary & Text Headers | NA | Fuzzy and exact keyword matching. | |
DasMetadata/OverviewStatement
High-level overview information for the DAS deployment. |
Attribute | Text Header | NA | Fuzzy and exact keyword matching. | |
DasMetadata/Interrogator
Group of DAS interrogator metadata. Should include a unique identifier for each interrogator if more than one is used. Includes attributes for InterrogatorManufacturer and InterrogatorModel. |
Group | Text Header | NA | Fuzzy matching for common manufacturers/models in the text header. | |
DasMetadata/Interrogator/Acquisition/SampleRate
Sampling rate used for acquisition. Should be in the units specified by Acquisition/SampleRateUnit. |
Attribute | Binary Header | Interval | Computes rate in Hz from sample interval which is in microseconds for SEG-Y files. | |
DasMetadata/Interrogator/Acquisition/NumberOfChannels
Number of channels used in DAS survey. |
Attribute | Traces | NA | Counts number of “traces” in SEG-Y file. | |
DasRawData/RawDataArray
An array of size m x n, where m is the number of timesteps and n is the number of channels, containing the raw DAS data. |
Dataset | Traces | NA | Forms array by assuming each trace is a DAS channel. | |
DasRawData/DasTimeArray
1D array of timestamps for each sample in microseconds since Unix epoch (1970-01-01). |
Group | Text Header | NA | Time array is built using the first UTC timestamp, adding sampling interval to successive timesteps. If timestamp not found or provided by data owner, first timestamp is set to zero. | |
DasMetadata/Interrogator/Acquisition/SampleRateUnit
Units of sampling rate. Should be Hz for PRODML. |
Attribute | Binary Header | Interval | Standard in SEG-Y is microseconds. Convert to Hz. | |
DasMetadata/Interrogator/Acquisition
Information common to all channels. |
Group | Traces, Binary & Text Headers | NA | Fuzzy and exact keyword matching. | |
DasMetadata/Interrogator/Acquisition/AcquisitionStartTime
Start UTC timestamp of DAS data acquisition. |
Attribute | Text Header | NA | Use first time from DasTimeArray. | |
DasMetadata/Interrogator/Acquisition/AcquisitionEndTime
End UTC timestamp of DAS data acquisition. |
Attribute | Text Header | NA | Use last time from DasTimeArray. | |
DasMetadata/Interrogator/Acquisition/StartDate
Start date of DAS data acquisition in UTC. |
Attribute | Text Header | NA | Use first time from DasTimeArray. | |
DasMetadata/Interrogator/Acquisition/EndDate
End date of DAS data acquisition in UTC. |
Attribute | Text Header | NA | Use last time from DasTimeArray. |