See all standards

DAS Data Standard & Pipeline

Overview

The GDR's distributed acoustic sensing (DAS) data pipeline automatically converts DAS data from non-standardized SEG-Y formats into a standardized HDF5 format, based on PRODML and the IRIS DAS RCN's DAS metadata standard. This pipeline works both for DAS data uploaded directly to the GDR submission form (not recommended), and for DAS data added to the GDR Data Lake.

Tips to Ensure your Data are Standardized

To ensure your drilling data are standardized, follow these tips:

  • Do not put your DAS data into zipped directories.
  • While your data will still be standardized if you upload them directly to a GDR submission form, the process is much less efficient - both for you and for us! As an alternative, consider adding your DAS data to the GDR Data Lake. For more information, contact GDR Help.
  • Ensure that your DAS data are in a compatible format:
    • SEG-Y format
  • Provide as much metadata as possible, either in the SEG-Y file headers, in a separate README file, or by using our DAS metadata template, which is based on the DAS RCN Metadata Standard (see link under Helpful Resources below).
  • Upload a channel map with your data that includes individual DAS channel coordinates and elevation.

If you think your dataset should have been standardized, but was not, please contact GDR Help. We are happy to assist and are constantly looking to improve are data standards and pipelines.

Helpful Resources

Here are some helpful resources related to the GDR's DAS data standard and pipeline:

How Does it Work?

When DAS data in SEG-Y format is added to the GDR Data Lake, it is recognized as DAS data and its location is parsed into our DAS data pipeline, which creates a new directory in the data lake for the standardized version of the DAS dataset. Before translation begins, our curators may reach out to request additional metadata in an attempt to better adhere to the DAS RCN's metadata standard. Note that the GDR stores most of the metadata within HDF5 files rather than the DAS RCN's suggestion of storing all metadata in a separate json file. This is so that most of the metadata stays with the data.

Once any additional metadata is acquired, it is parsed into the DAS data pipeline along with the non-standardized SEG-Y DAS data files. Metadata is extracted from the headers in the SEG-Y files using direct and fuzzy field matching, as described by the image to the right. This metadata is stored in the appropriate attributes of the standardized HDF5 file's groups and datasets.

Data is extracted from the traces in each SEG-Y file and mapped to the associated channel (or column) in an array in the standardized HDF5 file.

The resulting HDF5 format is structured like the image below, with one major group for DasRawData, one for DasMetadata, and an optional additional group for DasProcessedData.

The GDR DAS Data Standard

The GDR DAS Data Standard

GDR Standardized HDF5 Object Name & Description HDF5 Object Type SEG-Y Field Location SEG-Y Field Label
(if applicable)
Extraction Method Required
DasRawData

Group of information and data associated with raw DAS data.
Group Traces NA Forms array by assuming each trace is a DAS channel.
DasMetadata

Group of DAS metadata attributes. Format should follow that of the DAS RCN's metadata standard.
Group Binary & Text Headers NA Fuzzy and exact keyword matching.
DasMetadata/OverviewStatement

High-level overview information for the DAS deployment.
Attribute Text Header NA Fuzzy and exact keyword matching.
DasMetadata/Interrogator

Group of DAS interrogator metadata. Should include a unique identifier for each interrogator if more than one is used. Includes attributes for InterrogatorManufacturer and InterrogatorModel.
Group Text Header NA Fuzzy matching for common manufacturers/models in the text header.
DasMetadata/Interrogator/Acquisition/SampleRate

Sampling rate used for acquisition. Should be in the units specified by Acquisition/SampleRateUnit.
Attribute Binary Header Interval Computes rate in Hz from sample interval which is in microseconds for SEG-Y files.
DasMetadata/Interrogator/Acquisition/NumberOfChannels

Number of channels used in DAS survey.
Attribute Traces NA Counts number of “traces” in SEG-Y file.
DasRawData/RawDataArray

An array of size m x n, where m is the number of timesteps and n is the number of channels, containing the raw DAS data.
Dataset Traces NA Forms array by assuming each trace is a DAS channel.
DasRawData/DasTimeArray

1D array of timestamps for each sample in microseconds since Unix epoch (1970-01-01).
Group Text Header NA Time array is built using the first UTC timestamp, adding sampling interval to successive timesteps. If timestamp not found or provided by data owner, first timestamp is set to zero.
DasMetadata/Interrogator/Acquisition/SampleRateUnit

Units of sampling rate. Should be Hz for PRODML.
Attribute Binary Header Interval Standard in SEG-Y is microseconds. Convert to Hz.
DasMetadata/Interrogator/Acquisition

Information common to all channels.
Group Traces, Binary & Text Headers NA Fuzzy and exact keyword matching.
DasMetadata/Interrogator/Acquisition/AcquisitionStartTime

Start UTC timestamp of DAS data acquisition.
Attribute Text Header NA Use first time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/AcquisitionEndTime

End UTC timestamp of DAS data acquisition.
Attribute Text Header NA Use last time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/StartDate

Start date of DAS data acquisition in UTC.
Attribute Text Header NA Use first time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/EndDate

End date of DAS data acquisition in UTC.
Attribute Text Header NA Use last time from DasTimeArray.
* Note that the standard above is built only off of metadata we have been able to find in SEG-Y files, meaning that it is theoretically incomplete. We will continue to improve it as we encounter new DAS datasets with more complete metadata packages. In the meantime, data owners should aim to include as many of the metadata fields as possible suggested by the DAS RCN metadata standard to help us achieve a complete picture.