See all standards

DAS Data Standard & Pipeline

Overview

The GDR's distributed acoustic sensing (DAS) data pipeline automatically converts DAS data from non-standardized SEG-Y formats into a standardized HDF5 format, based on PRODML and the IRIS DAS RCN's DAS metadata standard. This pipeline works both for DAS data uploaded directly to the GDR submission form (not recommended), and for DAS data added to the GDR Data Lake.

Tips to Ensure your Data are Standardized

To ensure your data are standardized, follow these tips:

  • Do not put your DAS data into zipped directories.
  • While your data will still be standardized if you upload them directly to a GDR submission form, the process is much less efficient - both for you and for us! As an alternative, consider adding your DAS data to the GDR Data Lake. For more information, contact GDR Help.
  • Ensure that your DAS data are in a compatible format:
    • SEG-Y format
  • Provide as much metadata as possible, either in the SEG-Y file headers, in a separate README file, or by using our DAS metadata template, which is based on the DAS RCN Metadata Standard (see link under Helpful Resources below).

If you think your data were not standardized in error, please contact GDR Help. We are happy to assist and are constantly looking to improve are data standards and pipelines.

Helpful Resources

Here are some helpful resources related to the GDR's DAS data standard and pipeline:

How Does it Work?

When DAS data in SEG-Y format is added to the GDR Data Lake, it is recognized as DAS data and its location is parsed into our DAS data pipeline, which creates a new directory in the data lake for the standardized version of the DAS dataset. Before translation begins, our curators may reach out to request additional metadata. Once any additional metadata is acquired, it is parsed into the DAS data pipeline along with the non-standardized SEG-Y DAS data files. Metadata is extracted from the headers in the SEG-Y files using direct and fuzzy field matching, as described by the image below. This metadata is stored in the appropriate attributes of the standardized HDF5 file's groups and datasets.

Data is extracted from the traces in each SEG-Y file and mapped to the associated channel (or column) in an array in the standardized HDF5 file.

The resulting HDF5 format is structured like the image below, with one major group for DasRawData, one for DasMetadata, and an optional additional group for DasProcessedData.

The GDR DAS Data Standard

Standardized HDF5 Object Name & Description HDF5 Object Type SEG-Y Field Location SEG-Y Field Label
(if applicable)
Extraction Method Required
DasRawData

Group of information and data associated with raw DAS data.
Group Traces NA Forms array by assuming each trace is a DAS channel.
DasMetadata

Group of DAS metadata attributes. Format should follow that of the DAS RCN's metadata standard.
Group Binary & Text Headers NA Fuzzy and exact keyword matching.
DasMetadata/OverviewStatement

High-level overview information for the DAS deployment.
Attribute Text Header NA Fuzzy and exact keyword matching.
DasMetadata/Interrogator

Group of DAS interrogator metadata. Should include a unique identifier for each interrogator if more than one is used. Includes attributes for InterrogatorManufacturer and InterrogatorModel.
Group Text Header NA Fuzzy matching for common manufacturers/models in the text header.
DasMetadata/Interrogator/Acquisition/SampleRate

Sampling rate used for acquisition. Should be in the units specified by Acquisition/SampleRateUnit.
Attribute Binary Header Interval Computes rate in Hz from sample interval which is in microseconds for SEG-Y files.
DasMetadata/Interrogator/Acquisition/NumberOfChannels

Number of channels used in DAS survey.
Attribute Traces NA Counts number of “traces” in SEG-Y file.
DasMetadata/Interrogator/Acquisition/ChannelGroup

Information common to all channels.
Group Text Header CoordinateUnits Looks for UTM zone in text header and determines units from SEG-Y trace header 'CoordinateUnits' field.
DasMetadata/Interrogator/Acquisition/ChannelGroup/ChannelMetadata

Properties unique to individual channel, including ElevationAboveSeaLevel, X-Coordinate, and Y-Coordinate.
Dataset Trace Headers ElevationScalar, CoordinateUnits, GroupX, GroupY Exact matching and pairing to each channel.
DasRawData/RawDataArray

An array of size m x n, where m is the number of timesteps and n is the number of channels, containing the raw DAS data.
Dataset Traces NA Forms array by assuming each trace is a DAS channel.
DasRawData/DasTimeArray

An array of size m x n, where m is the number of timesteps and n i the number of channels, containing subgroups for processed DAS data e.g., DasFbeData, DasSpectraData, etc.).
Group Text Header NA Time array is built using the first UTC timestamp, adding sampling interval to successive timesteps. If timestamp not found, first timestamp is set to zero.
DasMetadata/Interrogator/Acquisition/SampleRateUnit

Units of sampling rate. Should be Hz for PRODML.
Attribute Binary Header Interval Standard in SEG-Y is microseconds. Convert to Hz.
DasMetadata/Interrogator/Acquisition

Information common to all channels.
Group Traces, Binary & Text Headers NA Fuzzy and exact keyword matching.
DasMetadata/Interrogator/Acquisition/AcquisitionStartTime

Start UTC timestamp of DAS data acquisition.
Attribute Text Header NA Use first time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/AcquisitionEndTime

End UTC timestamp of DAS data acquisition.
Attribute Text Header NA Use last time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/StartDate

Start date of DAS data acquisition in UTC.
Attribute Text Header NA Use first time from DasTimeArray.
DasMetadata/Interrogator/Acquisition/EndDate

End date of DAS data acquisition in UTC.
Attribute Text Header NA Use last time from DasTimeArray.
* Note that the standard above is built only off of metadata we have been able to find in SEG-Y files, meaning that it is theoretically incomplete. We will continue to improve it as we encounter new DAS datasets with more complete metadata packages. In the meantime, data owners should aim to include as many of the metadata fields as possible suggested by the DAS RCN metadata standard to help us achieve a complete picture.