Data Standards

What are Data Standards?

One component that contributes to high quality data is reusability, which can be enhanced through standardization. Data standards creates consistency in formatting and contents of like datasets, lessening preprocessing requirements and ensuring adequate information is provided by a given dataset.

Why Standardize?

High quality data is a key component for producing high quality machine learning results and for the applicability of machine learning to real-world problems. Machine learning is frequently exploratory in nature, meaning that data curation is often an iterative process throughout the life of a project. Modular data pipelines and standardization of processes and practices help to streamline this process, supporting the move to data-centric machine learning workflows which often produce outcomes that are more applicable to real world problems.

That said, any processes that can be taken to lessen data curation requirements are helpful in improving geothermal machine learning and data science outcomes. Data standardization puts similar data sets into a standard format, which lessens the time spent by researchers reformatting and combining data sets. This overall reduces the amount of time required for adequate data curation, both reducing the overall cost of machine learning projects and allowing more time for exploring different machine learning experiments and properly interpreting results. It would also allow users to incorporate many datasets efficiently and possibly automatically into machine learning projects, as opposed to focusing on just one or a few manually parsed datasets.

Automated Data Pipelines

Data pipelines have been implemented for select high-value data sets to automate the standardization process. The GDR's data pipelines automatically recognize certain types of datasets, and then convert them into a standardized format while also preserving the original data file. This shift takes the burden of data standardization off the user and project teams, allowing more project resources to be used on research and development activities, and increase the availability of standardized geothermal data available through the GDR. A set of recommendations and a data standard for each data type, will exist with each data pipeline in order to advise data collection for maximum usability for future research.

Existing GDR Data Standards and Pipelines

Check out our existing data standards:

Drilling Data (Pason, RigCLOUD, and RIMBase*)

*RIMBase is included in the standard, but not currently in the pipeline

Coming Soon

The GDR is currently developing data standards for the following data types:

  • Geospatial data
  • Distributed acoustic sensing (DAS) data
  • Stimulation data

NGDS Content Models

The National Geothermal Data System (NGDS) provides standardized templates in Excel and XML formats for users to input their data into. The NGDS Content Models were developed with the intent of being all-inclusive, meaning that there is a column for every possible measurement associated with a particular data type. The list below describes the existing NGDS Content Models.

NGDS Content Models

Abandoned Mines Active Fault / Quaternary Fault Aqueous Chemistry
Borehole Lithology Intercepts Borehole Lithology Interval Feature Borehole Temperature Observation
Contour Lines Direct Use Feature Drill Stem Test Observations (deprecated)
Fault Feature / Shear Displacement Structure Fluid Flux Injection and Disposal Geologic Contact Feature
Geologic Fault Feature / Shear Displacement Structure Geologic Reservoir Geologic Units
Geothermal Area Geothermal Fluid Production (deprecated) Geothermal Metadata Compilation
Geothermal Power Plant Facility Gravity Stations Heat Flow
Heat Pump Facility Hydraulic Properties Mineral Recovery Brines
Physical Sample Powell and Cumming Geothermometry Power Plant Production
Radiogenic Heat Production Rock Chemistry Seismic Event Hypocenter
Thermal Conductivity Observation Thermal/Hot Spring Feature Volcanic Vents
Well Fluid Production Well Header Observation Well Log Observation
Well Tests

Learn More or Submit Feedback

If you want to learn more about the importance of data standardization for data science in geothermal, check out this Stanford Geothermal Workshop paper: Taverna, N., Weers, J., Huggins, J., Anderson, A., Frone, Z. “Improving the Quality of Geothermal Data Through Data Standards and Pipelines Within the Geothermal Data Repository (GDR).” Proceedings of the 48th Workshop on Geothermal Reservoir Engineering, Stanford Geothermal Program (2023).

The GDR team is continuously working to align its efforts with the needs of the geothermal community, and we would like to invite you to provide your feedback here: