OOI Data Management and QA/QC: Lessons Learned and

OOI Data Management and QA/QC: Lessons Learned and

OOI Data Management and QA/QC: Lessons Learned and the Path Ahead Michael F Vardaro, M. Crowley, L. Belabbassi, M. Smith, L. Garzio, F. Knuth, J. Kerfoot, S. Glenn, O. Schofield, S. Lichtenwalner Rutgers University, Dept. of Marine & Coastal Sciences, New Brunswick, NJ Table of Contents: 1. 2. 3. 4. OOI Data Evaluation Team OOI Data Flow OOI Data Evaluation Procedures Lessons Learned a) Metadata b) Communications c) QA vs. QC 5. Future Plans AGU Ocean Sciences 2016

2 OOI By the Numbers 7 50 33 >850 >2500 >100K AGU Ocean Sciences 2016 Arrays Stable Platforms Moorings, Profilers, Nodes Mobile Assets Gliders, AUVs Instruments Science Data Products Science/Engineering Data Products 3 Data Team

Leadership & Oversight Program Management CI PI Manish Parashar Science/User PI Oscar Schofield Science/User PI Scott Glenn System Engineering PM Project Coordinator Science PM RU Ivan Rodero Caroline McHugh

RU Mike Crowley RDI2 COOL Software Team Software Team Manager Bill Kish Software Test Eng. Matthew Danku Software Configuration & Maintenance Janeen Pisciotta System Team Data Manager Mike Vardaro IT Manager

Juan Jos Villalobos Network Engineer Jim Housell Technical Help Desk & EducationOutreach Sage Lichtenwalner System Administrator Aamir Jadoon Rutgers Support Data Team User Services Team OOI Management: Steady State: 1 FTE Asst. Data Manager

Leila Belabbassi Data Evaluator Friedrich Knuth Data Evaluator Lori Garzio Data Evaluator Mike Smith OOI Operations: 12 FTEs 4 Data Flow AGU Ocean Sciences 2016 5 OOI Data Product Levels Raw data: The datasets as they are received from the instrument o May contain multiple L0, L1, or L2 parameters, data for multiple sensors, and be in native sensor units o Always persisted and archived by the OOI

o Example: format 0 binary file from an SBE-37IM on a Global Flanking Mooring. Level 0 (L0): Unprocessed, parsed data parameter that is in instrument/sensor units and resolution o Sensor by sensor (unpacked and/or de-interleaved) and available in OOI supported formats (e.g., NetCDF) o Always persisted and archived by the OOI o Example: SBE-37IM Temperature portion of the hex string Level 1 (L1): Data parameter that has been calibrated and is in scientific units o QC may be applied at this level, utilizing simple automated techniques or human inspection o Actions to transform Level 0 to Level 1 data are captured and presented in the metadata of the Level 1 data o Example: SBE-37IM Temperature converted from hex to binary and scaled to produce degrees C Level 2 (L2): Derived data parameter created via an algorithm that draws on multiple L1 data products o Products may come from the same or from separate instruments o Example: SBE-37IM Density and Salinity AGU Ocean Sciences 2016 6 Data Flow Example: Pioneer Profiler CTDPF DOFST FLORT PARAD

VEL3D Profiler Controller Instrument ADCPS/T MOPAK RTE WHOI Ocean Inductive Modem Platform Controller Telemetry Iridium

OMC Platform Shore Server rsync OMC Data Server Inductive Modem Rutgers Data Files from CGSN RU Acquisition Point Server AGU Ocean Sciences 2016 Dataset

Agent Driver uFrame Database Data Product Algorithm GUI User 7 Data Processing Flow Cabled Ingest Config (MIOs/SD) Data from RSN Recorded

Preload Database (All) QC Parameters (RU) A/A Parameters (MIOs) Streaming Dataset Agent Driver (Omaha) Acquisition Point Server (RU) Data files

from CGSN (Telemetered and Recovered) Ingestion Sheet (.csv) (MIOs/RU) AGU Ocean Sciences 2016 uFrame Database (Omaha) Data Product Algorithms, QC, A/A (Omaha) Cal Sheets (MIOs) GUI

User (ASA-RPS) Algorithms (OSU) 8 Data Access Points Data Portal (GUI) THREDDS Server o Aggregated data products in NetCDF and CSV o ERDDAP (Spring-Summer 2016) Raw data files (Spring 2016) Shipboard Data Large Format Data o o o o Still Camera & HD Video Seismic Sensors

Hydrophones Bioacoustic Sonar Direct Pass to community organizations (e.g. seismic to IRIS) AGU Ocean Sciences 2016 9 OOI Essential Ocean Variables (EOVs) 1. All Arrays, all platforms: a. CTD data products (Temperature, Conductivity, Pressure, Density, Salinity) b. Dissolved Oxygen c. ADCP (all series) d. Bulk Meteorology (all products) e. Surface Wave Spectra significant wave height f. Fluorometric Products (CDOM/Chlorophyll/Backscatter) g. Nitrate h. Seawater pH i. In-Water and Air/Sea pCO2 2. Cabled Array EOVs (these instruments are only on the Cabled Array): a. HD Camera Products b. Bottom Pressure/Tilt Products

c. Seafloor Pressure d. Low-Frequency Hydrophone Products e. Vent Fluid and Particulate DNA Sampler D1000 Temperature Products AGU Ocean Sciences 2016 10 QC Procedures and Tools AGU Ocean Sciences 2016 11 Data Evaluation Team QA/QC Testing AGU Ocean Sciences 2016 12 OOI Automated QC Procedures 7 automated QC algorithms can produce 8 flags (including logical or which combines flags) which are plottable and are included in downloaded files Coded based on specifications written by OOI Project Scientists, derived from QARTOD manuals and other observatory experiences Algorithms refer to lookup tables assembled by OOI Project Scientists with

input from subject matter experts: https://github.com/ooi-integration/qc-lookup 1. 2. 3. 4. 5. 6. 7. Global Range Test Local Range Test Spike Test Stuck Value Test Trend Test Temporal Gradient Test Spatial Gradient Test (Profile) AGU Ocean Sciences 2016 13 QC Challenges Local range values will require ongoing gathering of environmental data for each platform

Spike test is currently very simple, and needs tweaking to avoid false positives/negatives (especially in biological data) and to work with certain data types Trend test may not work as designed, because it requires the system to compare data prior to the user request date Gradient test is complicated to apply, requires 2D dataset Not all QC algorithms apply to all data products ongoing review with Project Scientists The QC algorithms do NOT trigger alerts in the system Alerts/alarms only trigger when new data is telemetered/streamed

Can set alerts on L1/L2 data streams based on Global/Local range values AGU Ocean Sciences 2016 14 Data Evaluation Procedures & Tools Data Management Plan (Spring 2016): 1102-00000_Data_Management_Plan_OOI Data QA/QC and Sampling Plan: 1102-00300_QAQC_Cal_Physical_Samples_OOI Sampling Strategy Document: 1102-00200_Observation_and_Sampling_Approach_OOI

Data Product Specifications: 1341-000xx (DPS) and 1342-000xx (Data Flow) Data Product and QC Algorithms: github.com/ooici/ion-functions/tree/master/ion_functions/data github.com/ooici/ion-functions/tree/master/ion_functions/qc Quality Control Lookup tables: github.com/ooi-integration/qc-lookup Data Team download and plotting tools: github.com/najascutellatus/plot-nc-ooi AGU Ocean Sciences 2016 15 Lessons Learned AGU Ocean Sciences 2016 16 Metadata standards Process of defining, collecting, and determining presentation of metadata has been ongoing for longer than anticipated Software developers interpreted metadata standards differently than intended by the science requirements Climate and Forecast (CF-1.6) standards adopted Presentation in NetCDF header is incomplete but being improved

Provenance contained within file, which is not standard o Encoded in JSON format, requires extraction o Contains error notifications and information on how product was created More formatting work needs to be done AGU Ocean Sciences 2016 17 Communication The main method of communication from the data team to users is via annotation (requires enhancement) Main method of communication from users to data team is through help desk requests (via website) Also needed to standardize and improve communication between data team and Marine Implementing Organization personnel AGU Ocean Sciences 2016 18 QA vs. QC

Quality Control is a product check to identify flaws or defects Quality Assurance is a defined process used to diagnose why a product is flawed or being produced incorrectly The data team has had to become increasingly familiar with the end-toend production of data products Wider teamwork required to track (sometimes mysterious) issues back to a specific cause o Software error or missing data vs o Instrument miscalibration by vendor vs o Instrument settings changed during deployment vs o Mooring run over by ship AGU Ocean Sciences 2016 19 Future Plans AGU Ocean Sciences 2016 20 Near-Term Priorities Deliver high quality data to the community, enabled and accelerated via feedback and community eyes on the products o Requires transparency in software releases, QC procedures, and documentation Hybrid data delivery approaches to augment asynchronous data delivery via

GUI, using THREDDS, ERDDAP, etc. Organize a cross-project team to define and develop a method for post-recovery secondary calibration. This group will rely on external community input. Several high value and high interest OOI datasets (e.g. bioacoustics, covariance flux measurements, vent fluid chemistry) will require entraining the external community in data quality and delivery discussions Addition of a data forum and additional communications to external scientists o OOI is more then a collection of sensors: an ongoing distributed community discussion AGU Ocean Sciences 2016 21 OOI at Oceans 2016 TOWN HALL TOMORROW AT 12:30pm - Rooms 220-221 Consortium for Ocean Leadership & OOI Booth 611 in Exhibitor Hall Posters Today at 4pm OD14A-2394 - OOI Data Access and Visualization via the Graphical User Interface - Lori Garzio OD14A-2397 - The Ocean Observatories Initiative: Unprecedented access to real-time data streaming from the Cabled Array through OOI Cyberinfrastructure - Friedrich Knuth OD14A-2396 - OOI Data Acquisition Functions and Automated Python Modules - Mike Smith OD14A-2399 OOI Data Pre-Processing: Diagnostic Tools to Prepare Data for QA/QC Processing - Leila Belabbassi

Poster Thursday 4pm ED44B-1728: The OOI Ocean Education Portal: Enabling the Development of Online Data Investigations - Sage Lichtenwalner AGU Ocean Sciences 2016 Questions? OOI Main Web site: http://oceanobservatories.org Data Portal: http://ooinet.oceanobservatories.org Mike Vardaro, Data Manager, OOI CI Data Team [email protected] Acknowledgements: NSF, COL, Rutgers University, University of Washington, WHOI, Oregon State University, RPS-ASA, Raytheon, UCSD/SIO AGU Ocean Sciences 2016

Recently Viewed Presentations

  • Various systems of coordinates  Cartesian  Spherical  Cylindrical  Elliptical

    Various systems of coordinates Cartesian Spherical Cylindrical Elliptical

    Various systems of coordinates Cartesian Spherical Cylindrical Elliptical Parabolic … Spherical coordinates Coordinates on a sphere: latitude and longitude Celestial sphere Cylindrical coordinates Polar coordinates A ball of mass m is swung around a circle at the end of a...
  • Public Financial Management System (PFMS) Full Roll Out

    Public Financial Management System (PFMS) Full Roll Out

    Set up a PFMS Cell in their office. PFMS will place an outsourced Project Manager in the Cell. Other outsourced personnel may also be placed in the Cell along with some existing staff, as available. Identify all Implementing Agencies below...
  • Factors Affecting Earthquake Damage

    Factors Affecting Earthquake Damage

    Influences Type Of Buildings: MEDC's generally have better quality buildings, more emergency services and the funds to cope with disasters. People in MEDC's are more likely to have insurance cover than those in LEDC's. Time Of Day: An earthquake during...
  • SIMS 213: User Interface Design & Development Marti

    SIMS 213: User Interface Design & Development Marti

    Arial Narrow Times New Roman Cactus SIMS 213: User Interface Design & Development Metaphor in User Interfaces Metaphor and Analogy Metaphor PowerPoint Presentation Direct Manipulation uses a Metaphor The Desktop Metaphor Macintosh Desktop Caldera's Desktop Microsoft Bob's Desktop Metaphor Microsoft...
  • The Drive to Metrics A Global Review of

    The Drive to Metrics A Global Review of

    If HP observed media deadlines and had creative format resolved at planning stage, savings would be on average 10-15% Data: Starch for issue ending Jan 6 2003 Running two ads both scored well but synergies also evident Separately… and together…...
  • Deepwater Horizon Reputational Crisis 2.0 Greenpeace's Online ...

    Deepwater Horizon Reputational Crisis 2.0 Greenpeace's Online ...

    The cost of the 'Helios logo design' and its rollout cost BP in total 211 million dollars and became one of the most expensive logo designs and rebranding campaigns of all times. These efforts positioned the corporation as environmentally friendly...
  • Hygiene and Food Safety - Homesciencetges

    Hygiene and Food Safety - Homesciencetges

    Food hygiene -the careful handling of food in a way that will keep it safe and free from all contaminants. Hygiene at diff stages: Purchase of food-Food should be fresh ,good colour and desirable flavour . Avoid food with heavy...
  • Indiana Department of Veterans Affairs

    Indiana Department of Veterans Affairs

    Gulf War. Gulf War. Veterans discharged under conditions other than dishonorable who served in the . Southwest Asia theater of military operations, which includes the areas specified by regulation, but not Afghanistan, may be entitled to disability compensation for certain...