Skip to content
Permalink
main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time

DataGuide-AAPF-HSI

This document details the file system organization and navigation procedures for accessing both unprocessed hyperspectral imaging (HSI) data and the derived phenotypic measurements of the plants captured by the Ag Alumni Seed Phenotyping Facility (AAPF). The data can be categorized into three main parts:

  • Raw Scanned Data: This folder structure houses individual scans (.hdr, .md5, .raw, .tiff) for each measurement.
  • Interim Data Product: This data contains data used for quality control procedures and images representing various vegetation indices.
  • Masterfile (.xlsx): This file contains a compilation of spreadsheets and a guide document explaining the data contents.

Table of Contents


File system structure

Raw Scanned Data

Example (does not contain all associated files):
401_80 (/depot/smarterag/data/HSI/401_80)
├── HSI_R_2401856_240606084952173_SWIR-SIDE-DAT_401_80_240606085015154.hdr
├── HSI_R_2401856_240606084952173_SWIR-SIDE-DAT_401_80_240606085015154.md5
├── HSI_R_2401856_240606084952173_SWIR-SIDE-DAT_401_80_240606085015154.raw
├── HSI_R_2401856_240606084952173_SWIR-SIDE-DAT_401_80_240606085015154.tiff
├── HSI_R_2401856_240606084952173_SWIR-TOP-DAT_401_80_240606085006035.hdr
├── HSI_R_2401856_240606084952173_SWIR-TOP-DAT_401_80_240606085006035.md5
├── HSI_R_2401856_240606084952173_SWIR-TOP-DAT_401_80_240606085006035.raw
└── HSI_R_2401856_240606084952173_SWIR-TOP-DAT_401_80_240606085006035.tiff
  • Parent directory
    • {Experiment_Number}_{Treatment}
    • Example: 407_WL, 407_WW, 401_20-1, 401_20-2, 401_80
  • Data files
    • Header File (.hdr): This file stores metadata associated with the data acquisition process. It includes information such as sensor specifications, date and time of capture, etc.
    • MD5 File (.md5): This file contains a Message Digest 5 (MD5) checksum. The MD5 checksum is a unique identifier generated from the data itself and can be used to verify data integrity. Any accidental corruption during storage or transmission will result in a mismatch between the calculated MD5 checksum and the one stored in the file.
    • Hyperspectral Datacube (.raw): This file stores the raw hyperspectral data. A hyperspectral datacube is a three-dimensional array where each layer represents a two-dimensional image captured at a specific wavelength band.
    • Image File (.tiff): This format stores two image representations depending on the data type. For VNIR data, a True Color Composite (TCC) image is generated. This image combines red, green, and blue bands from the hyperspectral datacube, creating a color representation similar to the natural scene. Conversely, SWIR data utilizes a False Color Composite (FCC) image. Here, bands from the datacube are combined, but the chosen bands may not correspond to the typical red, green, and blue channels. It is important to note that both TCC and FCC images are primarily used for quality control purposes.
    • Filename convention: HSI_R_{POT_BARCODE}_{Time_In}_{VNIR|SWIR}-{SIDE|TOP}-DAT_{Experiment_No}_{Treatment}_{Time_Out}.{hdr|md5|raw|tiff}. For additional information, refer to this link.

Interim Data Product & Masterfile

Example (does not contain all associated files):
401 (/depot/smarterag/hsiproc/401 or /depot/smarterag/hsiproc/working/401)
├── Hyperspectral_data_AAPF_experiment_401.xlsx (masterfile)
├── quality_control
│   ├── header_summary_plants.csv
│   ├── header_summary_ref.csv
│   ├── HSI_R_2401761_240514080207482_VNIR-SIDE-DAT_401_20-1_240514080219626.gif
│   ├── HSI_R_2401761_240514080207482_VNIR-TOP-DAT_401_20-1_24051408022152.gif
│   ├── HSI_R_2401761_24061408172718_VNIR-SIDE-DAT_401_20-1_240614081746352.gif
└── VI_images
    ├── NDVI_default.zip
    ├── NDWI_default.zip
    ├── NLI_default.zip
    └── NMDI_default.zip
  • Parent directory
    • {Experiment_Number}
    • Example: 401, 407
  • Subdirectories
    • qualtiy control
    • VI_images
  • Data files
    • Masterfile (.xlsx): This file stores the derived phenotypic measurements of the plants and a guide document.
    • HSI Data Processing Quality Control Data (.csv and .gif files under quality_control directory): This data contains information for quality assessment of the HSI data processing pipeline. It includes summaries of header information for plants, white and dark reference (header_summary_plants.csv, header_summary_ref.csv), a detailed record of the processing procedures employed (processing_strategy.csv), and animated gif files used to check coregistration quality between visible and near infrared (VNIR) and shortwave infrared (SWIR) data. The filename conventions for gif files are same as raw scanned data.
    • Compressed Vegetation Index Images (.zip, under VI_images directory): This section houses compressed image files for various types of vegetation indices calculated from the HSI data.

How to Read Information in Header Summary (header_summary_plants.csv, header_summary_ref.csv)

The definition of column headers and corresponding values mostly originates from the sensor vendor and system integrator.

Column Head Description
header path File path to the header file
camera model Camera mode
serial number Serial number of the camera
roi left Camera- and height-specific parameter, not relevant to vegetation segments
roi top Camera- and height-specific parameter, not relevant to vegetation segments
roi width Camera- and height-specific parameter, not relevant to vegetation segments
roi height Camera- and height-specific parameter, not relevant to vegetation segments
samples Width of data cube (conveyer belt direction)
bands Number of spectral channels
lines Number of scanslines from top to bottom
scanStartTime Scan start time
scanEndTime Scan end time
Wavelength List of central wavelengths for each channel (separated by "|")
fwhm Full wavelength at half maximum (width of wavelengths, separated by "|"
PlantID Plant ID (POT_BARCODE)
CarrierID System-generated carrier ID
PlantHeight Plnat height (❗ calculation method unclear)
PlantWidth Plant width (❗ calculation method unclear)
LotID Experiment ID
CameraPosition The value in this column is set to "Both" when data from the corresponding opposing viewpoint (top for side views, side for top view) is also available. (❗ Verification pending)
illumination "4" under normal illumination conditions (no malfunction)
specdir Spectral range of the camera and viewing direction (VNIR-SIDE, VNIR-TOP, SWIR-SIDE, or SWIR-TOP)
scan_order The 'Scan order' column assigns a unique integer value to each scan incrementally based on the order of image capture during the experiment within the AAPF. This value is identical for all cameras (VNIR-SIDE, VNIR-TOP, SWIR-SIDE, and SWIR-TOP) operating concurrently during a single scanning session, which captures data for the same plant on the same day
dark white "Dark" or "White", only available for header_summary_ref.csv

How to Read Hyperspectral Data Processing Summary (.csv)

This table provides a concise overview of the data processing steps applied concurrently to each HSI acquisition session, encompassing output data products from VNIR-SIDE, VNIR-TOP, SWIR-SIDE, and SWIR-TOP sensors. A detailed explanation of each column header follows:

Column Head Description
scan_order This column references the same scan_order defined in the Header Summary section.
LotID This column references the same LotID code defined in the Header Summary section.
VNIR-SIDE
VNIR-TOP
SWIR-SIDE
SWIR-TOP
These columns specify the corresponding header filenames for each sensor and view combination (SIDE or TOP) based on the designated scan order
step1 This column outlines the processing strategy employed for side-view data (SIDE). When both VNIR and SWIR data are available, the strategy is Combine SIDE, involving coregistration of VNIR and SWIR data during processing. If only VNIR data exists, the strategy is Use VNIR-SIDE, where coregistration is not performed, and all data extraction relies solely on VNIR data. In the absence of VNIR data or both VNIR and SWIR data, the strategy is denoted as Pass.
step2 This column mirrors the processing strategy descriptions in step1, but specifically for top-down view data (TOP). Values in this column directly correspond to those in step1, simply replacing "SIDE" with "TOP" to reflect the view orientation.

Interpretation of Animated GIF Files for VNIR-SWIR Coregistration Quality Assessment

Side view Top-down view

This section details the interpretation of an animated GIF file used to evaluate the coregistration quality between VNIR (visible and near-infrared) and SWIR (short-wave infrared) imagery. The animation consists of four frames, each displaying a vegetation mask derived from a specific processing step:

  • 1/4 frame: Vegetation mask obtained from the VNIR data (presented upside-down to match the original data orientation for both side and top-down views).
  • 2/4 frame: Vegetation mask derived from the SWIR data after an initial coarse alignment using affine transformation parameters (translation, scaling, and rotation) specified in HSI data processing parameters.
  • 3/4 frame: Vegetation mask from the SWIR data following a subsequent geometric refinement using the cv.findTransformECC function. Significant misalignment between 3/4 frame and 4/4 frame indicate the need to adjust the translation, scaling, and rotation parameters within HSI data processing parameters.
  • 4/4 frame: The vegetation mask from 1/4 frame with an additional morphological filter applied. The default parameter applies no filtering, while upper, middle,and lower parameters utilize only the upper, middle, or lower third of the plant segments, respectively (applicable to side view images only). Future iterations may incorporate additional morphological filter options.

How to Read Information in Masterfile file (.xlsx, spreadsheet)

Phenotypic measurements derived from the HSI data and auxiliary measurements are stored in spreadsheets. The master data file (*.xlsx) is organized into two distinct groups:

  • Vegetative_indices_Side, Vegetative_indices_Top: This worksheet houses information on various vegetation indices calculated for side and top views.
  • Reflectance_Side, Reflectance_Top: This worksheet contains reflectance values at each measured wavelength, categorized by side and top views.

Here's a breakdown of the column headers for the above worksheets:

Column Head Description
Filename-VNIR-SIDE Filename of input VNIR-SIDE data
Filename-VNIR-TOP Filename of input VNIR-TOP data
Filename-SWIR-SIDE Filename of input SWIR-SIDE data
Filename-SWIR-TOP Filename of input SWIR-TOP data
EXP ID Internal experiment number assigned in PPEW
POT_BARCODE Unique number assigned to plant
VARIETY Variety assigned in PPEW
TREATMENT Treatment applied to plant
SCAN_TIME Scan start time
SCAN_DATE Scan start date
DFP Age of plant in days from planting at time of imaging
{vegetation index}_{statistic}
(e.g., NDVI_avg)
Vegetation index (VI) statistics within masked vegetation areas
List of vegetation indices
List of statistics: avg (average), sd(standard deviation), max, min, p## (##-th percentile)
Wavelength in nm
(e.g., 402.8)
Average reflectance over vegetation mask area (ratio of reflected energy to incident energy)