Skip to content

Standardized Nitrogen and Phosphorus Dataset (SNAPD) - This repository contains the code necessary to replicate the harmonized water quality dataset from Krasovich et al (2022)

License

Notifications You must be signed in to change notification settings

Global-Policy-Lab/SNAPD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Standardized Nitrogen and Phosphorus Dataset (SNAPD)

This repository contains the code necessary to replicate the harmonized water quality dataset from:

E. Krasovich, P. Lau, J. Tseng, J. Longmate, K. Bell, and S. Hsiang. "Harmonized nitrogen and phosphorus concentrations in the Mississippi/Atchafalaya River Basin from 1980 to 2018," Scientific Data, 2022.

Data inputs and outputs are available on our HydroShare Repository.


1. Setup

All scripts are written in R. Throughout this README, paths to code and data assume you execute scripts from the top-level SNAPD/ folder using R or RStudio, matching the folder structure in the HydroShare repository.

1.1 Installing dependencies

Run Code/_install_us_wq_packages.R to install all required R packages, or source it from the master workflow (see Section 3).

1.2 Downloading shapefiles

Before running any code, download the following two shapefiles and save them to Data/_A_workflow/:


2. Repository Structure

SNAPD/
├── README.md
├── LICENSE
├── Code/
│   ├── _install_us_wq_packages.R       # Install required R packages
│   ├── _master_workflow_and_setup.R    # Master script: runs full pipeline
│   ├── A00_us_raw_wqd_retrieval_workflow.R
│   ├── A01download_wq_sites_from_WQP.R
│   ├── A02create_and_clean_WQP_site_df.R
│   ├── A03download_wqd_by_nutrient.R
│   ├── A04merge_wqd_w_site_data_by_download.R
│   ├── A05crop_wqp_sites_to_mrb.R
│   ├── B00_us_wqd_processing_workflow.R
│   ├── B01standardize_wq_org_names.R
│   ├── B02recover_state_and_make_unique_sites.R
│   ├── B03flag_sample_level_metadata.R
│   ├── B04flag_raw_obs_w_unknown_chemical_form.R
│   ├── B05flag_result_level_metadata.R
│   ├── B06flag_and_convert_wqd_units.R
│   ├── B07merge_nutrient_compounds_and_rename_RSFs.R
│   ├── B08get_upper_DLs_and_merge_w_wqd.R
│   ├── B09impute_non_detects.R
│   ├── B10flag_potential_outliers.R
│   ├── B11flag_duplicate_types.R
│   ├── B12create_full_flagged_dataset.R
│   ├── B13harmonize_duplicates.R
│   ├── B14combine_parameters.R
│   ├── B15final_cleaning.R
│   ├── C00_us_wq_data_figures_and_tables_workflow.R
│   ├── C01create_raw_wqd_summary_table.R
│   ├── C02create_harmonization_process_table.R
│   ├── C03create_final_wqd_summary_table.R
│   ├── C04create_technical_validation_histograms.R
│   └── C05make_sankey_plots.R
└── Data/                               # Not tracked — see HydroShare for all data

Data files are not tracked in this repository. Static copies of all inputs and outputs are available on HydroShare.


3. Replication

There are three stages to the data harmonization pipeline, each corresponding to a lettered workflow.

Run the full pipeline

The entire pipeline can be run from the master workflow:

source("Code/_master_workflow_and_setup.R")

The master workflow installs packages, loads libraries, creates directories, sets file paths, and sources each stage in sequence. Set your working directory to the top-level SNAPD/ folder before running.


Stage 1 — Data retrieval from the Water Quality Portal

Entry point: Code/A00_us_raw_wqd_retrieval_workflow.R

Downloads raw water quality site and sample data from the Water Quality Portal and performs minimal cleaning. Output is saved to Data/_A_workflow/all_raw_wqd_and_sites.fst.

Note: The Water Quality Portal is frequently updated. Running Stage 1 may produce data that differs from what we used. We recommend skipping this stage and using our archived output on HydroShare unless new/updated data is desired. Running Stage 1 with new data may require downstream code adjustments.

Script Description
A01 Download WQ sites from WQP
A02 Create and clean WQP site dataframe
A03 Download WQ data by nutrient
A04 Merge WQ data with site data
A05 Crop WQP sites to Mississippi/Atchafalaya River Basin

Stage 2 — Data processing and harmonization

Entry point: Code/B00_us_wqd_processing_workflow.R

Performs the cleaning and harmonization steps described in Table 2 of Krasovich et al. (2022). Outputs are saved to Data/_B_workflow/, including:

  • WQP_to_SNAPD_flagged.fst — intermediate flagged dataset
  • SNAPD.fst — final harmonized dataset
Script Description
B01 Standardize WQ organization names
B02 Recover state codes and make unique sites
B03 Flag sample-level metadata
B04 Flag observations with unknown chemical form
B05 Flag result-level metadata
B06 Flag and convert WQ data units
B07 Merge nutrient compounds and rename result sample fractions
B08 Get upper detection limits and merge with WQ data
B09 Impute non-detects
B10 Flag potential outliers
B11 Flag duplicate types
B12 Create full flagged dataset
B13 Harmonize duplicates
B14 Combine parameters
B15 Final cleaning

Stage 3 — Figures and tables

Entry point: Code/C00_us_wq_data_figures_and_tables_workflow.R

Creates figures and tables used in Krasovich et al. (2022). Requires Stages 1 and 2 to be completed first. Outputs are saved to Data/_C_workflow/.

Specifically outputs: Table 1, Table 2, Table 5, Figure 4, Figure 5 (A and B), Figure 6 (A and B), and SNAPD_final_wqd_sites.csv (used for Figures 1 and 3 in QGIS). Final figure edits are made in Adobe Illustrator after export from R.

Script Description
C01 Create raw WQ data summary table (Table 1)
C02 Create harmonization process table (Table 2)
C03 Create final WQ data summary table (Table 5)
C04 Create technical validation histograms (Figures 4 and 5)
C05 Make Sankey plots (Figure 6)

4. Data

Static copies of all data inputs and outputs are archived on HydroShare:

HydroShare Repository: http://www.hydroshare.org/resource/9547035cf37940eb9b500b7994a378a1

Variable definitions for all datasets are in the Data Records section of Krasovich et al. (2022).


5. Citation

Please cite the dataset as:

Krasovich, E., P. Lau, J. Tseng, J. Longmate, K. Bell, S. Hsiang (2022). Standardized Nitrogen and Phosphorus Dataset (SNAPD), HydroShare. http://www.hydroshare.org/resource/9547035cf37940eb9b500b7994a378a1

And the associated paper as:

Krasovich, E., Lau, P., Tseng, J., Longmate, J., Bell, K., & Hsiang, S. (2022). Harmonized nitrogen and phosphorus concentrations in the Mississippi/Atchafalaya River Basin from 1980 to 2018. Scientific Data, 9, 556. https://doi.org/10.1038/s41597-022-01650-6

About

Standardized Nitrogen and Phosphorus Dataset (SNAPD) - This repository contains the code necessary to replicate the harmonized water quality dataset from Krasovich et al (2022)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages