staRdom is a package for R version 4 (R Development Core Team 2019) to analyse fluorescence and absorbance data of dissolved organic matter (DOM). The important features are:
staRdom has been developed and is maintained at WasserCluster Lunz (https://www.wcl.ac.at/index.php/en/) and the University of Natural Resources and Life Sciences, Vienna (https://boku.ac.at/).
staRdom comes with an Rmd template (infos on Rmd) where you can
start your analysis with example data and add your personal data and
parameters whenever you feel ready. We recommend to go interactively
through the template while reading this vignette to get an overview of
what is possible. Each version of staRdom provides a new template and
usually using templates from different versions should be fine. In case
you have difficulties running the calculations in the template, please
use the template of the very same version as the installed staRdom
package. As an advanced user, you can just include the functions in
whatever calculations you want to do (please see details in the advanced vignette). This
vignette describes the template. If you are interested in the specific
functions please refer to the help in R, which can be accessed by
help(function)
or in RStudio pressing F1 while the cursor
is on the function name in the code.
Later in the vignette there is also a chapter about troubleshooting. If you experience problems you may find a useful solution there.
This file aims for beginners in R and describes an easy way of calculating EEM peaks and absorbance (slope) parameters by just setting variables (no use of functions) in an Rmd file. The options are limited and a PARAFAC analysis cannot be done this way. This way of more or less automatic analysis bears the risks of missing informations in the data and overlooking problems like outliers and noise in the data. For more possibilities, options and a PARAFAC analysis please refer to the vignette for the PARAFAC analysis.
If you are a beginner in R you may find some help at the R-Studio online-learning (https://docs.posit.co/ide/user/), or Modern R with the tidyverse (https://modern-rstats.eu/) by Bruno Rodrigues.
The package is available on CRAN and can be installed via
install.packages("staRdom")
.
The template is accessible with the command
file.edit(system.file("EEM_simple_analysis.Rmd", package = "staRdom"))
.
You can and should save this file if you want to preserve it. The
example data is saved within the package structure and you can find the
containing folders with the commands
system.file(package = "staRdom")
. Raw data is in the
sub-folder “extdata”. The original settings in the template refer to
sample data and you can do full featured test runs with that. In case of
any problems, it can help to run the code chunk-wise (https://rmarkdown.rstudio.com/lesson-3.html).
On top of the template there are the header parameters necessary to create a report with knitr (https://yihui.org/knitr/). Parameters can be changed and just show up in the final report (e.g. author, title) or alter the appearance of the document. It is possible to create reports in other file formats. Please find details at https://rmarkdown.rstudio.com/lesson-9.html.
The directory your generated files are saved in is set by output_dir at line 7. It is important that you keep the “;” at the end of the line. Folders are delimited by “/”. In RStudio, pressing the tab key while the cursor is in the path can reveal possible folders on your drive.
title: "DOM analysis"
subtitle: "EEM peak picking, absorbance slope parameters"
date: "`r format(Sys.time(), '%B %e %Y')`"
author: "WCL"
knit: (function(inputFile, encoding) {
output_dir = 'C:/some_folder/output/';
rmarkdown::render(inputFile,
encoding=encoding,
output_file=file.path(output_dir,paste0('DOM_EEM_analysis_report_',format(Sys.time(), "%Y%m%d_%H%M%S"),'.htm')))
})
You need to specify the output folder on line 61 as well.
Please be sure to use the same file names for your fluorescence data, absorbance data and meta data, as differing file names are often the reason for a non working analysis.
The parameter sample_dir specifies the directory where your data files from the fluorometer are. They have to be in a text format (Cary Eclipse .csv files, Aqualog .dat files, Shimadzu .TXT files, Fluoromax-4 .dat files, Hitachi .TXT files, generic .CSV files). Samples can be stored in subfolders as well. Please be sure, that your file names are unique. File names must not contain ” ” (space) or “-” (minus) or start with a number. The command system.file() as used in the template EEM_simple_analysis.Rmd is used to access the example data and is not needed if you want to use your own data.
# Set the directory with your sample files. Please see eem_read() help for details on file formats.
# Sub folders are read in and are considered different sample sets.
# Import is done with eem_read() (package eemR), please see details there.
# The template refers to data coming with the package. Please use your data
# by setting the path to your files!
sample_dir = "C:/some_folder/input/fluor/" # e.g. sample_dir = "C:/some_folder/input/fluor/", system.file() accesses the example data coming with the package!
# Set the used instrument (with hyphens!):
# Cary Eclipse: "cary"
# Aqualog: "aqualog"
# Shimadzu: "shimadzu"
# Fluoromax-4: "fluoromax4"
# And furthermore, without hyphens:
# generic csv, excitation column-wise: eem_csv
# generic csv, emission column-wise: eem_csv2
# Hitachi F-7000: eem_hitachi
fluorometer = eem_csv
Absorbance data are needed for inner-filter effect correction and the
calculation of the slope parameters. They are taken from the folder
specified by absorbance_dir
. The filenames or column
designations must be identical to the EEM file names to link
fluorescence and absorbance data distinctly. Please be sure, that your
file names are unique. File names must not contain ” ” (space) or “-”
(minus) or start with a number. For the calculations the light path
length (in cm) used in the photometric measurement has to be set.
### Absorbance data ###
#~~~~~~~~~~~~~~~~~~~~~#
# Absorbance data is read from *.TXT or *.CSV files.
# Either a directory containing one or more files can be named or a single file containing all samples.
# Absorbance data is used for inner-filter-effect correction and calculation of the slope parameters.
# Those steps can be skipped but keep in mind it is important for a profound analysis!
#
# path of adsorbance data as directory or single file, sub folders are not read:
absorbance_dir = "C:/some_folder/input/absorbance/" # e.g. absorbance_dir = "C:/some_folder/input/absorbance/", system.file() accesses the exmaple data coming with the package!
# Path length of absorbance measurement in cm that was used in absorbance measurement.
# If it is set to "meta" data from the metadata table is used (details see below).
absorbance_path = 5 # e.g. absorbance_path = 5
In case your samples’ measurements differ and you need to set parameters sample-wise, you can set distinct dilution factors, photometer cuvette lengths and Raman areas in a table. You can skip that if dilution factors and cuvette lengths are similar and you used blank samples for calculating the Raman area. Distinct numbers can then be set as described below. A dilution factor of e.g. 10 means, there is 1 part sample and 9 parts ultrapure water added.
### Meta data ###
#~~~~~~~~~~~~~~~#
# Adding a table with meta data is OPTIONAL!
# The table can contain dilution factors, path lengths of
# the photometer and raman areas and is intended
# for cases where different values should be used for different
# samples. Each column can be used optionally.
# read table with metadata as *.TXT or *.CSV
# either a path or FALSE if no metadata file is used.
metadata = system.file("extdata/metatable.csv",package = "staRdom") # e.g. metadata = "C:/some_folder/input/metatable.csv"", system.file() accesses the exmaple data coming with the package!
### Meta data: names of columns ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# designation of column with sample names
col_samples = "sample"
# if you want to use dilution factors (e.g. 10 if 1 part sample and 9 parts solvent) from the meta data table, state the name
# of the column containing the dilution data and set dilution = "meta" (below)
col_dilution = "dilution"
# if you want to use the cuvette length (in cm) for the absorbance from the meta data table,
# state the name of the column containing the cuvette lengths and set absorbance_path = "meta" (below)
col_cuv_len = "cuv_len"
# if you want to use the raman area (under the curve) data from the meta data table, state the name
# of the column containing the raman areas and set raman_normalisation = "meta" (below)
col_raman_area = "raman"
Spectral correction is done to remove instrument-specific influences on the EEMs (DeRose and Resch-Genger 2010). Some instruments can do this automatically. Correction vectors should be provided in at least the same range as the EEM measurements. If this is not the case, the EEMs are cut to this range. Please provide paths to csv tables containing wavelengths in the first column and correction values in the second.
#### Spectral correction of EEMs ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Some instruments, but not all need a spectral correction to compensate
# for specific deviations in the measurements. A vector for emission and
# excitation is used each. EEMs are cut to match the wavelength range of
# the vectors if used. Please provide paths to csv tables containing
# wavelengths in the first column and correction values in the second. If
# you do not want spectral correction to be done, setting these two input
# files it not necessary.
# Emission correction vector
Emcor <- system.file("extdata/CorrectionFiles/mcorrs_4nm.csv",package="staRdom") # e.g. "C:\folder\emcor.csv", FALSE
# Excitation correction vector
Excor <- system.file("extdata/CorrectionFiles/xc06se06n.csv",package="staRdom")
If you want to export your picked peaks and slope parameters as a
table, set the parameter output_xls = TRUE
. Exporting XLS
files needs a properly configured Java environment. If any problems are
encountered, a CSV is written to your output directory instead and can
be opened with a spreadsheet software. Furthermore, you can specify the
cell separator and decimal point of you data files in case a CSV file is
written.
### Table output ###
#~~~~~~~~~~~~~~~~~~#
# Write a table with peaks and slope parameters.
# Written as xls or, in case of missing java environment or the package xlsx as csv.
output_xls = TRUE # e.g. TRUE
# In case of a csv export you can define the separator and the decimal point here.
out_sep_dec = c("\t",".") # e.g. out_sep_dec = c("\t",".")
The script offers several options for exporting plots.
output_overview_png
states whether overview plots
containing a number of samples (overview_number
) each are
saved in the output directory and output_single_png
is the
parameter if you want to export single PNGs from each sample. The
parameter scale_col
defines if the colour range of all
plots is synchronised. If you want to compare different samples, it is
easier if the colour code has the same range. Weak peaks in samples with
lower fluorophore presence than other samples can be found easier if the
colours are not synchronised.
With the parameters overview
and
single_plots
these plots can be included in the report
using the same parameters scale_col
and
overview_number
.
### Plot settings PNG ###
#~~~~~~~~~~~~~~~~~~~~~~~#
# State whether you want pngs of the single EEM spectra written in your output directory
output_single_png = FALSE # e.g. TRUE
## State whether you want pngs of multiple EEM spectra written in your output directory
output_overview_png = FALSE # e.g. TRUE
## number of EEM spectra plottet in each overview image
overview_number = 6 # e.g. 6
# The scaling of the different sample plots can be chosen.
# Either all samples are coloured according to the range of the
# complete sample set (TRUE) or each plot is scaled separately (FALSE).
scale_col = FALSE # e.g. TRUE
### Plot settings report ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~#
# This block defines which plots are included in the report.
#
# Add plots with several EEM samples per plot.
# The number per plot is defined by overview_number above.
overview = TRUE # e.g. TRUE
# State whether you want plots from single EEM spectra in the report.
single_plots = FALSE # e.g. TRUE
The data of the analysis can be stored in an Rdata file to keep track of your work or to deepen the analysis later (e.g. PARAFAC).
#### Save data for further analysis in R ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# File name where data is stored in RData format in the output directory.
# Set to FALSE if you dont want your eem data saved.
# Date, time and file extension is added automatically so you do not overwrite previous saved data.
data_file = "eem_data" # e.g. "eem_data"" or FALSE
# Desired name for the variable containing the eem data.
eem_name = "eem_list"
Raw data from fluorometers and photometers bear several shortcomings. Murphy et al. (2013) addressed several ways of EEM data correction that are used in the staRdom template. Some correction methods need specific data (e.g. absorbance ) to be applied. Corrections can be necessary and can help you focus on certain aspects and information covered by noise otherwise. But depending on your aim they might not be necessary. Bro and Smilde (2003) offer additional information on correction of EEM data. If either of the correction steps was already done (e.g. by the instrument), the specific correction step can just be skipped. In this case R might still tell you about missing correction steps and warnings might appear, but these can be ignored as long as the whole correction was done as desired.
The instrumental baseline drift in absorbance data can be removed by subtracting the mean of the absorbance at high wavelengths (Li and Hur 2017). The default is to use the spectrum between 680 and 700 nm but any other range can be set manually.
#### Normalising absorbance data to baseline ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Absorbance data can be corrected by subtracting a baseline value from each sample.
# In high wavelength ranges (default 680-700 nm), the absorbance is assumed to be 0.
# The average value of that range (or any other range) is subtracted from the whole spectrum.
# abs_norm can be set TRUE to use the default range, you can specify the desired range by a vector of length 2 and you can set it FALSE to skip this correction.
abs_norm = TRUE # e.g. TRUE, c(700,800)
If samples were diluted before the spectroscopic measurements, the
dilution factor can be set here and the sample will be corrected
accordingly. As an example a dilution factor of 10 means a 1:10 dilution
(1 part sample and 9 parts ultrapure water). By setting
dilution = "meta"
, data from the meta table is used and
each sample can be corrected by an individual dilution factor.
### Correction of diluted samples ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Set a dilution factor if your samples were diluted.
# All samples are multiplied with this factor.
# Please use a meta table (above) if your dilutions are differing
# 1 for no dilution, 10 for dilution 1:10 (1 part sample and 9
# parts ultrapure water), "meta" for data from meta table
dilution = "meta" # e.g. 1 for undiluted samples
The reason of diluting a sample can be inner-filter effects that cannot be corrected conveniently if they are too high (Kothawala et al. 2013). On the contrary, absorbance data might be better analysed undiluted. To combine the results of diluted EEM measurements and undiluted absorbance measurements, the following parameter can be set to do this automatically. Please check the results because depending on your sample names, automatically guessed combinations might not be recognised correctly!
# In case of diluted samples, two absorbance measurements of the
# same sample in different dilutions might be present. If this is
# the case, EEMs are renamed to the undiluted sample, absorbance
# data might be multiplied by the dilution factor if it is only
# presentas diluted sample. This can be done automatically. In the
# final protocol a table shows, what has been done to the samples.
# Please check this table and see, if the output is what you
# wanted it to be!
dil_sample_name_correction = FALSE
The spectral correction is described above. Here you can set, whether a spectral correction should be applied. Correction vectors need to bet supplied above.
#### Spectral correction of EEMs ####
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Some instruments, but not all need a spectral correction to compensate
# for specific deviations in the measurements. Please sepecify, if you want
# spectral correection to be done.
spectral_cor = TRUE # e.g. TRUE, set to FALSE, if your instrument already provided EEMs with spectral correction
EEM data can be cut in both dimensions. Peaks are calculated before
the reduction. Cut ranges are set with vectors containing the upper and
lower limits: c(lower,upper)
. If you want to avoid any
cutting, set the vectors to c(0,Inf)
. Inf
means infinity, so the script keeps data from 0 to infinity. The script
also allows to cut all samples to the size of the sample with the
shortest range which is necessary if you want to perform a PARAFAC
analysis. Cutting can be necessary to remove noisy data in advance of a
PARAFAC analysis or to increase the visibility of important peaks in
plots.
### Cut data to certain range ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Set a vector with range of wavelengths to be plotted and saved.
# Peak picking is done before range reduction.
# Emission wavelength:
em_range = c(0,Inf) # e.g. c(300,500), c(0,Inf) to use everything
# Excitation wavelength:
ex_range = c(0,Inf) # e.g. c(300,500), c(0,Inf) to use everything
# Cut all samples to fit largest range available in all samples
cut_range_to_smallest = FALSE # e.g. FALSE
Blank samples are data from measuring ultrapure water. Systematic biases can be removed by subtracting the blank sample from each sample. The blank samples have to contain “nano”, “miliq”, “milliq”, “mq” or “blank” (cases are ignored) in the file name. Regular samples must not contain one of these words. Blanks have to be in the same (sub)folder as the samples that are corrected with the certain blank. Multiple blanks in one (sub)folder are averaged. It needs to be measured with each sample set (e.g. once a day) (Murphy et al. 2013) and kept together in one folder.
### Blank correction ###
#~~~~~~~~~~~~~~~~~~~~~~#
# A blank sample is subtracted from each sample. Blank samples have to be
# in the same (sub)folder as the according EEM samples. So different blanks are used
# for different subsets. The file names of the blanks have to contain nano,
# miliq, milliq, mq or blank (cases are ignored). Other samples must not
# contain these words in their names respectively!
blank_correction = FALSE # e.g. FALSE
The inner-filter effect is caused by absorbance that blocks light in the pathway from the source to the sensor during fluorescence measurements. To apply the inner-filter effects correction described in Kothawala et al. (2013), absorbance data have to be measured for each sample. By knowing the exact absorbance, this effect can be mathematically corrected. In case of a total absorbance greater than 1.5, the sample has to be diluted because otherwise, the linear relationship is not appropriate anymore.
Diagonal scatter peaks hinder the analysis of EEM data as they usually are much greater than peaks from DOM. They can be partly removed by subtracting the blank sample as described above. Diagonal peaks are called Rayleigh and Raman peaks of first and second order. They can also make a PARAFAC analysis impossible. Senesi (1990), Lakowicz et al. (2006) and Coble et al. (1990) offer additional information.
The width of the removed scatter slot can be set. Make sure not to lose too much data while still removing the whole peak. If you use the interpolation below, a remaining diagonal peak hints at insufficient width. Elcoroaristizabal et al. (- Elcoroaristizabal et al. 2015), Bahram et al. (2006) and Zepp et al. (2004) suggest an interpolation of the removed scattering prior to a PARAFAC analysis and offer a description.
### Remove scattering and interpolate missing data ###
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
# Scattering is removed from the EEM spectra.
remove_scatter <- c(TRUE, TRUE, TRUE, TRUE) # logical values, ordered by raman1,raman2,rayleigh1,rayleigh2
# Set the width of removed scatter slot (usually 10 to 20).
# If you can still see traces of scattering after interpolation,
# this value should be increased. You can specify a vector containing
# separate widths for each scatter c(15,16,16,14), ordered by raman1,raman2,rayleigh1,rayleigh2
# In case one or more scatter peaks are skipped, this vector must remain of length 4 and positions of the certain widths must be kept.
remove_scatter_width = c(15,15,15,15) # e.g. 15 or c(15,15,15,15)
# state whether removed scattering should be interpolated
interpolation <- TRUE # e.g. TRUE
Fluorescence intensities can differ between analyses on different fluorometers, different settings or different days on the same fluorometer. The so-called Raman normalisation makes samples comparable and normalises fluorescence intensities to Raman units. In staRdom, it can be applied in two ways. Either you use a blank sample (details see at chapter Blank correction above) to calculate the value for the normalisation (Lawaetz and Stedmon 2009) or you provide a certain value, that is used. Fixed values for each sample can be set in the meta table as well.
### Raman normailsation ###
#~~~~~~~~~~~~~~~~~~~~~~~~~#
# State whether a Raman normalisation should be performed
# Either "blank" if a blank is present in each (sub)folder of the EEM data.
# Blank samples have to be in the same (sub)folder as the EEM samples. So
# different blanks are used for different subsets. The file names of the
# blanks have to contain nano, miliq, milliq, mq or blank (cases are ignored).
# Other samples must not contain these words in their names respectively!
# Normalisation is then calculated with this blank, the raman area as a number
# or "meta" if the raman areas should be taken from the meta data table.
raman_normalisation = "blank" # e.g. "blank", FALSE, 160, "meta"
For calculating the peaks, the EEMs can be smoothed. If so, peaks and indices are calculated from smoothed EEMs but these are not saved. The smoothing parameter specifies the size of a moving average window in nm.
If you reach the box below in the template, all parameters are set and you can finally run the analysis.
#############################################
# #
# THERE ARE NO SETTINGS BELOW. #
# YOU CAN KLICK "KNIT" AT THE MENU BAR. #
# In case of errors, chunk-wise execution #
# of the code can reveal problems! #
# #
# Please read the help of the used #
# functions if you encounter #
# any problems: #
# Press F1 while cursor in function or #
# type help(function) in command line! #
# #
# Please read the #
# error messages carefully! #
# Naming of the input files and table #
# column and row names is crucial! #
# #
#############################################
You can run the script by clicking the “Knit” button in the toolbar of RStudio. At the first run of the script you may be asked if you want to install several packages. Please confirm. This can take some time. Your generated files are placed in your specified output folder. In case you experience problems, consider to start over with a “fresh” template.
The script is running in R environment (R Development Core Team 2019). Using a graphical user interface like RStudio (https://posit.co/download/rstudio-desktop/) can help beginners to get into it.
You can install staRdom via RStudio by klicking Tools -> Install
Packages… or by entering the command
install.packages("staRdom")
in the command line.
If any of the programs are already installed on your computer, you can skip the respective step. In case of problems while running the script, consider re-installing/updating the respective programs.
Download:
https://cran.r-project.org/mirrors.html
Installation manual:
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#R-Installation-and-Administration
Download:
https://posit.co/download/rstudio-desktop/
Please find the installer for your specific operating system there.
Optionally(!) you need a Java runtime environment to import data from XLS files and a TeX environment (e.g. MikTeX for Windows) to export PDF files. You can use the script to the full extend without those.
NA stands for ‘Not available’ and means the wavelength range of the certain peak is missing. Be sure that you measured the range of the certain peak on your instrument.
If samples differ considerably in the amount of DOM, scaling might be
a problem. You can scale each sample plot separately by setting
scale_col = FALSE
.
If you encounter problems with reading csv files please visit:
Be sure the specified drive and folder existis on your system (e.g. C:/) and you have write access.