The package provides methods to read output files from the MetIDQ™ software into R. Metabolomics data is read and reformatted into an S4 object for convenient data handling, statistics and downstream analysis.
There is a version available on CRAN.
The latest version can directly be installed from the github.
The package takes metabolomic measurements as “.xlsx” files generated from the MetIDQ™ software. Additionally, meta data for each sample can be provided for further analysis.
Effective quantification of metabolites requires a reliable signal. The limit of detection (LOD) is defined as 3 times signal-to-noise of the baseline, calculated by the software MetIDQ™ for each metabolite. Values are classified as “Valid”, “LOQ” (below limit of quantification) or “LOD” (below limit of detection). This information is encoded in the color in the “.xlsx” table. Further color codes can include “ISTD Out of Range”, “Invalid” and “Incomplete”. The MetAlyzer packages allow to read both information, the value and the quantification status, gives statistics about the quantification efficacy and allows filtering based on the LOD.
To present the base functionalities of MetAlyzer, we use a data set
published by Gegner
et al. 2022. This data set was analyzed using the MxP®Quant
500 kit and inlcudes 6 different extraction methods (1 to 6) to quantify
630 metabolites in triplicates for 4 tissue types (Drosophila, Mouse
Liver, Mouse Kidney, Zebrafish Liver). It is included in the MetAlyzer
package and can be accessed via example_extraction_data()
.
MetAlyzer_dataset
reads the data from Excel into R and
stores it as a SummarizedExperiment (SE).
fpath <- example_extraction_data()
metalyzer_se <- MetAlyzer_dataset(file_path = fpath)
#>
#>
#> _____ ______ _______ _________ ________ ___ ___ ___ ________ _______ ________
#> |\ _ \ _ \|\ ___ \|\___ ___\\ __ \|\ \ |\ \ / /|\_____ \|\ ___ \ |\ __ \
#> \ \ \\\__\ \ \ \ __/\|___ \ \_\ \ \|\ \ \ \ \ \ \/ / /\|___/ /\ \ __/|\ \ \|\ \
#> \ \ \\|__| \ \ \ \_|/__ \ \ \ \ \ __ \ \ \ \ \ / / / / /\ \ \_|/_\ \ _ _\
#> \ \ \ \ \ \ \ \_|\ \ \ \ \ \ \ \ \ \ \ \____ \/ / / / /_/__\ \ \_|\ \ \ \\ \|
#> \ \__\ \ \__\ \_______\ \ \__\ \ \__\ \__\ \_______\__/ / / |\________\ \_______\ \__\\ _\
#> \|__| \|__|\|_______| \|__| \|__|\|__|\|_______|\____/ / \|_______|\|_______|\|__|\|__|
#> \|____|/
#>
#>
#> Info: Reading color code "FFFFCCCC" as "#FFCCCC"
#> Info: Reading color code "FF00CD66" as "#00CD66"
#> Info: Reading color code "FF6A5ACD" as "#6A5ACD"
#> Info: Reading color code "FF87CEEB" as "#87CEEB"
#> Info: Reading color code "FFFFFFCC" as "#FFFFCC"
#>
#> Measured concentration values:
#> ------------------------------
#> 0% 25% 50% 75% 100%
#> 0.000 0.017 1.760 21.200 288149.000
#>
#> NAs: 5348 (8.38%)
#> Note: 'Metabolism Indicators' are frequently NA!
#>
#> Measured quantification status:
#> -------------------------------
#> Valid: 24095 (37.77%)
#> LOQ: 5799 (9.09%)
#> LOD: 21789 (34.16%)
#> Invalid: 12105 (18.98%)
#> NAs: 0 (0%)
metalyzer_se
#> class: SummarizedExperiment
#> dim: 862 74
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(862): C0 C2 ... Sum of TGs Sum of Unsaturated TGs
#> rowData names(1): metabolic_classes
#> colnames(74): 9 10 ... 89 90
#> colData names(7): Plate Bar Code Sample Bar Code ... Sample Volume
#> Measurement Time
The SE includes assay information with concentration values and quantification status, while storing metabolites and their respective classes in the row data. It is important to note that there are 630 metabolites, and 232 metabolite indicators calculated using the MetIDQ™ software. Sample-wise metadata is stored in the column data. Additional data such as the file path (‘file_path’), the Excel sheet number (‘sheet_index’), a list of colors for each quantification status (status_list), as well as a tibble of aggregated concentration data and quantification status for downstream analysis and visualizations (‘aggregated_data’) are stored in the SE’s metadata.
Summaries of concentration values and quantification status are
automatically called during data loading, but can also manually be
called using summarizeConcValues(metalyzer_se)
and
summarizeQuantData(metalyzer_se)
. The function
summarizeConcValues
prints the quantiles and the amount /
percentage of NAs in the concentration data. The function
summarizeQuantData
gives an overview of the quantification
status of the metabolites. E.g 34.16% of metabolites where below the
Limit of Detection.
The various elements within the SummarizedExperiment (SE) can be accessed in a manner consistent with a typical SE.
meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 7 columns
#> Plate Bar Code Sample Bar Code Sample Type Sample Description
#> <character> <character> <character> <character>
#> 9 1030423581-1 | 10304.. 1030420836 Sample 1
#> 10 1030423581-1 | 10304.. 1030420841 Sample 1
#> 11 1030423581-1 | 10304.. 1030420855 Sample 1
#> 12 1030423581-1 | 10304.. 1030423460 Sample 2
#> 13 1030423581-1 | 10304.. 1030420860 Sample 2
#> 14 1030423581-1 | 10304.. 1030420874 Sample 2
#> Tissue Sample Volume Measurement Time
#> <character> <character> <character>
#> 9 Drosophila 10 2020.10.05 17:52:54 ..
#> 10 Drosophila 10 2020.10.05 17:59:08 ..
#> 11 Drosophila 10 2020.10.05 18:05:19 ..
#> 12 Drosophila 10 2020.10.05 18:11:33 ..
#> 13 Drosophila 10 2020.10.05 18:17:47 ..
#> 14 Drosophila 10 2020.10.05 18:23:58 ..
metabolites <- rowData(metalyzer_se)
head(metabolites)
#> DataFrame with 6 rows and 1 column
#> metabolic_classes
#> <character>
#> C0 Acylcarnitines
#> C2 Acylcarnitines
#> C3 Acylcarnitines
#> C3-DC (C4-OH) Acylcarnitines
#> C3-OH Acylcarnitines
#> C3:1 Acylcarnitines
concentration_values <- assays(metalyzer_se)$conc_values
head(concentration_values, c(5, 5))
#> 9 10 11 12 13
#> C0 203.000 86.800 246.000 198.000 369.000
#> C2 29.500 15.800 34.600 39.200 42.200
#> C3 39.200 9.290 49.900 20.100 50.000
#> C3-DC (C4-OH) 0.057 0.055 0.076 0.679 0.071
#> C3-OH 0.106 0.132 0.116 0.145 0.170
quantification_status <- assays(metalyzer_se)$quant_status
head(quantification_status, c(5, 5))
#> 9 10 11 12 13
#> C0 "Valid" "Valid" "Valid" "Valid" "Valid"
#> C2 "Valid" "Valid" "Valid" "Valid" "Valid"
#> C3 "Valid" "Valid" "Valid" "Valid" "Valid"
#> C3-DC (C4-OH) "LOD" "LOD" "LOD" "Valid" "LOD"
#> C3-OH "LOD" "LOD" "LOD" "LOD" "LOD"
In addition to accessing the aggregated data through
metadata(metalyzer_se)$aggregated_data
, it can be directly
accessed using aggregatedData.
aggregated_data <- aggregatedData(metalyzer_se)
head(aggregated_data)
#> # A tibble: 6 × 5
#> # Groups: Metabolite [1]
#> ID Metabolite Class Concentration Status
#> <fct> <fct> <fct> <dbl> <fct>
#> 1 9 C0 Acylcarnitines 203 Valid
#> 2 10 C0 Acylcarnitines 86.8 Valid
#> 3 11 C0 Acylcarnitines 246 Valid
#> 4 12 C0 Acylcarnitines 198 Valid
#> 5 13 C0 Acylcarnitines 369 Valid
#> 6 14 C0 Acylcarnitines 127 Valid
The functions filterMetabolites
and
filterMetaData
can be used to subset the data set based on
certain metabolites or meta variables. Both functions subset
conc_values, quant_status and
aggregated_data.
Individual metabolites as well as metabolic classes can be passed to
filterMetabolites
. Here, we want to exclude the metabolism
indicators.
metalyzer_se <- filterMetabolites(metalyzer_se, drop_metabolites = "Metabolism Indicators")
#> Info: 232 metabolites were removed!
metalyzer_se
#> class: SummarizedExperiment
#> dim: 630 74
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(630): C0 C2 ... TG(22:6_34:3) Choline
#> rowData names(1): metabolic_classes
#> colnames(74): 9 10 ... 89 90
#> colData names(7): Plate Bar Code Sample Bar Code ... Sample Volume
#> Measurement Time
The extraction methods are encoded in the column “Sample Description” of the meta data. There are 2 blank samples in the data that we want to remove and only keep extraction method 1 to 6.
metalyzer_se <- filterMetaData(metalyzer_se, `Sample Description` %in% 1:6)
#> Info: 2 samples were removed!
To be more specific and for easier handling, we change the column “Sample Description” of the meta data to “Extraction_Method”.
metalyzer_se <- renameMetaData(metalyzer_se, "Extraction_Method" = "Sample Description")
meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 7 columns
#> Plate Bar Code Sample Bar Code Sample Type Extraction_Method
#> <character> <character> <character> <character>
#> 9 1030423581-1 | 10304.. 1030420836 Sample 1
#> 10 1030423581-1 | 10304.. 1030420841 Sample 1
#> 11 1030423581-1 | 10304.. 1030420855 Sample 1
#> 12 1030423581-1 | 10304.. 1030423460 Sample 2
#> 13 1030423581-1 | 10304.. 1030420860 Sample 2
#> 14 1030423581-1 | 10304.. 1030420874 Sample 2
#> Tissue Sample Volume Measurement Time
#> <character> <character> <character>
#> 9 Drosophila 10 2020.10.05 17:52:54 ..
#> 10 Drosophila 10 2020.10.05 17:59:08 ..
#> 11 Drosophila 10 2020.10.05 18:05:19 ..
#> 12 Drosophila 10 2020.10.05 18:11:33 ..
#> 13 Drosophila 10 2020.10.05 18:17:47 ..
#> 14 Drosophila 10 2020.10.05 18:23:58 ..
Additional meta data can be easily added to the Summarized Experiment. Here, we want to add the column “Date” with the current date to the meta data as well as the column “Replicate” with extraction method replicates.
replicate_meta_data <- example_meta_data()
head(replicate_meta_data)
#> Method Replicate
#> 1 1 R1
#> 2 1 R2
#> 3 1 R3
#> 4 2 R1
#> 5 2 R2
#> 6 2 R3
metalyzer_se <- updateMetaData(
metalyzer_se,
Date = Sys.Date(),
Replicate = replicate_meta_data$Replicate
)
meta_data <- colData(metalyzer_se)
head(meta_data)
#> DataFrame with 6 rows and 9 columns
#> Plate Bar Code Sample Bar Code Sample Type Extraction_Method
#> <character> <character> <character> <character>
#> 9 1030423581-1 | 10304.. 1030420836 Sample 1
#> 10 1030423581-1 | 10304.. 1030420841 Sample 1
#> 11 1030423581-1 | 10304.. 1030420855 Sample 1
#> 12 1030423581-1 | 10304.. 1030423460 Sample 2
#> 13 1030423581-1 | 10304.. 1030420860 Sample 2
#> 14 1030423581-1 | 10304.. 1030420874 Sample 2
#> Tissue Sample Volume Measurement Time Date Replicate
#> <character> <character> <character> <factor> <factor>
#> 9 Drosophila 10 2020.10.05 17:52:54 .. 2024-03-12 R1
#> 10 Drosophila 10 2020.10.05 17:59:08 .. 2024-03-12 R2
#> 11 Drosophila 10 2020.10.05 18:05:19 .. 2024-03-12 R3
#> 12 Drosophila 10 2020.10.05 18:11:33 .. 2024-03-12 R1
#> 13 Drosophila 10 2020.10.05 18:17:47 .. 2024-03-12 R2
#> 14 Drosophila 10 2020.10.05 18:23:58 .. 2024-03-12 R3
MetAlyzer includes functions to perform data analysis as it was
performed for MetaboExtract
(Andresen
et al. 2022). All analysing functions in MetAlyzer use the
aggregated concentration data and quantification status
(‘aggregated_data’). The calculate_cv
calculates
the mean, standard deviation (SD) and the coefficient of variation (CV)
within each group and saves it to ‘aggregated_data’. Each group
is additionally categorized into given thresholds of variation based on
the calculated CV.
metalyzer_se <- calculate_cv(
metalyzer_se,
groups = c("Tissue", "Extraction_Method", "Metabolite"),
cv_thresholds = c(0.1, 0.2, 0.3),
na.rm = TRUE
)
#> Info: Calculating mean and coefficient of variation (groupwise: Tissue * Extraction_Method * Metabolite)... finished!
aggregated_data <- aggregatedData(metalyzer_se) %>%
select(c(Extraction_Method, Metabolite, Mean, SD, CV, CV_thresh))
#> Adding missing grouping variables: `Tissue`
head(aggregated_data)
#> # A tibble: 6 × 7
#> # Groups: Tissue, Extraction_Method, Metabolite [2]
#> Tissue Extraction_Method Metabolite Mean SD CV CV_thresh
#> <fct> <fct> <fct> <dbl> <dbl> <dbl> <fct>
#> 1 Drosophila 1 C0 179. 82.4 0.461 more30
#> 2 Drosophila 1 C0 179. 82.4 0.461 more30
#> 3 Drosophila 1 C0 179. 82.4 0.461 more30
#> 4 Drosophila 2 C0 231. 124. 0.538 more30
#> 5 Drosophila 2 C0 231. 124. 0.538 more30
#> 6 Drosophila 2 C0 231. 124. 0.538 more30
As described in Gegner et al., an ANOVA is calculated to identify extraction methods with superior and inferior extraction performance. NAs or zero values are first imputed, to avoid NAs during log2 transformation. To impute NAs, the impute_NA argument can be set to TRUE. The argument impute_perc_of_min specifies the percentage of the minimum concentration value that is used for imputation.
metalyzer_se <- calculate_anova(
metalyzer_se,
categorical = "Extraction_Method",
groups = c("Tissue", "Metabolite"),
impute_perc_of_min = 0.2,
impute_NA = TRUE
)
#> Info: Imputing concentrations (groupwise: Metabolite) with 20% of the minimal positive value... finished!
#> Info: Calculating ANOVA (groupwise: Tissue * Metabolite)... finished!
aggregated_data <- aggregatedData(metalyzer_se) %>%
select(c(Extraction_Method, Metabolite, imputed_Conc, log2_Conc, ANOVA_n, ANOVA_Group))
#> Adding missing grouping variables: `Tissue`
head(aggregated_data)
#> # A tibble: 6 × 7
#> # Groups: Tissue, Metabolite, Extraction_Method [2]
#> Tissue Extraction_Method Metabolite imputed_Conc log2_Conc ANOVA_n ANOVA_Group
#> <fct> <fct> <fct> <dbl> <dbl> <int> <chr>
#> 1 Droso… 1 C0 203 7.67 3 A
#> 2 Droso… 1 C0 86.8 6.44 3 A
#> 3 Droso… 1 C0 246 7.94 3 A
#> 4 Droso… 2 C0 198 7.63 3 A
#> 5 Droso… 2 C0 369 8.53 3 A
#> 6 Droso… 2 C0 127 6.99 3 A
The result of the imputation is stored in an extra column “imputed_Conc”. In total 12,212 zero values were imputed in this example.
cat("Number of zero values before imputation:",
sum(aggregatedData(metalyzer_se)$Concentration == 0, na.rm = TRUE), "\n")
#> Number of zero values before imputation: 12212
cat("Number of zero values after imputation:",
sum(aggregatedData(metalyzer_se)$imputed_Conc == 0, na.rm = TRUE), "\n")
#> Number of zero values after imputation: 0
MetAlyzer also includes functions to calculate and visualize the log2 fold change of data with paired effects. For this, we load a dataset with control and mutated samples analyzed using the MxP®Quant 500 XL kit. The XL kit covers 332 additional metabolites and 6 additional metabolic classes compared to the MxP®Quant 500 kit.
fpath <- example_mutation_data_xl()
metalyzer_se <- MetAlyzer_dataset(file_path = fpath)
#>
#>
#> _____ ______ _______ _________ ________ ___ ___ ___ ________ _______ ________
#> |\ _ \ _ \|\ ___ \|\___ ___\\ __ \|\ \ |\ \ / /|\_____ \|\ ___ \ |\ __ \
#> \ \ \\\__\ \ \ \ __/\|___ \ \_\ \ \|\ \ \ \ \ \ \/ / /\|___/ /\ \ __/|\ \ \|\ \
#> \ \ \\|__| \ \ \ \_|/__ \ \ \ \ \ __ \ \ \ \ \ / / / / /\ \ \_|/_\ \ _ _\
#> \ \ \ \ \ \ \ \_|\ \ \ \ \ \ \ \ \ \ \ \____ \/ / / / /_/__\ \ \_|\ \ \ \\ \|
#> \ \__\ \ \__\ \_______\ \ \__\ \ \__\ \__\ \_______\__/ / / |\________\ \_______\ \__\\ _\
#> \|__| \|__|\|_______| \|__| \|__|\|__|\|_______|\____/ / \|_______|\|_______|\|__|\|__|
#> \|____|/
#>
#>
#> Info: Reading color code "FFCBD2D7" as "#CBD2D7"
#> Info: Reading color code "FF7FB2C5" as "#7FB2C5"
#> Info: Reading color code "FFCBD2D7" as "#CBD2D7"
#> Info: Reading color code "FFB9DE83" as "#B9DE83"
#> Info: Reading color code "FFB9DE83" as "#B9DE83"
#> Info: Reading color code "FFA28BA3" as "#A28BA3"
#> Info: Reading color code "FFA28BA3" as "#A28BA3"
#> Info: Reading color code "FFB2D1DC" as "#B2D1DC"
#> Info: Reading color code "FFB2D1DC" as "#B2D1DC"
#> Info: Reading color code "FF7FB2C5" as "#7FB2C5"
#>
#> Measured concentration values:
#> ------------------------------
#> 0% 25% 50% 75% 100%
#> 0.00 1.63 7.77 43.07 139200.40
#>
#> NAs: 155 (0.85%)
#>
#>
#> Measured quantification status:
#> -------------------------------
#> Valid: 14280 (77.85%)
#> LOQ: 819 (4.47%)
#> LOD: 3243 (17.68%)
#> NAs: 0 (0%)
metalyzer_se
#> class: SummarizedExperiment
#> dim: 1019 18
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(1019): C0 C2 ... TG 22:6_34:3 Choline
#> rowData names(1): metabolic_classes
#> colnames(18): 32 33 ... 48 49
#> colData names(11): Plate bar code Sample bar code ... Org. info
#> Measurement time
Again, we want to remove the metabolism indicators.
metalyzer_se <- filterMetabolites(metalyzer_se, drop_metabolites = "Metabolism Indicators")
#> Info: No metabolites were removed!
metalyzer_se
#> class: SummarizedExperiment
#> dim: 1019 18
#> metadata(4): file_path sheet_index status_list aggregated_data
#> assays(2): conc_values quant_status
#> rownames(1019): C0 C2 ... TG 22:6_34:3 Choline
#> rowData names(1): metabolic_classes
#> colnames(18): 32 33 ... 48 49
#> colData names(11): Plate bar code Sample bar code ... Org. info
#> Measurement time
In this data set, the column “Sample Description” holds information about the wether a sample is a wild type or a mutant.
meta_data <- colData(metalyzer_se)
meta_data$`Sample Description`
#> [1] "Mutant" "Mutant" "Mutant" "Mutant" "Mutant" "Mutant" "Mutant"
#> [8] "Mutant" "Mutant" "Control" "Control" "Control" "Control" "Control"
#> [15] "Control" "Control" "Control" "Control"
To determine the direction of the effect during log2 fold change calculation, the information is converted into a factor and saved as a new column “Control_Mutant”. The log2 fold change will now be calculated from “Control” to “Mutant”.
control_mutant <- factor(colData(metalyzer_se)$`Sample Description`, levels = c("Control", "Mutant"))
metalyzer_se <- updateMetaData(metalyzer_se, Control_Mutant = control_mutant)
meta_data <- colData(metalyzer_se)
meta_data$Control_Mutant
#> [1] Mutant Mutant Mutant Mutant Mutant Mutant Mutant Mutant Mutant
#> [10] Control Control Control Control Control Control Control Control Control
#> Levels: Control Mutant
In order to avoid NAs during log2 transformation the same imputation
of NAs or zero values is applied as for calculate_anova
. To
impute NAs, the impute_NA argument can be set to TRUE. The
argument impute_perc_of_min specifies the percentage of the
minimum concentration value that is used for imputation.
metalyzer_se <- calculate_log2FC(
metalyzer_se,
categorical = "Control_Mutant",
impute_perc_of_min = 0.2,
impute_NA = TRUE
)
#> Info: Imputing concentrations (groupwise: Metabolite) with 20% of the minimal positive value... finished!
#> Info: Calculating log2 fold change from Mutant to Control (column: Control_Mutant).
#> Info: Calculating log2 fold change groupwise (Metabolite) using a linear model... finished!
In addition to accessing the log2 fold change data through
metadata(metalyzer_se)$log2FC
, it can be directly accessed
using log2FC
.
log2FC(metalyzer_se)
#> # A tibble: 1,019 × 5
#> Metabolite Class log2FC pval qval
#> <fct> <fct> <dbl> <dbl> <dbl>
#> 1 C0 Acylcarnitines -0.0953 0.694 0.838
#> 2 C2 Acylcarnitines 0.323 0.431 0.643
#> 3 C3 Acylcarnitines -0.351 0.219 0.488
#> 4 C3-DC (C4-OH) Acylcarnitines 0.315 0.420 0.633
#> 5 C3-OH Acylcarnitines 0.593 0.460 0.667
#> 6 C3:1 Acylcarnitines 0.203 0.781 0.894
#> 7 C4 Acylcarnitines 0.240 0.185 0.462
#> 8 C4:1 Acylcarnitines 1.11 0.365 0.585
#> 9 C5 Acylcarnitines 0.115 0.628 0.791
#> 10 C5-DC (C6-OH) Acylcarnitines 0.342 0.0498 0.272
#> # ℹ 1,009 more rows
The log2 fold change between the wild type and the mutant can be visualized with a volcano plot:
log2fc_vulcano <- plot_log2FC(
metalyzer_se,
hide_labels_for = rownames(rowData(metalyzer_se)),
vulcano=TRUE
)
log2fc_vulcano
as well as in a scatter plot categorized by metabolic classes:
log2fc_by_class <- plot_log2FC(
metalyzer_se,
hide_labels_for = rownames(rowData(metalyzer_se)),
vulcano=FALSE
)
log2fc_by_class
MetAlyzer also provides a visualization of the log2 fold change across different pathways, in order to identify groups of metabolites impacted by the effect.