What is “metabolomics?”
Metabolomics is an emerging field of analytical chemistry whose goal is to quantify all metabolites in a biologic sample, typically through application of multiple analytical platforms that enable determinations across a variety of metabolite classes (Figure 18.1). As with other omics-based technologies, this comprehensive coverage of metabolism has the benefit of allowing the observation of metabolic stressor effects not only at a single target, but also the ripple effect of the stressor, including compensatory responses, across the metabolic landscape. Metabolomics integrates and aggregates the information stemming from genomic, transcriptomic and proteomic events, providing access to a quantitative description of the metabolic phenotype of an individual (Figure 18.2). Moreover, while the genetic code and protein machineries responsible for metabolism provide the distinguishing characteristics between genera, species, and even individuals, the metabolites modified by this elegant and elaborate machinery are far less variable across disparate organisms. For example, all mammals use the same basic suite of metabolites to store, transport and generate energy, build membranes, and use in purine or pyrimidine turnover. Therefore, metabolomics provides a unique and powerful hypothesis-testing and hypothesis-generating tool in both the pre-clinical and clinical research environments.
The ability to discriminate individuals based on pattern of metabolites measured in biofluids has a long history, which has advanced along with technology . An early biomedical application of this concept is newborn screening for inborn errors of metabolism.These tests employ diagnostic profiling of metabolites in blood spots (e.g. acylcarnitines, amino acids, organic acids) that can reveal tissue accumulation of intermediates resulting from specific metabolic lesions [7,8]. Metabolomics developed as an expansion of metabolic profiling, using pattern recognition to segregate subjects using both identified and unidentified components. However, the field is experiencing a transformation to a quantitative science as analyte peak identification and quantitative efficiency increases. As these efforts continue, repositories and searchable databases like those established within the Human Metabolome Database (www.hmdb.ca/) for metabolite reference ranges in the contexts of health and specific illnesses, will expand .The blood and urine metabolome contains a broad array of metabolites, many of which are dynamically linked to concentrations in tissues and thus report on peripheral tissue metabolism. While less common, metabolomic studies of cereobrospinal fluid have also proven useful, offering novel insights into the progression of diseases such as amytrophic lateral sclerosis . Currently, the human metabolome is believed to contain at least 6400 unique chemical entities. While this number is impressive, many analytes observed in routine metabolomic screens remain unidentified (non-annotated). In the metabolomic paradigm, “unknowns” are considered useful and are assigned unique identifiers allowing their discrete identification in future studies. If such an unknown is later found to associate with a specific disease or treatment intervention, the effort to elucidate its structure may be warranted.
Mass spectrometry and proton nuclear magnetic resonance (NMR) spectroscopy are complementary techniques most routinely employed for metabolomic analyses, often using blood plasma or urine. While NMR is nondestructive, rapid, and inherently quantitative, it is also insensitive and as such best suited for higher-abundance compounds. On the other hand, while mass spectrometric methods are inherently more sensitive and able to discriminate complex mixtures by employing chromatographic separations, true quantification requires access to authentic standards. As with all information-rich technologies, the exploration, analysis and interpretation of metabolomics data is challenging. However, routine statistical approaches exist to effectively manage these data sets and distill them into discrete pieces of information. Metabolomics results are inherently data-heavy and complex, and to properly investigate them requires familiarity with multivariate statistical analyses approaches . For instance, in case-control studies partial least squares discriminate analyses (PLS-DA) is often the default analysis approach.This process filters the data set to find those input variables that best characterize the differences between conditions (e.g. between case and control; pre- and post-treatment). While this approach to data reduction can remove 60–80% of the collected data to focus on variables that most robustly discriminate comparator groups, it can still leave hundreds of metabolites to consider. However, many of these compounds correlate strongly, and understanding which compounds move together (either increasing, or decreasing in concert) can further assemble the data into nodes of information that are descriptive of a biologic process. Hierarchical cluster analyses are the simplest approach to filtering data in this manner, and can parse the body of data into 5–10 pieces of discrete analyte groupings, a manageable amount to aid biologic interpretation.Another important issue that arises when dealing with such large data sets is the multiple comparison problem. Namely, using the statistical convention of a 95% confidence interval; if 100 measurements are made, 5 of these are expected to be statistically different by chance (i.e., α=0.05). Various approaches have been employed to control for this “multiple comparison error.”The simplest and most conservative of these is the Bonferonni correction which limits the odds that a given observation is by chance by setting the p-value for significance at α/n.Therefore, if 1000 metabolites are measured and tested for significance, a p=0.0005 would be required to claim a significant difference. A less conservative approach is controlling for the false discovery rate (e.g. ), which constrains the likelihood of making a false positive interpretation.However, an underlying assumption in these corrections is that the measured variables are independent and not collinear. In the case of metabolism, many metabolites are inherently linked. Thus, rather than a penalty, repeated determination of a difference in a group of related compounds should enhance interpretations as to the veracity of the results. In practice, this means that users of such data must carefully determine the true number of “independent” measurements that exist in a multivariate data set before application of these p-value adjustments.
Beyond the statistical analysis, visualization of these complex data sets is also an important tool for their analysis and interpretation. Where metabolic networks exist, fluxes and flows along metabolic pathways can be mapped. Similarly, correlation networks can be established showing interconnections between discrete pieces of information, thus helping to bridge and thus integrate the information into a more complete picture.’