Chemical safety evaluation: Limitations of emerging test methods

Jennifer McPartland, Ph.D., is a Health Scientist. Richard Denison, Ph.D., is a Senior Scientist.

Parts in this series:      Part 1     Part 2     Part 3     Part 4

This is the fourth in a series of blog posts on new approaches that federal agencies are exploring to improve how chemicals are evaluated for safety.  In this post, we’ll discuss a number of current limitations and challenges that must be overcome if the new approaches are to fulfill their promise of transforming the current chemical safety testing paradigm. 

  • In vivo versus in vitro.  In our second post in this series, we addressed the question of whether high-throughput (HT) tests tell us anything different than we could learn from traditional tests in laboratory animals.  We noted that traditional toxicity testing aims to determine whether a particular dose of a chemical results in an observable change in the health or normal functioning of the whole animal (in vivo), while HT tests look to see whether and by what mechanism a chemical induces changes at the cellular or molecular level (in vitro) that may be precursor events leading to an actual disease outcome.  These types of changes can’t be easily detected or measured in whole animals.  However, the converse question must always be asked:  Can tests conducted in vitro accurately reflect the effects that a chemical would have in the more complex and complete environment of a whole animal?

Both EPA and the National Research Council have acknowledged that high-throughput, in vitro methods may not capture processes affecting chemicals that may occur within the more complex biology of the whole organism or within or between tissues.  To quote an EPA study, “The most widely held criticism of this in vitro-to-in vivo prediction approach is that genes or cells are not organisms and that the emergent properties of tissues and organisms are key determinants of whether a particular chemical will be toxic.”  This concern frequently arises in related conversations around chemical metabolites.

The toxicity posed by a chemical is not always derived from the chemical itself, but rather from the compounds generated from its breakdown inside an organism (called metabolites).  A classic example is the polycyclic aromatic hydrocarbon, benzo[a]pyrene, the metabolites of which are mutagenic and carcinogenic.  Metabolism can also work in reverse, of course, rendering a toxic chemical less or non-toxic.  Many of the high-throughput assays utilized in ToxCast and other HT systems lack explicit metabolizing capabilities.  EPA is exploring ways to better incorporate whole animal capabilities such as metabolism, in the context of HT tools, but until greater confidence in capturing these complexities exists, this issue will remain a factor limiting the extent to which in vitro HT test data can be considered fully predictive of in vivo effects.

  • HT tests don’t yet adequately cover the waterfront.   Determining whether a chemical perturbs a biological pathway necessarily requires that the pathway is included within the battery of high-throughput assays.  In other words, it’s impossible to detect an adverse effect if it’s not being tested for.  Dr. Robert Kavlock, Director of the National Center for Computational Toxicology at the EPA, put this well during an interview with Environmental Health Perspectives on ToxCast, “And then another lack that we have is we’re looking at 467 [HT] assays right now. We may need to have 2,000 or 3,000 assays before we cover enough human biology to be comfortable that when we say something doesn’t have an effect, that we’ve covered all the bases correctly.”

Likewise, during the NexGen Public Conference, Dr. Linda Birnbaum, Director of the National Institutes of Environmental Health Sciences (NIEHS), identified gene targets relevant to disease pathways involved in diabetes that currently are not included in the HT battery of assays.  These gene targets were suggested by experts during an NIEHS workshop on chemicals and their relationship to obesity and diabetes.   It will be critical for ToxCast-like efforts to continuously mine and integrate the latest science into their HT assay regimes.

  • Ability to account for diversity in the population.  Another challenge, not unique to newer testing methods, is the ability to account for real-world diversity among the human population that influences susceptibilities and vulnerabilities to toxic chemical exposures.  Individual differences in our genomes, epigenomes, life stage, gender, pre-existing health conditions and other characteristics are integral in determining the ultimate health effect of a chemical exposure.  Traditional animal toxicity tests typically use inbred, genetically identical animal strains to extrapolate and predict a chemical’s effect in humans.  This experimental design presents shortcomings not only because animal data are being used to predict effects in humans, but also because the data from a highly homogenous population is being used to make predictions for a very diverse human population.

Newer methods like HT testing will need to surmount similar constraints that arise from testing on homogenous populations of cells or cell components.  As we mentioned in the last post in this series, use of stem cells offers some ability to account for early life stages, and it may be possible to use multiple, genetically diverse cell lines to incorporate genetic variations.  This challenge has not escaped the federal experts behind new testing initiatives.  In fact, NIH Director Francis Collins confronted this issue in a 2008 publication, commenting on federal research endeavors that involve testing thousands of compounds on different human cell lines to account for differential susceptibility to effects.

  • Accounting for different patterns of exposure.  We know that we are not exposed to one chemical at a time; that we are in fact exposed to multiple chemicals at the same time.  Along the same lines, the frequency, duration, and intensities of exposure to a chemical or mixture of chemicals vary among us.  The ultimate impact of a toxic chemical exposure on our health may be quite different if that exposure happens, for example, one time late in life and at a high dose than if exposure is continuous, starting at a young age, and at a low dose.  Similar to the previous challenge, this is not a challenge specific to newer testing methodologies.  Nevertheless, whether and, if so, how these types of issues can and will be addressed in more novel testing strategies should be articulated to stakeholders.  Once again, this has been acknowledged by agency experts in a peer-reviewed publication:  “A related challenge is the understanding of what short-timescale (hours to days) in vitro assays can tell us about long-timescale (months to years) processes that lead to in vivo toxicity end points such as cancer.”
  • What level of perturbation is biologically significant?  At some point, an informed decision will need to be made as to what level of chemically-induced perturbation observed in an HT assay is considered sufficiently indicative or predictive of a toxic effect.  In other words, even if an assay performs perfectly, determining how to interpret and translate HT data into a measure of actual toxicity to humans is a challenge, especially when one attempts to overlay issues like human variability.  Not to beat a dead horse, but again this is not an issue unique to newer testing methods.  With the newer methods, decision rules will be needed to govern extrapolation to humans, analogous to adjustment factors and other means used currently to extrapolate from animal studies, aided where possible by data from human epidemiological studies and other benchmark references.  What’s most important at this stage is that Federal efforts continue to confront this challenge head on and are transparent about the approaches used to translate HT assay data into measures of human toxicity.
  • Insufficient accounting for epigenetic effectsEpigenetics is a burgeoning field of science that studies how gene expression and function can be altered by means other than a change in the sequence of DNA, i.e., a mutation.  As we have noted in an earlier post, epigenetic changes are critical to normal human development and function.  For example, epigenetics is the reason why skin cells stay skin cells and don’t change into kidney cells and vice versa.  Evidence is increasing that certain chemicals can interfere with normal epigenetic patterns.  For example, epigenetic changes induced by tributyltin have been shown to influence the programming of stem cells to become fat cells as opposed to bone cells.  The current ToxCast battery of assays is limited in explicitly measuring epigenetic effects of chemicals; see slide 13 here, and this description of one of the few such assays currently available.
  • False Negatives/False Positives.  Fundamental to the success of HT assays is their ability to correctly identify chemicals that are – and are not – of concern.   Such concerns inform EPA’s intense focus on validating of these methods.  EPA’s validation strategy largely involves testing chemicals with well defined hazard characteristics in the high-throughput assays, and seeing if the HT assays appropriately identify their hazards.

To the extent use of HT assays is presently discussed, it is generally within the context of screening chemicals for prioritization for further assessment.  Within this context, a proclivity of such assays to allow “false negatives” would be of much greater concern than their yielding “false positives.”  Why?

If a truly hazardous chemical isn’t “caught” in HT assays, then it could be erroneously deemed low-priority and set aside indefinitely.  While the converse could also happen – a chemical that is actually safe could be erroneously flagged as toxic and assigned a higher priority than warranted – such an error would almost certainly later be caught, as it would be subject to further scrutiny.

Already, false negatives have arisen in ToxCast.  For example, during last year’s NIEHS Workshop on the “Role of Environmental Chemicals in the Development of Obesity and Diabetes,” experts examining organotins and phthalates noted that ToxCast high-throughput assays did not successfully identify chemicals known to interfere with PPAR—a protein important for proper lipid and fatty acid metabolism—in assays designed to flag this interference.

Now, using HT tools to screen chemicals is a realistic near-term first routine use of these technologies.   But we should proceed with caution, because even prioritization is a decision with consequences.  As Dr. Kavlock put it, “You want to have as few false negatives as possible in the system, because if you put something low in a priority queue, we may never get to it, and so you really want to have confidence that when you say something is negative, it really does have a low potential.”

  • Ultimate Challenge: Use in regulatory decision-making.  If federal agencies are serious about advancing the newer testing methods to a point where they can form the core of a new toxicity testing and assessment paradigm, by extension these methods will also need to be deemed a sufficient basis for regulatory decisions.  In comparison to the other challenges discussed in this post, this one is the ultimate challenge:  to move the new methods from the research and development phase to serve as the basis for risk management and other regulatory determinations.  A corollary implication is that data derived using the new methods must be able to meet statutory and regulatory standards governing how the safety of a chemical is to be determined.

To meet this challenge, regulatory bodies will ultimately need to attain sufficient buy-in or acceptance from relevant stakeholders in the industry, NGO, and governmental sectors.  And to achieve that buy-in, at a minimum each of the challenges we’ve laid out in this post will need to be addressed.  Now, all of this will not happen overnight, of course, and will likely take many years. But it is imperative that the ultimate challenge in kept in sight, guiding the development of newer testing strategies as they move forward.


New molecular and computational testing approaches should ultimately improve our ability to protect human health and the environment from toxic chemical exposures.  To get there, however, scientific and stakeholder confidence in the capabilities of the new methods must be built.  In addition, explicit recognition and communication of their limitations, the associated uncertainty, and appropriate and inappropriate applications are essential as these methods mature and evolve.

Our impression from the NexGen Public Conference is that federal officials by and large embrace this perspective and forthrightly acknowledge some of the challenges we have discussed.  EPA’s earnestness in building confidence in the new approaches is evident by the intense research it is conducting on assessing and testing their capabilities.  A number of scientific publications by EPA personnel on high-throughput testing, and related efforts are already available, here.

EPA is also working on a series of chemical-disease/disorder prototypes.   The agency is using these prototypes as case studies to determine whether researchers can successfully “reverse-engineer” well-established adverse outcomes for data-rich chemicals using the newer types of molecular, systems biology-based methods.  The four initial draft prototypes underway are:  1) lung injury caused by ozone, 2) developmental impairment caused by thyroid hormone disruptors, 3) cancer caused by polycyclic aromatic hydrocarbons, and 4) cancer caused by benzene.

We applaud EPA’s and other agencies’ efforts in improving chemical assessments and ultimately chemical risk management.  The impetus for NexGen is in part due to the thousands of chemicals on the market that have not been adequately assessed for safety.  We all should be for developing means to assess chemical safety better, faster, at less cost and requiring the use of fewer laboratory animals.  This is part of the vision laid out in two seminal National Academy Reports, Toxicity Testing in the 21st Century and Science and Decisions.

But at the same time, we can’t get too far ahead of ourselves.  New approaches must be validated as effective for their intended uses and contexts before being so used.  And every effort needs to be directed at clearly communicating uncertainty and limitations even as their use expands.  Doing so will not only build stakeholder confidence in the project, but will also afford opportunities to work on solutions to address those uncertainties and challenges.

For EDF’s part, we believe the agency’s recent flurry of activity—ranging from CompTox to NexGen to Tox21 to the recent rollout of the Chemical Safety for Sustainability Program—is important to follow, encourage and help shape.

This entry was posted in Emerging Testing Methods, Health Science and tagged , , , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.