Posted By: Sarah Ratzel, PhD, Science Editor, AJHG
Each month, the editors of The American Journal of Human Genetics interview an author(s) of a recently published paper. This month, we check in with John A. (Tony) Capra and Will Bush, to discuss their paper, “Comprehensive Analysis of Constraint on the Spatial Distribution of Missense Variants in Human Protein Structures.”
AJHG: What caused you to start working on this project?
Tony: The roots of this project go all the way back to when I was in graduate school. As a graduate student, I studied how quantifying evolutionary patterns in protein sequences and structures between species could help us understand their functions (e.g., Capra et al. 2009). Then, I transitioned to working on the genetics of recent human evolution and didn’t think much about proteins for several years. When I started my own lab, a colleague came to me with a question about the function of a protein-coding variant in a human protein. As lead author Mike Sivley and I mapped the evolutionary conservation of this variant and its 3D neighborhood across species, we realized that it was silly not to include the wealth of information about genetic variation within human populations in these analyses as well. Around the same time, my colleague Will Bush had a similar idea. Once we got together and implemented a pipeline to map a few variants into protein structures, there was nothing (except lots of debugging!) to stop us from doing it comprehensively. More than 4 million variants later, we had this paper.
Will: I have had a long fascination with structural biology, and have focused much of my work on genomic analyses that are informed in some way by the biological context where variation occurs. This project started for me when multiple studies were published using technologies that explicitly target coding variation, which point to protein-level thinking. Around this time, I met Tony with expertise in protein evolution, and this project felt like the perfect way to start a new collaboration.
AJHG: What about this paper most excites you?
Tony: This paper is a great example of how looking across fields can help solve hard problems. Once we had mapped protein-coding variants into 3D structures, we needed to find a way to quantify whether their spatial patterns exhibited evidence of constraint. After several failed attempts, we realized that this problem had a lot in common with questions that field ecologists commonly ask about the distribution of individuals across physical ranges. A bit of reading revealed the Ripley’s K framework for evaluating and comparing spatial distributions of observations. We had to adapt the methodology for our application, but making this connection to a problem in another field provided the foundation for our solution. I like that an approach from ecology helped us to re-establish a strong link between human genetics and structural biology.
Our results also illustrate why data sharing is so important. By putting two big publicly available databases together, we were able to learn something new about how genetic variants are constrained in 3D space. It would not have been possible without the efforts and foresight of the groups that collected and maintain protein structural data (the Protein Data Bank) and genetic variation data (gnoMAD, COSMIC, TCGA, and ClinVar). Thank you to all of them!
Will: Like Tony, I am excited about the potential of modeling genomic data in a totally different way! The field of geospatial analysis has grown dramatically over the last few years, so using Ripley’s K just scratches the surface of the potential approaches that could be applied in this context. Given all the data that is available for research, the idea of data integration has become quite popular, but there are often many methodological hurdles to combining data of different types or from different domains in a coherent way. I’m excited that our work contributes in this area, and I echo Tony’s thanks to all the wonderful resources that provided the data we used in this work.
AJHG: Thinking about the bigger picture, what implications do you see from this work for the larger human genetics community?
Tony: This paper provides a framework that I believe will improve analysis strategies in both human genetics and structural biology. Both fields have seen substantial increases in the amount of data available over the past 15 years, and our work illustrates the potential to extract insight from the integration of patterns of human genetic variation with 3D structures. We have many new ideas about fully leveraging this combined point of view.
I hope that the human genetics community will recognize that structural biology has many powerful tools that can help us with variant interpretation. However, our results demonstrate that getting the full benefit of the structural perspective requires considering the complex 3D context of variants. This goes beyond the basic structural information, like secondary structure, that is often included in variant pathogenicity predictors.
We also think that we human geneticists have a lot to teach structural biologists, especially about the flexibility and dynamics of their structures. But that’s a topic for another paper!
Will: Beyond our key findings, I hope that this work will inspire other ways to think of the genome in 3D! Chromatin conformation studies are now producing spatial maps of DNA within the nucleus, and we know that these patterns influence gene expression. Long non-coding RNAs fold into complex forms to achieve their functions – many possibilities exist!
AJHG: What advice do you have for trainees/young scientists?
Tony: Talk to diverse scientists (and non-scientists). This will help you make unexpected connections between fields. Much of the motivation for this project came out of the fact that my office happens to be on the same floor as the Vanderbilt Center for Structural Biology. Different fields have powerful datasets and methods that have direct relevance to important problems (like variant interpretation). The challenge is finding them and then figuring out how they fit together! It is much easier to be creative when you have a broad knowledge of what is state-of-the-art in different fields.
Will: Keep your work organized and persevere. Mike Sivley is a meticulous note-taker, so it was easy at any given moment to go back to prior results and put everything together. Taking good notes is also a great way to know what questions you are asking, and to push through until you have an answer. With any project, there is a time when multiple setbacks make you question the whole endeavor. Looking back over notes from an entire project is the best way to see how much you’ve learned in the process, and that can be a strong motivator to push forward.
AJHG: And for fun, tell us something about your life outside of the lab.
Tony: I secretly want to be a bartender. I suspect this is because I watched too many re-runs of Cheers when I was young. I also hate getting to work before noon.
Will: I intentionally schedule my meetings with Tony before noon, and I really love a good bourbon, especially from Tony’s bar.
Tony Capra, PhD, Assistant Professor at Vanderbilt University, has been an ASHG member since 2012. Will Bush, Assistant Professor at Case Western Reserve University, has been an ASHG member since 2005 and served on the Society’s Communications Committee from 2012-17.