We recently spoke with Geoffrey Ginsburg, MD, PhD, Chief Medical and Scientific Officer at All of Us, who leads the effort to set the scientific vision and strategy for the program as well as oversees the program’s collection and curation of data and the integration of new data types.
All of Us is a precision medicine research program at the National Institutes of Health (NIH) that seeks to recruit at least one million participants in the United States. Earlier this spring, All of Us shared a program announcement, highlighting the release of its initial genomic dataset.
What led you to joining All of Us?
I have been in the precision medicine space for most of my career pushing for the ability to use a variety of data sources for impact on human health. In my prior position at Duke University, I ran a precision medicine program in implementation science and the utilization of genetic & genomic testing in healthcare environments. When the All of Us opportunity came along, I saw it as a once-in-a-lifetime opportunity to shape a national research agenda focused on precision medicine. I was highly motivated to contribute to NIH because I’ve been well funded by NIH for most of my academic career, and this position is really a chance to give back in terms of public service. All of Us, in my mind, puts everything together, integrating huge data sources for discovery and translation.
What scientific framework do you envision for All of Us?
I was brought on to help craft the scientific agenda for the next five to ten years. I have embarked on a journey to create a living scientific agenda: science is not static. It is a daunting task to shape a scientific agenda of this magnitude, but a very exciting one. To date, the focus at All of Us has rightfully been on enrollment, recruitment, data quality, data capture, and building the infrastructure to make this entire program work. I think we are now at an inflection point where we have an enormous amount of data and scientists who are excited to use the data.
We plan to release the first version of our scientific plan this year. First and foremost, our focus will be on important scientific themes that cut across multiple disease areas. One important theme for us is health equity and diversity, specifically how health and disease states manifest across diverse populations. All of Us is disease-agnostic, so this research program is uniquely poised to ask these questions among longitudinal population studies. An example is the genesis of risk models that include polygenic risk scores emerging from genetics studies, focusing on the social determinants of health and the environmental determinants of health for a holistic picture of risk.
We recently announced a funding opportunity to the extramural community that will specifically use the All of Us dataset as the foundation. In parallel, we are working with specific institutes on ancillary studies where they can use All of Us data as a platform to enrich their scientific agendas.
What is unique about this dataset and what kinds of analyses will be possible?
Most longitudinal population studies have not been able to reach the level of diversity that we have, which makes All of Us a unique resource for the research community. The latest data that I’ve seen suggests that 80% of our current participants identify with a group that has been historically underrepresented in biomedical research. This includes groups that have not been actively engaged in research including those in rural environments, certain age groups, and people with disabilities. We are actively changing the paradigm of research to be fully inclusive, which is a guiding principle for us.
I am excited about the release in March of nearly 100,000 whole genome datasets and around 165,000 genomic arrays. This data is phenomenal and unique in its representation of groups never seen before in research databases. The genomics teams at NIH and at other centers found nearly 600 million novel variants. More than 75,000 genomic datasets are linked to other data types, so we have a wealth of “complete” data. For the ASHG community, this provides an enormous opportunity to do exploratory or confirmatory research. Our projects directory lists the ongoing research projects using All of Us data.
Why is diversity important to the All of Us Research Program?
This program is designed to help advance precision medicine opportunities for all people in the United States. We need to think about the diversity of this country’s population and the lack of representation of many demographics historically in research, including the variety of socioeconomic backgrounds and ancestries. There are significant gaps that exist in research data that bias the use of those data, which has been shown in many papers and by many groups over the years. The fruits of our labor in the All of Us program cannot be fully realized unless we address these issues head-on. We need to make sure the program is not only relevant to all, but valuable to all.
Our Data Snapshots page details the demographics of our participants. This includes all participants recruited to date, although it is not necessarily reflective of data that’s been curated into the Researcher Workbench.
In what ways have you sought to engage communities or reach researchers that are typically underrepresented in research? To what do you attribute your success in recruiting such a diverse cohort?
Diversity of participants and engagement of communities have been a priority since the inception of All of Us. We are actively listening to participants and engaging ambassadors from underrepresented communities. We have a chief engagement officer, Karriem Watson, who is fully dedicated to this strategy, and has been actively reaching out to a wide range of communities within our country, including working closely with Historically Black Colleges and Universities (HBCUs) and universities with a diverse researcher base. He works across NIH institutes involved in minority health and has established great partnerships with the researcher community as well as initiated and promoted minority research symposia. Some of the research supplements that we announced are specifically diversity supplements or are focused on researcher communities who are carrying out research agendas on diverse populations. We also have a director of health equity, Martin Mendoza, whose primary mission is to ensure that our research agenda embraces the full diversity of our participants and engages a diverse researcher population. These partnerships are most appreciated and are so important for our scientific agenda to have both the diversity of our participants as well as the diversity of researchers.
What is next for the All of Us Research Program? What new data types or opportunities are being explored? When might the research community anticipate further releases of data?
The data we have now is waiting to be explored by the researcher community. We already have more than 2,000 researchers actively using the data. These researchers are affiliated with the nearly 350 institutions with Data Use and Registration Agreements in place. Our goal is to have the numbers of participating researchers go beyond 10,000, maybe 20,000, in five to ten years.
There are five core data streams: electronic health records (EHR), whole genome sequencing and genotyping arrays, wearable digital technologies, biometrics, and health surveys. We have nearly 500,000 consented individuals and close to 350,000 participants who have data across most of the five core datasets, so there is an enormous opportunity to just look at the existing data. Researchers can look forward to the second release of genomic data in early 2023.
One of the components of the scientific plan will be to address what research questions we can answer with the existing data and what kinds of questions we want to answer in the future that require new datatypes. We are exploring the utility of data types such as geospatial linkages, raw imaging data, and longitudinal data from wearables. We are very cognizant of the potential for the digital divide to put more bias in the data, so we are actively thinking about ways to bridge that divide and to engage people with technology in ways they haven’t before.
What would a researcher need to know about gaining access to the data? What information can be accessed through the Data Browser compared to registering to request access to the health data at the Research Hub?
There are three tiers of access: 1) the Public Tier, 2) the Registered Tier, and 3) the Controlled Tier. The All of Us Researcher Workbench requires registration to access the Registered Tier and Controlled Tier data.
The publicly available participant data is accessible through the Data Browser. Currently, participant-provided information, including data from surveys, wearable devices, physical measurements taken at the time of participant enrollment, and EHR, are available. Total counts for certain disease areas, like COVID-19 or heart disease cases, are searchable in this tier.
Access to the Registered Tier requires data-use agreements between the researcher’s organization and the All of Us Research Program. Protection of participants’ identity and privacy are key issues, so date-of-birth and other potentially identifying information is obfuscated in this tier. The only organizations that currently have access to the data in this tier are not-for-profit or healthcare-related institutions. The genomic data resides in the Controlled Tier along with more complete demographic information.
There are a few educational modules that researchers need to pass to get access to Controlled Tier data. These modules are designed to provide guidance to the research community for data privacy and security. We want to catalyze a positive researcher experience to use our data. We are being proactive with training programs and workshops to walk researchers through how to get on to the Researcher Workbench and what is needed to start analyzing the data.
Are there lessons other researchers can learn from All of Us?
Programs aspiring to create diverse representation in their own databases should be open to conversations with us. All our tools are in the public domain. All of Us can help other programs, large and small, learn from some of our successes as well as our mistakes and use the processes we have optimized. The All of Us Research Program is also involved with the International HundredK+ Cohorts Consortium (IHCC), which aims to create a global platform for translational research. IHCC can be a resource to learn how these large-scale efforts have worked in low-and middle-income countries and could inform outreach to parts of the U.S. as well. Finally, we invite all researchers to join us, come to the All of Us Research Hub, ask and answer your questions using our data, and help us all to advance precision medicine.