Data Speaks

One of the many ways the CTSC “builds research teams of the future to improve human health” is through data curation, provisioning, and analysis. Over the last few years, a small team – in collaboration with colleagues within UC Davis, across the UC system, and from the CTSA consortium – expanded the resources to support biomedical research at UC Davis.

Researchers have extensive clinical and health data available; however, simply getting access is not enough. Research teams also need to know how to evaluate the data for meaning and use it to develop information that drives knowledge-based decision making in this data-rich but information-poor environment. When clinician scientists need assistance with data – access, use of tools to characterize and generate knowledge, and analysis – the CTSC Biomedical Informatics and Biostatistics teams are poised to help.

Beginning With Data Fluency

Data is a foundational requirement that communicates and informs patient care and research strategies. When working in a data-driven healthcare organization, it is important to create common expectations about resources and approaches that can transform complex concepts into actions. Lack of individual experiences about how to communicate and define individual data needs is a common barrier many researchers face. Communication in a manner that engenders collaborative discovery requires a common language. By enhancing interdisciplinary data fluency, researchers can more effectively engage their peers and community to facilitate shared problem solving and faster research. Furthermore, researchers who are data fluent can more easily gain actionable insights unique to their subject domain expertise by gathering, organizing, interpreting, and presenting data in order to achieve innovations and improve health.

What is data fluency and why is it important?

Data fluency – also known as data literacy – is the ability to collect, manage, evaluate, and apply data in a critical manner to generate information and new knowledge. Increasing data fluency empowers researchers to do more with data. Broad variation in data fluency competency exists across the research community. For example, researchers may be curious about a question they believe data can answer, but they may be unsure how to determine what data may be available, how to procure and interpret the data, and/or whether there are existing efforts in the use of such data. Deliberate upskilling in data fluency is an effective way to enhance appropriate use of data across interdisciplinary teams and thereby accelerate innovative research.

How do I upskill my data fluency competency?

In partnership with the UC Davis Library, the CTSC offers services to health researchers that promote data fluency through a variety of tools and resources. The Blaisdell Medical Library (BML) Health Library Informaticist, Christy Navarro, (see inset) is available to help elevate data fluency competencies at UC Davis Health. Here are her suggestions on how to get started:

  1. Determine where you are today. The Data Literacy Project is “a global community dedicated to creating a data-literate world.” This website offers resources to assess your skills, learn new skills, and engage with others on the use of data. By answering a few questions, the assessment tool will create a personalized list of learning resources and recommendations.
  2. See what resources are available. UC Davis Health has a robust data resource homepage that lists collections of available data; tools to analyze, synthesize, and interpret data; and services to help researchers collect and use data effectively.
  3. Ask for guidance. Contact the Health Library Informaticist for assistance identifying various resources across UC Davis Health and the UC Davis Library. Requests are also used to prioritize and develop new workshops.
  4. Sign up for notification of CTSC events. The CTSC events calendar lists workshops, monthly office hours, and other trainings. Schedule automated calendar updates by clicking on the Subscribe link found above the event calendar. Also consider signing up for the CTSC announcement listserv by sending an email to HS-CTSC Mail.

Christy NavarroChristy Navarro, M.S.
Health Library Informaticist

Introducing Christy Navarro

Christy Navarro ( serves the health sciences community as a Health Library Informaticist at Blaisdell Medical Library. With 20 years of experience in health care and 15 years of consumer, patient, and research subject privacy, as well as information security, she is passionate about helping researchers meet their research goals and accelerating innovative research through the dissemination of health data resources. Her interests include the study of algorithmic bias that reflects systematic discrimination. With partial support from the CTSC, Navarro provides workshops, develops guides, holds office hours, and consults on a variety of topics related to health data.

What are the components of data fluency?

Data fluency incorporates concepts and competencies – both core and advanced – that allow researchers to harness the power of data. As shown in the diagram below, data fluency competencies are organized into 5 main categories relating to data utilization. The CTSC and BML resources establish a conceptual framework and help researchers collect, manage, evaluate, and apply best practices to health data.

Data fluency graphic

Source: Ridsdale, Chantel and Rothwell, James and Ali-Hassan, Hossam and Bliemel, Michael and Irvine, Dean and Kelley, Dan and Matwin, Stan and Smit, Mike and Wuetherick, Brad. (2016). Data Literacy: A Multidisciplinary Synthesis of the Literature. Diagram used with permission of authors.

Speaking of Data

As with most industries and domains that involve complex and abstract topics, various metaphors have evolved to help people understand processes they cannot easily “see” for themselves. Here are a few examples:

  • Data Curation: A metaphor from museums with artifacts. The museum curator collects artifacts from collectors and carefully documents where the artifact came from and what is known about it. Data curators collect “raw” data and associate meta-data with it so that others know where the data came from and what it means within the context of its origins.
  • Data Archaeology:  A term used by Identifier Technology Health Indicators staff to remind people that clinical data have complex histories. Like archaeologists uncovering “layers” of history, data archaeologists peel back the layers of data to provide context and meaning.
  • Data Provenance: A fancy term about the source of the data, or how it came into existence.

Emerging Resources

The CTSC has long provided biomedical informatics expertise, resources, services, and training at UC Davis Health. As defined by the American Medical Informatics Association (AMIA), "Biomedical Informatics (BMI) is the interdisciplinary field that studies and pursues the effective uses of biomedical data, information, and knowledge for scientific inquiry, problem solving, and decision making, motivated by efforts to improve human health." The advent of COVID-19 research and associated public health ramifications dramatically accelerated production and provision of data and has become a catalyst to bring resources to curate, access, and collaborate across health and non-health research environments. As these resources proliferate, utility, access, and application remain a focus for the CTSC. To assist researchers on the quest for answers to health questions, the CTSC provides, supports, and links researchers with several emerging data resources:

  • The UC Health Data Warehouse (UCDHW) is the aggregated data from the five UC medical centers. UCHDW adopted the Observational Medical Outcomes Partnership (OMOP) data model, which allows for the systematic analysis of disparate observational databases. Data, dashboards, and visualizations are available for 5 million+ patients seen since 2012 resulting from 100 million+ encounters and treated by nearly 100,000 health care providers, with 300 million+ procedures, more than a quarter billion medication orders, and over 1 billion vital signs, measurements, and test results. Over 600,000 of these patients are primary care patients. Success of this collaboration led to the creation of the Center for Data-Driven Insights and Innovation (CDI2), which now oversees the UCDHW. Jason Adams (see next story) is the UC Davis Health representative for the CDI2.
  • CORDS is the COvid Research Data Set, a de-identified subset of the UC-wide Health Data Warehouse comprised of electronic medical record data from patients tested for COVID-19 across all 5 UC medical centers. Usage of this limited data set is considered ‘non-human subjects research’ and as such, research and public health use of this data set will not require individual IRB approval for investigators. Nicholas Anderson, Director of the CTSC Informatics program, engages with CDI2 and other related datasets through his role with UC BRAID (Biomedical Research Acceleration, Integration, and Development).
  • N3C is the National COVID Cohort Collaborative, a new analytic platform under development by the NIH through the stewardship of the National Center for Advancing Translational Science (NCATS) and contributions of the Clinical and Translational Science Award (CTSA) informatics community. It contains clinical data from the electronic health records of people who were tested for the novel coronavirus or have had related symptoms across the United States. Over 2.8 million patient data records (February 2021) have been extracted from healthcare centers with a CTSA hub (e.g., the CTSC at UC Davis). The data are transformed into a set of harmonized data models for use by scientists and citizens in different forms to study COVID-19, including potential risk factors, protective factors, and long-term health consequences.
  • Accrual to Clinical Trials (ACT) Network is a real-time data query platform that allows researchers to explore and validate feasibility of clinical studies from the desktop. This secure, HIPAA-compliant, and IRB-approved dataset (acquired from CTSA hubs) helps researchers design and complete clinical studies by providing aggregate patient count data for cohort discovery and iterative testing.
  • NCATS’ National Center for Data to Health (CD2H) accelerates advancements in informatics by using findable, accessible, interoperable, and reusable (FAIR) principles to promote collaboration across the CTSA Program community. CD2H tools and resources make it simple and valuable for CTSA Program members to get engaged, connect with peers, and contribute. By promoting collaboration, CD2H fosters a robust translational science informatics ecosystem that collectively develops solutions to solve clinical problems faster, more efficiently, and more effectively.
  • DataLab facilitates the application of data science and informatics methods and best practices to enhance research and learning in all domains across the University. Working across the entirety of the research data pipeline, DataLab provides support and training from method and algorithm development to analysis and visualization of translational, high-impact data-driven research. Computer scientists on the UC Davis campus are available to assist researchers in accessing/utilizing complex data such as CORDS and the N3C national data warehouses. The CTSC provides partial support for two graduate student researchers to augment resources, identify opportunities, and facilitate collaborations.
  • The UC Davis Center for Data Science and Artificial Intelligence Research (CeDAR) responds to the great challenges of our society by translating complex data into powerful solutions. Data science and the development of artificial intelligence (AI) have widespread applications. Nicholas Anderson, faculty director of the CTSC Informatics program, is associate director of CeDAR, which advances data science foundations, methods and applications, and weaves them into the fabric of the university, promoting a highly efficient exchange of information and expertise to enhance real-world data science applications.

Supporting Data Integration Through Collaboration

In 2018, UC Davis Health IT initiated a new research informatics capability to build a strong and sustainable collaborative environment for clinical, research, and quality impacts. Co-directed by Jason Adams, former CTSC MCRTP scholar, and Kent Anderson, associate director of the CTSC Informatics program, the UC Davis Health Data Provisioning Core (DPC) links clinical expertise, applied informatics, and IT research engineers to advanced clinical domain capabilities. The DPC supports the majority of health data analysis needs at UC Davis Health through the provisioning of well-characterized health-related data and derived information to data requestors.

Data Provisioning Core

The mission of the DPC is to: 1) Implement best practices in clinical informatics to guide the acquisition, transmission, aggregation, semantic curation, characterization, protection, and delivery of health data for secondary use, and 2) Enable the learning healthcare system through a generation of reusable enterprise data assets and collaborative partnerships between users of data and UC Davis Health IT.

Jason Adams, M.D.Jason Adams, M.D.
Director, Digital Health Innovation,
Co-Director, UC Davis Health Data
Provisioning Core (DPC),
Co-Chair, UC Davis Health Data
Oversight Committee (HDOC)

The first phase included developing collaborative teams and then identifying faculty to lead specific use cases for integrating clinical and analytic environments. The second phase expanded access to these resources with an applied research focus that engaged a broader selection of faculty and partners for advancing quality assurance, as well as clinical and research-initiated work in cancer and genomics.

Kent Anderson, M.S.Kent Anderson, M.S.
Director, IT Informatics,
UC Davis Health,
Associate Director, CTSC
Biomedical Informatics Program,
Co-Director, UC Davis Health
Data Provisioning Core (DPC)

The CTSC partnered with the UC Davis Comprehensive Cancer Center and Research Informatics to develop the Cancer Center Data Integration Informatics Initiative (CCDI3) – the cancer data domain-specific unit within the DPC. The CCDI3 provides access to clinical data on UC Davis patients, as well as computational tools, methods, and expertise that link patient phenotypes, lifestyle factors, biomarkers, genotypes, health monitoring data, and other ‘omics’ data to optimize individual treatment plans and longitudinal evaluation.

Now in its third phase, the DPC has expanded broad collaborations and additional clinical specialties (Pulmonary Critical Care, Emergency Medicine, and Ambulatory Care) within UC Davis Health. The DPC represents an innovative, data-driven model to enhance the application of advanced computational methods for health data analysis.

CTSC Biomedical Informatics Resources

Nicholas Anderson, M.S., Ph.D.Nicholas Anderson, M.S., Ph.D.
Director, CTSC Biomedical
Robert D. Cardiff Professor of
Director of Informatics Research,
UC Davis Health

The CTSC Biomedical Informatics team, led by Nicholas Anderson and Kent Anderson, provides access to clinical and translational informatics tools, data, training, and expertise. These resources support and expand capabilities for the research community through design and analysis of clinical and integrative data-driven research. Through consultation, resources, and expertise, the Biomedical Informatics component of the CTSC provides researchers with access to tools at all stages of the translational science research cycle. The team facilitates essential partnerships across UC Davis to share common needs for data access, data management, and training. The program provides researchers with access to emerging research initiatives to advance data-driven precision medicine, population health, and technology-enabled medicine. The following longstanding resources are described in detail on the CTSC website:

  • Consultation
  • EMR Data Retrieval for Research
  • Biorepository Core Resource
  • UC Davis Cohort Discovery and the ACT network
  • Research Electronic Data Capture (REDCap)

Fending Off Analysis Paralysis

"Data is like garbage. You’d better know what you are going to do with it before you start collecting it."
                                   -   Mark Twain

Requiring data for research is ubiquitous. Knowing what data are needed and what to do with it is priceless. Among the many resources the CTSC provides to researchers, the biostatistics service is one of the most often requested. Managed by Sandra Taylor, the CTSC Biostatistics program strengthens research plans through study design, analysis, and consultation. The program supports clinical and translational research design through training, collaboration, service, and discovery in biostatistics, research design, and epidemiology. A strong outreach and partnering function supports interdisciplinary approaches to clinical and translational research. A team of doctorate and master's level biostatisticians assist in the development of protocols, statistical plans, data safety monitoring plans, data analysis, and in preparing statistical sections of grant applications, abstracts, and manuscripts.

The CTSC Biostatistics team provides office hours to answer researchers’ questions, consultations, seminars, and up to 10 hours of assistance for faculty and staff (2 hours for medical residents and students) – all at no charge to biomedical researchers. More in-depth support is available on an hourly basis or as effort when grant funding is allocated. During the consultation, the biostatistician will also determine the sample size necessary for valid statistical inference. Requesting assistance at least 6 weeks before a grant proposal deadline will allow adequate time for thoughtful review and meaningful input.

CTSC Biostatistics Program Management - Sandra Taylor, Ph.D.

Sandra Taylor oversees daily operations of the CTSC Biostatistics Program. In this role, she allocates requests for statistical assistance to CTSC biostatisticians, educates investigators about statistical support available through the CTSC, and seeks to improve research quality by promoting greater integration of statistics into project planning and execution. Taylor earned her doctorate degree in Biostatistics from the University of California, Davis and holds a master's degree in Zoology and Physiology.


Sandra Taylor, Ph.D.
Principal Biostatistician, Program Manager,
CTSC Biostatistics Program

In collaboration with other UC Davis Health Biostatistics Cores, the CTSC Biostatistics team regularly delivers seminars on statistical topics of interest to clinical and translational investigators. In addition, the CTSC supports a robust array of resources on the CTSC website and an annual award for extended biostatistical services to an implementation science project selected from among the presenters at the UC Davis Health Annual Health Quality Symposium.

Among their many successes, the CTSC Biostatistics team has a long, successful history working with scholars to advance their research careers. With subject matter expertise provided by the Biostatistics team, CTSC scholars have published, received pilot awards, developed commercial products, and advanced their careers.

Experiencing Data Overload?

We recognize that we have covered a lot of ground in this issue of the newsletter. However, there’s no need to remember all of the details.

The UC Davis IT Health Informatics team (Kent Anderson, Director; Jason Adams, Medical Director) provides a centralized resource that augments the CTSC informatics webpages. The Health Data Resources website is a compilation of the data assets, software tools, and analytics support resources available to access and facilitate the most effective use of data.

This directory provides clinicians, faculty, and staff the ability to leverage the data-rich electronic health records at UC Davis Health. Integrated clinical, financial, and operational data provide timely information that adds knowledge and identifies opportunities to improve efficiency and patient care. From raw EMR data sources to highly curated and validated datasets, the data assets go beyond making decisions about patients at the time of care to enabling deeper analyses of patient populations.

Another place to find information about health data utilization and management and many other aspects of clinical research is the CTSC Clinical Research Guidebook. Managed by the CTSC Clinical Trials Office (Kate Marusina, Director), this comprehensive online compendium details the vast array of processes, procedures, and resources for health research available at UC Davis. Please note that access to the guidebook requires a UC Davis Health login.

"A wealth of information creates a poverty of attention." - Herbert Simon