DINGO graph methy Network
COTS 2026 Photo
Dr. Kim-Anh Do with colleagues at COTS 2026

Our dedicated research team is at the forefront of cutting-edge development in translational data science focusing on

COmputational Statistics, Medicine, Integrative omiC Research (COSMIC).

Our dynamic group provides a wealth of expertise in developing statistical and machine-learning methodologies to address translational data science challenges including the integration and analysis of biological omic data (genomic, transcriptomic, proteomic, metabolomic, microbiome, single cell RNA-sequencing, and spatial transcriptomic), producing computational tools to efficiently apply our statistical methods to large biological data sets, small clinical data sets or large pan-cancer data sets.
Together, we are passionately committed to advancing statistical research and its transformative impact on healthcare. Our collective efforts underscore the significance of rigorous statistical methods in unraveling the complexities of correlated data in biology and medicine, marking them as vital contributors to the ongoing progress of these fields, particularly in personalized medicine and precision oncology.
Our group provides critical collaborative expertise to data scientists, basic and computational biologists, clinicians, bioinformaticians, geneticists, and statisticians:
  • Bayesian Methodology
  • Graphical and Neural Network Modeling
  • Feature Selection of Biomarkers
  • Deep Learning Methodology and Causal Models to Maximize Prediction Performance
  • Clinical Trial Design
  • Survival Modeling
We also design state-of-the-art courses for the education and training of the next generation of computational statisticians and data scientists.

Contact Me

iBRIGHT symbol Oral Microbiome microbiome

Back to top ↑

In the News

Posted on 08-13-2024

Kim-Anh Do selected to receive the Janet L. Norwood Award for 2024!

This award is given annually and recognizes outstanding achievement by a woman in the statistical sciences. Click the link for the award ceremony

Posted on 05-30-2024

Our Groundbreaking Article on Lethal Lung Cancer Biomarkers Published in Cancers Journal's Special Issue!

Click on the title to read more about the Validation of a Blood-Based Protein Biomarker Panel for a Risk Assessment of Lethal Lung Cancer in the Physicians’ Health Study

Posted on 12-21-2023

Our Research Study Earns a Place in MD Anderson's 23 Cancer Research Highlights of 2023!

Read more about it on the MD Anderson website: Blood test developed at MD Anderson aids in predicting lung cancer mortality risk

Recent Talk

Back to top ↑

Selected Published and Impactful Papers

Article 1 Published 12-1-2025
SMAGS-LASSO: A Novel Feature Selection Method for Sensitivity Maximization in Early Cancer Detection
We developed SMAGS-LASSO, a machine learning algorithm that combines our developed Sensitivity Maximization at a Given Specificity (SMAGS) framework with L1 regularization for feature selection.

(Click on title to read more)

Article 2 Published on 7-11-2025
CAT: a conditional association test for microbiome data using a permutation approach
This paper proposes a novel conditional association test, CAT, that can account for other features and phylogenetic relatedness when testing the association between a feature and an outcome.

(Click on title to read more)

Article 3 Published on 4-26-2025
Grape-Pi: graph-based neural networks for enhanced protein identification in proteomics pipelines
We developed a graph neural network (GNN)-based model, Graph Neural Network using Protein–Protein Interaction for Enhancing Protein Identification (Grape-Pi), which is applicable to all proteomics pipelines.

(Click on title to read more)

Article 4 Published on 1-7-2025
Causal models and prediction in cell line perturbation experiments
We propose causal structural equations for modeling how perturbations effect cells. From this model, we derive two estimators for predicting responses: a Linear Regression (LR) estimator and a causal structure...

(Click on title to read more)

Article 1 Published on 05-30-2024
Validation of a Blood-Based Protein Biomarker Panel for a Risk Assessment of Lethal Lung Cancer in the Physicians' Health Study
This study aimed to assess a four-marker protein panel (4MP)’s performance, including the precursor form of surfactant protein B, cancer antigen 125, carcinoembryonic antigen, and cytokeratin-19, for predicting lung cancer...

(Click on title to read more)

Article 2 Published on 01-04-2024
Contributions of the Microbiome-Derived Metabolome for Risk Assessment and Prognostication of Pancreatic Cancer
The occurrence of microbial metabolites in biofluids thereby enables risk assessment and prognostication of PDAC, as well as having potential for design of interception strategies. In this review, we first...

(Click on title to read more)

Article 3 Accepted on 11-29-2023
Attempts to Understand Oral Mucositis in Head and Neck Cancer Patients through Omics Studies: A Narrative Review
Oral mucositis (OM) is a common and clinically impactful side effect of cytotoxic cancer treatment, particularly in patients with head and neck squamous cell carcinoma (HNSCC) who undergo radiotherapy with...

(Click on title to read more)

Article 1 Revised on 11-09-2023
Estimating Causal Effects with Hidden Confounding using Instrumental Variables and Environments
Recent works have proposed regression models which are invariant across data collection environments [24, 20, 11, 16, 8]. These estimators often have a causal interpretation under conditions on the environments...

(Click on title to read more)

Article 1 Published on 09-20-2023
Mortality Benefit of a Blood-Based Biomarker Panel for Lung Cancer on the Basis of the Prostate, Lung, Colorectal, and Ovarian Cohort
To investigate the utility of integrating a panel of circulating protein biomarkers in combination with a risk model on the basis of subject characteristics to identify individuals at high risk...

(Click on title to read more)

Article 2 Published on 09-19-2023
A blood-based metabolomic signature predictive of risk for pancreatic cancer
Emerging evidence implicates microbiome involvement in the development of pancreatic cancer (PaCa). Here, we investigate whether increases in circulating microbial-related metabolites associate with PaCa risk by applying metabolomics profiling to...

(Click on title to read more)

Article 3 Published on 09-08-2023
Study finds link between oral microbiome and common side effect in patients with head and neck cancer
Oral mucositis refers to erythematous and ulcerative lesions of the oral mucosa observed in patients with cancer being treated with chemotherapy, and/or with radiation therapy to fields involving the oral...

(Click on title to read more)

Article 4 Published on 02-05-2023
Performance determinants of unsupervised clustering methods for microbiome data
In microbiome data analysis, unsupervised clustering is often used to identify naturally occurring clusters, which can then be assessed for associations with characteristics of interest. In this work, we systematically...

(Click on title to read more)

Article 1 Posted on 11-1-2022
A Blood-Based Metabolite Panel for Distinguishing Ovarian Cancer from Benign Pelvic Masses
A blood-based metabolite panel was developed that demonstrates independent predictive ability and complements ROMA for distinguishing early-stage ovarian cancer from benign disease to better inform clinical decision making.

(Click on title to read more)

Article 2 Posted on 10-21-2022
A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data
We developed a straightforward yet effective method combining autoencoder with iterative feature selection to automatically identify novel cells from scRNA-seq data. Our method trains an autoencoder with the labeled training...

(Click on title to read more)

Article 3 Posted on 10-14-2022
Bayesian hierarchical quantile regression with application to characterizing the immune architecture of lung cancer
The successful development and implementation of precision immuno-oncology therapies requires a deeper understanding of the immune architecture at a patient level. T-cell receptor (TCR) repertoire sequencing is a relatively new...

(Click on title to read more)

Article 4 Posted on 9-21-2022
Transcriptomic Signatures of Hypomethylating Agent Failure in Myelodysplastic Syndromes and Chronic Myelomonocytic Leukemia. Exp Hematol
Hypomethylating agents (HMAs) are the standard of care for myelodysplastic syndromes (MDS) and chronic myelomonocytic leukemia (CMML). HMA treatment failure is a major clinical problem and its mechanisms are poorly...

(Click on title to read more)

Article 1 Published on 9-9-2022
CAMLU: A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data
Single-cell RNA sequencing (scRNA-seq) has been widely used to decompose complex tissues into functionally distinct cell types. The first and usually the most important step of scRNA-seq data analysis is...

(Click on title to read more)

Article 2 Posted on 9-5-2022
MDM2 antagonist improves therapeutic activity of azacitidine in myelodysplastic syndromes and chronic myelomonocytic leukemia
Failure of hypomethylation agent (HMA) treatments is an important issue in myelodysplastic syndromes (MDS) and chronic myelomonocytic leukemia (CMML). Recent studies indicated that function of wildtype TP53 positively impacts outcome...

(Click on title to read more)

Article 3 Published on 8-28-2019
NExUS: Bayesian simultaneous network estimation across unequal sample sizes
Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes...

(Click on title to read more)

Article 4 Published 10-8-2018
PRECISE: Personalized Integrated Network Modeling of the Cancer Proteome Atlas
Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The...

(Click on title to read more)

Article 1 Published on 7-6-2015
DINGO: differential network analysis in genomics
Cancer progression and development are initiated by aberrations in various molecular networks through coordinated changes across multiple genes and pathways. It is important to understand how these networks change under...

(Click on title to read more)