Biostatistics and Epidemiology

Faculty and Staff

Samprit Bannerjee, PhD

Samprit Banerjee, Ph.D., M.Stat., is an assistant professor of biostatistics in healthcare policy & research at Weill Cornell Medicine since 2011. He received a bachelor’s and master’s degree in statistics from Indian Statistical Institute, Kolkata, India and obtained a Ph.D. in biostatistics from the University of Alabama at Birmingham.

As a biostatistician Dr. Banerjee has broad experience on all major areas of biostatistics, namely, statistical genetics and genomics, clinical trials and observational studies. However, his methodological research interests are in a) multivariate statistics and its various applications including longitudinal studies and classification problems b) high-dimensional problems in various biomedical applications. Hi collaborations span randomized clinical trials (in psychiatry), mental health research with big biomedical data, comparative effectiveness research of medical devices, health services research, statistical genetics, and cancer genomics. Dr. Banerjee also serves as the Director of the Biostatistics Unit of the Institute of Geriatric Psychiatry at WCMC and also serve as the chief statistician for ICOR (International Consortium for Orthopedic Registries). Below are few areas of his contributions to science —

  1. Multivariate Methodology: Research studies in medicine do not always analyze multiple correlated outcomes primarily due to the difficulty in interpretation and statistical complexity. Dr. Banerjee’s primary interest is to understand the interplay between multiple correlated outcomes in determining treatment efficacy, mediating treatment effect and discovering patient sub-groups. The main step in analyzing multiple correlated outcomes is to model the covariance/correlation between these traits accurately. To do this, Dr. Banerjee has studied the estimation of the covariance matrix in higher dimensions and proposed an improved estimator which shows robust performance in a wide-range of situations. He has also developed a Bayesian multivariate model in the context of quantitative trait loci (see contribution to Statistical genetics) to detect genetic loci jointly affecting multiple correlated outcomes/traits. In the spirit of multivariate statistics, he has also developed methods for performing multivariate meta-analysis of survival curves and applied it to distributed health network data. In addition his research in multivariate methodology, Dr. Banerjee has applied multivariate clustering and classification techniques (e.g. hierarchical clustering, linear discriminant analysis, factor analysis etc.) to identify patient sub-groups in various applications (e.g. sub-groups based on clinical profile in mental health research and based on genomic signature in cancer research).
  2. Mental Health Research: Dr. Banerjee collaborates extensively with researchers in psychiatry and has designed and analyzed several randomized trials (including cluster randomized trials) which study various behavioral interventions, psychotherapies, drugs and home care management interventions on older adults with depression, psychosis and bipolar disorder. These trials typically treatment effect on longitudinal outcome, mediators of this treatment effect (target engagement) and moderators of treatment efficacy. These trials also have a high degrees of missing data. Dr. Banerjee uses mixed models to study longitudinal outcomes, causal inference methods to study the effects of mediation and uses pattern mixture models to study biases due to missing data.
  3. Comparative Effectiveness Research: Dr. Banerjee collaborates with a team of researchers to study the comparative effectiveness (CE) of orthopedic devices in hip and knee replacement surgeries. Due to the lack of clinical trials there gap in evidence on the CE of these devices. ICOR is a distributed network of multiple international and national joint registries who participate by providing summary information on these various devices for several risk factors. Dr. Banerjee has developed semi-parametric Bayesian methodology to meta-analyze time to revision surgery (survival outcome) from these multiple registries to provide CE estimates. Unlike conventional approaches of meta-analysis, this approach can investigate time trends of survival curves and explore interaction effects.
  4. Cancer genomics: Dr. Banerjee has collaborated with a team of scientists extensively on studies to understand the molecular pathology of prostate cancer. The first decade of 21st century saw a large number of genome-wide association studies for single nucleotide polymorphisms or SNPs. However, association studies for other structural variants such as copy number variants or CNVs were rare, partly due to lack of methods to infer CNVs from array data for germline variants. He has developed a computational method to detect copy number variants from array data which was applied to a genome-wide association study of germline CNVs on prostate cancer which found a couple of functionally active, low frequency CNVs associated with risk of prostate cancer. In addition, Dr. Banerjee has worked on gene expression data from various platforms, genome-wide association studies of SNPs and copy number variants, enrichment of molecular pathways, impact of structural variants in the evolution of pathways and next generation sequencing data.
  5. Statistical genetics: In quantitative genetics, one of the goals is to find genomic positions that are associated and linked to complex traits or outcomes. Complex outcomes underlying a disease are rarely uncorrelated, yet they were typically analyzed independently. For his dissertation, Dr. Banerjee developed a Bayesian model selection procedure to select genomic locations that are jointly associated with multiple correlated traits. He has also jointly worked on developing hierarchical models for various types of traits e.g. ordinal, binary, categorical etc.

Dr. Banerjee's CV


  • Division Office
  • (646) 962-8020

Selected Recent Publications