Conferências
C1 - Are we already FAIR? – The future of data sharing

‘Data sharing’ is becoming increasingly important in terms of efficient use of resources. In 2007, the OECD called for easy access to research data for the scientific community. In 2016, the FAIR principles (findable, accessible, interoperable, reusable) for research data were published. In 2018, the German government decided to establish a National Research Data Infrastructure (NFDI) where NFDI4Health takes responsibility for personal health data. This talk will present the infrastructures that have been realized so far and discuss potential (statistical) hurdles by giving illustrative examples. Further European developments such as the European Health Data Space will be addressed.
C4 - Modelagem Estatística de Redes Complexas

Redes complexas têm recebido uma grande atenção da comunidade estatística, especialmente no contexto de analisar e descrever as interações de sistemas complexos aleatórios. Redes estão por toda parte, desde redes sociais até redes biológicas e sistemas de transporte. No entanto, compreender e analisar essas redes complexas é um desafio científico. Nesta palestra discutiremos como técnicas estatísticas nos permitem desvendar os padrões subjacentes, identificar comunidades, medir a robustez e prever o comportamento futuro das redes complexas.
C5 - Statistical Pitfalls in Measuring Biological Aging with Epigenetic Clocks: Insights from a Chronic Disease Setting

The third goal of the United Nations 2030 Agenda for Sustainable Development is to “Ensure healthy lives and promote well-being for all at all ages”. Its targets are a good representation of how vast health care is, covering maternal mortality, epidemics of infectious diseases, premature mortality from chronic diseases, prevention and treatment of substance abuse, access to sexual and reproductive health care, and affordable health care, just to name a few. In addition to the essential and undeniable role of statisticians in measuring all the diverse indicators related to this goal, how can statisticians contribute to reaching the targets?
In this talk, I will focus on statistical challenges faced when working with chronic diseases, with particular emphasis on people with Multiple Sclerosis (MS). Beyond chronological age, biological age reflects the cumulative damage to a person’s cells over time and is increasingly recognized as a critical factor in understanding health disparities and the progression of chronic conditions. Several biomarkers have recently been proposed to measure biological age and assess the cumulative burden of aging, which directly affect not only life expectancy but also quality of life, especially for people with debilitating diseases. As one of the popular markers, epigenetic “clocks” are based on statistical and machine learning tools to detect DNA methylation (DNAm) patterns.
Epigenetic modifications represent a reversible mechanism in regulating the function of the genome without altering the underlying DNA sequence and have been linked to aging through several factors, allowing DNAm to be affected by environmental exposures and lifestyle habits. Since the first proposed epigenetic clock model in 2011, multiple epigenetic clocks have been reported with increasing accuracy, precision, and broader application prospects in aging research. Still, they are based on regression coefficients determined on general training populations, which are then used for out-of-sample prediction. Commonly, the predictions are then regressed on chronological age and technical variables, and the corresponding residuals are called Epigenetic Age Acceleration (EAA), which are then often used for hypothesis testing.
The lack of interval estimates for individual predictions, the availability of several algorithms, the lack of a gold standard measure of biological age, and the use of prediction-based inference add to the statistical challenges of these markers. This talk will discuss such challenges, illustrated using data from a clinical study on biological aging in people with MS. I will also discuss the role of statisticians in ensuring that such issues are properly considered, especially when such measures could evolve to be outcomes in clinical trials for anti-aging treatment and the prevention of chronic diseases progression.
C7 - Influência de Ancestralidade Local na Expressão Gênica de Tipos Celulares da População Brasileira

A diversidade genética da população brasileira, resultante de um complexo histórico de migração e miscigenação, oferece uma oportunidade única para estudar como a ancestralidade genética influencia a expressão gênica. Este estudo propõe uma análise integrada de dados de RNA-seq de célula única (scRNA-seq) e de sequenciamento do genoma completo (WGS) para investigar a relação entre ancestralidade local e expressão diferencial de genes em diferentes tipos celulares. Neste trabalho, serão discutidos métodos de pré-processamento de dados de scRNA-seq, incluindo a filtragem de células de baixa qualidade, normalização, remoção de efeitos de lote e anotação de tipos celulares. Paralelamente, serão também apresentados métodos de pré-processamento de dados de sequenciamento de genoma completo, genotipagem e inferência de ancestralidade local. Um modelo de regressão para respostas do tipo binomial negativa será discutido e utilizado na integração de ambas modalidades de dados, permitindo a comparação da expressão gênica entre os grupos de diferentes ancestrais locais. Extensões que permitam o uso de variáveis medidas com erros e também o uso de observações correlacionadas serão discutidas. Esta estratégia permitirá a identificação de genes cuja expressão é significativamente influenciada pela ancestralidade local em diferentes tipos celulares. Esses achados podem fornecer novos insights sobre as bases genéticas das diferenças fenotípicas observadas na população brasileira e contribuir para a compreensão das interações entre genética e ambiente. Este estudo demonstra uma abordagem inovadora e integrada para explorar a influência da ancestralidade genética na expressão gênica em nível celular, utilizando dados de scRNA-seq e WGS. A aplicação dessas metodologias pode abrir novas perspectivas para pesquisas em genética populacional e medicina personalizada, especialmente em populações geneticamente diversas como a brasileira.
Financiamento: BRAINN/FAPESP 2013/07559-3
C8 - Recurrent Event Process Models: change point models and clustering of events

Recurrent event data arise when an event may occur repeatedly over time. Examples include recurrence of bladder cancer tumors, epileptic seizures, or pulmonary exacerbations. This talk will main discuss two projects. The first focuses on modeling pulmonary exacerbations and their relationship to a longitudinal binary outcome, and the second aims to understand the clustering of events within individuals.
The first project was motivated by a study of cystic fibrosis, a hereditary lung disease characterized by progressive loss of lung function. Chronic Pseudomonas aeruginosa (PA) infection is associated with worse clinical outcomes, including more frequent pulmonary exacerbations (PE). The longitudinal progression of PA infection and recurrent PE events are likely intrinsically linked, but their temporal interrelationship has not been fully characterized. It is known that the rate of PA progression increases as individuals age, with potential sharp changes in its trajectory. Using data from the Early Pseudomonas Infection Control Observational Study, we propose a joint model to examine longitudinal PA and recurrent PE events. This model incorporates individual-specific random effects in the longitudinal sub-model, linked to those in the recurrent event sub-model. The longitudinal sub-model includes two change points to represent sharp changes in the trajectory, while the recurrent event sub-model employs a counting process for recurrent events and accommodates delayed entry. The results indicate that children experience a modest increase of 5.13% per year in the odds of PA starting at age 6.9, followed by a more pronounced rise of 27.12% around age 14.5. Additionally, an increased probability of PA is associated with a higher risk of experiencing subsequent PE events. The second project focuses on epileptic seizures, with the primary goal of understanding the clustering of seizures within individuals. We model clustering using a self-exciting stochastic process.
C9 - Causal Inference on Flexible Non-mixture Cure Rate Modeling with Piecewise Hazard and Gaussian Process

In the field of oncology, survival analysis often requires the inclusion of a cure fraction to account for individuals who are effectively cured of their disease. The concept of a cure fraction in survival analysis was introduced in a study examining long term survival following cancer therapy. The early work laid the foundation for defining cured individuals who were not at risk of experiencing the cancer recurrence after a certain period. Recently, there has been increasing interest in semiparametric mixture cure models, which relax some parametric assumptions.
The non-mixture cure rate model, which is another branch of the cure rate modeling, represents a significant advance in cure rate modeling. The non-mixture cure rate model, unlike the traditional mixture cure rate model, addresses this need by introducing a latent variable, often interpreted as the unobserved count of cancer cells, to indirectly estimate an individual’s cure status. This latent factor approach offers several benefits, including the flexibility to integrate a proportional hazards structure and enhanced computational efficiency. Over time, the non-mixture cure rate model has been extensively extended to handle complex data, leveraging semiparametric methods for modeling survival function. The non-mixture cure rate model typically incorporates covariates into the cure rate parameter through a log-linear form, assuming a Poisson distribution for the unobserved cancer cell count. Additionally, the common use of a linear functional form for covariate effects can be restrictive, particularly for continuous covariates that often display nonlinear relationships.
In modeling cure rates, it is often assumed that the effects of continuous covariates vary smoothly over their domain. However, the exact relationship between these covariates and the event of interest is not typically known a priori and may exhibit complex, nonlinear patterns. To flexibly capture these nonlinear covariate effects, we impose a Gaussian Process prior over the effects of the continuous covariates.
In this presentation, we consider non-mixture cure rate models in presence of Gaussian Process and further develop causality approach to decide the order of treatment procedures. The methodology is exemplified on a breast cancer study.