Browsing by Author "Mueller, Peter"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- ItemA Product Partition Model With Regression on Covariates(AMER STATISTICAL ASSOC, 2011) Mueller, Peter; Quintana, Fernando; Rosner, Gary L.We propose a probability model for random partitions in the presence of covariates. In other words, we develop a model-based clustering algorithm that exploits available covariates. The motivating application is predicting time to progression for patients in a breast cancer trial. We proceed by reporting a weighted average of the responses of clusters of earlier patients. The weights should be determined by the similarity of the new patient's covariate with the covariates of patients in each cluster. We achieve the desired inference by defining a random partition model that includes a regression on covariates. Patients with similar covariates are a priori more likely to be clustered together. Posterior predictive inference in this model formalizes the desired prediction.
- ItemA semiparametric Bayesian model for repeatedly repeated binary outcomes(WILEY-BLACKWELL, 2008) Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.; Relling, Mary V.We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives.
- ItemA short tutorial on Bayesian Nonparametrics(2016) Jara, Alejandro; Mueller, Peter; Xu, Yanxun
- ItemBM-BC: a Bayesian method of base calling for Solexa sequence data(2012) Quintana Quintana, Fernando; Jara Weitzmann, Alejandro; Ji, Yuan; Mitra, Riten; Mueller, Peter; Liu, Ping; Lu, Yue; Liang, ShoudanAbstract Base calling is a critical step in the Solexa next-generation sequencing procedure. It compares the position-specific intensity measurements that reflect the signal strength of four possible bases (A, C, G, T) at each genomic position, and outputs estimates of the true sequences for short reads of DNA or RNA. We present a Bayesian method of base calling, BM-BC, for Solexa-GA sequencing data. The Bayesian method builds on a hierarchical model that accounts for three sources of noise in the data, which are known to affect the accuracy of the base calls: fading, phasing, and cross-talk between channels. We show that the new method improves the precision of base calling compared with currently leading methods. Furthermore, the proposed method provides a probability score that measures the confidence of each base call. This probability score can be used to estimate the false discovery rate of the base calling or to rank the precision of the estimated DNA sequences, which in turn can be useful for downstream analysis such as sequence alignment.
- ItemCluster‐specific variable selection for product partition models(2015) Quintana Quintana, Fernando; Mueller, Peter; Papoila, Ana Luisa
- ItemDPpackage: Bayesian Semi- and Nonparametric Modeling in R(JOURNAL STATISTICAL SOFTWARE, 2011) Jara, Alejandro; Hanson, Timothy E.; Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
- ItemRegression with Variable Dimension Covariates(2024) Mueller, Peter; Quintana, Fernando Andres; Page, Garritt L.Regression is one of the most fundamental statistical inference problems. A broad definition of regression problems is as estimation of the distribution of an outcome using a family of probability models indexed by covariates. Despite the ubiquitous nature of regression problems and the abundance of related methods and results there is a surprising gap in the literature. There are no well established methods for regression with a varying dimension covariate vectors, despite the common occurrence of such problems. In this paper we review some recent related papers proposing varying dimension regression by way of random partitions.
- ItemSemi-parametric Bayesian Inference for Multi-Season Baseball Data(INT SOC BAYESIAN ANALYSIS, 2008) Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.; Munsell, MarkWe analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performance vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a different set of autologistic regression coefficients, i.e., the regression coefficients are random effects that are specific of each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain season, and some others, like the score of the game, can safely be ignored.
- ItemSemiparametric Bayesian classification with longitudinal markers(BLACKWELL PUBLISHING, 2007) De la Cruz Mesia, Rolando; Quintana, Fernando A.; Mueller, PeterWe analyse data from a study involving 173 pregnant women. The data are observed values of the beta human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.
- ItemSome issues in nonparametric Bayesian modelling using species sampling models(SAGE PUBLICATIONS LTD, 2008) Navarrete, Carlos; Quintana, Fernando A.; Mueller, PeterWe review some aspects of nonparametric Bayesian data analysis with discrete random probability measures. We focus on the class of species sampling models (SSMs). We critically investigate the common use of the Dirichlet process (DP) prior as a default SSM choice. We discuss alternative prior specifications from a theoretical, computational and data analysis perspective. We conclude with a recommendation to consider SSM priors beyond the special case of the DP prior, and make specific recommendations on how different choices can be used to reflect prior information and how they impact the desired inference. We show the required changes in the posterior simulation schemes, and argue that the additional generality can be achieved without additional computational effort.