• La Universidad
    • Historia
    • Rectoría
    • Autoridades
    • Secretaría General
    • Pastoral UC
    • Organización
    • Hechos y cifras
    • Noticias UC
  • 2011-03-15-13-28-09
  • Facultades
    • Agronomía e Ingeniería Forestal
    • Arquitectura, Diseño y Estudios Urbanos
    • Artes
    • Ciencias Biológicas
    • Ciencias Económicas y Administrativas
    • Ciencias Sociales
    • College
    • Comunicaciones
    • Derecho
    • Educación
    • Filosofía
    • Física
    • Historia, Geografía y Ciencia Política
    • Ingeniería
    • Letras
    • Matemáticas
    • Medicina
    • Química
    • Teología
    • Sede regional Villarrica
  • 2011-03-15-13-28-09
  • Organizaciones vinculadas
  • 2011-03-15-13-28-09
  • Bibliotecas
  • 2011-03-15-13-28-09
  • Mi Portal UC
  • 2011-03-15-13-28-09
  • Correo UC
- Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log in
    Log in
    Have you forgotten your password?
Repository logo
  • Communities & Collections
  • All of DSpace
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log in
    Log in
    Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Barcelo, Pablo"

Now showing 1 - 11 of 11
Results Per Page
Sort Options
  • No Thumbnail Available
    Item
    Chile's New Interdisciplinary Institute for Foundational Research on Data
    (2020) Arenas, Marcelo; Barcelo, Pablo
  • No Thumbnail Available
    Item
    On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results
    (2023) Arenas, Marcelo; Barcelo, Pablo; Bertossi, Leopoldo; Monet, Mikael
    Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, we prove a strong positive result stating that the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits under the so-called product distributions on entities. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). Our positive result extends even beyond binary classifiers, as it continues to hold if each feature is associated with a finite domain of possible values.We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always poly-nomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard). It also implies that computing SHAP-scores is #P-hard even over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing SHAP-scores over such class. In stark contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists (under widely believed complexity assumptions) for the computation of SHAP-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula phi and features x, y, whether the SHAP-score of x in phi is smaller than the SHAP-score of y in phi.
  • No Thumbnail Available
    Item
    On the expressiveness of LARA: A proposal for unifying linear and relational algebra
    (2022) Barcelo, Pablo; Higuera, Nelson; Perez, Jorge; Subercaseaux, Bernardo
    We study the expressive power of the LARA language - a recently proposed unified model for expressing relational and linear algebra operations - both in terms of traditional database query languages and some analytic tasks often performed in machine learning pipelines. Since LARA is parameterized by a set of user-defined functions which allow to transform values in tables, known as extension functions, the exact expressive power of the language depends on how these functions are defined. We start by showing LARA to be expressive complete with respect to a syntactic fragment of relational algebra with aggregation (under the mild assumption that extension functions in LARA can cope with traditional relational algebra operations such as selection and renaming). We then look further into the expressiveness of LARA based on different classes of extension functions, and distinguish two main cases depending on the level of genericity that queries are enforced to satisfy. Under strong genericity assumptions the language cannot express matrix convolution, a very important operation in current machine learning pipelines. This language is also local, and thus cannot express operations such as matrix inverse that exhibit a recursive behavior. For expressing convolution, one can relax the genericity requirement by adding an underlying linear order on the domain. This, however, destroys locality and turns the expressive power of the language much more difficult to understand. In particular, although under complexity assumptions some versions of the resulting language can still not express matrix inverse, a proof of this fact without such assumptions seems challenging to obtain.
  • No Thumbnail Available
    Item
    Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries
    (2011) Arenas, Marcelo; Barcelo, Pablo; Reutter, Juan
    The class of unions of conjunctive queries (UCQ) has been shown to be particularly well-behaved for data exchange; its certain answers can be computed in polynomial time (in terms of data complexity). However, this is not the only class with this property; the certain answers to any Datalog program can also can be computed in polynomial time. The problem is that both UCQ and Datalog do not allow negated atoms, as adding an unrestricted form of negation to these languages yields to intractability.
  • No Thumbnail Available
    Item
    Querying Regular Graph Patterns
    (2014) Barcelo, Pablo; Libkin, Leonid; Reutter, Juan L.
    Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, that is, graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood.
  • No Thumbnail Available
    Item
    Regular Languages of Nested Words: Fixed Points, Automata, and Synchronization
    (2011) Arenas, Marcelo; Barcelo, Pablo; Libkin, Leonid
    Nested words provide a natural model of runs of programs with recursive procedure calls. The usual connection between monadic second-order logic (MSO) and automata extends from words to nested words and gives us a natural notion of regular languages of nested words.
  • No Thumbnail Available
    Item
    Regularizing conjunctive features for classification
    (2021) Barcelo, Pablo; Baumgartner, Alexander; Dalmau, Victor; Kimelfeld, Benny
    We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and we want to find feature queries that linearly separate the two sets of examples. We focus on conjunctive feature queries, and explore two problems: (a) deciding if separating feature queries exist (separability), and (b) generating such queries when they exist. To restrict the complexity of the generated classifiers, we explore various ways of regularizing them by limiting their dimension, the number of joins in feature queries, and their generalized hypertreewidth (ghw). We show that the separability problem is tractable for bounded ghw; yet, the generation problem is not because feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without explicitly generating such queries. (C) 2021 Elsevier Inc. All rights reserved.
  • Loading...
    Thumbnail Image
    Item
    Screening of COVID-19 cases through a Bayesian network symptoms model and psychophysical olfactory test
    (CELL PRESS, 2021) Eyheramendy, Susana; Saa, Pedro A.; Undurraga, Eduardo A.; Valencia, Carlos; Lopez, Carolina; Mendez, Luis; Pizarro Berdichevsky, Javier; Finkelstein Kulka, Andres; Solari, Sandra; Salas, Nicolas; Bahamondes, Pedro; Ugarte, Martin; Barcelo, Pablo; Arenas, Marcelo; Agosin, Eduardo
    The sudden loss of smell is among the earliest and most prevalent symptoms of COVID-19 when measured with a clinical psychophysical test. Research has shown the potential impact of frequent screening for olfactory dysfunction, but existing tests are expensive and time consuming. We developed a low-cost ($0.50/test) rapid psychophysical olfactory test (KOR) for frequent testing and a model-based COVID-19 screening framework using a Bayes Network symptoms model. We trained and validated the model on two samples: suspected COVID-19 cases in five healthcare centers (n = 926; 33% prevalence, 309 RT-PCR confirmed) and healthy miners (n = 1,365; 1.1% prevalence, 15 RT-PCR confirmed). The model predicted COVID-19 status with 76% and 96% accuracy in the healthcare and miners samples, respectively (healthcare: AUC = 0.79 [0.75-0.82], sensitivity: 59%, specificity: 87%; miners: AUC = 0.71 [0.63-0.79], sensitivity: 40%, specificity: 97%, at 0.50 infection probability threshold). Our results highlight the potential for low-cost, frequent, accessible, routine COVID-19 testing to support society's reopening.
  • No Thumbnail Available
    Item
    Semantic Optimization of Conjunctive Queries
    (2020) Barcelo, Pablo; Figueira, Diego; Gottlob, Georg; Pieris, Andreas
    This work deals with the problem of semantic optimization of the central class of conjunctive queries (CQs). Since CQ evaluation is NP-complete, a long line of research has focussed on identifying fragments of CQs that can be efficiently evaluated. One of the most general restrictions corresponds to generalized hypetreewidth bounded by a fixed constant k >= 1; the associated fragment is denoted GHW(k). A CQ is semantically in GHW(k) if it is equivalent to a CQ in GHW(k). The problem of checking whether a CQ is semantically in GHW(k) has been studied in the constraint-free case, and it has been shown to be NP-complete. However, in case the database is subject to constraints such as tuple-generating dependencies (TGDs) that can express, e.g., inclusion dependencies, or equality-generating dependencies (EGDs) that capture, e.g., key dependencies, a CQ may turn out to be semantically in GHW(k) under the constraints, while not being semantically in GHW(k) without the constraints. This opens avenues to new query optimization techniques. In this article, we initiate and develop the theory of semantic optimization of CQs under constraints. More precisely, we study the following natural problem: Given a CQ and a set of constraints, is the query semantically in GHW(k), for a fixed k >= 1, under the constraints, or, in other words, is the query equivalent to one that belongs to GHW(k) over all those databases that satisfy the constraints? We show that, contrary to what one might expect, decidability of CQ containment is a necessary but not a sufficient condition for the decidability of the problem in question. In particular, we show that checking whether a CQ is semantically in GHW1 is undecidable in the presence of full TGDs (i.e., Datalog rules) or EGDs. In view of the above negative results, we focus on the main classes of TGDs for which CQ containment is decidable and that do not capture the class of full TGDs, i.e., guarded, non-recursive, and sticky sets of TGDs, and show that the problem in question is decidable, while its complexity coincides with the complexity of CQ containment. We also consider key dependencies over unary and binary relations, and we show that the problem in question is decidable in elementary time. Furthermore, we investigate whether being semantically in GHW(k) alleviates the cost of query evaluation. Finally, in case a CQ is not semantically in GHW(k), we discuss how it can be approximated via a CQ that falls in GHW(k) in an optimal way. Such approximations might help finding "quick" answers to the input query when exact evaluation is intractable.
  • No Thumbnail Available
    Item
    The Expressive Power of Graph Neural Networks as a Query Language
    (2020) Barcelo, Pablo; Kostylev, Egor, V; Monet, Mikael; Perez, Jorge; Reutter, Juan L.; Silva, Juan-Pablo
    In this paper we survey our recent results characterizing various graph neural network (GNN) architectures in terms of their ability to classify nodes over graphs, for classifiers based on unary logical formulas- or queries. We focus on the language FOC2, a well-studied fragment of FO. This choice is motivated by the fact that FOC2 is related to the Weisfeiler-Lehman (WL) test for checking graph isomorphism, which has the same ability as GNNs for distinguishing nodes on graphs. We unveil the exact relationship between FOC2 and GNNs in terms of node classification. To tackle this problem, we start by studying a popular basic class of GNNs, which we call AC-GNNs, in which the features of each node in a graph are updated, in successive layers, according only to the features of its neighbors. We prove that the unary FOC2 formulas that can be captured by an AC-GNN are exactly those that can be expressed in its guarded fragment, which in turn corresponds to graded modal logic. This result implies in particular that AC-GNNs are too weak to capture all FOC2 formulas. We then seek for what needs to be added to AC-GNNs for capturing all FOC2. We show that it suffices to add readouts layers, which allow updating the node features not only in terms of its neighbors, but also in terms of a global attribute vector. We call GNNs with readouts ACR-GNNs. We also describe experiments that validate our findings by showing that, on synthetic data conforming to FOC2 but not to graded modal logic, AC-GNNs struggle to fit in while ACR-GNNs can generalise even to graphs of sizes not seen during training.
  • Loading...
    Thumbnail Image
    Item
    The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits
    (ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, 2021) Arenas, Marcelo; Barcelo, Pablo; Bertossi, Leopoldo; Monet, Mikael
    Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP - score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, it has recently been claimed that the SHAP-score can be computed in polynomial time over the class of decision trees. In this paper, we provide a proof of a stronger result over Boolean models: the SHAP -score can be computed in polynomial time over deterministic and decomposable Boolean circuits. Such circuits, also known as tractable Boolean circuits, generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always polynomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard).

Bibliotecas - Pontificia Universidad Católica de Chile- Dirección oficinas centrales: Av. Vicuña Mackenna 4860. Santiago de Chile.

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback