DSpace Angular :: Browsing by Author "Arenas, Marcelo"

Browsing by Author "Arenas, Marcelo"

Now showing 1 - 20 of 32

An extension of SPARQL for RDFS
(SPRINGER-VERLAG BERLIN, 2008) Arenas, Marcelo; Gutierrez, Claudio; Perez, Jorge; Christophides, V.; Collard, M.; Gutierrez, C.
RDF Schema (RDFS) extends RDF with a schema vocabulary with a predefined semantics. Evaluating queries which involve this vocabulary is challenging, and there is not yet consensus in the Semantic Web community on how to define a query language for RDFS. In this paper, we introduce a language for querying RDFS data. This language is obtained by extending SPARQL with nested regular expressions that allow to navigate through an RDF graph with RDFS vocabulary. This language is expressive enough to answer SPARQL queries involving RDFS vocabulary, by directly traversing the input graph.
Bidirectional Constraints for Exchanging Data: Beyond Monotone Queries
(IJCAI-INT JOINT CONF ARTIF INTELL, 2015) Arenas, Marcelo; Dieguez, Gabriel; Perez, Jorge
In this paper, we propose to use the language of bidirectional constraints to specify schema mappings in the context of data exchange. These constraints impose restrictions over both the source and the target data, and have the potential to minimize the ambiguity in the description of the target data to be materialized. We start by making a case for the usefulness of bidirectional constraints to give a meaningful closed-world semantics for st-tgds, which is motivated by Clark's predicate completion and Reiter's formalization of the closed-world assumption of a logical theory. We then formally study the use of bidirectional constraints in data exchange. In particular, we pinpoint the complexity of the existence-of-solutions and the query evaluation problems in several different scenarios, including in the latter case both monotone and nonmonotone queries.
Chile's New Interdisciplinary Institute for Foundational Research on Data
(2020) Arenas, Marcelo; Barcelo, Pablo
Composition and Inversion of Schema Mappings
(ASSOC COMPUTING MACHINERY, 2009) Arenas, Marcelo; Perez, Jorge; Reutter, Juan; Riveros, Cristian
COMPOSITION WITH TARGET CONSTRAINTS
(2011) Arenas, Marcelo; Fagin, Ronald; Nash, Alan
It is known that the composition of schema mappings, each specified by source-to-target tgds (st-tgds), can be specified by a second-order tgd (SO tgd). We consider the question of what happens when target constraints are allowed. Specifically, we consider the question of specifying the composition of standard schema mappings (those specified by st-tgds, target egds, and a weakly acyclic set of target tgds). We show that SO tgds, even with the assistance of arbitrary source constraints and target constraints, cannot specify in general the composition of two standard schema mappings. Therefore, we introduce source-to-target second-order dependencies (st-SO dependencies), which are similar to SO tgds, but allow equations hi the conclusion. We, show that st-SO dependencies (along with target egds and target tgds) are sufficient to express the composition of every finite sequence of standard schema mappings, and further, every at SO dependency specifies such a composition. In addition to this expressive power, we show that st-SO dependencies enjoy other desirable properties. In particular, they have a polynomial-time chase that generates a universal solution. This universal solution can be used to find the certain answers to unions of conjunctive queries in polynomial time.
Counting the Answers to a Query
(2022) Arenas, Marcelo; Croquevielle, Luis Alberto; Jayaram, Rajesh; Riveros, Cristian
Counting the answers to a query is a fundamental problem in databases, with several applications in the evaluation, optimization, and visualization of queries. Unfortunately, counting query answers is a #P-hard problem in most cases, so it is unlikely to be solvable in polynomial time. Recently, new results on approximate counting have been developed, specifically by showing that some problems in automata theory admit fully polynomial-time randomized approximation schemes. These results have several implications for the problem of counting the answers to a query; in particular, for graph and conjunctive queries. In this work, we present the main ideas of these approximation results, by using labeled DAGs instead of automata to simplify the presentation. In addition, we review how to apply these results to count query answers in different areas of databases.
Game-based notions of locality over finite models
(2008) Arenas, Marcelo; Barceló, Pablo; Libkin, Leonid
Locality notions in logic say that the truth value of a formula can be determined locally, by looking at the isomorphism type of a small neighbourhood of its free variables. Such notions have proved to be useful in many applications. They all, however, refer to isomorphisms of neighbourhoods, which most local logics cannot test. A stronger notion of locality says that the truth value of a formula is determined by what the logic itself can say about that small neighbourhood. Since the expressiveness of many logics can be characterized by games, one can also say that the truth value of a formula is determined by the type, with respect to a game, of that small neighbourhood. Such game-based notions of locality can often be applied when traditional isomorphism-based notions of locality cannot. Our goal is to study game-based notions of locality. We work with an abstract view of games that subsumes games for many logics. We look at three, progressively more complicated locality notions. The easiest requires only very mild conditions on the game and works for most logics of interest. The other notions, based on Hanf's and Gaifman's theorems, require more restrictions. We state those restrictions and give examples of logics that satisfy and fail the respective game-based notions of locality. (c) 2007 Elsevier B.V. All rights reserved.
Improved screening of COVID-19 cases through a Bayesian network symptoms model and psychophysical olfactory test
(2021) Eyheramendy, Susana ; Saa, Pedro A. ; Undurraga, Eduardo A. ; Valencia, Carlos ; López, Carolina ; Méndez, Luis ; Pizarro-Berdichevsky, Javier ; Finkelstein-Kulka, Andrés ; Solari, Sandra ; Salas, Nicolás ; Bahamondes, Pedro ; Ugarte, Martín ; Barceló, Pablo ; Arenas, Marcelo ; Agosin, Eduardo
Information systems preface
(PERGAMON-ELSEVIER SCIENCE LTD, 2009) Arenas, Marcelo; Schwartzbach, Michael I.
Is it Possible to Verify if a Transaction is Spendable?
(FRONTIERS MEDIA SA, 2021) Arenas, Marcelo; Reisenegger, Thomas; Reutter, Juan; Vrgoc, Domagoj
With the popularity of Bitcoin, there is a growing need to understand the functionality, security, and performance of various mechanisms that comprise it. In this paper, we analyze Bitcoin's scripting language, Script, that is one of the main building blocks of Bitcoin transactions. We formally define the semantics of Script, and study the problem of determining whether a user-defined script is well-formed; that is, whether it can be unlocked, or whether it contains errors that would prevent this from happening.
Locality of queries and transformations (Invited talk)
(SPRINGER, 2006) Arenas, Marcelo
MillenniumDB: An Open-Source Graph Database System
(2023) Vrgoc, Domagoj; Rojas, Carlos; Angles, Renzo; Arenas, Marcelo; Arroyuelo, Diego; Buil-Aranda, Carlos; Hogan, Aidan; Navarro, Gonzalo; Riveros, Cristian; Romero, Juan
In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB is based on a graph data model, which we call domain graphs, that provides a simple abstraction upon which a variety of popular graph models can be supported, thus providing a flexible data management engine for diverse types of knowledge graph. The engine itself is founded on a combination of tried and tested techniques from relational data management, state-of-the-art algorithms for worst-case-optimal joins, as well as graph-specific algorithms for evaluating path queries. In this paper, we present the main design principles underlying MillenniumDB, describing the abstract graph model and query semantics supported, the concrete data model and query syntax implemented, as well as the storage, indexing, query planning and query evaluation techniques used. We evaluate MillenniumDB over real-world data and queries from the Wikidata knowledge graph, where we find that it outperforms other popular persistent graph database engines (including both enterprise and open source alternatives) that support similar query features.
#NFA Admits an FPRAS: Efficient Enumeration, Counting, and Uniform Generation for Logspace Classes
(ASSOC COMPUTING MACHINERY, 2021) Arenas, Marcelo; Alberto Croquevielle, Luis; Jayaram, Rajesh; Riveros, Cristian
In this work, we study two simple yet general complexity classes, based on logspace Turing machines, that provide a unifying framework for efficient query evaluation in areas such as information extraction and graph databases, among others. We investigate the complexity of three fundamental algorithmic problems for these classes: enumeration, counting, and uniform generation of solutions, and show that they have several desirable properties in this respect.
nSPARQL: A Navigational Language for RDF
(SPRINGER-VERLAG BERLIN, 2008) Perez, Jorge; Arenas, Marcelo; Gutierrez, Claudio; Sheth, A.; Staab, S.; Paolucci, M.; Maynard, D.; Finin, T.; Krishnaprasad, T.
Navigational features have been largely recognized as fundamental for graph database query languages. This fact has motivated several authors to propose RDF query languages with navigational capabilities. In particular, we have argued in a previous paper that nested regular expressions are appropriate to navigate RDF data, and we have proposed the nSPARQL query language for RDF, that uses nested regular expressions as building blocks. In this paper, we study some of the fundamental properties of nSPARQL concerning expressiveness and complexity of evaluation. Regarding expressiveness, we show that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph. We also show that nesting is necessary to obtain this last result, and we study the expressiveness of the combination of nested regular expressions and SPARQL operators. Regarding complexity of evaluation, we prove that the evaluation of a nested regular expression E over an RDF graph G can be computed in time O(vertical bar G vertical bar . vertical bar E vertical bar).
OBDA: Query Rewriting or Materialization? In Practice, Both!
(SPRINGER INTERNATIONAL PUBLISHING AG, 2014) Sequeda, Juan F.; Arenas, Marcelo; Miranker, Daniel P.; Mika, P.; Tudorache, T.; Bernstein, A.; Welty, C.; Knoblock, C.; Vrandecic, D.; Groth, P.; Noy, N.; Janowicz, K.; Goble, C.
Given a source relational database, a target OWL ontology and a mapping from the source database to the target ontology, Ontology-Based Data Access (OBDA) concerns answering queries over the target ontology using these three components. This paper presents the development of Ultrawrap(OBDA), an OBDA system comprising bidirectional evaluation; that is, a hybridization of query rewriting and materialization. We observe that by compiling the ontological entailments as mappings, implementing the mappings as SQL views and materializing a subset of the views, the underlying SQL optimizer is able to reduce the execution time of a SPARQL query by rewriting the query in terms of the views specified by the mappings. To the best of our knowledge, this is the first OBDA system supporting ontologies with transitivity by using SQL recursion. Our contributions include: (1) an efficient algorithm to compile ontological entailments as mappings; (2) a proof that every SPARQL query can be rewritten into a SQL query in the context of mappings; (3) a cost model to determine which views to materialize to attain the fastest execution time; and (4) an empirical evaluation comparing with a state-of-the-art OBDA system, which validates the cost model and demonstrates favorable execution times.
On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results
(2023) Arenas, Marcelo; Barcelo, Pablo; Bertossi, Leopoldo; Monet, Mikael
Scores based on Shapley values are widely used for providing explanations to classification results over machine learning models. A prime example of this is the influential SHAP-score, a version of the Shapley value that can help explain the result of a learned model on a specific entity by assigning a score to every feature. While in general computing Shapley values is a computationally intractable problem, we prove a strong positive result stating that the SHAP-score can be computed in polynomial time over deterministic and decomposable Boolean circuits under the so-called product distributions on entities. Such circuits are studied in the field of Knowledge Compilation and generalize a wide range of Boolean circuits and binary decision diagrams classes, including binary decision trees, Ordered Binary Decision Diagrams (OBDDs) and Free Binary Decision Diagrams (FBDDs). Our positive result extends even beyond binary classifiers, as it continues to hold if each feature is associated with a finite domain of possible values.We also establish the computational limits of the notion of SHAP-score by observing that, under a mild condition, computing it over a class of Boolean models is always poly-nomially as hard as the model counting problem for that class. This implies that both determinism and decomposability are essential properties for the circuits that we consider, as removing one or the other renders the problem of computing the SHAP-score intractable (namely, #P-hard). It also implies that computing SHAP-scores is #P-hard even over the class of propositional formulas in DNF. Based on this negative result, we look for the existence of fully-polynomial randomized approximation schemes (FPRAS) for computing SHAP-scores over such class. In stark contrast to the model counting problem for DNF formulas, which admits an FPRAS, we prove that no such FPRAS exists (under widely believed complexity assumptions) for the computation of SHAP-scores. Surprisingly, this negative result holds even for the class of monotone formulas in DNF. These techniques can be further extended to prove another strong negative result: Under widely believed complexity assumptions, there is no polynomial-time algorithm that checks, given a monotone DNF formula phi and features x, y, whether the SHAP-score of x in phi is smaller than the SHAP-score of y in phi.
On the complexity of verifying consistency of XML specifications
(SIAM PUBLICATIONS, 2008) Arenas, Marcelo; Fan, Wenfei; Libkin, Leonid
XML specifications often consist of a type definition (typically, a document type definition (DTD)) and a set of integrity constraints. It has been shown previously that such specifications can be inconsistent, and thus it is often desirable to check consistency at compile time. It is known [W. Fan and L. Libkin, J. ACM, 49 (2002), pp. 368 - 406] that for general keys, foreign keys, and DTDs the consistency problem is undecidable; however, it becomes NP-complete when all keys are one-attribute (unary) and tractable, if no foreign keys are used. In this paper, we consider a variety of previously studied constraints for XML data and investigate the complexity of the consistency problem. Our main conclusion is that, in the presence of foreign key constraints, compile-time veri. cation of consistency is infeasible. We look at absolute constraints that hold in the entire document and relative constraints that hold only in a part of the document. For absolute constraints, we prove decidability and establish complexity bounds for primary multiattribute keys and unary foreign keys and study unary constraints that involve regular expressions. For relative constraints, we prove that even for unary constraints the consistency problem is undecidable. We also show that results continue to hold for extended DTDs, a more expressive typing mechanism for XML.
Query language-based inverses of schema mappings: semantics, computation, and closure properties
(2012) Arenas, Marcelo; Perez, Jorge; Reutter, Juan; Riveros, Cristian
The inversion of schema mappings has been identified as one of the fundamental operators for the development of a general framework for metadata management. During the last few years, three alternative notions of inversion for schema mappings have been proposed (Fagin-inverse (Fagin, TODS 32(4), 25:1-25:53, 2007), quasi-inverse (Fagin et al., TODS 33(2), 11:1-11:52, 2008), and maximum recovery (Arenas et al., TODS 34(4), 22:1-22:48, 2009)). However, these notions lack some fundamental properties that limit their practical applicability: most of them are expressed in languages including features that are difficult to use in practice, some of these inverses are not guaranteed to exist for mappings specified with source-to-target tuple-generating dependencies (st-tgds), and it has been futile to search for a meaningful mapping language that is closed under any of these notions of inverse. In this paper, we develop a framework for the inversion of schema mappings that fulfills all of the above requirements. It is based on the notion of -maximum recovery, for a query language , a notion designed to generate inverse mappings that recover back only the information that can be retrieved with queries in . By focusing on the language of conjunctive queries (CQ), we are able to find a mapping language that contains the class of st-tgds, is closed under CQ-maximum recovery, and for which the chase procedure can be used to exchange data efficiently. Furthermore, we show that our choices of inverse notion and mapping language are optimal, in the sense that choosing a more expressive inverse operator or mapping language causes the loss of these properties.
Query Languages for Data Exchange: Beyond Unions of Conjunctive Queries
(2011) Arenas, Marcelo; Barcelo, Pablo; Reutter, Juan
The class of unions of conjunctive queries (UCQ) has been shown to be particularly well-behaved for data exchange; its certain answers can be computed in polynomial time (in terms of data complexity). However, this is not the only class with this property; the certain answers to any Datalog program can also can be computed in polynomial time. The problem is that both UCQ and Datalog do not allow negated atoms, as adding an unrestricted form of negation to these languages yields to intractability.
Querying in the Age of Graph Databases and Knowledge Graphs
(ASSOC COMPUTING MACHINERY, 2021) Arenas, Marcelo; Gutierrez, Claudio; Sequeda, Juan F.
Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.

Browsing by Author "Arenas, Marcelo"

Results Per Page

Sort Options