DSpace Angular :: Browsing by Author "Toro Icarte, Rodrigo Andrés"

Browsing by Author "Toro Icarte, Rodrigo Andrés"

Now showing 1 - 10 of 10

Benchmarking machine learning methods for portfolio management: challenges and opportunities
(2025) Laguna Altamirano, Santiago Andrés; Baier Aranda, Jorge Andrés; Toro Icarte, Rodrigo Andrés; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
El Aprendizaje Automático se ha convertido en una herramienta poderosa en la gestión de portafolios de inversión en la última década. Sin embargo, ciertos desafíos prácticos clave a menudo son pasados por alto, como la diversidad del mercado, los costos de transacción realistas para grandes operaciones y limitaciones robustas en las pruebas de los modelos. Este trabajo evalúa la efectividad y escalabilidad de los métodos de aprendizaje automático bajo condiciones más realistas utilizando las acciones que componen el S&P 500 y el DJIA. Analizamos técnicas de aprendizaje por refuerzo, aprendizaje por imitación, DAgger y basadas en modelos. Hasta donde sabemos, este es el primer estudio que compara sistemáticamente todos estos enfoques. Nuestros hallazgos demuestran que los mejores métodos superan el retorno anualizado y la razón de Sharpe de los índices de referencia estándar.
Can a general-purpose commonsense ontology improve performance of learning-based image retrieval?
(2015) Toro Icarte, Rodrigo Andrés; Baier Aranda, Jorge Andrés; Soto Arriaza, Álvaro Marcelo; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
La comunidad de representación del conocimiento ha invertido grandes esfuerzos en la creación de ontologías de sentido común. Ellas poseen miles de relaciones sobre distintos aspectos del mundo cotidiano, por ejemplo “todo hombre es persona” o “los libros son usados para leer”. Dentro de esta gran cantidad de relaciones, algunas de ellas contienen información relevante sobre el mundo visual. Sin embargo, hasta la fecha, ningún algoritmo (que sea el estado del arte en alguna tarea de visión por computador) ha incorporad oeste conocimiento en forma explícita. Dichos algoritmos suelen utilizar técnicas de aprendizaje de máquina para aprender modelos de reconocimiento a partir de ejemplos (miles de ellos). En esta tesis estudiamos si una ontología de propósito general, específicamente ConceptNet (la ontología del MIT), puede, o no, tener un rol en el estado del arte de visión por computador. Elegimos sentence based image retrieval (búsqueda de imágenes mediante oraciones) como escenario de pruebas. Nuestro punto de partida es una red convolucional profunda que nos permite generar un algoritmo de image retrieval basado en detectores de palabras. Luego de eso presentamos una variante que incorpora relaciones de sentido común provenientes de ConceptNet. Como resultado, obtuvimos una mejora el estado del arte para la base de datos MSCOCO 5K.
Comparación de MILP, SAT Y ASP para la síntesis de autómatas mínimos a partir de trazas
(2025) Medina Jorquera, Alex Pavel; Baier Aranda, Jorge Andrés; Toro Icarte, Rodrigo Andrés; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
En este trabajo, comparamos tres enfoques principales para la síntesis de autómatas finitos deterministas minimales (DFA) a partir de trazas etiquetadas positiva y negativamente: Programación Lineal Entera Mixta (MILP), resolución de problemas SAT y Programación de Conjuntos de Respuestas (ASP). Adaptamos modelos SAT existentes, desarrollamos nuevas codificaciones basadas en MILP y ASP, y creamos un conjunto de datos inspirado en la competencia StaMinA para evaluar el rendimiento de los modelos. Los resultados muestran que los enfoques basados en SAT son consistentemente superiores, resolviendo problemas tanto simples como complejos con mayor eficiencia. Los modelos basados en ASP presentan un rendimiento competitivo bajo configuraciones específicas, mientras que MILP mostró limitaciones en comparación. Este estudio destaca la importancia de las optimizaciones en modelos basados en ASP y su potencial para aplicaciones futuras, como la extensión a autómatas no deterministas y otros problemas de optimización discreta.
Connection-Aware Heuristics for Scheduling and Distributing Jobs under Dynamic Dew Computing Environments
(2024) Sanabria Quispe, Pablo; Montoya Tapia, Sebastián Ignacio; Neyem, Andrés; Toro Icarte, Rodrigo Andrés; Hirsch, Matías; Mateos, Cristian
Due to the widespread use of mobile and IoT devices, coupled with their continually expanding processing capabilities, dew computing environments have become a significant focus for researchers. These environments enable resource-constrained devices to contribute computing power to a local network. One major challenge within these environments revolves around task scheduling, specifically determining the optimal distribution of jobs across the available devices in the network. This challenge becomes particularly pronounced in dynamic environments where network conditions constantly change. This work proposes integrating the “reliability” concept into cutting-edge human-design job distribution heuristics named ReleSEAS and RelBPA as a means of adapting to dynamic and ever-changing network conditions caused by nodes’ mobility. Additionally, we introduce a reinforcement learning (RL) approach, embedding both the notion of reliability and real-time network status into the RL agent. Our research rigorously contrasts our proposed algorithms’ throughput and job completion rates with their predecessors. Simulated results reveal a marked improvement in overall throughput, with our algorithms potentially boosting the environment’s performance. They also show a significant enhancement in job completion within dynamic environments compared to baseline findings. Moreover, when RL is applied, it surpasses the job completion rate of human-designed heuristics. Our study emphasizes the advantages of embedding inherent network characteristics into job distribution algorithms for dew computing. Such incorporation gives them a profound understanding of the network’s diverse resources. Consequently, this insight enables the algorithms to manage resources more adeptly and effectively.
Encouraging exploration in vision and language navigation: a path towards better generalization
(2024) Hinostroza Espinoza, Cristian Andrés; Baier Aranda, Jorge Andrés; Toro Icarte, Rodrigo Andrés; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
In Vision-and-Language Navigation (VLN), given an natural-language instruction describing a certain target, and a 3D environment, the task is to find a sequence of actions that allows an agent to navigate from its current location to the target. A fundamental challenge in VLN is that training data is not representative of the distribution of environments. This lack of data may result in very poor performance on unseen environments. In this paper we study a novel approach which explicitly incorporates the notion of exploration training. Specifically, we propose Explore Supervision (EXPS), which is designed to provide VLN agents with supervision which strategically encourages exploration in areas around the initial shortest path to the target. We implemented EXPS on top of a state-of-the-art model for the REVERIE challenge, achieving an improvement of 4.77% in success rate on the unseen validation set.
Learning Reward Machines: A Study in Partially Observable Reinforcement Learning
(2023) Toro Icarte, Rodrigo Andrés; Klassen, Toryn Q.; Valenzano, Richard; Castro Anich, Margarita; Waldie, Ethan; McIlraith, Sheila A.
Reinforcement Learning (RL) is a machine learning paradigm wherein an artificial agentinteracts with an environment with the purpose of learning behaviour that maximizesthe expected cumulative reward it receives from the environment. Reward machines(RMs) provide a structured, automata-based representation of a reward function thatenables an RL agent to decompose an RL problem into structured subproblems that canbe efficiently learned via off-policy learning. Here we show that RMs can be learnedfrom experience, instead of being specified by the user, and that the resulting problemdecomposition can be used to effectively solve partially observable RL problems. We posethe task of learning RMs as a discrete optimization problem where the objective is to findan RM that decomposes the problem into a set of subproblems such that the combinationof their optimal memoryless policies is an optimal policy for the original problem. Weshow the effectiveness of this approach on three partially observable domains, where itsignificantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations,and broader potential.
On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation
(ML Research Press, 2024) Labarca Silva, Álvaro; Parra Santander, Denis; Toro Icarte, Rodrigo Andrés
In recent years, Reinforcement Learning (RL) has shown great promise in session-based recommendation. Sequential models that use RL have reached state-of-the-art performance for the Next-item Prediction (NIP) task. This result is intriguing, as the NIP task only evaluates how well the system can correctly recommend the next item to the user, while the goal of RL is to find a policy that optimizes rewards in the long term - sometimes at the expense of suboptimal short-term performance. Then, how can RL improve the system's performance on short-term metrics? This article investigates this question by exploring proxy learning objectives, which we identify as goals RL models might be following, and thus could explain the performance boost. We found that RL - when used as an auxiliary loss - promotes the learning of embeddings that capture information about the user's previously interacted items. Subsequently, we replaced the RL objective with a straightforward auxiliary loss designed to predict the number of items the user interacted with. This substitution results in performance gains comparable to RL. These findings pave the way to improve performance and understanding of RL methods for recommender systems.
Real time search with linear temporal goals
(2023) Middleton Moreno, Jaime Andrés; Baier Aranda, Jorge Andrés; Toro Icarte, Rodrigo Andrés; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
In Real-Time Heuristic Search (RTHS) we are given a search graph G, a heuristic, and an objective, which is to find a path from a start node to a given goal node in G. As such, one does not impose any trajectory constraints on the path, besides reaching the goal. In this work we consider a version of RTHS in which temporally extended goals can be defined on the form of the path. Such goals are specified in Linear Temporal Logic over Finite Traces (LTLf), an expressive language that has been considered in many other frameworks, such as Automated Planning, Synthesis, and Reinforcement Learning, but has not yet been studied in the context of RTHS. We propose a general automata-theoretic approach for RTHS, whereby LTLf goals are supported as the result of searching over the cross product of the search graph and the automaton for the LTLf goal; specifically, we describe LTL-LRTA*, a version of LSS LRTA*. Second, we propose an approach to produce heuristics for LTLf goals, based on existing goal-dependent heuristics. Finally, we propose a greedy strategy for RTHS with LTLf goals, which focuses search to make progress over the structure of the automaton; this yields LTL-LRTA*A. In our experimental evaluation over standard benchmarks we show LTL-LRTA*A may outperform LTL-LRTA* substantially for a variety of LTLf goals.
Reward Machines for Deep RL in Noisy and Uncertain Environments
(2024) Li, Andrew C.; Chen, Zizhao; Klassen, Toryn Q.; Vaezipoor, Pashootan; Toro Icarte, Rodrigo Andrés; McIlraith, Sheila A.
Reward Machines provide an automaton-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing the underlying structure of a reward function, they enable the decomposition of an RL task, leading to impressive gains in sample efficiency. Although Reward Machines and similar formal specifications have a rich history of application towards sequential decision-making problems, they critically rely on a ground-truth interpretation of the domain-specific vocabulary that forms the building blocks of the reward function—such ground-truth interpretations are elusive in the real world due in part to partial observability and noisy sensing. In this work, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that exploit task structure under uncertain interpretation of the domain- specific vocabulary. Through theory and experiments, we expose pitfalls in naive approaches to this problem while simultaneously demonstrating how task structure can be successfully leveraged under noisy interpretations of the vocabulary.
The unexpected results of reinforcement learning for sequential recommendation
(2024) Labarca Silva, Álvaro; Parra Santander, Denis; Toro Icarte, Rodrigo Andrés; Pontificia Universidad Católica de Chile. Escuela de Ingeniería
En los ultimos años, el Aprendizaje Reforzado (RL por sus siglas en inglés) ha mostrado un gran potencial en recomendaciones basadas en sesión. Modelos secuenciales que usan RL han alcanzado resultados de estado del arte en la tarea de predicción del siguiente ítem (NIP por sus siglas en inglés). Este resultado es intrigante, ya que la tarea NIP solo evalúa qué tan bien el sistema puede correctamente recomendar el siguiente ítem al usuario, mientras que el objetivo de RL es encontrar una política que optimiza la recompensa en el largo plazo - a veces a costa de un desempeño a corto plazo sub-óptimo. Entonces, ¿Cómo puede RL mejorar el desempeño del sistema en métricas de corto plazo? Este artículo investiga esta pregunta explorando con objetivos de aprendizaje a través de un proxy, que identificamos como objetivos que los modelos de RL podrían estar siguiendo y de esta forma explicar la ganancia en desempeño. Encontramos que RL - al ser usado como pérdida auxiliar - promueve el aprendizaje de embeddings que capturan información acerca de ítems con los que el usuario interactuó previamente. Luego, reemplazamos el objetivo RL con una pérdida auxiliar directa diseñada para predecir el número de ítems con los que el usuario ha interactuado. Esta substitución resulta en una mejora de rendimiento comparable a la de RL. Estos resultados abren el camino para mejorar el desempeño y entendimiento de modelos de RL para sistemas recomendadores.

Browsing by Author "Toro Icarte, Rodrigo Andrés"

Results Per Page

Sort Options