Browsing by Author "McIlraith, Sheila A."
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- ItemLearning Reward Machines: A Study in Partially Observable Reinforcement Learning(2023) Toro Icarte, Rodrigo Andrés; Klassen, Toryn Q.; Valenzano, Richard; Castro Anich, Margarita; Waldie, Ethan; McIlraith, Sheila A.Reinforcement Learning (RL) is a machine learning paradigm wherein an artificial agentinteracts with an environment with the purpose of learning behaviour that maximizesthe expected cumulative reward it receives from the environment. Reward machines(RMs) provide a structured, automata-based representation of a reward function thatenables an RL agent to decompose an RL problem into structured subproblems that canbe efficiently learned via off-policy learning. Here we show that RMs can be learnedfrom experience, instead of being specified by the user, and that the resulting problemdecomposition can be used to effectively solve partially observable RL problems. We posethe task of learning RMs as a discrete optimization problem where the objective is to findan RM that decomposes the problem into a set of subproblems such that the combinationof their optimal memoryless policies is an optimal policy for the original problem. Weshow the effectiveness of this approach on three partially observable domains, where itsignificantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations,and broader potential.
- ItemPlanning with Preferences(2008) Baier, Jorge A.; McIlraith, Sheila A.Automated planning is a branch or AI that addresses the problem of generating a set of actions to achieve a specified goal state, given an initial state of the world. It is an active area of research that is central to the development of intelligent agents and autonomous robots. In many real-world applications, a multitude of valid plans exist, and a user distinguishes plans of high qnality by how well they adhere to the user's preferences. To generate such high-quality plans automatically, a planning system must provide a means of specifying the user's preferences with respect to the planning task, as well as a means of generating plans that ideally optimize these preferences. In the last few years, there has been significant research in the area of planning with preferences. In this article we review current approaches to preference representation for planning as well as overviewing and contrasting the various approaches to generating preferred plans that have been developed to date.
- ItemReward Machines for Deep RL in Noisy and Uncertain Environments(2024) Li, Andrew C.; Chen, Zizhao; Klassen, Toryn Q.; Vaezipoor, Pashootan; Toro Icarte, Rodrigo Andrés; McIlraith, Sheila A.Reward Machines provide an automaton-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing the underlying structure of a reward function, they enable the decomposition of an RL task, leading to impressive gains in sample efficiency. Although Reward Machines and similar formal specifications have a rich history of application towards sequential decision-making problems, they critically rely on a ground-truth interpretation of the domain-specific vocabulary that forms the building blocks of the reward function—such ground-truth interpretations are elusive in the real world due in part to partial observability and noisy sensing. In this work, we explore the use of Reward Machines for Deep RL in noisy and uncertain environments. We characterize this problem as a POMDP and propose a suite of RL algorithms that exploit task structure under uncertain interpretation of the domain- specific vocabulary. Through theory and experiments, we expose pitfalls in naive approaches to this problem while simultaneously demonstrating how task structure can be successfully leveraged under noisy interpretations of the vocabulary.
- ItemReward Machines for Deep RL in Noisy and Uncertain Environments(2024) Li, Andrew C.; Chen, Zizhao; Klassen, Toryn Q.; Vaezipoor, Pashootan; Toro Icarte Rodrigo Andres; McIlraith, Sheila A.