DSpace Angular :: Browsing by Author "Mcllraith, Sheila A."

Browsing by Author "Mcllraith, Sheila A."

Now showing 1 - 2 of 2

A heuristic search approach to planning with temporally extended preferences
(ELSEVIER SCIENCE BV, 2009) Baier, Jorge A.; Bacchus, Fahiem; Mcllraith, Sheila A.
Planning with preferences involves not only finding a plan that achieves the goal, it requires finding a preferred plan that achieves the goal, where preferences over plans are specified as part of the planner's input. In this paper we provide a technique for accomplishing this objective. Our technique can deal with a rich class of preferences, including so-called temporally extended preferences (TEPs). Unlike simple preferences which express desired properties of the final state achieved by a plan, TEPs can express desired properties of the entire sequence of states traversed by a plan, allowing the user to express a much richer set of preferences. Our technique involves converting a planning problem with TEPs into an equivalent planning problem containing only simple preferences. This conversion is accomplished by augmenting the inputed planning domain with a new set of predicates and actions for updating these predicates. We then provide a collection of new heuristics and a specialized search algorithm that can guide the planner towards preferred plans. Under some fairly general conditions our method is able to find a most preferred plan-i.e., an optimal plan. It can accomplish this without having to resort to admissible heuristics, which often perform poorly in practice. Nor does our technique require an assumption of restricted plan length or make-span. We have implemented our approach in the HPLAN-P planning system and used it to compete in the 5th International Planning Competition, where it achieved distinguished performance in the Qualitative Preferences track. (C) 2008 Elsevier B.V. All rights reserved.
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
(2022) Icarte, Rodrigo Toro; Klassen, Toryn Q.; Valenzano, Richard; Mcllraith, Sheila A.
Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible - to show the reward function's code to the RL agent so it can exploit the function's internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Browsing by Author "Mcllraith, Sheila A."

Results Per Page

Sort Options