Approaches that use human-feedback to guide RL-agents suffer from poor scalability. Furthermore, it is hard to make an informed feedback decision as a user about whether an agent took some actions for the right reason or not. Therefore, we want to study if a visual interpretation of an agent’s policy, in the form of an attention or saliency mechanism, can help users make better feedback decisions. Can a user distinguish a good policy from a bad policy when given a visualization mechanism? The main task is to find a suitable visualization technique and run a comparative study on RL-agent policies. In an extended step, the goal is to see whether such an augmentation of the observations can improve the performance of human-in-the-loop RL algorithms in game-based environments such as Atari.
Sammy Christen (firstname.lastname@example.org)
Dr. David Lindlbauer (email@example.com)
CLS Student Project (MPG ETH CLS)
ETH Organization's Labels (ETHZ)
Engineering and Technology
Behavioural and Cognitive Sciences
Max Planck ETH Center for Learning Systems