Deep reinforcement learning (DRL) models have shown great promise in various applications, but their practical adoption in critical domains is limited due to their opaque decision-making processes. To address this challenge, explainable AI (XAI) techniques aim to enhance transparency and interpretability of black-box models. However, most current interpretable systems focus on supervised learning problems,
leaving reinforcement learning relatively unexplored.
This paper extends the work of PW-Net, an interpretable wrapper model for DRL agents inspired by image classification methodologies. We introduce Shared-PW-Net, an interpretable deep learning model that features a fully trainable prototype layer. Unlike PW-Net, Shared-PW-Net does not rely on
pre-existing prototypes. Instead, it leverages the concept of ProtoPool to automatically learn general prototypes assigned to actions during training. Additionally, we propose a novel prototype initialization method that significantly improves the model’s performance.
Through extensive experimentation, we demonstrate that our Shared-PW-Net achieves the same reward performance as existing methods without requiring human intervention. Our model’s fully trainable prototype layer, coupled with the innovative prototype initialization approach, contributes to a clearer and more interpretable decision-making process. The code for this work is publicly available for further exploration and applications.
Dettaglio pubblicazione
2023, CEUR Workshop Proceedings Vol-3518, Pages -
Understanding Deep RL agent decisions: a novel interpretable approach with trainable prototypes (04b Atto di convegno in volume)
Borzillo Caterina, Ragno Alessio, Capobianco Roberto
keywords