On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation

dc.catalogadorpva
dc.contributor.authorLabarca Silva, Álvaro
dc.contributor.authorParra Santander, Denis
dc.contributor.authorToro Icarte, Rodrigo Andrés
dc.date.accessioned2025-06-12T20:01:50Z
dc.date.available2025-06-12T20:01:50Z
dc.date.issued2024
dc.description.abstractIn recent years, Reinforcement Learning (RL) has shown great promise in session-based recommendation. Sequential models that use RL have reached state-of-the-art performance for the Next-item Prediction (NIP) task. This result is intriguing, as the NIP task only evaluates how well the system can correctly recommend the next item to the user, while the goal of RL is to find a policy that optimizes rewards in the long term - sometimes at the expense of suboptimal short-term performance. Then, how can RL improve the system's performance on short-term metrics? This article investigates this question by exploring proxy learning objectives, which we identify as goals RL models might be following, and thus could explain the performance boost. We found that RL - when used as an auxiliary loss - promotes the learning of embeddings that capture information about the user's previously interacted items. Subsequently, we replaced the RL objective with a straightforward auxiliary loss designed to predict the number of items the user interacted with. This substitution results in performance gains comparable to RL. These findings pave the way to improve performance and understanding of RL methods for recommender systems.
dc.description.funderNational Center for Artificial Intelligence CENIA
dc.description.funderFondecyt
dc.fechaingreso.objetodigital2025-06-12
dc.format.extent19 páginas
dc.fuente.origenSCOPUS
dc.identifier.issn2640-3498
dc.identifier.scopusidSCOPUS_ID:85203845318
dc.identifier.urihttps://proceedings.mlr.press/v235/silva24b.html
dc.identifier.urihttps://repositorio.uc.cl/handle/11534/104663
dc.information.autorucEscuela de Ingeniería; Labarca Silva, Álvaro; S/I; 1025772
dc.information.autorucEscuela de Ingeniería; Parra Santander, Denis; 0000-0001-9878-8761; 1011554
dc.information.autorucEscuela de Ingeniería; Toro Icarte, Rodrigo Andrés; 0000-0002-7734-099X; 170373
dc.language.isoen
dc.nota.accesocontenido completo
dc.pagina.final45450
dc.pagina.inicio45432
dc.publisherML Research Press
dc.relation.ispartofProceedings of the 41st International Conference on Machine Learning
dc.revistaPMLR
dc.rightsacceso abierto
dc.rights.licenseAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subject.ddc620
dc.subject.deweyIngenieríaes_ES
dc.titleOn the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation
dc.typecomunicación de congreso
dc.volumen235
sipa.codpersvinculados1025772
sipa.codpersvinculados1011554
sipa.codpersvinculados170373
sipa.trazabilidadSCOPUS;2024-09-22
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
On the Unexpected Effectiveness of Reinforcement Learning.pdf
Size:
342.36 KB
Format:
Adobe Portable Document Format
Description: