arxiv:2504.03163

Enhanced Penalty-based Bidirectional Reinforcement Learning Algorithms

Published on Apr 4, 2025

Authors:

Abstract

Penalty functions integrated with bidirectional learning enhance reinforcement learning agent performance by guiding action avoidance and improving learning efficiency in complex environments.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

This research focuses on enhancing reinforcement learning (RL) algorithms by integrating penalty functions to guide agents in avoiding unwanted actions while optimizing rewards. The goal is to improve the learning process by ensuring that agents learn not only suitable actions but also which actions to avoid. Additionally, we reintroduce a bidirectional learning approach that enables agents to learn from both initial and terminal states, thereby improving speed and robustness in complex environments. Our proposed Penalty-Based Bidirectional methodology is tested against Mani skill benchmark environments, demonstrating an optimality improvement of success rate of approximately 4% compared to existing RL implementations. The findings indicate that this integrated strategy enhances policy learning, adaptability, and overall performance in challenging scenarios

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.03163 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.03163 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.03163 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.