<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>University of Guilan</PublisherName>
				<JournalTitle>Journal of Mathematical Modeling</JournalTitle>
				<Issn>2345-394X</Issn>
				<Volume>14</Volume>
				<Issue>1</Issue>
				<PubDate PubStatus="epublish">
					<Year>2026</Year>
					<Month>03</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Proximal policy optimization with adaptive generalized advantage estimate: critic-aware refinements</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>177</FirstPage>
			<LastPage>190</LastPage>
			<ELocationID EIdType="pii">9132</ELocationID>
			
<ELocationID EIdType="doi">10.22124/jmm.2025.29704.2654</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Naemeh</FirstName>
					<LastName>Mohammadpour</LastName>
<Affiliation>Department of Mechanical Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>Meysam</FirstName>
					<LastName>Fozi</LastName>
<Affiliation>Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>Mohammad Mehdi</FirstName>
					<LastName>Ebadzadeh</LastName>
<Affiliation>Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>Ali</FirstName>
					<LastName>Azimi</LastName>
<Affiliation>Department of Mechanical Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
<Author>
					<FirstName>Ali</FirstName>
					<LastName>Kamali Iglie</LastName>
<Affiliation>Department of Mechanical Engineering, Amirkabir University of Technology, Tehran, Iran</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2025</Year>
					<Month>02</Month>
					<Day>01</Day>
				</PubDate>
			</History>
		<Abstract>Proximal Policy Optimization (PPO) is one of the most widely used methods in reinforcement learning, designed to optimize policy updates while maintaining training stability. However, in complex and high-dimensional environments, maintaining a suitable balance between bias and variance poses a significant challenge. The λ parameter in Generalized Advantage Estimation (GAE) influences this balance by controlling the trade-off between short-term and long-term return estimations. In this study, we propose a method for adaptive adjustment of the λ parameter, where λ is dynamically updated during training instead of remaining fixed. The updates are guided by internal learning signals such as the value function loss and Explained Variance—a statistical measure that reflects how accurately the critic estimates target returns. To further enhance training robustness, we incorporate a Policy Update Delay (PUD) mechanism to mitigate instability from overly frequent policy updates. The main objective of this approach is to reduce dependence on expensive and time-consuming hyperparameter tuning. By leveraging internal indicators from the learning process, the proposed method contributes to the development of more adaptive, stable, and generalizable reinforcement learning algorithms. To assess the effectiveness of the approach, experiments are conducted in four diverse and standard benchmark environments: Ant-v4, HalfCheetah-v4, and Humanoid-v4 from the OpenAI Gym, as well as Quadruped-Walk from the DeepMind Control Suite. The results demonstrate that the proposed method can substantially improve the performance and stability of PPO across these environments. Our implementation is publicly available at https://github.com/naempr/PPO-with-adaptive-GAE.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">Reinforcement learning</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">proximal policy optimization</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">generalized advantage estimate</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">bias-variance trade-off</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://jmm.guilan.ac.ir/article_9132_c7dfb26a364dde09244cec154dec609e.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
