Reinforcement Learning Basics

Policies

A policy is a rule that determines what action to take, typically denoted as $μ$ . When the action is selected stochastically, the policy is represented specifically as $π (\cdot | s_{t})$ at timestep $t$

When the policy is based on stochastic process, the action is sampled categorically if the action space is discrete, and sampled in a Guassian manner if the action space is continuous.

Value Functions

value functions are functions that return values of a specified state or state-action pair.

On-Policy Value Function $V^{π} (s)$ : starting in state $s$ and always act according to policy $π$
$V^{π} (x) = E_{τ \sim π} [R (τ) | s_{o} = s]$
On-Policy Action-Value Function: starting in state $s$ and takes an action $a$
$Q^{π} (s, a) = E_{τ \sim π} [R (τ) | s_{0} = s, a_{0} = a]$

Advantage Functions

Advantage function $A^{π} (s, a)$ : how much an action is better than others on average

$A^{π} (s, a)$ describes how much better it is to take a specific action $a$ in state s over randomly selected action according to $π (\cdot | s)$ . Hence it can be represented as

$A^{π} (s, a) = Q^{π} (s, a) - V^{π} (s)$

'DL·ML > Study' 카테고리의 다른 글

RVOS Datasets (0)	2025.01.24
segmentation task들의 종류 (0)	2025.01.15
GIoU, CIoU metrics (0)	2025.01.06
Jaccrad Index(IoU)와 F1/Dice, Coutour Accuracy(F) (1)	2025.01.03

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Reinforcement Learning Basics

Policies

Value Functions

Advantage Functions

Discussion

References

Footnotes

'DL·ML > Study' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역