Policies
A policy is a rule that determines what action to take, typically denoted as
When the policy is based on stochastic process, the action is sampled categorically if the action space is discrete, and sampled in a Guassian manner if the action space is continuous.
Value Functions
value functions are functions that return values of a specified state or state-action pair.
On-Policy Value Function
: starting in state and always act according to policyOn-Policy Action-Value Function: starting in state
and takes an action
Advantage Functions
Advantage function
Discussion
References
[1] https://spinningup.openai.com/en/latest/spinningup/rl_intro.html
Footnotes
'DL·ML > Study' 카테고리의 다른 글
RVOS Datasets (0) | 2025.01.24 |
---|---|
segmentation task들의 종류 (0) | 2025.01.15 |
GIoU, CIoU metrics (0) | 2025.01.06 |
Jaccrad Index(IoU)와 F1/Dice, Coutour Accuracy(F) (1) | 2025.01.03 |