Reinforcement Learning Basics

2024. 8. 20. 16:06·DL·ML/Study

Policies

A policy is a rule that determines what action to take, typically denoted as $μ$. When the action is selected stochastically, the policy is represented specifically as $π(⋅|s_t)$ at timestep $t$


When the policy is based on stochastic process, the action is sampled categorically if the action space is discrete, and sampled in a Guassian manner if the action space is continuous.


Value Functions

value functions are functions that return values of a specified state or state-action pair.

  1. On-Policy Value Function $V^π(s)$: starting in state $s$ and always act according to policy $π$
    $$ V^π(x) = \mathbb E_{τ\sim π} [R(τ)|s_o=s]$$

  2. On-Policy Action-Value Function: starting in state $s$ and takes an action $a$
    $$ Q^π(s,a) = \mathbb E_{τ\sim π} [R(τ)|s_0=s,a_0=a]$$

Advantage Functions

Advantage function $A^π(s,a)$: how much an action is better than others on average


$A^π(s,a)$ describes how much better it is to take a specific action $a$ in state s over randomly selected action according to $π(\cdot|s)$. Hence it can be represented as


$$ A^π(s,a) = Q^π(s,a) - V^π(s)$$

Discussion


References

[1] https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

Footnotes

'DL·ML > Study' 카테고리의 다른 글

RVOS Datasets  (0) 2025.01.24
segmentation task들의 종류  (0) 2025.01.15
GIoU, CIoU metrics  (0) 2025.01.06
Jaccrad Index(IoU)와 F1/Dice, Coutour Accuracy(F)  (1) 2025.01.03
'DL·ML/Study' Other articles in this category
  • RVOS Datasets
  • segmentation task들의 종류
  • GIoU, CIoU metrics
  • Jaccrad Index(IoU)와 F1/Dice, Coutour Accuracy(F)
Jordano
Jordano
  • Jordano
    Jordano
    Jordano
  • Total
    Today
    Yesterday
    • All categories
      • Introduction
      • Theatre⋅Play
      • Thinking
        • iDeAs
        • Philosophy
      • History
        • Cuba
        • China
      • CS
        • HTML·CSS·JavaScript
        • Dart·Flutter
        • C, C++
        • Python
        • PS
        • Algorithm
        • Network
        • OS
        • etc
      • DL·ML
        • Paper
        • Study
        • Project
      • Mathematics
        • Information Theory
        • Linear Algebra
        • Statistics
        • etc
      • etc
        • Paper
      • Private
      • Travel
  • Blog Menu

    • 홈
    • 태그
    • 방명록
  • Link

  • hELLO· Designed By정상우.v4.10.3
Jordano
Reinforcement Learning Basics
상단으로

티스토리툴바