您当前的位置: 首页 >  ar

FPGA硅农

暂无认证

  • 0浏览

    0关注

    282博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

【强化学习】Sarsa(Lamda)算法

FPGA硅农 发布时间:2021-12-15 18:54:12 ,浏览量:0

算法描述

在这里插入图片描述

代码实现
import numpy as np
import pandas as pd
import time

N_STATES = 25   # the length of the 2 dimensional world
ACTIONS = ['left', 'right','up','down']     # available actions
EPSILON = 0.7   # greedy police
ALPHA = 0.8     # learning rate
GAMMA = 0.9    # discount factor
MAX_EPISODES = 1000   # maximum episodes
FRESH_TIME = 0.00001    # fresh time for one move

def build_q_table(n_states, actions):
    table = pd.DataFrame(
        np.zeros((n_states, len(actions))),     # q_table initial values
        columns=actions,    # actions's name
    )
    return table

def build_e_table(n_states, actions):
    table = pd.DataFrame(
        np.zeros((n_states, len(actions))),
        columns=actions,
    )
    return table

def choose_action(state, q_table):
    state_actions = q_table.iloc[state, :]
    if (np.random.uniform() > EPSILON) or ((state_actions == 0).all()):  # act non-greedy or state-action have no value
        if state==0:
            action_name=np.random.choice(['right','down'])
        elif state>0 and state20 and state0 and state20 and state            
关注
打赏
1658642721
查看更多评论
0.0377s