Shortcuts

Red

red_irl_model

RedRewardModel

class ding.reward_model.red_irl_model.RedRewardModel(config: Dict, device: str, tb_logger: SummaryWriter)[source]
Overview:

The implement of reward model in RED (https://arxiv.org/abs/1905.06750)

Interface:

estimate, train, load_expert_data, collect_data, clear_date, __init__, _train

Config:

ID

Symbol

Type

Default Value

Description

Other(Shape)

1

type

str

red

Reward model register name, refer
to registry REWARD_MODEL_REGISTRY


2

expert_data_
path

str

expert_data .pkl

Path to the expert dataset

Should be a ‘.pkl’
file

3

sample_size

int

1000

sample data from expert dataset
with fixed size


4

sigma

int

5

hyperparameter of r(s,a)
r(s,a) = exp(
-sigma* L(s,a))

5

batch_size

int

64

Training batch size

6

hidden_size

int

128

Linear model hidden size

7

update_per_
collect

int

100

Number of updates per collect



8

clear_buffer _per_iters

int

1

clear buffer per fixed iters
make sure replay
buffer’s data count
isn’t too few.
(code work in entry)
Properties:
  • online_net (:obj: SENet): The reward model, in default initialized once as the training begins.

__init__(config: Dict, device: str, tb_logger: SummaryWriter) None[source]
Overview:

Initialize self. See help(type(self)) for accurate signature.

Arguments:
  • cfg (Dict): Training config

  • device (str): Device usage, i.e. “cpu” or “cuda”

  • tb_logger (str): Logger, defaultly set as ‘SummaryWriter’ for model summary

clear_data()[source]
Overview:

Collecting clearing data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in clear_data method

collect_data(data) None[source]
Overview:

Collecting training data, not implemented if reward model (i.e. online_net) is only trained ones, if online_net is trained continuously, there should be some implementations in collect_data method

estimate(data: list) List[Dict][source]
Overview:

Estimate reward by rewriting the reward key

Arguments:
  • data (list): the list of data used for estimation, with at least obs and action keys.

Effects:
  • This is a side effect function which updates the reward values in place.

load_expert_data() None[source]
Overview:

Getting the expert data from config['expert_data_path'] attribute in self.

Effects:

This is a side effect function which updates the expert data attribute (e.g. self.expert_data)

train() None[source]
Overview:

Training the RED reward model. In default, RED model should be trained once.

Effects:
  • This is a side effect function which updates the reward model and increment the train iteration count.