Shortcuts

Pdeil

pdeil_irl_model

PdeilRewardModel

class ding.reward_model.pdeil_irl_model.PdeilRewardModel(cfg: dict, device, tb_logger: SummaryWriter)[source]
Overview:

The Pdeil reward model class (https://arxiv.org/abs/2112.06746)

Interface:

estimate, train, load_expert_data, collect_data, clear_date, __init__, _train, _batch_mn_pdf

Config:

ID

Symbol

Type

Default Value

Description

Other(Shape)

1

type

str

pdeil

Reward model register name, refer
to registry REWARD_MODEL_REGISTRY


2

expert_data_
path

str

expert_data. .pkl

Path to the expert dataset

Should be a ‘.pkl’
file

3

discrete_
action

bool

False

Whether the action is discrete



4

alpha

float

0.5

coefficient for Probability
Density Estimator


5

clear_buffer _per_iters

int

1

clear buffer per fixed iters
make sure replay
buffer’s data count
isn’t too few.
(code work in entry)
__init__(cfg: dict, device, tb_logger: SummaryWriter) None[source]
Overview:

Initialize self. See help(type(self)) for accurate signature. Some rules in naming the attributes of self.:

  • e_ : expert values

  • _sigma_ : standard division values

  • p_ : current policy values

  • _s_ : states

  • _a_ : actions

Arguments:
  • cfg (Dict): Training config

  • device (str): Device usage, i.e. “cpu” or “cuda”

  • tb_logger (str): Logger, defaultly set as ‘SummaryWriter’ for model summary

clear_data()[source]
Overview:

Clearing training data. This is a side effect function which clears the data attribute in self

collect_data(item: list)[source]
Overview:

Collecting training data by iterating data items in the input list

Arguments:
  • data (list): Raw training data (e.g. some form of states, actions, obs, etc)

Effects:
  • This is a side effect function which updates the data attribute in self by iterating data items in the input data items’ list

estimate(data: list) List[Dict][source]
Overview:

Estimate reward by rewriting the reward keys.

Arguments:
  • data (list): the list of data used for estimation, with at least obs and action keys.

Effects:
  • This is a side effect function which updates the reward values in place.

load_expert_data() None[source]
Overview:

Getting the expert data from config['expert_data_path'] attribute in self.

Effects:

This is a side effect function which updates the expert data attribute (e.g. self.expert_data)

train()[source]
Overview:

Training the Pdeil reward model.