Pdeil¶

pdeil_irl_model¶

class ding.reward_model.pdeil_irl_model.PdeilRewardModel(cfg: dict, device, tb_logger: SummaryWriter)[source]¶

Overview:

Interface:

estimate, train, load_expert_data, collect_data, clear_date, __init__, _train, _batch_mn_pdf

Config:

ID	Symbol	Type	Default Value	Description	Other(Shape)
1	`type`	str	pdeil	Reward model register name, refer to registry `REWARD_MODEL_REGISTRY`
2	`expert_data_` `path`	str	expert_data. .pkl	Path to the expert dataset	Should be a ‘.pkl’ file
3	`discrete_` `action`	bool	False	Whether the action is discrete
4	`alpha`	float	0.5	coefficient for Probability Density Estimator
5	`clear_buffer` `_per_iters`	int	1	clear buffer per fixed iters	make sure replay buffer’s data count isn’t too few. (code work in entry)

__init__(cfg: dict, device, tb_logger: SummaryWriter) → None[source]¶

Overview:

Initialize self. See help(type(self)) for accurate signature. Some rules in naming the attributes of self.:

e_ : expert values

_sigma_ : standard division values

p_ : current policy values

_s_ : states

_a_ : actions

Arguments:

clear_data()[source]¶

Overview:: Clearing training data. This is a side effect function which clears the data attribute in self

collect_data(item: list)[source]¶

Overview:

Collecting training data by iterating data items in the input list

Arguments:

Effects:

This is a side effect function which updates the data attribute in self by iterating data items in the input data items’ list

estimate(data: list) → List[Dict][source]¶

Overview:

Estimate reward by rewriting the reward keys.

Arguments:

data (list): the list of data used for estimation, with at least obs and action keys.

Effects:

load_expert_data() → None[source]¶

Overview:: Getting the expert data from config['expert_data_path'] attribute in self.
Effects:: This is a side effect function which updates the expert data attribute (e.g. self.expert_data)