Pdeil¶
pdeil_irl_model¶
PdeilRewardModel¶
- class ding.reward_model.pdeil_irl_model.PdeilRewardModel(cfg: dict, device, tb_logger: SummaryWriter)[source]¶
- Overview:
The Pdeil reward model class (https://arxiv.org/abs/2112.06746)
- Interface:
estimate,train,load_expert_data,collect_data,clear_date,__init__,_train,_batch_mn_pdf- Config:
ID
Symbol
Type
Default Value
Description
Other(Shape)
1
typestr
pdeil
Reward model register name, referto registryREWARD_MODEL_REGISTRY2
expert_data_pathstr
expert_data. .pkl
Path to the expert datasetShould be a ‘.pkl’file3
discrete_actionbool
False
Whether the action is discrete4
alphafloat
0.5
coefficient for ProbabilityDensity Estimator5
clear_buffer_per_itersint
1
clear buffer per fixed itersmake sure replaybuffer’s data countisn’t too few.(code work in entry)
- __init__(cfg: dict, device, tb_logger: SummaryWriter) None[source]¶
- Overview:
Initialize
self.Seehelp(type(self))for accurate signature. Some rules in naming the attributes ofself.:e_: expert values_sigma_: standard division valuesp_: current policy values_s_: states_a_: actions
- Arguments:
cfg (
Dict): Training configdevice (
str): Device usage, i.e. “cpu” or “cuda”tb_logger (
str): Logger, defaultly set as ‘SummaryWriter’ for model summary
- clear_data()[source]¶
- Overview:
Clearing training data. This is a side effect function which clears the data attribute in
self
- collect_data(item: list)[source]¶
- Overview:
Collecting training data by iterating data items in the input list
- Arguments:
data (
list): Raw training data (e.g. some form of states, actions, obs, etc)
- Effects:
This is a side effect function which updates the data attribute in
selfby iterating data items in the input data items’ list
- estimate(data: list) List[Dict][source]¶
- Overview:
Estimate reward by rewriting the reward keys.
- Arguments:
data (
list): the list of data used for estimation, with at leastobsandactionkeys.
- Effects:
This is a side effect function which updates the reward values in place.