Pwil¶
pwil_irl_model¶
PwilRewardModel¶
- class ding.reward_model.pwil_irl_model.PwilRewardModel(config: Dict, device: str, tb_logger: SummaryWriter)[source]¶
- Overview:
The Pwil reward model class (https://arxiv.org/pdf/2006.04678.pdf)
- Interface:
estimate,train,load_expert_data,collect_data,clear_date,__init__,_train,_get_state_distance,_get_action_distance- Config:
ID
Symbol
Type
Default Value
Description
Other(Shape)
1
typestr
pwil
Reward model register name, referto registryREWARD_MODEL_REGISTRY2
expert_data_pathstr
expert_data. .pkl
Path to the expert datasetShould be a ‘.pkl’file3
sample_sizeint
1000
sample data from expert datasetwith fixed size4
alphaint
5
factor alpha5
betaint
5
factor beta6
s_sizeint
4
state size7
a_sizeint
2
action size8
clear_buffer_per_itersint
1
clear buffer per fixed itersmake sure replaybuffer’s data countisn’t too few.(code work in entry)- Properties:
reward_table (:obj: Dict): In this algorithm, reward model is a dictionary.
- __init__(config: Dict, device: str, tb_logger: SummaryWriter) None[source]¶
- Overview:
Initialize
self.Seehelp(type(self))for accurate signature.- Arguments:
cfg (
Dict): Training configdevice (
str): Device usage, i.e. “cpu” or “cuda”tb_logger (
str): Logger, defaultly set as ‘SummaryWriter’ for model summary
- _get_action_distance(a1: list, a2: list) Tensor[source]¶
- Overview:
Getting distances of actions given 2 action lists. One single action is of shape
torch.Size([n])(nreferred in in-code comments)- Arguments:
a1 (
torch.Tensor list): the 1st actions’ list of size Ma2 (
torch.Tensor list): the 2nd actions’ list of size N
- Returns:
distance (
torch.Tensor) Euclidean distance tensor of the action tensor lists, of size M x N.
- _get_state_distance(s1: list, s2: list) Tensor[source]¶
- Overview:
Getting distances of states given 2 state lists. One single state is of shape
torch.Size([n])(nreferred in in-code comments)- Arguments:
s1 (
torch.Tensor list): the 1st states’ list of size Ms2 (
torch.Tensor list): the 2nd states’ list of size N
- Returns:
distance (
torch.Tensor) Euclidean distance tensor of the state tensor lists, of size M x N.
- clear_data() None[source]¶
- Overview:
Clearing training data. This is a side effect function which clears the data attribute in
self
- collect_data(data: list) None[source]¶
- Overview:
Collecting training data formatted by
fn:concat_state_action_pairs.- Arguments:
data (
list): Raw training data (e.g. some form of states, actions, obs, etc)
- Effects:
This is a side effect function which updates the data attribute in
self; in this algorithm, also thes_size,a_sizefor states and actions are updated in the attribute inself.cfgDict;reward_factoralso updated ascollect_datacalled.
- estimate(data: list) List[Dict][source]¶
- Overview:
Estimate reward by rewriting the reward key in each row of the data.
- Arguments:
data (
list): the list of data used for estimation, with at leastobsandactionkeys.
- Effects:
This is a side effect function which updates the
reward_tablewith(obs,action)tuples from input.
- load_expert_data() None[source]¶
- Overview:
Getting the expert data from
config['expert_data_path']attribute in self- Effects:
This is a side effect function which updates the expert data attribute (e.g.
self.expert_data); in this algorithm, also theself.expert_s,self.expert_afor states and actions are updated.
collect_state_action_pairs¶
- Overview:
Concate state and action pairs from input iterator.
- Arguments:
iterator (
Iterable): Iterables with at leastobsandactiontensor keys.
- Returns:
res (
Torch.tensor): State and action pairs.