Pwil¶
pwil_irl_model¶
PwilRewardModel¶
- class ding.reward_model.pwil_irl_model.PwilRewardModel(config: Dict, device: str, tb_logger: SummaryWriter)[source]¶
- Overview:
The Pwil reward model class (https://arxiv.org/pdf/2006.04678.pdf)
- Interface:
estimate
,train
,load_expert_data
,collect_data
,clear_date
,__init__
,_train
,_get_state_distance
,_get_action_distance
- Config:
ID
Symbol
Type
Default Value
Description
Other(Shape)
1
type
str
pwil
Reward model register name, referto registryREWARD_MODEL_REGISTRY
2
expert_data_
path
str
expert_data. .pkl
Path to the expert datasetShould be a ‘.pkl’file3
sample_size
int
1000
sample data from expert datasetwith fixed size4
alpha
int
5
factor alpha5
beta
int
5
factor beta6
s_size
int
4
state size7
a_size
int
2
action size8
clear_buffer
_per_iters
int
1
clear buffer per fixed itersmake sure replaybuffer’s data countisn’t too few.(code work in entry)- Properties:
reward_table (:obj: Dict): In this algorithm, reward model is a dictionary.
- __init__(config: Dict, device: str, tb_logger: SummaryWriter) None [source]¶
- Overview:
Initialize
self.
Seehelp(type(self))
for accurate signature.- Arguments:
cfg (
Dict
): Training configdevice (
str
): Device usage, i.e. “cpu” or “cuda”tb_logger (
str
): Logger, defaultly set as ‘SummaryWriter’ for model summary
- _get_action_distance(a1: list, a2: list) Tensor [source]¶
- Overview:
Getting distances of actions given 2 action lists. One single action is of shape
torch.Size([n])
(n
referred in in-code comments)- Arguments:
a1 (
torch.Tensor list
): the 1st actions’ list of size Ma2 (
torch.Tensor list
): the 2nd actions’ list of size N
- Returns:
distance (
torch.Tensor
) Euclidean distance tensor of the action tensor lists, of size M x N.
- _get_state_distance(s1: list, s2: list) Tensor [source]¶
- Overview:
Getting distances of states given 2 state lists. One single state is of shape
torch.Size([n])
(n
referred in in-code comments)- Arguments:
s1 (
torch.Tensor list
): the 1st states’ list of size Ms2 (
torch.Tensor list
): the 2nd states’ list of size N
- Returns:
distance (
torch.Tensor
) Euclidean distance tensor of the state tensor lists, of size M x N.
- clear_data() None [source]¶
- Overview:
Clearing training data. This is a side effect function which clears the data attribute in
self
- collect_data(data: list) None [source]¶
- Overview:
Collecting training data formatted by
fn:concat_state_action_pairs
.- Arguments:
data (
list
): Raw training data (e.g. some form of states, actions, obs, etc)
- Effects:
This is a side effect function which updates the data attribute in
self
; in this algorithm, also thes_size
,a_size
for states and actions are updated in the attribute inself.cfg
Dict;reward_factor
also updated ascollect_data
called.
- estimate(data: list) List[Dict] [source]¶
- Overview:
Estimate reward by rewriting the reward key in each row of the data.
- Arguments:
data (
list
): the list of data used for estimation, with at leastobs
andaction
keys.
- Effects:
This is a side effect function which updates the
reward_table
with(obs,action)
tuples from input.
- load_expert_data() None [source]¶
- Overview:
Getting the expert data from
config['expert_data_path']
attribute in self- Effects:
This is a side effect function which updates the expert data attribute (e.g.
self.expert_data
); in this algorithm, also theself.expert_s
,self.expert_a
for states and actions are updated.
collect_state_action_pairs¶
- Overview:
Concate state and action pairs from input iterator.
- Arguments:
iterator (
Iterable
): Iterables with at leastobs
andaction
tensor keys.
- Returns:
res (
Torch.tensor
): State and action pairs.