Base Model¶
base_reward_estimate¶
BaseRewardModel¶
- class ding.reward_model.base_reward_model.BaseRewardModel[source]¶
- Overview:
the base class of reward model
- Interface:
default_config
,estimate
,train
,clear_data
,collect_data
,load_expert_date
- abstract clear_data() None [source]¶
- Overview:
Clearing training data. This can be a side effect function which clears the data attribute in
self
- abstract collect_data(data) None [source]¶
- Overview:
Collecting training data in designated formate or with designated transition.
- Arguments:
data (
Any
): Raw training data (e.g. some form of states, actions, obs, etc)
- Returns / Effects:
This can be a side effect function which updates the data attribute in
self
- abstract estimate(data: list) Any [source]¶
- Overview:
estimate reward
- Arguments:
data (
List
): the list of data used for estimation
- Returns / Effects:
This can be a side effect function which updates the reward value
If this function returns, an example returned object can be reward (
Any
): the estimated reward
create_reward_model¶
- Overview:
Reward Estimation Model.
- Arguments:
cfg (
Dict
): Training configdevice (
str
): Device usage, i.e. “cpu” or “cuda”tb_logger (
str
): Logger, defaultly set as ‘SummaryWriter’ for model summary
- Returns:
reward (
Any
): The reward model