Base Model¶
base_reward_estimate¶
BaseRewardModel¶
- class ding.reward_model.base_reward_model.BaseRewardModel[source]¶
- Overview:
the base class of reward model
- Interface:
default_config,estimate,train,clear_data,collect_data,load_expert_date
- abstract clear_data() None[source]¶
- Overview:
Clearing training data. This can be a side effect function which clears the data attribute in
self
- abstract collect_data(data) None[source]¶
- Overview:
Collecting training data in designated formate or with designated transition.
- Arguments:
data (
Any): Raw training data (e.g. some form of states, actions, obs, etc)
- Returns / Effects:
This can be a side effect function which updates the data attribute in
self
- abstract estimate(data: list) Any[source]¶
- Overview:
estimate reward
- Arguments:
data (
List): the list of data used for estimation
- Returns / Effects:
This can be a side effect function which updates the reward value
If this function returns, an example returned object can be reward (
Any): the estimated reward
create_reward_model¶
- Overview:
Reward Estimation Model.
- Arguments:
cfg (
Dict): Training configdevice (
str): Device usage, i.e. “cpu” or “cuda”tb_logger (
str): Logger, defaultly set as ‘SummaryWriter’ for model summary
- Returns:
reward (
Any): The reward model