Shortcuts

IQN

Overview

IQN was proposed in Implicit Quantile Networks for Distributional Reinforcement Learning. The goal of distributional RL is to provide a more comprehensive depiction of the expected reward distribution for different actions by modeling the probability distribution of the value function. The key difference between IQN and QRDQN is that IQN introduces the implicit quantile network (IQN), a deterministic parametric function trained to re-parameterize samples from a base distribution, e.g. tau in U([0, 1]), to the respective quantile values of a target distribution, while QRDQN direct learns a fixed set of pre-defined quantiles.

Quick Facts

  1. IQN is a model-free and value-based RL algorithm.

  2. IQN only support discrete action spaces.

  3. IQN is an off-policy algorithm.

  4. Usually, IQN use eps-greedy or multinomial sample for exploration.

  5. IQN can be equipped with RNN.

Key Equations

In implicit quantile networks, a sampled quantile tau is first encoded into an embedding vector via:

\[\phi_{j}(\tau):=\operatorname{ReLU}\left(\sum_{i=0}^{n-1} \cos (\pi i \tau) w_{i j}+b_{j}\right)\]

Then the quantile embedding is element-wise multiplied by the embedding of the observation of the environment, and the subsequent fully-connected layers map the resulted product vector to the respective quantile value.

Key Graphs

The comparison among DQN, C51, QRDQN and IQN is shown as follows:

../_images/dis_reg_compare.png

Extensions

IQN can be combined with:
  • PER (Prioritized Experience Replay)

    Tip

    Whether PER improves IQN depends on the task and the training strategy.

  • Multi-step TD-loss

  • Double (target) Network

  • RNN

Implementation

Tip

Our benchmark result of IQN uses the same hyper-parameters as DQN except the IQN’s exclusive hyper-parameter, the number of quantiles, which is empirically set as 32. The number of quantiles are not recommended to set larger than 64, which brings marginal gain and much more forward latency.

The default config of IQN is defined as follows:

The network interface IQN used is defined as follows:

The bellman updates of IQN used is defined in the function iqn_nstep_td_error of ding/rl_utils/td.py.

Benchmark

environment

best mean reward

evaluation results

config link

comparison

Pong

(PongNoFrameskip-v4)

20

../_images/IQN_pong.png

config_link_p

Tianshou(20)

Qbert

(QbertNoFrameskip-v4)

16331

../_images/IQN_qbert.png

config_link_q

Tianshou(15520)

SpaceInvaders

(SpaceInvadersNoFrame skip-v4)

1493

../_images/IQN_spaceinvaders.png

config_link_s

Tianshou(1370)

P.S.:
  1. The above results are obtained by running the same configuration on five different random seeds (0, 1, 2, 3, 4).

References

(IQN) Will Dabney, Georg Ostrovski, David Silver, Rémi Munos: “Implicit Quantile Networks for Distributional Reinforcement Learning”, 2018; arXiv:1806.06923. https://arxiv.org/pdf/1806.06923

Other Public Implementations