A distributional deep reinforcement-learning agent — Implicit Quantile Networks — playing Atari 2600 Pong from raw pixels, recorded during a greedy evaluation episode. The agent never sees the game's rules; it learned the full return distribution of each action by trial and error. Trained with PyTorch + the Arcade Learning Environment, far too heavy for a static site, so this is a recorded eval.
A distributional deep-RL agent (Implicit Quantile Networks) playing Pong from raw pixels — recorded during a greedy evaluation.