A production-ready, highly modular Reinforcement Learning project featuring an interactive "Human vs. AI" Pong game. The core demonstration allows a human player to challenge an AI opponent controlled by a trained neural network, with the ability to dynamically swap the AI's difficulty mid-game via keyboard shortcuts.
Goal: Build a live, interactive Pong game where:
- Left Paddle (Human): Controlled via keyboard (W/S keys)
- Right Paddle (AI): Controlled by a Stable-Baselines3 PPO agent
- Dynamic Difficulty: Press keys 1-4 to "level up" the AI with different trained models (Novice β Master)
This project demonstrates:
- Pure game physics simulation (collision detection, scoring)
- Gymnasium environment design for RL training
- Stable-Baselines3 PPO training pipeline
- Real-time PyGame rendering with AI inference
- FastAPI WebSocket streaming for future dashboard integration
PongAI/
βββ engine/
β βββ pong.py # Pure physics engine (no pygame)
βββ rl/
β βββ env.py # Gymnasium environment wrapper
βββ train/
β βββ ppo.py # PPO training pipeline
βββ demo/
β βββ play.py # Interactive live demo
βββ api/
β βββ app.py # FastAPI WebSocket server
βββ models/ # Trained model checkpoints (autogenerated)
βββ pong_tensorboard/ # TensorBoard logs (autogenerated)
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Python 3.10+
- Game Engine: Pure
pygame(2.5.2+) - RL Framework:
gymnasium(0.30.0+) withstable-baselines3(2.3.0+) - API/Streaming:
fastapi,uvicorn,websockets - Numerical Computing:
numpy
Pure mathematical simulation with NO pygame dependencies.
Paddle: Represents a paddle with position and movement logicBall: Represents the ball with position, velocity, and boundary collisionPongEngine: Main game state manager
SCREEN_WIDTH = 800,SCREEN_HEIGHT = 600PADDLE_WIDTH = 15,PADDLE_HEIGHT = 100,BALL_SIZE = 10PADDLE_SPEED = 6.0,BALL_SPEED = 5.0,MAX_BALL_SPEED = 8.0
PongEngine.reset() β Dict[str, float]
- Initializes/resets the game state
- Centers ball and paddles
- Assigns random initial ball velocity
- Returns raw state dictionary
PongEngine.step(action_left: int, action_right: int) β Tuple[Dict, bool, bool]
- Executes one simulation step
- Actions:
0(Stay),1(Up),2(Down) - Collision Detection: Precise AABB collision between ball and paddles
- Reverses X velocity on paddle hit
- Applies Y velocity modifier based on paddle intersection point (adds "spin")
- Prevents ball from getting stuck in paddle
- Scoring: Detects ball passing left/right boundaries
- Returns:
(state_dict, left_scored, right_scored)
Key Physics:
# AABB Collision Detection
def _aabb_collision(box1, box2) -> bool:
# Returns True if axis-aligned boxes overlap
# Paddle-Ball Collision Response
def _handle_paddle_collision(paddle, is_left):
# 1. Reverse ball X velocity
# 2. Add Y velocity based on relative hit position
# - Hit top: negative Y velocity
# - Hit bottom: positive Y velocity
# - Hit middle: minimal changeWraps the physics engine for Stable-Baselines3 training.
Action Space: Discrete(3)
0: Stay1: Up2: Down
Observation Space: Box(-1.0, 1.0, shape=(6,), dtype=float32)
6-element normalized state vector:
ball_x- Ball X position (normalized to [-1, 1])ball_y- Ball Y position (normalized to [-1, 1])ball_vx- Ball X velocity (normalized to [-1, 1])ball_vy- Ball Y velocity (normalized to [-1, 1])ai_paddle_y- AI paddle Y position (normalized to [-1, 1])opponent_paddle_y- Opponent paddle Y position (normalized to [-1, 1])
Normalization Strategy:
# Position normalization (map [0, width/height] to [-1, 1])
normalized = (value / max_value) * 2.0 - 1.0
# Velocity normalization (clip to [-1, 1])
normalized = clip(velocity / 10.0, -1.0, 1.0)Reward Function:
+1.0if AI scores (right paddle)-1.0if opponent scores (left paddle)+0.1if AI paddle successfully deflects the ball-0.001per step (encourages fast wins)
Opponent AI:
- Simple hardcoded tracker that follows the ball's Y position
- Provides consistent training partner
Stable-Baselines3 PPO training script.
Environment Setup:
- Creates vectorized environments using
DummyVecEnv(single-threaded) orSubprocVecEnv(multiprocessing) - Configurable number of parallel environments (default: 4)
PPO Configuration:
PPO(
policy="MlpPolicy",
learning_rate=3e-4,
n_steps=2048,
batch_size=64,
n_epochs=10,
gamma=0.99,
gae_lambda=0.95,
clip_range=0.2,
ent_coef=0.01,
)Callbacks:
CheckpointCallback: Saves model every 50,000 timesteps to/models/- TensorBoard logging to
/pong_tensorboard/
Usage:
python train/ppo.pyOutput:
- Model checkpoints:
models/rl_model_50000_steps.zip,rl_model_100000_steps.zip, etc. - Final model:
models/rl_model_final.zip - TensorBoard logs for monitoring training progress
PyGame frontend with real-time AI inference and dynamic model swapping.
- W/S Keys: Move left paddle (human player)
- 1-4 Keys: Switch AI difficulty
- Novice (50k steps)
- Intermediate (200k steps)
- Advanced (500k steps)
- Master (1M steps)
- SPACE: Pause/Resume
- R: Reset game
Dynamic Model Loading:
- Instant model swapping without restarting
- Models loaded from
/models/directory - Demonstrates "leveling up" the AI mid-game
Game Loop (60 FPS):
- Handle PyGame events (keyboard, window close)
- Get human action from keyboard input
- Get AI action via
model.predict(state, deterministic=True) - Call
engine.step(human_action, ai_action) - Render using PyGame drawing functions
Rendering:
- Black background with white paddles and ball
- Center dashed line separator
- Score display (left paddle vs. right paddle)
- Current AI level display
- Control instructions
Normalized State Extraction:
# Get raw state from engine
state_dict = self.engine._get_state()
# Normalize using environment wrapper
state = self.env_wrapper._normalize_state(state_dict)
# Get AI action (deterministic for consistency)
action, _ = self.model.predict(state, deterministic=True)Real-time game data streaming for future dashboard integration.
GET /
- Health check and service info
- Returns connected client count
GET /health
- Simple health check endpoint
WebSocket /ws/game-data
- Accepts WebSocket connections
- Receives game state JSON and broadcasts to all clients
- Expected message format (from demo):
{ "human_score": 5, "ai_score": 3, "current_level": "Advanced (500k)", "ai_action": 1, "ball_x": 400.0, "ball_y": 300.0, "human_paddle_y": 250.0, "ai_paddle_y": 280.0, "timestamp": 1704067200.123 }
POST /broadcast
- REST endpoint to broadcast data without WebSocket
- Useful for testing or external integrations
GET /clients
- Returns current number of connected clients
class ConnectionManager:
async def connect(websocket) # Accept new connection
def disconnect(websocket) # Remove client
async def broadcast(data) # Send to all clients
async def send_personal(ws, data) # Send to specific clientpython -m api.app
# Server starts on http://127.0.0.1:8000
# WebSocket: ws://127.0.0.1:8000/ws/game-datapip install -r requirements.txtIf you don't have pre-trained models, train them first:
# Train for 1 million timesteps with 4 parallel environments
python train/ppo.py
# Monitor training with TensorBoard
tensorboard --logdir=./pong_tensorboard/Training Notes:
- Takes 30-60 minutes on a modern GPU
- Checkpoint models are saved every 50,000 steps
- Monitor training progress via TensorBoard at
http://localhost:6006 - Adjust
total_timestepsornum_envsbased on your hardware
python demo/play.pyIn the demo:
- Control left paddle with W (up) / S (down)
- Press 1-4 to instantly change AI difficulty
- Press SPACE to pause/resume
- Press R to reset the score
- Close window to exit
# Run API server (development mode with auto-reload)
python -m api.app
# Visit http://localhost:8000 for health check
# WebSocket endpoint: ws://localhost:8000/ws/game-dataThe training pipeline automatically saves checkpoints:
- 50k steps: Basic rally pattern learning
- 100k steps: Paddle positioning and defense
- 200k steps: Intermediate play with varied strategies
- 500k steps: Advanced spike and positioning
- 1M steps: Master-level near-optimal play
Track training via TensorBoard:
- Cumulative Reward: Should trend upward
- Episode Length: Indicates longer rallies as AI improves
- Win Rate: Demonstrates AI improvement
Edit /engine/pong.py constants:
SCREEN_WIDTH = 1200 # Wider screen
PADDLE_SPEED = 8.0 # Faster paddles
BALL_SPEED = 7.0 # Faster ball
MAX_BALL_SPEED = 10.0 # Higher ceilingEdit /rl/env.py _calculate_reward():
if right_scored:
reward += 2.0 # More generous scoring reward
reward -= 0.005 # Increase step penalty for faster winsEdit /train/ppo.py:
PPO(
learning_rate=1e-4, # Lower for finer control
n_steps=4096, # Larger batches
gamma=0.95, # Shorter horizon
clip_range=0.1, # More conservative updates
)Edit /rl/env.py _get_opponent_action() to implement:
- A trained secondary model (self-play)
- Adaptive difficulty levels
- Predictive tracking
β
Pure Physics Engine: Completely independent from rendering; reusable for headless training
β
Proper Collision Detection: AABB + velocity-based response with paddle spin
β
Normalization: All observations normalized to [-1, 1] for stable training
β
Vectorized Training: Support for parallel environments to speed up learning
β
Checkpoint Strategy: Regular model saves for difficulty progression
β
Clean Architecture: Strict separation of concerns (physics, RL, rendering, API)
β
Deterministic Inference: Uses deterministic=True for consistent demo behavior
β
Error Handling: Graceful fallbacks for missing models
β
Modular Design: Each component can be used independently
β
Comprehensive Logging: Print statements for debugging and monitoring
| File | Purpose | Status |
|---|---|---|
engine/pong.py |
Physics engine, collision math | β Complete |
rl/env.py |
Gymnasium wrapper, normalization, rewards | β Complete |
train/ppo.py |
Training pipeline with callbacks | β Complete |
demo/play.py |
PyGame frontend, model loading, inference | β Complete |
api/app.py |
FastAPI WebSocket server | β Complete |
requirements.txt |
Dependencies | β Complete |
README.md |
Documentation | β Complete |
Issue: "No module named 'engine'" when running demo
# Run from project root, not from demo/ directory
cd PongAI
python demo/play.pyIssue: Model not found when starting demo
- Train a model first:
python train/ppo.py - Or download pre-trained models from the releases page
Issue: PyGame window doesn't respond
- Check that you have a display connection (not headless)
- Ensure pygame version is 2.5.2+
Issue: Training is very slow
- Set
use_multiprocessing=Falseintrain/ppo.pyfor debugging - Reduce
n_stepsor increasenum_envsfor faster learning - GPU acceleration is automatic with stable-baselines3
Optional extensions not included in this scaffold:
- Web dashboard (React/Vue frontend consuming WebSocket data)
- Spectator mode with live streaming
- Multiple AI opponents simultaneously
- Transfer learning from easier to harder tasks
- Imitation learning from human demonstrations
- Multi-agent self-play tournaments
This project is provided as-is for educational and demonstration purposes.
This is a production-ready scaffold with:
- Complete, functional code (no placeholders)
- Professional error handling and logging
- Modular architecture for easy extension
- Extensive documentation
- Best practices for RL + game development
All components are independently testable and can be used in other projects.
Last Updated: April 2026
Stable-Baselines3 Version: 2.3.0
Gymnasium Version: 0.30.0
Python Version: 3.10+