PongAI: Play the infamous pong game with RL Agent !

A production-ready, highly modular Reinforcement Learning project featuring an interactive "Human vs. AI" Pong game. The core demonstration allows a human player to challenge an AI opponent controlled by a trained neural network, with the ability to dynamically swap the AI's difficulty mid-game via keyboard shortcuts.

Project Overview

Goal: Build a live, interactive Pong game where:

Left Paddle (Human): Controlled via keyboard (W/S keys)
Right Paddle (AI): Controlled by a Stable-Baselines3 PPO agent
Dynamic Difficulty: Press keys 1-4 to "level up" the AI with different trained models (Novice → Master)

This project demonstrates:

Pure game physics simulation (collision detection, scoring)
Gymnasium environment design for RL training
Stable-Baselines3 PPO training pipeline
Real-time PyGame rendering with AI inference
FastAPI WebSocket streaming for future dashboard integration

Architecture

Directory Structure

PongAI/
├── engine/
│   └── pong.py              # Pure physics engine (no pygame)
├── rl/
│   └── env.py               # Gymnasium environment wrapper
├── train/
│   └── ppo.py               # PPO training pipeline
├── demo/
│   └── play.py              # Interactive live demo
├── api/
│   └── app.py               # FastAPI WebSocket server
├── models/                  # Trained model checkpoints (autogenerated)
├── pong_tensorboard/        # TensorBoard logs (autogenerated)
├── requirements.txt         # Python dependencies
└── README.md                # This file

Technology Stack

Python 3.10+
Game Engine: Pure pygame (2.5.2+)
RL Framework: gymnasium (0.30.0+) with stable-baselines3 (2.3.0+)
API/Streaming: fastapi, uvicorn, websockets
Numerical Computing: numpy

📋 Module Documentation

1. `/engine/pong.py` - Core Physics Engine

Pure mathematical simulation with NO pygame dependencies.

Key Classes:

Paddle: Represents a paddle with position and movement logic
Ball: Represents the ball with position, velocity, and boundary collision
PongEngine: Main game state manager

Constants:

SCREEN_WIDTH = 800, SCREEN_HEIGHT = 600
PADDLE_WIDTH = 15, PADDLE_HEIGHT = 100, BALL_SIZE = 10
PADDLE_SPEED = 6.0, BALL_SPEED = 5.0, MAX_BALL_SPEED = 8.0

Core Methods:

PongEngine.reset() → Dict[str, float]

Initializes/resets the game state
Centers ball and paddles
Assigns random initial ball velocity
Returns raw state dictionary

PongEngine.step(action_left: int, action_right: int) → Tuple[Dict, bool, bool]

Executes one simulation step
Actions: 0 (Stay), 1 (Up), 2 (Down)
Collision Detection: Precise AABB collision between ball and paddles
- Reverses X velocity on paddle hit
- Applies Y velocity modifier based on paddle intersection point (adds "spin")
- Prevents ball from getting stuck in paddle
Scoring: Detects ball passing left/right boundaries
Returns: (state_dict, left_scored, right_scored)

Key Physics:

# AABB Collision Detection
def _aabb_collision(box1, box2) -> bool:
    # Returns True if axis-aligned boxes overlap

# Paddle-Ball Collision Response
def _handle_paddle_collision(paddle, is_left):
    # 1. Reverse ball X velocity
    # 2. Add Y velocity based on relative hit position
    #    - Hit top: negative Y velocity
    #    - Hit bottom: positive Y velocity
    #    - Hit middle: minimal change

2. `/rl/env.py` - Gymnasium Environment

Wraps the physics engine for Stable-Baselines3 training.

Class: `CustomPongEnv(gym.Env)`

Action Space: Discrete(3)

0: Stay
1: Up
2: Down

Observation Space: Box(-1.0, 1.0, shape=(6,), dtype=float32)

6-element normalized state vector:

ball_x - Ball X position (normalized to [-1, 1])
ball_y - Ball Y position (normalized to [-1, 1])
ball_vx - Ball X velocity (normalized to [-1, 1])
ball_vy - Ball Y velocity (normalized to [-1, 1])
ai_paddle_y - AI paddle Y position (normalized to [-1, 1])
opponent_paddle_y - Opponent paddle Y position (normalized to [-1, 1])

Normalization Strategy:

# Position normalization (map [0, width/height] to [-1, 1])
normalized = (value / max_value) * 2.0 - 1.0

# Velocity normalization (clip to [-1, 1])
normalized = clip(velocity / 10.0, -1.0, 1.0)

Reward Function:

+1.0 if AI scores (right paddle)
-1.0 if opponent scores (left paddle)
+0.1 if AI paddle successfully deflects the ball
-0.001 per step (encourages fast wins)

Opponent AI:

Simple hardcoded tracker that follows the ball's Y position
Provides consistent training partner

3. `/train/ppo.py` - Training Pipeline

Stable-Baselines3 PPO training script.

Key Features:

Environment Setup:

Creates vectorized environments using DummyVecEnv (single-threaded) or SubprocVecEnv (multiprocessing)
Configurable number of parallel environments (default: 4)

PPO Configuration:

PPO(
    policy="MlpPolicy",
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,
)

Callbacks:

CheckpointCallback: Saves model every 50,000 timesteps to /models/
TensorBoard logging to /pong_tensorboard/

Usage:

python train/ppo.py

Output:

Model checkpoints: models/rl_model_50000_steps.zip, rl_model_100000_steps.zip, etc.
Final model: models/rl_model_final.zip
TensorBoard logs for monitoring training progress

4. `/demo/play.py` - Interactive Live Demo

PyGame frontend with real-time AI inference and dynamic model swapping.

Controls:

W/S Keys: Move left paddle (human player)
1-4 Keys: Switch AI difficulty
1. Novice (50k steps)
2. Intermediate (200k steps)
3. Advanced (500k steps)
4. Master (1M steps)
SPACE: Pause/Resume
R: Reset game

Features:

Dynamic Model Loading:

Instant model swapping without restarting
Models loaded from /models/ directory
Demonstrates "leveling up" the AI mid-game

Game Loop (60 FPS):

Handle PyGame events (keyboard, window close)
Get human action from keyboard input
Get AI action via model.predict(state, deterministic=True)
Call engine.step(human_action, ai_action)
Render using PyGame drawing functions

Rendering:

Black background with white paddles and ball
Center dashed line separator
Score display (left paddle vs. right paddle)
Current AI level display
Control instructions

Normalized State Extraction:

# Get raw state from engine
state_dict = self.engine._get_state()

# Normalize using environment wrapper
state = self.env_wrapper._normalize_state(state_dict)

# Get AI action (deterministic for consistency)
action, _ = self.model.predict(state, deterministic=True)

5. `/api/app.py` - FastAPI WebSocket Server

Real-time game data streaming for future dashboard integration.

Endpoints:

GET /

Health check and service info
Returns connected client count

GET /health

Simple health check endpoint

WebSocket /ws/game-data

Accepts WebSocket connections
Receives game state JSON and broadcasts to all clients

Expected message format (from demo):

{
  "human_score": 5,
  "ai_score": 3,
  "current_level": "Advanced (500k)",
  "ai_action": 1,
  "ball_x": 400.0,
  "ball_y": 300.0,
  "human_paddle_y": 250.0,
  "ai_paddle_y": 280.0,
  "timestamp": 1704067200.123
}

POST /broadcast

REST endpoint to broadcast data without WebSocket
Useful for testing or external integrations

GET /clients

Returns current number of connected clients

Connection Manager:

class ConnectionManager:
    async def connect(websocket)     # Accept new connection
    def disconnect(websocket)         # Remove client
    async def broadcast(data)         # Send to all clients
    async def send_personal(ws, data) # Send to specific client

Usage:

python -m api.app
# Server starts on http://127.0.0.1:8000
# WebSocket: ws://127.0.0.1:8000/ws/game-data

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Train the AI (Optional)

If you don't have pre-trained models, train them first:

# Train for 1 million timesteps with 4 parallel environments
python train/ppo.py

# Monitor training with TensorBoard
tensorboard --logdir=./pong_tensorboard/

Training Notes:

Takes 30-60 minutes on a modern GPU
Checkpoint models are saved every 50,000 steps
Monitor training progress via TensorBoard at http://localhost:6006
Adjust total_timesteps or num_envs based on your hardware

3. Run the Live Demo

python demo/play.py

In the demo:

Control left paddle with W (up) / S (down)
Press 1-4 to instantly change AI difficulty
Press SPACE to pause/resume
Press R to reset the score
Close window to exit

4. (Optional) Start the API Server

# Run API server (development mode with auto-reload)
python -m api.app

# Visit http://localhost:8000 for health check
# WebSocket endpoint: ws://localhost:8000/ws/game-data

📊 Training Details

Curriculum & Checkpoints

The training pipeline automatically saves checkpoints:

50k steps: Basic rally pattern learning
100k steps: Paddle positioning and defense
200k steps: Intermediate play with varied strategies
500k steps: Advanced spike and positioning
1M steps: Master-level near-optimal play

Performance Metrics

Track training via TensorBoard:

Cumulative Reward: Should trend upward
Episode Length: Indicates longer rallies as AI improves
Win Rate: Demonstrates AI improvement

Customization

Modifying Game Physics

Edit /engine/pong.py constants:

SCREEN_WIDTH = 1200      # Wider screen
PADDLE_SPEED = 8.0       # Faster paddles
BALL_SPEED = 7.0         # Faster ball
MAX_BALL_SPEED = 10.0    # Higher ceiling

Adjusting Reward Function

Edit /rl/env.py _calculate_reward():

if right_scored:
    reward += 2.0  # More generous scoring reward

reward -= 0.005   # Increase step penalty for faster wins

Changing PPO Hyperparameters

Edit /train/ppo.py:

PPO(
    learning_rate=1e-4,    # Lower for finer control
    n_steps=4096,          # Larger batches
    gamma=0.95,            # Shorter horizon
    clip_range=0.1,        # More conservative updates
)

Custom Opponent AI

Edit /rl/env.py _get_opponent_action() to implement:

A trained secondary model (self-play)
Adaptive difficulty levels
Predictive tracking

Project Features & Best Practices

✅ Pure Physics Engine: Completely independent from rendering; reusable for headless training
✅ Proper Collision Detection: AABB + velocity-based response with paddle spin
✅ Normalization: All observations normalized to [-1, 1] for stable training
✅ Vectorized Training: Support for parallel environments to speed up learning
✅ Checkpoint Strategy: Regular model saves for difficulty progression
✅ Clean Architecture: Strict separation of concerns (physics, RL, rendering, API)
✅ Deterministic Inference: Uses deterministic=True for consistent demo behavior
✅ Error Handling: Graceful fallbacks for missing models
✅ Modular Design: Each component can be used independently
✅ Comprehensive Logging: Print statements for debugging and monitoring

File-by-File Summary

File	Purpose	Status
`engine/pong.py`	Physics engine, collision math	✅ Complete
`rl/env.py`	Gymnasium wrapper, normalization, rewards	✅ Complete
`train/ppo.py`	Training pipeline with callbacks	✅ Complete
`demo/play.py`	PyGame frontend, model loading, inference	✅ Complete
`api/app.py`	FastAPI WebSocket server	✅ Complete
`requirements.txt`	Dependencies	✅ Complete
`README.md`	Documentation	✅ Complete

Troubleshooting

Issue: "No module named 'engine'" when running demo

# Run from project root, not from demo/ directory
cd PongAI
python demo/play.py

Issue: Model not found when starting demo

Train a model first: python train/ppo.py
Or download pre-trained models from the releases page

Issue: PyGame window doesn't respond

Check that you have a display connection (not headless)
Ensure pygame version is 2.5.2+

Issue: Training is very slow

Set use_multiprocessing=False in train/ppo.py for debugging
Reduce n_steps or increase num_envs for faster learning
GPU acceleration is automatic with stable-baselines3

Future Enhancements

Optional extensions not included in this scaffold:

Web dashboard (React/Vue frontend consuming WebSocket data)
Spectator mode with live streaming
Multiple AI opponents simultaneously
Transfer learning from easier to harder tasks
Imitation learning from human demonstrations
Multi-agent self-play tournaments

License

This project is provided as-is for educational and demonstration purposes.

Author Notes

This is a production-ready scaffold with:

Complete, functional code (no placeholders)
Professional error handling and logging
Modular architecture for easy extension
Extensive documentation
Best practices for RL + game development

All components are independently testable and can be used in other projects.

Last Updated: April 2026
Stable-Baselines3 Version: 2.3.0
Gymnasium Version: 0.30.0
Python Version: 3.10+

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
api		api
demo		demo
engine		engine
models		models
rl		rl
train		train
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
IMPLEMENTATION.md		IMPLEMENTATION.md
LICENSE		LICENSE
ML_EXPLANATION.md		ML_EXPLANATION.md
QUICKREF.md		QUICKREF.md
README.md		README.md
SCAFFOLD_SUMMARY.txt		SCAFFOLD_SUMMARY.txt
__init__.py		__init__.py
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
run_game.py		run_game.py
setup.py		setup.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

PongAI: Play the infamous pong game with RL Agent !

Project Overview

Architecture

Directory Structure

Technology Stack

📋 Module Documentation

1. /engine/pong.py - Core Physics Engine

Key Classes:

Constants:

Core Methods:

2. /rl/env.py - Gymnasium Environment

Class: CustomPongEnv(gym.Env)

3. /train/ppo.py - Training Pipeline

Key Features:

4. /demo/play.py - Interactive Live Demo

Controls:

Features:

5. /api/app.py - FastAPI WebSocket Server

Endpoints:

Connection Manager:

Usage:

Quick Start

1. Install Dependencies

2. Train the AI (Optional)

3. Run the Live Demo

4. (Optional) Start the API Server

📊 Training Details

Curriculum & Checkpoints

Performance Metrics

Customization

Modifying Game Physics

Adjusting Reward Function

Changing PPO Hyperparameters

Custom Opponent AI

Project Features & Best Practices

File-by-File Summary

Troubleshooting

Future Enhancements

License

Author Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `/engine/pong.py` - Core Physics Engine

2. `/rl/env.py` - Gymnasium Environment

Class: `CustomPongEnv(gym.Env)`

3. `/train/ppo.py` - Training Pipeline

4. `/demo/play.py` - Interactive Live Demo

5. `/api/app.py` - FastAPI WebSocket Server

Packages