Skip to content

Anorak001/PongAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PongAI: Play the infamous pong game with RL Agent !

A production-ready, highly modular Reinforcement Learning project featuring an interactive "Human vs. AI" Pong game. The core demonstration allows a human player to challenge an AI opponent controlled by a trained neural network, with the ability to dynamically swap the AI's difficulty mid-game via keyboard shortcuts.

image
Screenshot 2026-04-30 141911
Screenshot 2026-04-30 142100

Project Overview

Goal: Build a live, interactive Pong game where:

  • Left Paddle (Human): Controlled via keyboard (W/S keys)
  • Right Paddle (AI): Controlled by a Stable-Baselines3 PPO agent
  • Dynamic Difficulty: Press keys 1-4 to "level up" the AI with different trained models (Novice β†’ Master)

This project demonstrates:

  • Pure game physics simulation (collision detection, scoring)
  • Gymnasium environment design for RL training
  • Stable-Baselines3 PPO training pipeline
  • Real-time PyGame rendering with AI inference
  • FastAPI WebSocket streaming for future dashboard integration

Architecture

Directory Structure

PongAI/
β”œβ”€β”€ engine/
β”‚   └── pong.py              # Pure physics engine (no pygame)
β”œβ”€β”€ rl/
β”‚   └── env.py               # Gymnasium environment wrapper
β”œβ”€β”€ train/
β”‚   └── ppo.py               # PPO training pipeline
β”œβ”€β”€ demo/
β”‚   └── play.py              # Interactive live demo
β”œβ”€β”€ api/
β”‚   └── app.py               # FastAPI WebSocket server
β”œβ”€β”€ models/                  # Trained model checkpoints (autogenerated)
β”œβ”€β”€ pong_tensorboard/        # TensorBoard logs (autogenerated)
β”œβ”€β”€ requirements.txt         # Python dependencies
└── README.md                # This file

Technology Stack

  • Python 3.10+
  • Game Engine: Pure pygame (2.5.2+)
  • RL Framework: gymnasium (0.30.0+) with stable-baselines3 (2.3.0+)
  • API/Streaming: fastapi, uvicorn, websockets
  • Numerical Computing: numpy

πŸ“‹ Module Documentation

1. /engine/pong.py - Core Physics Engine

Pure mathematical simulation with NO pygame dependencies.

Key Classes:

  • Paddle: Represents a paddle with position and movement logic
  • Ball: Represents the ball with position, velocity, and boundary collision
  • PongEngine: Main game state manager

Constants:

  • SCREEN_WIDTH = 800, SCREEN_HEIGHT = 600
  • PADDLE_WIDTH = 15, PADDLE_HEIGHT = 100, BALL_SIZE = 10
  • PADDLE_SPEED = 6.0, BALL_SPEED = 5.0, MAX_BALL_SPEED = 8.0

Core Methods:

PongEngine.reset() β†’ Dict[str, float]

  • Initializes/resets the game state
  • Centers ball and paddles
  • Assigns random initial ball velocity
  • Returns raw state dictionary

PongEngine.step(action_left: int, action_right: int) β†’ Tuple[Dict, bool, bool]

  • Executes one simulation step
  • Actions: 0 (Stay), 1 (Up), 2 (Down)
  • Collision Detection: Precise AABB collision between ball and paddles
    • Reverses X velocity on paddle hit
    • Applies Y velocity modifier based on paddle intersection point (adds "spin")
    • Prevents ball from getting stuck in paddle
  • Scoring: Detects ball passing left/right boundaries
  • Returns: (state_dict, left_scored, right_scored)

Key Physics:

# AABB Collision Detection
def _aabb_collision(box1, box2) -> bool:
    # Returns True if axis-aligned boxes overlap

# Paddle-Ball Collision Response
def _handle_paddle_collision(paddle, is_left):
    # 1. Reverse ball X velocity
    # 2. Add Y velocity based on relative hit position
    #    - Hit top: negative Y velocity
    #    - Hit bottom: positive Y velocity
    #    - Hit middle: minimal change

2. /rl/env.py - Gymnasium Environment

Wraps the physics engine for Stable-Baselines3 training.

Class: CustomPongEnv(gym.Env)

Action Space: Discrete(3)

  • 0: Stay
  • 1: Up
  • 2: Down

Observation Space: Box(-1.0, 1.0, shape=(6,), dtype=float32)

6-element normalized state vector:

  1. ball_x - Ball X position (normalized to [-1, 1])
  2. ball_y - Ball Y position (normalized to [-1, 1])
  3. ball_vx - Ball X velocity (normalized to [-1, 1])
  4. ball_vy - Ball Y velocity (normalized to [-1, 1])
  5. ai_paddle_y - AI paddle Y position (normalized to [-1, 1])
  6. opponent_paddle_y - Opponent paddle Y position (normalized to [-1, 1])

Normalization Strategy:

# Position normalization (map [0, width/height] to [-1, 1])
normalized = (value / max_value) * 2.0 - 1.0

# Velocity normalization (clip to [-1, 1])
normalized = clip(velocity / 10.0, -1.0, 1.0)

Reward Function:

  • +1.0 if AI scores (right paddle)
  • -1.0 if opponent scores (left paddle)
  • +0.1 if AI paddle successfully deflects the ball
  • -0.001 per step (encourages fast wins)

Opponent AI:

  • Simple hardcoded tracker that follows the ball's Y position
  • Provides consistent training partner

3. /train/ppo.py - Training Pipeline

Stable-Baselines3 PPO training script.

Key Features:

Environment Setup:

  • Creates vectorized environments using DummyVecEnv (single-threaded) or SubprocVecEnv (multiprocessing)
  • Configurable number of parallel environments (default: 4)

PPO Configuration:

PPO(
    policy="MlpPolicy",
    learning_rate=3e-4,
    n_steps=2048,
    batch_size=64,
    n_epochs=10,
    gamma=0.99,
    gae_lambda=0.95,
    clip_range=0.2,
    ent_coef=0.01,
)

Callbacks:

  • CheckpointCallback: Saves model every 50,000 timesteps to /models/
  • TensorBoard logging to /pong_tensorboard/

Usage:

python train/ppo.py

Output:

  • Model checkpoints: models/rl_model_50000_steps.zip, rl_model_100000_steps.zip, etc.
  • Final model: models/rl_model_final.zip
  • TensorBoard logs for monitoring training progress

4. /demo/play.py - Interactive Live Demo

PyGame frontend with real-time AI inference and dynamic model swapping.

Controls:

  • W/S Keys: Move left paddle (human player)
  • 1-4 Keys: Switch AI difficulty
    1. Novice (50k steps)
    2. Intermediate (200k steps)
    3. Advanced (500k steps)
    4. Master (1M steps)
  • SPACE: Pause/Resume
  • R: Reset game

Features:

Dynamic Model Loading:

  • Instant model swapping without restarting
  • Models loaded from /models/ directory
  • Demonstrates "leveling up" the AI mid-game

Game Loop (60 FPS):

  1. Handle PyGame events (keyboard, window close)
  2. Get human action from keyboard input
  3. Get AI action via model.predict(state, deterministic=True)
  4. Call engine.step(human_action, ai_action)
  5. Render using PyGame drawing functions

Rendering:

  • Black background with white paddles and ball
  • Center dashed line separator
  • Score display (left paddle vs. right paddle)
  • Current AI level display
  • Control instructions

Normalized State Extraction:

# Get raw state from engine
state_dict = self.engine._get_state()

# Normalize using environment wrapper
state = self.env_wrapper._normalize_state(state_dict)

# Get AI action (deterministic for consistency)
action, _ = self.model.predict(state, deterministic=True)

5. /api/app.py - FastAPI WebSocket Server

Real-time game data streaming for future dashboard integration.

Endpoints:

GET /

  • Health check and service info
  • Returns connected client count

GET /health

  • Simple health check endpoint

WebSocket /ws/game-data

  • Accepts WebSocket connections
  • Receives game state JSON and broadcasts to all clients
  • Expected message format (from demo):
    {
      "human_score": 5,
      "ai_score": 3,
      "current_level": "Advanced (500k)",
      "ai_action": 1,
      "ball_x": 400.0,
      "ball_y": 300.0,
      "human_paddle_y": 250.0,
      "ai_paddle_y": 280.0,
      "timestamp": 1704067200.123
    }

POST /broadcast

  • REST endpoint to broadcast data without WebSocket
  • Useful for testing or external integrations

GET /clients

  • Returns current number of connected clients

Connection Manager:

class ConnectionManager:
    async def connect(websocket)     # Accept new connection
    def disconnect(websocket)         # Remove client
    async def broadcast(data)         # Send to all clients
    async def send_personal(ws, data) # Send to specific client

Usage:

python -m api.app
# Server starts on http://127.0.0.1:8000
# WebSocket: ws://127.0.0.1:8000/ws/game-data

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Train the AI (Optional)

If you don't have pre-trained models, train them first:

# Train for 1 million timesteps with 4 parallel environments
python train/ppo.py

# Monitor training with TensorBoard
tensorboard --logdir=./pong_tensorboard/

Training Notes:

  • Takes 30-60 minutes on a modern GPU
  • Checkpoint models are saved every 50,000 steps
  • Monitor training progress via TensorBoard at http://localhost:6006
  • Adjust total_timesteps or num_envs based on your hardware

3. Run the Live Demo

python demo/play.py

In the demo:

  • Control left paddle with W (up) / S (down)
  • Press 1-4 to instantly change AI difficulty
  • Press SPACE to pause/resume
  • Press R to reset the score
  • Close window to exit

4. (Optional) Start the API Server

# Run API server (development mode with auto-reload)
python -m api.app

# Visit http://localhost:8000 for health check
# WebSocket endpoint: ws://localhost:8000/ws/game-data

πŸ“Š Training Details

Curriculum & Checkpoints

The training pipeline automatically saves checkpoints:

  • 50k steps: Basic rally pattern learning
  • 100k steps: Paddle positioning and defense
  • 200k steps: Intermediate play with varied strategies
  • 500k steps: Advanced spike and positioning
  • 1M steps: Master-level near-optimal play

Performance Metrics

Track training via TensorBoard:

  • Cumulative Reward: Should trend upward
  • Episode Length: Indicates longer rallies as AI improves
  • Win Rate: Demonstrates AI improvement

Customization

Modifying Game Physics

Edit /engine/pong.py constants:

SCREEN_WIDTH = 1200      # Wider screen
PADDLE_SPEED = 8.0       # Faster paddles
BALL_SPEED = 7.0         # Faster ball
MAX_BALL_SPEED = 10.0    # Higher ceiling

Adjusting Reward Function

Edit /rl/env.py _calculate_reward():

if right_scored:
    reward += 2.0  # More generous scoring reward

reward -= 0.005   # Increase step penalty for faster wins

Changing PPO Hyperparameters

Edit /train/ppo.py:

PPO(
    learning_rate=1e-4,    # Lower for finer control
    n_steps=4096,          # Larger batches
    gamma=0.95,            # Shorter horizon
    clip_range=0.1,        # More conservative updates
)

Custom Opponent AI

Edit /rl/env.py _get_opponent_action() to implement:

  • A trained secondary model (self-play)
  • Adaptive difficulty levels
  • Predictive tracking

Project Features & Best Practices

βœ… Pure Physics Engine: Completely independent from rendering; reusable for headless training
βœ… Proper Collision Detection: AABB + velocity-based response with paddle spin
βœ… Normalization: All observations normalized to [-1, 1] for stable training
βœ… Vectorized Training: Support for parallel environments to speed up learning
βœ… Checkpoint Strategy: Regular model saves for difficulty progression
βœ… Clean Architecture: Strict separation of concerns (physics, RL, rendering, API)
βœ… Deterministic Inference: Uses deterministic=True for consistent demo behavior
βœ… Error Handling: Graceful fallbacks for missing models
βœ… Modular Design: Each component can be used independently
βœ… Comprehensive Logging: Print statements for debugging and monitoring

File-by-File Summary

File Purpose Status
engine/pong.py Physics engine, collision math βœ… Complete
rl/env.py Gymnasium wrapper, normalization, rewards βœ… Complete
train/ppo.py Training pipeline with callbacks βœ… Complete
demo/play.py PyGame frontend, model loading, inference βœ… Complete
api/app.py FastAPI WebSocket server βœ… Complete
requirements.txt Dependencies βœ… Complete
README.md Documentation βœ… Complete

Troubleshooting

Issue: "No module named 'engine'" when running demo

# Run from project root, not from demo/ directory
cd PongAI
python demo/play.py

Issue: Model not found when starting demo

  • Train a model first: python train/ppo.py
  • Or download pre-trained models from the releases page

Issue: PyGame window doesn't respond

  • Check that you have a display connection (not headless)
  • Ensure pygame version is 2.5.2+

Issue: Training is very slow

  • Set use_multiprocessing=False in train/ppo.py for debugging
  • Reduce n_steps or increase num_envs for faster learning
  • GPU acceleration is automatic with stable-baselines3

Future Enhancements

Optional extensions not included in this scaffold:

  • Web dashboard (React/Vue frontend consuming WebSocket data)
  • Spectator mode with live streaming
  • Multiple AI opponents simultaneously
  • Transfer learning from easier to harder tasks
  • Imitation learning from human demonstrations
  • Multi-agent self-play tournaments

License

This project is provided as-is for educational and demonstration purposes.

Author Notes

This is a production-ready scaffold with:

  • Complete, functional code (no placeholders)
  • Professional error handling and logging
  • Modular architecture for easy extension
  • Extensive documentation
  • Best practices for RL + game development

All components are independently testable and can be used in other projects.


Last Updated: April 2026
Stable-Baselines3 Version: 2.3.0
Gymnasium Version: 0.30.0
Python Version: 3.10+

About

A Pong game which you can actually play with a sophisticated RL Agent !!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages