Polygen is a dual-purpose AI system designed for cutting-edge synthetic media generation and robust deepfake detection. Exploring the boundary between reality and artificial intelligence, this project combines advanced diffusion models with state-of-the-art forensic analysis techniques.
Polygen's detection engine utilizes a hybrid approach, combining deep neural networks with low-level signal processing to identify manipulated media.
- Neural Ensembles: Leverages an ensemble of EfficientNet-B4 and Xception models trained on diverse forgery datasets.
- Signal Processing Refinements: Incorporates Fast Fourier Transform (FFT) analysis and Photo Response Non-Uniformity (PRNU) noise extraction to detect subtle anomalies invisible to the human eye.
- Explainable AI (XAI): Generates Grad-CAM heatmaps, visually highlighting the specific facial regions that influenced the model's prediction.
- Media Support: Comprehensive analysis for both static images (JPEG, PNG) and videos (MP4), processing up to 5-crop face extractions for enhanced reliability.
The generative suite is built for speed, quality, and control, utilizing the latest in latent diffusion technologies.
- Text-to-Image (SDXL Turbo): Rapid generation of photorealistic and artistic images from complex prompts using Stability AI's SDXL Turbo.
- Image-to-Image (ControlNet): Structure-preserving transformations. Upload an image and dictate structural rules via Canny edge detection.
- Precision Inpainting: Smart masking tools allowing users to seamlessly insert, replace, or remove elements within existing images using Stable Diffusion Inpainting.
- Real-ESRGAN Upscaling: Integrated tiled 4x upscaling to eliminate generation artifacts and enhance output resolution for ultra-high-definition results.
- Real-time Latent Preview: Watch the image materialize during the sampling steps with integrated visual callbacks.
- Backend: FastAPI (Python), serving concurrent ML pipelines asynchronously.
- Frontend: Vanilla HTML5, CSS3, JavaScript.
- ML Engine: PyTorch, Diffusers, OpenCV, timm, BasicSR/RealESRGAN.
- Python 3.8+
- NVIDIA GPU (Highly Recommended, optimized for >= 4GB VRAM) or CPU fallback.
- Clone the repository:
git clone https://github.com/yourusername/polygen.git cd polygen - Set up a Virtual Environment:
python -m venv .venv .\.venv\Scripts\Activate # Windows # source .venv/bin/activate # Mac/Linux
- Install Dependencies:
Note: If taking advantage of GPU acceleration, ensure you have the appropriate PyTorch with CUDA support installed.
pip install -r requirements.txt
-
Start the FastAPI Backend: Navigate to the project root and start the server:
python -m backend.main
Note: On the first run, the system will automatically download necessary foundational models (EfficientNet weights, SDXL tokenizers/UNets) caching them in your local directory.
-
Access the Interface: Open your preferred web browser and navigate to:
http://localhost:8000/static/index.html
poly/
├── backend/ # FastAPI core, API routers (detection, generation, stats)
├── frontend/ # Next-gen UI (HTML/CSS/JS assets)
├── ml_modules/
│ ├── detection/ # Forensics: detector ensembles, Grad-CAM, Refinements
│ └── generation/ # Generative: SDXL Turbo, ControlNet, Real-ESRGAN
├── models/ # Local checkpoint directory (checkpoints/safetensors)
├── scripts/ # Dataset prep, training, and utilities
└── requirements.txt # Python dependencies
- Implementation of Video-to-Video generative filters (e.g., temporally consistent stylization).
- Real-time webcam manipulation detection.
- Audio deepfake analysis integration.