Stealth WAF-bypass scraping engine with AI-powered structured data extraction.
Turn any website into a structured JSON API β no matter what WAF protects it.
PhantomAPI is a production-grade REST API framework that turns any website into a structured data source β even if that site has no public API and is protected by Cloudflare, Datadome, or similar WAF layers.
It drives a real, fingerprint-spoofed Chrome browser, cleans the DOM, then feeds the content to GPT-4o which returns exactly the data you asked for as a clean JSON object. It supports both Synchronous (instant JSON return) and Asynchronous (Webhook delivery) extraction modes.
POST /api/v1/extract
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Stealth Chrome Engine β
β Β· undetected-chromedriver β
β Β· Advanced Stealth Flags β
β Β· Proxy rotation β
β Β· Exponential backoff retry β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β BeautifulSoup DOM Cleaner β
β Β· script / style / svg removed β
β Β· Attribute stripping β
β Β· 12 000 char token guard β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β OpenAI GPT-4o β
β Β· json_object response mode β
β Β· Zero-temperature extraction β
βββββββββββββββββββββββββββββββββββ
β
βΌ
Clean JSON Response (Sync)
OR
Webhook Delivery (Async)
| Layer | Technology |
|---|---|
| API | FastAPI + Uvicorn |
| Scraping | undetected-chromedriver + Selenium |
| DOM Parsing | BeautifulSoup4 + lxml |
| AI Engine | OpenAI GPT-4o |
| Validation | Pydantic v2 |
| Rate Limit | SlowAPI + Asyncio Semaphore |
| Retries | Tenacity + exponential backoff |
| Deployment | Docker + Docker Compose |
| Logging | colorlog |
git clone https://github.com/ossiqn/PhantomAPI.git
cd PhantomAPI
cp .env.example .env
docker-compose up -d --buildgit clone https://github.com/ossiqn/PhantomAPI.git
cd PhantomAPI
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
python main.pyReturns the extracted JSON directly in the HTTP response.
curl -X POST "http://localhost:8000/api/v1/extract" \
-H "Content-Type: application/json" \
-H "X-OpenAI-Key: sk-..." \
-d '{
"url": "https://target-site.com/products",
"prompt": "Extract all product names and prices as a JSON array."
}'Provide a webhook_url. The API immediately returns 202 Accepted with a task_id and processes the extraction in the background. Once complete, the result is POST'd to your webhook.
curl -X POST "http://localhost:8000/api/v1/extract" \
-H "Content-Type: application/json" \
-H "X-OpenAI-Key: sk-..." \
-d '{
"url": "https://target-site.com/products",
"prompt": "Extract all product names and prices as a JSON array.",
"webhook_url": "https://your-server.com/webhook/receive"
}'| Field | Type | Required | Description |
|---|---|---|---|
url |
string | yes | Full URL of the target page |
prompt |
string | yes | What data to extract and how to structure it |
wait_for_selector |
string | no | CSS selector to wait for before capturing the DOM |
javascript |
string | no | Custom JS to execute after page load (max 2000c) |
webhook_url |
string | no | Target URL to receive the async extraction result |
| Header | Required | Description |
|---|---|---|
X-OpenAI-Key |
yes | Your OpenAI API key |
{
"success": true,
"url": "https://target-site.com/products",
"extracted_data": {
"products": [
{ "name": "Product A", "price": "$19.99" },
{ "name": "Product B", "price": "$34.99" }
]
},
"tokens_used": 812,
"proxy_used": "http://1.2.3.4:8080",
"elapsed_ms": 7430.21
}Create a proxies.txt file in the project root:
# Lines starting with # are ignored
http://user:pass@1.2.3.4:8080
socks5://9.10.11.12:1080
http://5.6.7.8:3128
- Proxies are selected randomly on each request.
- Bad proxies are auto-removed from the rotation pool on failure.
- If the file does not exist, PhantomAPI runs on your direct IP without interruption.
| Method | Path | Description |
|---|---|---|
| POST | /api/v1/extract |
Run extraction |
| GET | /api/v1/health |
Engine health check |
| GET | /docs |
Swagger UI |
| GET | /redoc |
ReDoc UI |
| Variable | Default | Description |
|---|---|---|
APP_HOST |
0.0.0.0 |
Server bind host |
APP_PORT |
8000 |
Server port |
APP_ENV |
production |
Environment label |
PAGE_LOAD_TIMEOUT |
30 |
Seconds before browser timeout |
RETRY_ATTEMPTS |
3 |
Max browser retry count |
RETRY_DELAY |
2 |
Base delay between retries in seconds |
MAX_CONTENT_CHARS |
12000 |
Max chars forwarded to OpenAI |
PROXY_FILE_PATH |
proxies.txt |
Path to proxy list file |
RATE_LIMIT_PER_MINUTE |
30 |
Max requests per minute per IP |
MAX_CONCURRENT_TASKS |
5 |
Max simultaneous browser instances (Queue cap) |
ADVANCED_STEALTH_MODE |
true |
Enable extreme WAF bypass Chrome flags |
| Status | Meaning |
|---|---|
401 |
Missing or invalid X-OpenAI-Key header |
408 |
Target page timed out after all retry attempts |
422 |
Validation error or empty page content |
429 |
Rate limit exceeded |
503 |
WAF bypass failed, OpenAI unreachable, or server full |
500 |
Unexpected internal error |
PhantomAPI/
βββ main.py
βββ requirements.txt
βββ Dockerfile
βββ docker-compose.yml
βββ .env.example
βββ .gitignore
βββ src/
βββ api/
β βββ routes.py
β βββ middleware.py
βββ core/
β βββ config.py
β βββ schemas.py
β βββ exceptions.py
βββ services/
β βββ scraper.py
β βββ ai_parser.py
βββ utils/
βββ proxy_manager.py
βββ rate_limiter.py
βββ logger.py
- API keys are never stored, logged, or hardcoded β passed per-request via header only.
- Rate limiting is enforced per IP via SlowAPI.
- Smart Queue (Semaphore) prevents server overload by capping concurrent Chrome instances.
- Custom JavaScript input is capped at 2000 characters to prevent abuse.
- All exception traces are server-side only β clients receive sanitized error messages.
This project is licensed under the MIT License. See the LICENSE file for details.
| Platform | Link |
|---|---|
| π’ Telegram | t.me/ossiqn |
| π¦ Telegram Archive | t.me/ossiqnarsiv |
| π Website | ossiqn.com.tr |
| πΈ Instagram | instagram.com/ossiqnstwo |
| π‘οΈ Forum | blueshield.com.tr |
Built with π» by Ossiqn β PhantomAPI is intended for legal use only. Always ensure you have permission to scrape a target website.
