ACCIDENT

Code and assets for the ACCIDENT dataset paper, including baseline methods and the CARLA-based synthetic data generation pipeline.

Start here if you are new to the repo:

Repository map

docs/ - onboarding and repository navigation docs
baselines/ - all baseline implementations, split into heuristic and LLM/VLM families
generation/ - dataset generation pipelines, currently including CARLA-based synthesis
dataset/ - local dataset cache plus the normalized real_videos/ layout used by the baselines

Quick start

If you want to reproduce paper baselines, start with one of these:

Download the dataset

Install the Kaggle CLI first:

uv venv .venv
source .venv/bin/activate
uv pip install -r dataset/requirements.txt

Then download the dataset:

bash dataset/download_dataset.sh

This runs dataset/download_dataset.sh, downloads the Kaggle dataset picekl/accident, and prepares dataset/real_videos/ for the baseline code.

You still need to authenticate the Kaggle CLI first. See dataset/README.md for setup.

Heuristic baselines

These are the easiest entry point because they already expose command-line interfaces.

cd baselines/heuristic
uv sync
python naive.py
python optical_flow.py --take 5
python bbox_dynamics.py --take 5

See baselines/heuristic/README.md for details.

LLM / VLM baselines

These experiments mix one tracked script and several notebooks:

cd baselines/llm
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python baselines/temporal/main.py --model qwen --dataset-path ../../dataset --range 0:10

See baselines/llm/README.md for the full execution order.

Synthetic data generation

Use the CARLA project when you want to generate or extend synthetic accident data rather than run baselines:

cd generation/carla-simulation
docker compose up --build

See generation/carla-simulation/README.md for requirements and workflow details.

Dataset layout used by the baselines

The heuristic baseline scripts and the LLM temporal script now both accept an explicit dataset root. The expected layout is:

dataset/
  real_videos/
    labels.csv
    test_metadata.csv
    videos/
      ...
  synthetic_videos/
    ...

If your dataset lives elsewhere, pass --dataset-path /path/to/real_videos to the supported scripts. Passing --dataset-path /path/to/dataset also works as long as that directory contains real_videos/.

Recommendations for new users

Start with baselines/heuristic/naive.py to verify that your labels and metadata are readable.
Use --take 5 on heavier baselines before launching full runs.
Keep generated outputs inside each subproject's local output folders so reruns stay reproducible.
Treat notebooks as analysis companions; use the tracked scripts when you want a repeatable paper run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACCIDENT

Repository map

Quick start

Download the dataset

Heuristic baselines

LLM / VLM baselines

Synthetic data generation

Dataset layout used by the baselines

Recommendations for new users

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
baselines		baselines
dataset		dataset
docs		docs
generation		generation
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ACCIDENT

Repository map

Quick start

Download the dataset

Heuristic baselines

LLM / VLM baselines

Synthetic data generation

Dataset layout used by the baselines

Recommendations for new users

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages