ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

ReImagine — controllable human video generation via image-first synthesis

Try it online Here: https://taited-reimagine.hf.space/

Overview

This repository hosts the official implementation of ReImagine, a framework for controllable high-quality human video generation via image-first synthesis. For more context, see the paper on arXiv and the project website.

What's New 🚀

April 23 2026: Updated the Image-First Synthesis demo.
April 22 2026: Initial repository launch.

Stay tuned for further updates!

Getting Started

Environment

We develop and test with Python 3.10, PyTorch 2.4.1, and CUDA 12.4. Install the CUDA 12.4 PyTorch wheels, then install this package in editable mode:

conda create -n reimagine python=3.10
conda activate reimagine
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
pip install -e .

Pretrained Base Models

Download checkpoints with the Hugging Face Hub CLI (hf download or huggingface-cli download on older installs). For FLUX.1-Kontext-dev, you can skip the monolithic flux1-kontext-dev.safetensors and the vae/ tree:

hf download black-forest-labs/FLUX.1-Kontext-dev \
  --local-dir ./models/FLUX.1-Kontext-dev \
  --exclude "flux1-kontext-dev.safetensors" \
  --exclude "vae/**"

For ControlNet:

hf download jasperai/Flux.1-dev-Controlnet-Surface-Normals \
  --local-dir ./models/Flux.1-dev-Controlnet-Surface-Normals

ReImagine LoRA Weights

ReImagine LoRA weight files are hosted on Hugging Face at taited/ReImagine-Pretrained.

SMPL-X Params	Input Type	File	Status
w/o	Canonical human (front & back views)	`kontext-wo_smplx-lora.safetensors`	Available
w/o	Disentangled assets (face, clothes, shoes)	TBA	Planned

Download: Use the same Hugging Face CLI as for the base models:

hf download taited/ReImagine-Pretrained --local-dir ./models/ReImagine-Pretrained

Inference with Image-First Synthesis (`inference_img.py`)

Once you have prepared the pretrained weights, use inference_img.py to infer each frame. This script requires two image inputs: a wide reference image (left = front, right = back) and a normal map. The normal map is generated from SMPL-X's global coordinate system based on camera parameters.

For more details on the usage of inference_img.py, check the full guide and example.

Inference with Temporal-Refinement Video Synthesis (To be continued)

The code for Temporal-Refinement Video Synthesis is currently being organized for open-source release. Once available, it will allow inference on video data with temporal refinement.

Stay tuned for updates!

Status

Released

Code for Image-First Synthesis inference (inference_img.py)
Pretrained LoRA weights (available for download)
Documentation and usage instructions for basic inference

To be released

Code for Temporal-Refinement Video Synthesis
Pretrained model weights for Disentangled assets (face, clothes, shoes)
Full dataset release

We are actively organizing and updating the repository. Updates will be added here as each item becomes available.

Acknowledgments

This repository’s implementation is based on DiffSynth Studio (ModelScope). We thank the authors and maintainers for releasing their work. The upstream project is licensed under the Apache License 2.0.

We acknowledge the contributions of the teams behind FLUX.1-Kontext-dev and Flux.1-dev-Controlnet-Surface-Normals for their open-source releases that this project builds on.

Citation

If you find this project useful, please consider citing our paper:

@article{sun2025rethinking,
  title={ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis},
  author={Sun, Zhengwentai and Zheng, Keru and Li, Chenghong and Liao, Hongjie and Yang, Xihe and Li, Heyuan and Zhi, Yihao and Ning, Shuliang and Cui, Shuguang and Han, Xiaoguang},
  journal={arXiv preprint arXiv:2604.19720},
  year={2026},
  url={https://arxiv.org/abs/2604.19720v1}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
diffsynth		diffsynth
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
inference_img.py		inference_img.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Overview

What's New 🚀

Getting Started

Environment

Pretrained Base Models

ReImagine LoRA Weights

Inference with Image-First Synthesis (`inference_img.py`)

Inference with Temporal-Refinement Video Synthesis (To be continued)

Status

Released

To be released

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

Overview

What's New 🚀

Getting Started

Environment

Pretrained Base Models

ReImagine LoRA Weights

Inference with Image-First Synthesis (inference_img.py)

Inference with Temporal-Refinement Video Synthesis (To be continued)

Status

Released

To be released

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Inference with Image-First Synthesis (`inference_img.py`)

Packages