Skip to content
@llm-d

llm-d

llm-d enables high performance distributed inference in production on Kubernetes

Welcome to llm-d: a Kubernetes-native high-performance distributed LLM inference framework

GitHub Org's stars Documentation License

Join Slack X (formerly Twitter) Follow Bluesky LinkedIn Reddit YouTube

llm-d is a Kubernetes-native high-performance distributed LLM inference framework that provides the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d offers modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.

🚀 Quick Start Guide

New to llm-d? Here's how to get started:

  1. Join our Slack 💬Get your invite and visit llm-d.slack.com
  2. Explore our code 📂GitHub Organization
  3. Join a meeting 📅Add calendar
  4. Pick your area 🎯 → Browse Special Interest Groups.

📚 Key Resources

💬 Communication Channels

🗓️ Regular Meetings

All meetings are open to the public! 🌟

  • 📅 Weekly Standup: Every Wednesday at 12:30pm ET - Project updates and open discussion
  • 🎯 SIG Meetings: Various times throughout the week - See SIG details for schedules

Join to participate, ask questions, or just listen and learn!

🎯 Special Interest Groups (SIGs)

Want to dive deeper into specific areas? Our Special Interest Groups are focused teams working on different aspects of llm-d:

  • Inference Scheduler - Intelligent request routing and load balancing
  • Benchmarking - Performance testing and optimization
  • PD-Disaggregation - Prefill/decode separation patterns
  • KV-Disaggregation - KV caching and distributed storage
  • Installation - Kubernetes integration and deployment
  • Autoscaling - Traffic-aware autoscaling and resource management
  • Observability - Monitoring, logging, and metrics

View more SIG Details →

🤝 How to Contribute

Getting Involved

Contributing Code

  1. Read Guidelines: Review our Code of Conduct and contribution process
  2. Sign Commits: All commits require DCO sign-off (git commit -s)

Ways to Contribute

  • 🐛 Bug fixes and small features - Submit PRs directly to component repos
  • 🚀 New features with APIs - Require project proposals
  • 📚 Documentation - Help improve guides and examples
  • 🧪 Testing & Benchmarking - Contribute to our test coverage
  • 💡 Experimental features - Start in llm-d-incubation org

🔒 Security & Safety

🌐 Connect With Us

Follow llm-d across social platforms for updates, discussions, and community highlights:

❓ Need Help?

Questions? Ideas? Just want to chat? We're here to help! The llm-d community team is friendly and responsive.


License: Apache 2.0

Pinned Loading

  1. llm-d llm-d Public

    Achieve state of the art inference performance with modern accelerators on Kubernetes

    Shell 3.1k 433

  2. llm-d-inference-scheduler llm-d-inference-scheduler Public

    Inference scheduler for llm-d

    Go 173 171

  3. llm-d-kv-cache llm-d-kv-cache Public

    Distributed KV cache scheduling & offloading libraries

    Go 133 114

  4. llm-d-benchmark llm-d-benchmark Public

    llm-d benchmark scripts and tooling

    Python 58 70

Repositories

Showing 10 of 18 repositories
  • llm-d-workload-variant-autoscaler Public

    Variant optimization autoscaler for distributed inference workloads

    llm-d/llm-d-workload-variant-autoscaler’s past year of commit activity
    Go 38 Apache-2.0 49 115 (4 issues need help) 37 Updated Apr 24, 2026
  • llm-d Public

    Achieve state of the art inference performance with modern accelerators on Kubernetes

    llm-d/llm-d’s past year of commit activity
    Shell 3,063 Apache-2.0 433 158 (5 issues need help) 81 Updated Apr 23, 2026
  • llm-d-benchmark Public

    llm-d benchmark scripts and tooling

    llm-d/llm-d-benchmark’s past year of commit activity
    Python 58 Apache-2.0 70 53 (2 issues need help) 11 Updated Apr 23, 2026
  • llm-d-prism Public

    Performance analysis for distributed inference systems

    llm-d/llm-d-prism’s past year of commit activity
    JavaScript 7 Apache-2.0 7 0 3 Updated Apr 23, 2026
  • llm-d-inference-scheduler Public

    Inference scheduler for llm-d

    llm-d/llm-d-inference-scheduler’s past year of commit activity
    Go 173 Apache-2.0 171 81 (6 issues need help) 23 Updated Apr 23, 2026
  • llm-d.github.io Public

    Website for llm-d: This repository builds the website seen at llm-d.ai

    llm-d/llm-d.github.io’s past year of commit activity
    JavaScript 11 Apache-2.0 26 12 1 Updated Apr 23, 2026
  • llm-d-inference-sim Public

    A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

    llm-d/llm-d-inference-sim’s past year of commit activity
    Go 119 Apache-2.0 78 32 (1 issue needs help) 7 Updated Apr 23, 2026
  • llm-d-python-template Public template

    Python project template for llm-d repos. Use 'Use this template' to create a new Python project with standard CI, linting, Prow, and governance.

    llm-d/llm-d-python-template’s past year of commit activity
    Makefile 1 Apache-2.0 2 1 0 Updated Apr 23, 2026
  • llm-d-inference-payload-processor Public

    Inference payload processor for llm-d

    llm-d/llm-d-inference-payload-processor’s past year of commit activity
    Makefile 3 Apache-2.0 3 1 6 Updated Apr 22, 2026
  • llm-d-kv-cache Public

    Distributed KV cache scheduling & offloading libraries

    llm-d/llm-d-kv-cache’s past year of commit activity
    Go 133 Apache-2.0 114 55 (4 issues need help) 32 Updated Apr 22, 2026

Top languages

Loading…

Most used topics