Open-source AI-native exercise engine for learning by invention.
Traditional education tests memory. LearnForge tests what actually matters:
| Capability | Exercise type |
|---|---|
| Recall | MCQ |
| Understanding | Fill in the blank (AI semantic) |
| Communication | Writeup (AI rubric scoring) |
| Programming ability | Code (deterministic pytest) |
Learners solve exercises on their own machine. No cloud dependency. No accounts. You bring your own AI key—or run Ollama locally for free.
Most AI-powered education tools do one of two things: generate content or grade essays. LearnForge is different: it treats exercises as data and evaluation as a pipeline.
- Exercise authors write YAML. The schema enforces what a good exercise looks like.
- MCQ grading is deterministic. No AI involved, no hallucinated "correct" answers.
- Code grading is deterministic. pytest decides pass/fail; AI only explains failures.
- Writeup/fill-blank grading uses AI with an explicit rubric, not vibes.
Requirements: Python 3.12+, pip
git clone https://github.com/your-org/learnforge
cd learnforge
pip install -e .Install with your preferred AI provider:
pip install -e ".[anthropic]" # Claude
pip install -e ".[openai]" # GPT
pip install -e ".[google]" # Gemini
pip install -e ".[all-providers]" # all of the above
# Ollama needs no extra package — just install Ollama locallyCopy the example and fill in your key:
cp .env.example .envThen edit .env:
# Pick one (or set LEARNFORGE_PROVIDER to force a specific one)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...
# Or use Ollama locally (no key needed):
# LEARNFORGE_PROVIDER=ollama
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3LearnForge auto-detects which provider to use from which key is present.
Set LEARNFORGE_PROVIDER if you want to force a specific one.
learnforge run exercises/machine_learning/001_mcq.yaml
learnforge run exercises/machine_learning/002_fill_blank.yaml
learnforge run exercises/machine_learning/003_writeup.yaml
learnforge run exercises/machine_learning/004_code.yaml --solution my_solution.pylearnforge start
learnforge start --dir exercises/machine_learninglearnforge test exercises/machine_learning/004_code.yaml my_solution.pyThe test command runs hidden pytest cases against your solution file and
reports which tests passed or failed. AI is used to explain failures—never
to decide correctness.
learnforge --verbose run exercises/machine_learning/001_mcq.yamlid: ml_001
type: mcq
title: Gradient Descent Objective
difficulty: beginner # beginner | intermediate | advanced
concept_tags: [gradient_descent, optimization]
prompt: "What does gradient descent minimize?"
options:
A: Number of training rows
B: Loss function
C: Learning rate
D: Accuracy
answer: BEvaluation: exact letter match. No AI required.
id: ml_002
type: fill_blank
title: Gradient Descent Target
prompt: "Gradient descent minimizes the ______."
rubric:
acceptable_patterns:
- loss function
- cost function
- objective function
semantic_check: true # false = keyword-only match
passing_threshold: 0.70Evaluation: AI checks whether the answer is semantically equivalent to any acceptable pattern.
id: ml_003
type: writeup
title: Explain Overfitting
prompt: "Explain overfitting in your own words."
rubric:
must_include:
- training performance
- generalization
- test data
scoring_dimensions: [correctness, depth, clarity]
min_words: 40
passing_threshold: 0.60Evaluation: AI scores each dimension 0–1, averages to an overall score, and returns dimension-specific feedback.
id: ml_004
type: code
title: Implement Binary Search
prompt: |
Implement binary_search(arr: list, target: int) -> int.
Return the index or -1 if not found. O(log n) required.
starter_code: |
def binary_search(arr: list, target: int) -> int:
pass
rubric:
test_file: tests/test_004_binary_search.py # relative to this YAML's dir
timeout_seconds: 10
passing_threshold: 1.0 # all tests must passEvaluation: pytest runs the hidden test file against the learner's code in an isolated temp directory. AI does not decide correctness—ever.
- Create a YAML file anywhere under
exercises/. - For code exercises, create a
tests/subdirectory next to the YAML and write a pytest file that imports fromsolution(e.g.from solution import my_func). - Run it:
learnforge run exercises/my_topic/my_exercise.yaml
The YAML schema is enforced by Pydantic — you'll get a clear error if anything is missing or wrong.
- Create
learnforge/providers/my_provider.py:
from learnforge.providers.base import AIProvider
class MyProvider(AIProvider):
@property
def name(self) -> str:
return "myprovider/model-name"
def complete(self, prompt: str, system: str = "") -> str:
# call your API, return a string
...- Register it in
learnforge/providers/factory.pyby adding anif _try("myprovider"):branch.
learnforge/
├── exercises/
│ └── machine_learning/
│ ├── 001_mcq.yaml
│ ├── 002_fill_blank.yaml
│ ├── 003_writeup.yaml
│ ├── 004_code.yaml
│ └── tests/
│ └── test_004_binary_search.py
│
├── tests/ ← LearnForge's own test suite
│ ├── test_schemas.py
│ ├── test_evaluators.py
│ ├── test_loader.py
│ └── test_code_runner.py
│
├── learnforge/
│ ├── cli/ ← typer commands, rich display
│ ├── core/ ← YAML loader, exercise discovery
│ ├── models/ ← Pydantic exercise models
│ ├── evaluators/ ← MCQ, FillBlank, Writeup, Code evaluators
│ ├── providers/ ← Anthropic, OpenAI, Gemini, Ollama
│ ├── runners/ ← subprocess code runner
│ ├── schemas/ ← EvaluationResult schema
│ └── utils/ ← config (pydantic-settings), logging
│
├── .env.example
├── pyproject.toml
└── LICENSE
pip install -e ".[dev]"
pytest
pytest --cov=learnforge --cov-report=term-missing # with coverageAI decides meaning, tests decide correctness. Code exercise grading is 100% deterministic. The AI is used only to explain failures to the learner in plain English—it cannot change the grade.
Exercises are data, not code. Authors write YAML. Pydantic validates the schema at load time. This keeps exercise authoring accessible and makes the format easy to parse, version, and share.
Provider-agnostic from day one.
The AIProvider interface has a single method: complete(prompt, system) → str.
Adding a new provider is adding one file.
No cloud dependency. Everything runs locally. Ollama support means you can evaluate semantically without sending any data to external services.
MIT — see LICENSE.