LearnForge

Open-source AI-native exercise engine for learning by invention.

Traditional education tests memory. LearnForge tests what actually matters:

Capability	Exercise type
Recall	MCQ
Understanding	Fill in the blank (AI semantic)
Communication	Writeup (AI rubric scoring)
Programming ability	Code (deterministic pytest)

Learners solve exercises on their own machine. No cloud dependency. No accounts. You bring your own AI key—or run Ollama locally for free.

Why it exists

Most AI-powered education tools do one of two things: generate content or grade essays. LearnForge is different: it treats exercises as data and evaluation as a pipeline.

Exercise authors write YAML. The schema enforces what a good exercise looks like.
MCQ grading is deterministic. No AI involved, no hallucinated "correct" answers.
Code grading is deterministic. pytest decides pass/fail; AI only explains failures.
Writeup/fill-blank grading uses AI with an explicit rubric, not vibes.

Installation

Requirements: Python 3.12+, pip

git clone https://github.com/your-org/learnforge
cd learnforge
pip install -e .

Install with your preferred AI provider:

pip install -e ".[anthropic]"   # Claude
pip install -e ".[openai]"      # GPT
pip install -e ".[google]"      # Gemini
pip install -e ".[all-providers]"  # all of the above
# Ollama needs no extra package — just install Ollama locally

Setting API keys

Copy the example and fill in your key:

cp .env.example .env

Then edit .env:

# Pick one (or set LEARNFORGE_PROVIDER to force a specific one)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

# Or use Ollama locally (no key needed):
# LEARNFORGE_PROVIDER=ollama
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3

LearnForge auto-detects which provider to use from which key is present. Set LEARNFORGE_PROVIDER if you want to force a specific one.

Running exercises

Run any exercise interactively

learnforge run exercises/machine_learning/001_mcq.yaml
learnforge run exercises/machine_learning/002_fill_blank.yaml
learnforge run exercises/machine_learning/003_writeup.yaml
learnforge run exercises/machine_learning/004_code.yaml --solution my_solution.py

Browse and pick from all exercises

learnforge start
learnforge start --dir exercises/machine_learning

Test a code exercise solution

learnforge test exercises/machine_learning/004_code.yaml my_solution.py

The test command runs hidden pytest cases against your solution file and reports which tests passed or failed. AI is used to explain failures—never to decide correctness.

Enable debug logging

learnforge --verbose run exercises/machine_learning/001_mcq.yaml

Exercise types

MCQ — multiple choice

id: ml_001
type: mcq
title: Gradient Descent Objective
difficulty: beginner          # beginner | intermediate | advanced
concept_tags: [gradient_descent, optimization]
prompt: "What does gradient descent minimize?"
options:
  A: Number of training rows
  B: Loss function
  C: Learning rate
  D: Accuracy
answer: B

Evaluation: exact letter match. No AI required.

Fill in the blank

id: ml_002
type: fill_blank
title: Gradient Descent Target
prompt: "Gradient descent minimizes the ______."
rubric:
  acceptable_patterns:
    - loss function
    - cost function
    - objective function
  semantic_check: true       # false = keyword-only match
  passing_threshold: 0.70

Evaluation: AI checks whether the answer is semantically equivalent to any acceptable pattern.

Writeup

id: ml_003
type: writeup
title: Explain Overfitting
prompt: "Explain overfitting in your own words."
rubric:
  must_include:
    - training performance
    - generalization
    - test data
  scoring_dimensions: [correctness, depth, clarity]
  min_words: 40
  passing_threshold: 0.60

Evaluation: AI scores each dimension 0–1, averages to an overall score, and returns dimension-specific feedback.

Code

id: ml_004
type: code
title: Implement Binary Search
prompt: |
  Implement binary_search(arr: list, target: int) -> int.
  Return the index or -1 if not found. O(log n) required.
starter_code: |
  def binary_search(arr: list, target: int) -> int:
      pass
rubric:
  test_file: tests/test_004_binary_search.py   # relative to this YAML's dir
  timeout_seconds: 10
  passing_threshold: 1.0    # all tests must pass

Evaluation: pytest runs the hidden test file against the learner's code in an isolated temp directory. AI does not decide correctness—ever.

Adding exercises

Create a YAML file anywhere under exercises/.
For code exercises, create a tests/ subdirectory next to the YAML and write a pytest file that imports from solution (e.g. from solution import my_func).
Run it: learnforge run exercises/my_topic/my_exercise.yaml

The YAML schema is enforced by Pydantic — you'll get a clear error if anything is missing or wrong.

Adding AI providers

Create learnforge/providers/my_provider.py:

from learnforge.providers.base import AIProvider

class MyProvider(AIProvider):
    @property
    def name(self) -> str:
        return "myprovider/model-name"

    def complete(self, prompt: str, system: str = "") -> str:
        # call your API, return a string
        ...

Register it in learnforge/providers/factory.py by adding an if _try("myprovider"): branch.

Project structure

learnforge/
├── exercises/
│   └── machine_learning/
│       ├── 001_mcq.yaml
│       ├── 002_fill_blank.yaml
│       ├── 003_writeup.yaml
│       ├── 004_code.yaml
│       └── tests/
│           └── test_004_binary_search.py
│
├── tests/                        ← LearnForge's own test suite
│   ├── test_schemas.py
│   ├── test_evaluators.py
│   ├── test_loader.py
│   └── test_code_runner.py
│
├── learnforge/
│   ├── cli/          ← typer commands, rich display
│   ├── core/         ← YAML loader, exercise discovery
│   ├── models/       ← Pydantic exercise models
│   ├── evaluators/   ← MCQ, FillBlank, Writeup, Code evaluators
│   ├── providers/    ← Anthropic, OpenAI, Gemini, Ollama
│   ├── runners/      ← subprocess code runner
│   ├── schemas/      ← EvaluationResult schema
│   └── utils/        ← config (pydantic-settings), logging
│
├── .env.example
├── pyproject.toml
└── LICENSE

Running the test suite

pip install -e ".[dev]"
pytest
pytest --cov=learnforge --cov-report=term-missing   # with coverage

Design principles

AI decides meaning, tests decide correctness. Code exercise grading is 100% deterministic. The AI is used only to explain failures to the learner in plain English—it cannot change the grade.

Exercises are data, not code. Authors write YAML. Pydantic validates the schema at load time. This keeps exercise authoring accessible and makes the format easy to parse, version, and share.

Provider-agnostic from day one. The AIProvider interface has a single method: complete(prompt, system) → str. Adding a new provider is adding one file.

No cloud dependency. Everything runs locally. Ollama support means you can evaluate semantically without sending any data to external services.

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LearnForge

Why it exists

Installation

Setting API keys

Running exercises

Run any exercise interactively

Browse and pick from all exercises

Test a code exercise solution

Enable debug logging

Exercise types

MCQ — multiple choice

Fill in the blank

Writeup

Code

Adding exercises

Adding AI providers

Project structure

Running the test suite

Design principles

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
exercises/machine_learning		exercises/machine_learning
learnforge.egg-info		learnforge.egg-info
learnforge		learnforge
tests		tests
.env.example		.env.example
.ignore		.ignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

LearnForge

Why it exists

Installation

Setting API keys

Running exercises

Run any exercise interactively

Browse and pick from all exercises

Test a code exercise solution

Enable debug logging

Exercise types

MCQ — multiple choice

Fill in the blank

Writeup

Code

Adding exercises

Adding AI providers

Project structure

Running the test suite

Design principles

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages