Skip to content

cloudxlab/learnforge

Repository files navigation

LearnForge

Open-source AI-native exercise engine for learning by invention.

Traditional education tests memory. LearnForge tests what actually matters:

Capability Exercise type
Recall MCQ
Understanding Fill in the blank (AI semantic)
Communication Writeup (AI rubric scoring)
Programming ability Code (deterministic pytest)

Learners solve exercises on their own machine. No cloud dependency. No accounts. You bring your own AI key—or run Ollama locally for free.


Why it exists

Most AI-powered education tools do one of two things: generate content or grade essays. LearnForge is different: it treats exercises as data and evaluation as a pipeline.

  • Exercise authors write YAML. The schema enforces what a good exercise looks like.
  • MCQ grading is deterministic. No AI involved, no hallucinated "correct" answers.
  • Code grading is deterministic. pytest decides pass/fail; AI only explains failures.
  • Writeup/fill-blank grading uses AI with an explicit rubric, not vibes.

Installation

Requirements: Python 3.12+, pip

git clone https://github.com/your-org/learnforge
cd learnforge
pip install -e .

Install with your preferred AI provider:

pip install -e ".[anthropic]"   # Claude
pip install -e ".[openai]"      # GPT
pip install -e ".[google]"      # Gemini
pip install -e ".[all-providers]"  # all of the above
# Ollama needs no extra package — just install Ollama locally

Setting API keys

Copy the example and fill in your key:

cp .env.example .env

Then edit .env:

# Pick one (or set LEARNFORGE_PROVIDER to force a specific one)
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GEMINI_API_KEY=...

# Or use Ollama locally (no key needed):
# LEARNFORGE_PROVIDER=ollama
# OLLAMA_BASE_URL=http://localhost:11434
# OLLAMA_MODEL=llama3

LearnForge auto-detects which provider to use from which key is present. Set LEARNFORGE_PROVIDER if you want to force a specific one.


Running exercises

Run any exercise interactively

learnforge run exercises/machine_learning/001_mcq.yaml
learnforge run exercises/machine_learning/002_fill_blank.yaml
learnforge run exercises/machine_learning/003_writeup.yaml
learnforge run exercises/machine_learning/004_code.yaml --solution my_solution.py

Browse and pick from all exercises

learnforge start
learnforge start --dir exercises/machine_learning

Test a code exercise solution

learnforge test exercises/machine_learning/004_code.yaml my_solution.py

The test command runs hidden pytest cases against your solution file and reports which tests passed or failed. AI is used to explain failures—never to decide correctness.

Enable debug logging

learnforge --verbose run exercises/machine_learning/001_mcq.yaml

Exercise types

MCQ — multiple choice

id: ml_001
type: mcq
title: Gradient Descent Objective
difficulty: beginner          # beginner | intermediate | advanced
concept_tags: [gradient_descent, optimization]
prompt: "What does gradient descent minimize?"
options:
  A: Number of training rows
  B: Loss function
  C: Learning rate
  D: Accuracy
answer: B

Evaluation: exact letter match. No AI required.


Fill in the blank

id: ml_002
type: fill_blank
title: Gradient Descent Target
prompt: "Gradient descent minimizes the ______."
rubric:
  acceptable_patterns:
    - loss function
    - cost function
    - objective function
  semantic_check: true       # false = keyword-only match
  passing_threshold: 0.70

Evaluation: AI checks whether the answer is semantically equivalent to any acceptable pattern.


Writeup

id: ml_003
type: writeup
title: Explain Overfitting
prompt: "Explain overfitting in your own words."
rubric:
  must_include:
    - training performance
    - generalization
    - test data
  scoring_dimensions: [correctness, depth, clarity]
  min_words: 40
  passing_threshold: 0.60

Evaluation: AI scores each dimension 0–1, averages to an overall score, and returns dimension-specific feedback.


Code

id: ml_004
type: code
title: Implement Binary Search
prompt: |
  Implement binary_search(arr: list, target: int) -> int.
  Return the index or -1 if not found. O(log n) required.
starter_code: |
  def binary_search(arr: list, target: int) -> int:
      pass
rubric:
  test_file: tests/test_004_binary_search.py   # relative to this YAML's dir
  timeout_seconds: 10
  passing_threshold: 1.0    # all tests must pass

Evaluation: pytest runs the hidden test file against the learner's code in an isolated temp directory. AI does not decide correctness—ever.


Adding exercises

  1. Create a YAML file anywhere under exercises/.
  2. For code exercises, create a tests/ subdirectory next to the YAML and write a pytest file that imports from solution (e.g. from solution import my_func).
  3. Run it: learnforge run exercises/my_topic/my_exercise.yaml

The YAML schema is enforced by Pydantic — you'll get a clear error if anything is missing or wrong.


Adding AI providers

  1. Create learnforge/providers/my_provider.py:
from learnforge.providers.base import AIProvider

class MyProvider(AIProvider):
    @property
    def name(self) -> str:
        return "myprovider/model-name"

    def complete(self, prompt: str, system: str = "") -> str:
        # call your API, return a string
        ...
  1. Register it in learnforge/providers/factory.py by adding an if _try("myprovider"): branch.

Project structure

learnforge/
├── exercises/
│   └── machine_learning/
│       ├── 001_mcq.yaml
│       ├── 002_fill_blank.yaml
│       ├── 003_writeup.yaml
│       ├── 004_code.yaml
│       └── tests/
│           └── test_004_binary_search.py
│
├── tests/                        ← LearnForge's own test suite
│   ├── test_schemas.py
│   ├── test_evaluators.py
│   ├── test_loader.py
│   └── test_code_runner.py
│
├── learnforge/
│   ├── cli/          ← typer commands, rich display
│   ├── core/         ← YAML loader, exercise discovery
│   ├── models/       ← Pydantic exercise models
│   ├── evaluators/   ← MCQ, FillBlank, Writeup, Code evaluators
│   ├── providers/    ← Anthropic, OpenAI, Gemini, Ollama
│   ├── runners/      ← subprocess code runner
│   ├── schemas/      ← EvaluationResult schema
│   └── utils/        ← config (pydantic-settings), logging
│
├── .env.example
├── pyproject.toml
└── LICENSE

Running the test suite

pip install -e ".[dev]"
pytest
pytest --cov=learnforge --cov-report=term-missing   # with coverage

Design principles

AI decides meaning, tests decide correctness. Code exercise grading is 100% deterministic. The AI is used only to explain failures to the learner in plain English—it cannot change the grade.

Exercises are data, not code. Authors write YAML. Pydantic validates the schema at load time. This keeps exercise authoring accessible and makes the format easy to parse, version, and share.

Provider-agnostic from day one. The AIProvider interface has a single method: complete(prompt, system) → str. Adding a new provider is adding one file.

No cloud dependency. Everything runs locally. Ollama support means you can evaluate semantically without sending any data to external services.


License

MIT — see LICENSE.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages