SGEMM Optimization

A compact CUDA SGEMM learning project that walks from a readable baseline kernel to Tensor Core WMMA, with cuBLAS verification and a CMake-first build.

What makes it useful

One optimization ladder: naive -> tiled -> bank-conflict-free -> double-buffer -> Tensor Core.
Comparable kernel interfaces: every FP32 kernel uses the same (A, B, C, M, K, N, stream) launcher shape.
Verification-first harness: kernel output is checked against cuBLAS with separate tolerances for FP32 and Tensor Core paths.
Learning-oriented docs: GitHub Pages carries the full walkthrough instead of duplicating it in the README.

Quick start

git clone https://github.com/LessUp/sgemm-optimization.git
cd sgemm-optimization

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
./build/bin/sgemm_benchmark -a
ctest --test-dir build

Runtime tests and benchmarks require a CUDA-capable local machine. Hosted CI is limited to compile-time, formatting, repository-structure, OpenSpec, and Pages checks.

Start here

Goal	Entry point
Use the project site	GitHub Pages
Build and run once	Getting Started
Follow the kernel ladder	Learning Path
Inspect the source layout	Architecture
Read the normative specs	Specifications

Source map

src/kernels/   CUDA SGEMM implementations
src/utils/     CUDA RAII, verification, benchmark helpers
src/main.cu    benchmark CLI
tests/         Google Test coverage against cuBLAS
docs/          learning documentation mirrored on Pages
openspec/      stable specs and change workflow

License

MIT. See LICENSE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.claude		.claude
.githooks		.githooks
.github		.github
.vscode		.vscode
_includes		_includes
_sass/custom		_sass/custom
assets/js		assets/js
benchmarks/data		benchmarks/data
build		build
docs		docs
openspec		openspec
scripts		scripts
src		src
tests		tests
zh		zh
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
_config.yml		_config.yml
index.md		index.md
specs.md		specs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGEMM Optimization

What makes it useful

Quick start

Start here

Source map

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SGEMM Optimization

What makes it useful

Quick start

Start here

Source map

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages