SimuMax models are defined with JSON files under configs/models. The model config describes the static architecture that cost, memory, and simulator analysis build on top of.
SimuMax works with three input files together:
- system: machine capability and efficiency data
- strategy: parallelism and runtime policy
- model: architecture description
See also:
Do not start from an empty file unless you have to.
Recommended path:
- Copy the nearest existing JSON from configs/models.
- Keep the original file around as a reference.
- Change only the structural fields that are different.
- Run
perfwith a known-goodstrategyandsystemfirst.
Good starting points:
- dense model: configs/models/llama3-8b.json
- MoE + MLA model: configs/models/deepseekv2.json
{
"model_type": "dense",
"model_name": "my_dense_model",
"hidden_size": 4096,
"head_num": 32,
"kv_head_num": 8,
"head_size": 128,
"intermediate_size": 14336,
"layer_num": 32,
"vocab_size": 128256,
"use_swiglu": true
}For a first dense model adaptation, the usual shortest path is:
- copy
llama3-8b.json - update
model_name - update
layer_num,hidden_size,head_num,kv_head_num,intermediate_size, andvocab_size - update
attention_typeonly if the target model is not standard MHA
At minimum, keep these aligned with the target Megatron or real model:
layer_numhidden_sizehead_numkv_head_numintermediate_sizevocab_sizeattention_typeuse_swiglu
If these are wrong, both timing and memory can drift even when the strategy and system are correct.
Common fields:
model_name: display/debug namelayer_num: number of transformer layershidden_sizehead_numkv_head_numintermediate_sizevocab_sizeuse_swigluattention_type
These fields drive the main dense attention/MLP math and the embedding / LM head shapes.
MoE-related fields:
expert_numtopkmoe_ffn_hidden_sizemoe_shared_expert_intermediate_sizedense_layerscapacitymoe_pad_expert_input_to_capacitygroup_linear_mode
Important note:
dense_layersis the number of dense transformer layers that appear before the MoE layers in the current model layout. It matters for stage-level memory and timing, especially with pipeline parallelism.
MLA-related fields:
attention_type="mla"qk_head_dimqk_pos_emb_head_dimv_head_dimq_lora_rankkv_lora_rank
MoE / MLA checklist:
- MoE users should double-check:
expert_numtopkmoe_ffn_hidden_sizemoe_shared_expert_intermediate_sizedense_layers
- MLA users should double-check:
attention_type="mla"qk_head_dimqk_pos_emb_head_dimv_head_dimq_lora_rankkv_lora_rank
Megatron-style vocabulary padding is modeled with:
make_vocab_size_divisible_bypadded_vocab_sizeorig_vocab_size
This is relevant when aligning perf or simulator output with Megatron real runs.
The model config should describe the same structural shape assumptions as the target Megatron run:
- number of layers
- dense vs MoE layout
- MLA vs MHA
- expert count and top-k
- LoRA ranks for MLA
- vocabulary size and padding behavior
If a real run and a model config disagree on these fields, both timing and memory can drift even when the strategy and system are correct.
- Start from an existing JSON in configs/models.
- Copy the nearest architecture.
- Update only the structural fields that change.
- Pair it with a strategy and system config.
- Run
perffirst, then usesimulate()if you need trace or memory lifecycle evidence.
from simumax.core.config import ModelConfig
model = ModelConfig.init_from_config_file("configs/models/llama3-8b.json")
print(model.layer_num, model.hidden_size, model.model_type)