Split generated R/aaa-auto.R into per-category R/aaa-<cat>.R files#2621
Split generated R/aaa-auto.R into per-category R/aaa-<cat>.R files#2621schochastics wants to merge 3 commits intomainfrom
Conversation
Stimulus generates one monolithic R/aaa-auto.R (~14.8k lines) covering every C igraph wrapper. This commit introduces a categorization layer that splits the generated output into 26 per-category files matching how the functions are grouped in the igraph reference manual, with subcategory banner comments inside each file. - tools/aaa-categories.yaml: authoritative category -> subcategory -> fn mapping, reconciled against every R_igraph_* symbol .Call()'d from the generated wrappers (491 entries; 8 closure wrappers mapped back to their underlying C functions via the src/rcallback.c whitelist) - tools/rebuild-cats.R: idempotent reconciliation tool; fails loudly if new functions appear in the generated wrappers without a categorization - tools/split-aaa-auto.R: post-processes stimulus output into R/aaa-<cat>.R - Makefile-cigraph: stimulus now writes to .build/aaa-auto.R (ignored), the split script produces the in-repo R/ files. Phony target r_wrappers covers the full pipeline
|
A developper grepping for a function name?! Like, not using IDE navigation?! Anyway awesome, I'll review this now, thanks a ton! |
maelle
left a comment
There was a problem hiding this comment.
Thank you! A few comments 😄
| @@ -0,0 +1,669 @@ | |||
| # Functions ordered by category | |||
| basicigraph: | |||
There was a problem hiding this comment.
| basicigraph: | |
| basic-igraph: |
|
|
||
| # R files that are generated/copied | ||
|
|
||
| RGEN = R/aaa-auto.R src/rinterface.c \ |
There was a problem hiding this comment.
why can't we add the post-processing step to this makefile so that generating the functions remain a single call?
| -t $(vendored_srcdir)/interfaces/types.yaml \ | ||
| -t tools/stimulus/types-RR.yaml \ | ||
| -l RR | ||
| Rscript tools/split-aaa-auto.R .build/aaa-auto.R |
There was a problem hiding this comment.
aaah ok, this was wrong in the PR description
| callback | ||
| ) { | ||
| # Argument checks | ||
| ensure_igraph(graph) |
There was a problem hiding this comment.
I have no specific opinion on categories but before we validate this (and split the test file), could you please add some stats to the PR thread: min/median/max number of lines per aaa file?
aaa-structural.R was 160 functions / 5,237 lines — too unwieldy for IDE navigation. Promote three natural sub-clusters to top-level categories, shrinking aaa-structural.R to 84 functions / 2,417 lines: - aaa-paths.R (38 fns) — distances, shortest paths, widest paths - aaa-centrality.R (30 fns) — centrality measures + centralization - aaa-trees.R (8 fns) — spanning trees and tree unfolding Implementation: tools/rebuild-cats.R gains a `category_moves` mechanism that relocates whole (cat, sub) groups to a new top-level on the flattened table. The structural/trees subcategory is renamed to spanning-trees-and-forests for clarity inside the new trees category. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Claude knows that I dont use IDE navigation I guess 😆 |
cc @maelle. Not very even but I guess that is hard to achieve with a good categorization anyway 🤷 |
Summary
Stimulus generates a single ~14,800-line
R/aaa-auto.Rcontaining every C igraph wrapper. This PR introduces a categorization layer that splits that monolithic output into 26 per-category files (R/aaa-basicigraph.R,R/aaa-cliques.R, …,R/aaa-visitors.R) so navigating the generated wrappers aligns with how igraph groups functions in its reference manual. Subcategories appear inside each file as banner comments.Why do this? Today a developer grepping for
bfs_impllands in the middle of a 14.8k-line file with no navigational cues; afterwards they land at the top ofR/aaa-visitors.Runder the# ==== breadth-first-search ====banner.The split happens as a post-processing step on the stimulus output; stimulus itself is unchanged (it doesn't support multi-file output natively). A new
tools/aaa-categories.yamlis the single source of truth for which function goes where, and two new tools keep everything reconciled.What's in the diff
tools/aaa-categories.yaml(new)igraph_*C functions. 491 entries across 26 categories, covering everyR_igraph_*symbol.Call()'d in the generated wrappers.tools/rebuild-cats.R(new)tools/split-aaa-auto.R(new)_implwrapper's category, and writes one file per category with subcategory banners. Preserves each wrapper's source byte-for-byte.Makefile-cigraph.build/aaa-auto.R(ignored), and the split script produces the in-repoR/aaa-<cat>.Rfiles. New phony targetr_wrapperscovers the full pipeline.R/aaa-auto.R→R/aaa-<cat>.R× 26.Call()semantics unchanged — it is a purely organizational change..gitignore/.Rbuildignore.build/.The closure-normalization rule
Nine
.Call()targets in the generated wrappers end in_closure(e.g.R_igraph_bfs_closure). These are R-binding helpers defined in src/rcallback.c that wrap an underlying C function with SEXP-callback support — they are not standalone C library functions.rebuild-cats.Rencodes the 9-entry whitelist and maps them back to their semantic names (e.g.igraph_bfs_closure→igraph_bfs) so each wrapper lands where a reader would expect.R_igraph_transitive_closureis not affected — there "closure" is a graph-theory term, not a wrapper suffix.Categorization highlights
The initial YAML layout mirrored igraph's legacy docbook sections. Several cleanups were applied:
undocumentedcategory — all 8 entries moved to real homes:igraph_residual_graph,igraph_reverse_residual_graph→flows/maximum-flowsigraph_hrg_sample_many→hrg/hrg-samplingigraph_has_attribute_table,igraph_finalizer→nongraph/internaligraph_eigen_adjacency→structural/spectral-propertiesigraph_eigen_matrix,igraph_eigen_matrix_symmetric,igraph_solve_lsap→nongraph/linear-algebra(new subcategory)regular-structre-generators→regular-structure-generators;Sparsifiers→sparsifiers;motifs/uncategorized→motifs/graph-census.igraph_transitive_closureandigraph_transitive_closure_dagmoved fromstructural/graph-components→operators/miscellaneous-operators(they produce a derived graph, not component analysis).structural/shortest-path-related-functions(34 entries) →distances-and-metrics(22) +shortest-paths(12).structural/other-operations(11) →matrix-representations(5) +mutual-edges(3) +summary-statistics(3).Developer workflow
After a stimulus upgrade or new igraph C function landing upstream:
The second step fails loudly with the exact names that need adding if
aaa-categories.yamldrifts from the generated wrappers.Validation performed
R/aaa-*.Rfiles parse cleanly._implwrappers distributed across the files, zero duplicates.R_igraph_*symbols preserved (the 491st beingR_igraph_finalizer, which appears in every impl'son.exitbut has no wrapper of its own).tools/rebuild-cats.Rproduces byte-identical output on re-run (idempotent).tools/split-aaa-auto.Rproduces byte-identical output on re-run from the same source (idempotent).Test plan
devtools::load_all(".")succeedsR CMD check/ CI passesmake -f Makefile-cigraph r_wrappersround-trips cleanly on a machine with the stimulus venvigraph_*wrappers broadly, so a green test run is the main check)cc @maelle for review — this is purely an organizational/tooling change; no behavior should change, but the restructuring is substantial so a second pair of eyes on the categorization choices would be welcome.
🤖 Generated with Claude Code