Skip to content

Split generated R/aaa-auto.R into per-category R/aaa-<cat>.R files#2621

Open
schochastics wants to merge 3 commits intomainfrom
categories
Open

Split generated R/aaa-auto.R into per-category R/aaa-<cat>.R files#2621
schochastics wants to merge 3 commits intomainfrom
categories

Conversation

@schochastics
Copy link
Copy Markdown
Contributor

Summary

Stimulus generates a single ~14,800-line R/aaa-auto.R containing every C igraph wrapper. This PR introduces a categorization layer that splits that monolithic output into 26 per-category files (R/aaa-basicigraph.R, R/aaa-cliques.R, …, R/aaa-visitors.R) so navigating the generated wrappers aligns with how igraph groups functions in its reference manual. Subcategories appear inside each file as banner comments.

Why do this? Today a developer grepping for bfs_impl lands in the middle of a 14.8k-line file with no navigational cues; afterwards they land at the top of R/aaa-visitors.R under the # ==== breadth-first-search ==== banner.

The split happens as a post-processing step on the stimulus output; stimulus itself is unchanged (it doesn't support multi-file output natively). A new tools/aaa-categories.yaml is the single source of truth for which function goes where, and two new tools keep everything reconciled.

What's in the diff

Change Purpose
tools/aaa-categories.yaml (new) Authoritative map: category → subcategory → list of igraph_* C functions. 491 entries across 26 categories, covering every R_igraph_* symbol .Call()'d in the generated wrappers.
tools/rebuild-cats.R (new) Reconciles the YAML against whatever R/aaa-*.R files are present. Idempotent; fails loudly if an ungrouped function appears in the generated wrappers.
tools/split-aaa-auto.R (new) Parses the stimulus output, looks up each _impl wrapper's category, and writes one file per category with subcategory banners. Preserves each wrapper's source byte-for-byte.
Makefile-cigraph Stimulus now writes to .build/aaa-auto.R (ignored), and the split script produces the in-repo R/aaa-<cat>.R files. New phony target r_wrappers covers the full pipeline.
R/aaa-auto.RR/aaa-<cat>.R × 26 The actual split output. All existing .Call() semantics unchanged — it is a purely organizational change.
.gitignore / .Rbuildignore Ignore .build/.

The closure-normalization rule

Nine .Call() targets in the generated wrappers end in _closure (e.g. R_igraph_bfs_closure). These are R-binding helpers defined in src/rcallback.c that wrap an underlying C function with SEXP-callback support — they are not standalone C library functions. rebuild-cats.R encodes the 9-entry whitelist and maps them back to their semantic names (e.g. igraph_bfs_closureigraph_bfs) so each wrapper lands where a reader would expect. R_igraph_transitive_closure is not affected — there "closure" is a graph-theory term, not a wrapper suffix.

Categorization highlights

The initial YAML layout mirrored igraph's legacy docbook sections. Several cleanups were applied:

  • Retired the undocumented category — all 8 entries moved to real homes:
    • igraph_residual_graph, igraph_reverse_residual_graphflows/maximum-flows
    • igraph_hrg_sample_manyhrg/hrg-sampling
    • igraph_has_attribute_table, igraph_finalizernongraph/internal
    • igraph_eigen_adjacencystructural/spectral-properties
    • igraph_eigen_matrix, igraph_eigen_matrix_symmetric, igraph_solve_lsapnongraph/linear-algebra (new subcategory)
  • Typo/case fixes: regular-structre-generatorsregular-structure-generators; Sparsifierssparsifiers; motifs/uncategorizedmotifs/graph-census.
  • Semantic relocations: igraph_transitive_closure and igraph_transitive_closure_dag moved from structural/graph-componentsoperators/miscellaneous-operators (they produce a derived graph, not component analysis).
  • Split oversized buckets:
    • structural/shortest-path-related-functions (34 entries) → distances-and-metrics (22) + shortest-paths (12).
    • structural/other-operations (11) → matrix-representations (5) + mutual-edges (3) + summary-statistics (3).

Developer workflow

After a stimulus upgrade or new igraph C function landing upstream:

make -f Makefile-cigraph r_wrappers   # regenerates the split files
Rscript tools/rebuild-cats.R          # validates/updates the categories YAML

The second step fails loudly with the exact names that need adding if aaa-categories.yaml drifts from the generated wrappers.

Validation performed

  • All 26 R/aaa-*.R files parse cleanly.
  • 490 _impl wrappers distributed across the files, zero duplicates.
  • 491 unique R_igraph_* symbols preserved (the 491st being R_igraph_finalizer, which appears in every impl's on.exit but has no wrapper of its own).
  • tools/rebuild-cats.R produces byte-identical output on re-run (idempotent).
  • tools/split-aaa-auto.R produces byte-identical output on re-run from the same source (idempotent).

Test plan

  • devtools::load_all(".") succeeds
  • R CMD check / CI passes
  • make -f Makefile-cigraph r_wrappers round-trips cleanly on a machine with the stimulus venv
  • Spot-check that at least one wrapper from each of the 26 category files still behaves correctly (e.g. the existing testthat suite covers igraph_* wrappers broadly, so a green test run is the main check)

cc @maelle for review — this is purely an organizational/tooling change; no behavior should change, but the restructuring is substantial so a second pair of eyes on the categorization choices would be welcome.

🤖 Generated with Claude Code

schochastics and others added 2 commits April 23, 2026 14:34
Stimulus generates one monolithic R/aaa-auto.R (~14.8k lines) covering
every C igraph wrapper. This commit introduces a categorization layer that
splits the generated output into 26 per-category files matching how the
functions are grouped in the igraph reference manual, with subcategory
banner comments inside each file.

- tools/aaa-categories.yaml: authoritative category -> subcategory -> fn
  mapping, reconciled against every R_igraph_* symbol .Call()'d from the
  generated wrappers (491 entries; 8 closure wrappers mapped back to their
  underlying C functions via the src/rcallback.c whitelist)
- tools/rebuild-cats.R: idempotent reconciliation tool; fails loudly if
  new functions appear in the generated wrappers without a categorization
- tools/split-aaa-auto.R: post-processes stimulus output into R/aaa-<cat>.R
- Makefile-cigraph: stimulus now writes to .build/aaa-auto.R (ignored), the
  split script produces the in-repo R/ files. Phony target r_wrappers
  covers the full pipeline
@maelle
Copy link
Copy Markdown
Contributor

maelle commented Apr 30, 2026

A developper grepping for a function name?! Like, not using IDE navigation?!

Anyway awesome, I'll review this now, thanks a ton!

Copy link
Copy Markdown
Contributor

@maelle maelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! A few comments 😄

Comment thread tools/aaa-categories.yaml
@@ -0,0 +1,669 @@
# Functions ordered by category
basicigraph:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
basicigraph:
basic-igraph:

Comment thread tools/aaa-categories.yaml
Comment thread Makefile-cigraph

# R files that are generated/copied

RGEN = R/aaa-auto.R src/rinterface.c \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we add the post-processing step to this makefile so that generating the functions remain a single call?

Comment thread Makefile-cigraph
-t $(vendored_srcdir)/interfaces/types.yaml \
-t tools/stimulus/types-RR.yaml \
-l RR
Rscript tools/split-aaa-auto.R .build/aaa-auto.R
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaah ok, this was wrong in the PR description

Comment thread R/aaa-cycles.R
callback
) {
# Argument checks
ensure_igraph(graph)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no specific opinion on categories but before we validate this (and split the test file), could you please add some stats to the PR thread: min/median/max number of lines per aaa file?

aaa-structural.R was 160 functions / 5,237 lines — too unwieldy for IDE
navigation. Promote three natural sub-clusters to top-level categories,
shrinking aaa-structural.R to 84 functions / 2,417 lines:

  - aaa-paths.R       (38 fns) — distances, shortest paths, widest paths
  - aaa-centrality.R  (30 fns) — centrality measures + centralization
  - aaa-trees.R        (8 fns) — spanning trees and tree unfolding

Implementation: tools/rebuild-cats.R gains a `category_moves` mechanism
that relocates whole (cat, sub) groups to a new top-level on the
flattened table. The structural/trees subcategory is renamed to
spanning-trees-and-forests for clarity inside the new trees category.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@schochastics
Copy link
Copy Markdown
Contributor Author

A developper grepping for a function name?! Like, not using IDE navigation?!

Anyway awesome, I'll review this now, thanks a ton!

Claude knows that I dont use IDE navigation I guess 😆

@schochastics
Copy link
Copy Markdown
Contributor Author

aaa-structural.R 84 2,417
aaa-paths.R 38 1,544
aaa-games.R 36 1,145
aaa-generators.R 35 881
aaa-centrality.R 30 1,071
aaa-layout.R 28 1,000
aaa-operators.R 24 529
aaa-basicigraph.R 23 534
aaa-cliques.R 21 536
aaa-community.R 21 731
aaa-flows.R 21 714
aaa-isomorphism.R 21 957
aaa-foreign.R 18 436
aaa-cycles.R 13 389
aaa-bipartite.R 10 286
aaa-hrg.R 10 245
aaa-nongraph.R 10 258
aaa-motifs.R 9 221
aaa-trees.R 8 211
aaa-coloring.R 5 115
aaa-processes.R 5 199
aaa-separators.R 5 92
aaa-visitors.R 5 316
aaa-embedding.R 3 104
aaa-graphlets.R 3 106
aaa-error.R 1 20
aaa-progress.R 1 22
aaa-spatial.R 1 20
aaa-status.R 1 20
total 490 15,119

cc @maelle. Not very even but I guess that is hard to achieve with a good categorization anyway 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants