Knowledge graph engine for InterSystems IRIS — temporal property graph, vector search, openCypher, graph analytics, and pre-aggregated analytics.
pip install iris-vector-graph # Core: just intersystems-irispython
pip install iris-vector-graph[full] # Full: + FastAPI, GraphQL, numpy, networkx
pip install iris-vector-graph[plaid] # + sklearn for PLAID K-means buildzpm "install iris-vector-graph-core"
Pure ObjectScript — VecIndex, PLAIDSearch, PageRank, Subgraph, GraphIndex, TemporalIndex. No Python. Works on any IRIS 2024.1+, all license tiers.
| Capability | Description |
|---|---|
| Temporal Graph | Bidirectional time-indexed edges — ^KG("tout"/"tin"/"bucket"). O(results) window queries via B-tree traversal. 134K+ edges/sec ingest (RE2-TT benchmark). |
| Pre-aggregated Analytics | ^KG("tagg") per-bucket COUNT/SUM/AVG/MIN/MAX and HLL COUNT DISTINCT. O(1) aggregation queries — 0.085ms for 1-bucket, 0.24ms for 24-hour window. |
| BM25Index | Pure ObjectScript Okapi BM25 lexical search — ^BM25Idx globals, zero SQL tables. Automatic kg_TXT upgrade when "default" index exists. Cypher CALL ivg.bm25.search(name, query, k). 0.3ms median search. |
| VecIndex | RP-tree ANN vector search — pure ObjectScript + $vectorop SIMD. Annoy-style two-means splitting. |
| IVFFlat | Inverted File flat vector index — Python k-means build (sklearn), pure ObjectScript query. Tunable nprobe recall/speed tradeoff. nprobe=nlist → exact search. Cypher CALL ivg.ivf.search(name, vec, k, nprobe). |
| PLAID | Multi-vector retrieval (ColBERT-style) — centroid scoring → candidate gen → exact MaxSim. Single server-side call. |
| HNSW | Native IRIS VECTOR index via kg_KNN_VEC. Sub-2ms search. |
| Edge Embeddings | Semantic search over graph relationships — embed_edges() encodes each (s, p, o_id) triple into kg_EdgeEmbeddings; edge_vector_search() retrieves the most similar edges to a query vector. Snapshot-portable. |
| Cypher | openCypher parser/translator — MATCH, WHERE, RETURN, CREATE, UNION, CASE WHEN, variable-length paths, shortestPath() / allShortestPaths(), CALL subqueries. Bolt 5.4 protocol (TCP + WebSocket) for standard driver connectivity. |
| Graph Analytics | PageRank, WCC, CDLP, PPR-guided subgraph — pure ObjectScript over ^KG globals. |
| FHIR Bridge | ICD-10→MeSH mapping via UMLS for clinical-to-KG integration. |
| GraphQL | Auto-generated schema from knowledge graph labels. |
| Embedded Python | EmbeddedConnection — zero-boilerplate dbapi2 adapter for IRIS Language=python methods. |
import iris
from iris_vector_graph.engine import IRISGraphEngine
conn = iris.connect(hostname='localhost', port=1972, namespace='USER', username='_SYSTEM', password='SYS')
engine = IRISGraphEngine(conn)
engine.initialize_schema()from iris_vector_graph.embedded import EmbeddedConnection
from iris_vector_graph.engine import IRISGraphEngine
engine = IRISGraphEngine(EmbeddedConnection())
engine.initialize_schema()A built-in Cypher server speaks the Bolt protocol, so standard graph tooling (drivers, visualization, LangChain) works out of the box:
IRIS_HOST=localhost IRIS_PORT=1972 IRIS_NAMESPACE=USER \
IRIS_USERNAME=_SYSTEM IRIS_PASSWORD=SYS \
python3 -m uvicorn iris_vector_graph.cypher_api:app --port 8000- Browser —
http://localhost:8000/browser/(force-directed graph visualization) - Bolt TCP —
bolt://localhost:7687(Python/Java/Go/.NET drivers, LangChain, cypher-shell) - HTTP API —
http://localhost:8000/api/cypher(curl, httpie, REST clients)
Store and query time-stamped edges — service calls, events, metrics, log entries — with sub-millisecond window queries and O(1) aggregation.
IVG has two distinct edge APIs that write to different storage and support different query patterns:
create_edge / bulk_create_edges |
create_edge_temporal / bulk_create_edges_temporal |
|
|---|---|---|
| Writes to | Graph_KG.rdf_edges SQL (durability) + ^KG("out",0,...) globals (query, synchronous) |
^KG("tout"/"tin") (time-ordered) + ^KG("out",0,...) (adjacency) |
| Query via | MATCH (a)-[:R]->(b) — immediately visible, no BuildKG() needed |
get_edges_in_window(), get_temporal_aggregate(), temporal Cypher WHERE r.ts >= $start; also visible in MATCH (a)-[:R]->(b) |
| Models | Structural relationship — "A is connected to B" | Event log — "A called B at time T with weight W" |
| Example | (service:auth)-[:DEPENDS_ON]->(service:payment) |
(service:auth)-[:CALLS_AT {ts: 1705000042, weight: 38ms}]->(service:payment) |
Use create_edge when the relationship is a permanent structural fact: schema dependencies, ontology hierarchies, entity co-occurrences, foreign key relationships.
Use create_edge_temporal when the relationship is a time-series event: service calls, metric emissions, log events, cost observations, anything you'll query by time window or aggregate over time.
The same node pair can have both: a structural DEPENDS_ON edge (created once) and thousands of temporal CALLS_AT events (one per call). Both are immediately visible in MATCH (a)-[r]->(b) — no rebuild required.
Deleting an edge:
engine.delete_edge("service:auth", "DEPENDS_ON", "service:payment")
# removes from rdf_edges SQL and kills ^KG("out",0,...) immediatelyNote — bulk ingest:
bulk_create_edgesis optimized for high-volume ingest (535M edges validated) and intentionally skips the per-edge^KGwrite for performance. Edges inserted in bulk are visible toMATCH/BFS only after callingBuildKG()at the end of the ingest session.bulk_create_edges_temporaldoes write^KGimmediately.create_edge(single) always writes immediately.
import time
# Single edge
engine.create_edge_temporal(
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
timestamp=int(time.time()),
weight=42.7, # latency_ms, metric value, or 1.0
)
# Bulk ingest — 134K+ edges/sec (RE2-TT benchmark, 535M edges validated)
edges = [
{"s": "service:auth", "p": "CALLS_AT", "o": "service:payment", "ts": 1712000000, "w": 42.7},
{"s": "service:payment", "p": "CALLS_AT", "o": "db:postgres", "ts": 1712000001, "w": 8.1},
{"s": "service:auth", "p": "EMITS_METRIC_AT","o": "metric:cpu", "ts": 1712000000, "w": 73.2},
]
engine.bulk_create_edges_temporal(edges)now = int(time.time())
# All calls from auth in the last 5 minutes
edges = engine.get_edges_in_window(
source="service:auth",
predicate="CALLS_AT",
start=now - 300,
end=now,
)
# [{"s": "service:auth", "p": "CALLS_AT", "o": "service:payment", "ts": 1712000042, "w": 38.2}, ...]
# Edge velocity — call count in last N seconds (reads pre-aggregated bucket, O(1))
velocity = engine.get_edge_velocity("service:auth", window_seconds=300)
# 847
# Burst detection — which nodes exceeded threshold in last N seconds
bursts = engine.find_burst_nodes(predicate="CALLS_AT", window_seconds=60, threshold=500)
# [{"id": "service:auth", "velocity": 1243}, {"id": "service:checkout", "velocity": 731}]now = int(time.time())
# Average latency for auth→payment calls in the last 5 minutes
avg_latency = engine.get_temporal_aggregate(
source="service:auth",
predicate="CALLS_AT",
metric="avg", # "count" | "sum" | "avg" | "min" | "max"
ts_start=now - 300,
ts_end=now,
)
# 41.3 (float, milliseconds)
# All metrics for count, and extremes
count = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "count", now-300, now)
p_min = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "min", now-300, now)
p_max = engine.get_temporal_aggregate("service:auth", "CALLS_AT", "max", now-300, now)
# GROUP BY source — all services, CALLS_AT, last 5 minutes
groups = engine.get_bucket_groups(predicate="CALLS_AT", ts_start=now-300, ts_end=now)
# [
# {"source": "service:auth", "predicate": "CALLS_AT", "count": 847, "avg": 41.3, "min": 2.1, "max": 312.0},
# {"source": "service:checkout", "predicate": "CALLS_AT", "count": 312, "avg": 28.7, "min": 1.4, "max": 189.0},
# ...
# ]
# COUNT DISTINCT targets — fanout detection (16-register HLL, ~26% error, good for threshold detection)
distinct_targets = engine.get_distinct_count("service:auth", "CALLS_AT", now-3600, now)
# 14 (distinct services called by auth in last hour)# Attach arbitrary attributes to any temporal edge
engine.create_edge_temporal(
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
timestamp=1712000000,
weight=42.7,
attrs={"trace_id": "abc123", "status": 200, "region": "us-east-1"},
)
# Retrieve attributes
attrs = engine.get_edge_attrs(
ts=1712000000,
source="service:auth",
predicate="CALLS_AT",
target="service:payment",
)
# {"trace_id": "abc123", "status": 200, "region": "us-east-1"}# Export temporal edges for a time window
engine.export_temporal_edges_ndjson(
path="traces_2026-04-01.ndjson",
start=1743465600,
end=1743552000,
)
# Import — resume an ingest from a file
engine.import_graph_ndjson("traces_2026-04-01.ndjson")// Ingest
Do ##class(Graph.KG.TemporalIndex).InsertEdge("svc:auth","CALLS_AT","svc:pay",ts,42.7,"")
// Bulk ingest (JSON array)
Set n = ##class(Graph.KG.TemporalIndex).BulkInsert(edgesJSON)
// Query window — returns JSON array
Set result = ##class(Graph.KG.TemporalIndex).QueryWindow("svc:auth","CALLS_AT",tsStart,tsEnd)
// Pre-aggregated average latency
Set avg = ##class(Graph.KG.TemporalIndex).GetAggregate("svc:auth","CALLS_AT","avg",tsStart,tsEnd)
// GROUP BY source
Set groups = ##class(Graph.KG.TemporalIndex).GetBucketGroups("CALLS_AT",tsStart,tsEnd)
// COUNT DISTINCT targets (HLL)
Set n = ##class(Graph.KG.TemporalIndex).GetDistinctCount("svc:auth","CALLS_AT",tsStart,tsEnd)engine.vec_create_index("drugs", 384, "cosine")
engine.vec_insert("drugs", "metformin", embedding_vector)
engine.vec_build("drugs")
results = engine.vec_search("drugs", query_vector, k=5)
# [{"id": "metformin", "score": 0.95}, ...]Inverted File with Flat quantization — Python k-means build, pure ObjectScript query. Tunable nprobe recall/speed tradeoff; nprobe=nlist gives exact results.
# Build: reads kg_NodeEmbeddings, runs MiniBatchKMeans, stores ^IVF globals
result = engine.ivf_build("kg_idx", nlist=256, metric="cosine")
# {"nlist": 256, "indexed": 10000, "dim": 768}
# Search: finds nprobe nearest centroids, scores their cells
results = engine.ivf_search("kg_idx", query_vector, k=10, nprobe=32)
# [("NCIT:C12345", 0.97), ("NCIT:C67890", 0.94), ...]
# Lifecycle
info = engine.ivf_info("kg_idx") # {"nlist":256,"dim":768,"indexed":10000,...}
engine.ivf_drop("kg_idx")Cypher:
CALL ivg.ivf.search('kg_idx', $query_vec, 10, 32) YIELD node, score
RETURN node, score ORDER BY score DESCGlobal storage: ^IVF(name, "cfg"|"centroid"|"list") — independent of ^KG, ^VecIdx, ^PLAID, ^BM25Idx.
Embed every graph triple as a natural-language sentence and search relationships semantically. Useful for retrieving the edges most similar to a free-text query — e.g., "drug strongly associated with autoimmune disease".
engine = IRISGraphEngine(conn, embedding_dimension=768)
engine.initialize_schema()
engine.embed_edges(
text_fn=lambda s, p, o: f"{s} {p.replace('_', ' ')} {o}",
batch_size=500,
)
results = engine.edge_vector_search(
query_embedding=my_encoder.encode("drug associated with autoimmune disease"),
top_k=10,
score_threshold=0.7,
)
for r in results:
print(r["s"], r["p"], r["o_id"], r["score"])embed_edges(model, text_fn, where, batch_size, force, progress_callback) -> dict
| Param | Default | Description |
|---|---|---|
text_fn |
lambda s,p,o: f"{s} {p} {o}" |
Serializes each triple to the string that gets embedded |
where |
None | SQL fragment on (s, p, o_id) to embed a subset — e.g. "p = 'associated_with'" |
force |
False | Re-embed edges already in kg_EdgeEmbeddings |
batch_size |
500 | Edges per batch; commits after each batch |
Returns {"embedded": int, "skipped": int, "errors": int, "total": int}. Restores the original embedder in a finally block.
edge_vector_search(query_embedding, top_k=10, score_threshold=None) -> list[dict]
Returns [{"s": str, "p": str, "o_id": str, "score": float}, ...] sorted descending by cosine similarity. The kg_EdgeEmbeddings table (VECTOR(DOUBLE, {dim}), composite PK on (s, p, o_id)) is included in save_snapshot() / restore_snapshot() — edge embeddings survive a snapshot round-trip without re-embedding.
# Build: Python K-means + ObjectScript inverted index
engine.plaid_build("colbert_idx", docs) # docs = [{"id": "x", "tokens": [[f1,...], ...]}, ...]
# Search: single server-side call, pure $vectorop
results = engine.plaid_search("colbert_idx", query_tokens, k=10)
# [{"id": "doc_3", "score": 0.94}, ...]Finds the minimum-cost path between two nodes using Dijkstra's algorithm. Unlike shortestPath() which minimizes hops, this minimizes the sum of edge weights.
Edge weights come from the numeric value stored in ^KG("out",0,s,p,o) — set automatically when you call create_edge or WriteAdjacency with a weight parameter.
# Store weighted edges
engine.create_node("svc:auth")
engine.create_node("svc:db")
iris_obj = engine._iris_obj()
iris_obj.classMethodVoid("Graph.KG.EdgeScan", "WriteAdjacency",
"svc:auth", "CALLS", "svc:db", "5.2") # weight=5.2ms latency
iris_obj.classMethodVoid("Graph.KG.EdgeScan", "WriteAdjacency",
"svc:auth", "CALLS", "svc:cache", "0.3")
iris_obj.classMethodVoid("Graph.KG.EdgeScan", "WriteAdjacency",
"svc:cache", "CALLS", "svc:db", "0.8")-- Minimum-latency path (prefers cache hop at cost 1.1 over direct at cost 5.2)
CALL ivg.shortestPath.weighted(
'svc:auth', 'svc:db',
'weight',
9999,
10
) YIELD path, totalCost
RETURN path, totalCostReturns:
{
"nodes": ["svc:auth", "svc:cache", "svc:db"],
"rels": ["CALLS", "CALLS"],
"costs": [0.3, 0.8],
"length": 2,
"totalCost": 1.1
}Parameters: (from, to, weightProp, maxCost, maxHops)
| Parameter | Description | Default |
|---|---|---|
from |
Source node ID (string or $param) |
required |
to |
Target node ID | required |
weightProp |
Edge weight property name (currently uses ^KG value) |
"weight" |
maxCost |
Stop searching if cost exceeds this | 9999 |
maxHops |
Maximum path length | 10 |
YIELD columns: path (JSON with nodes/rels/costs/length/totalCost), totalCost (float)
Falls back to unit weight (1.0 per hop = equivalent to BFS) when no weight is stored for an edge.
-- Filter edges by timestamp — routes to ^KG("tout") B-tree, O(results)
MATCH (a)-[r:CALLS_AT]->(b)
WHERE r.ts >= $start AND r.ts <= $end
RETURN r.ts, r.weight
ORDER BY r.ts DESC
-- Temporal + property filter
MATCH (a:Service)-[r:CALLS_AT]->(b)
WHERE r.ts >= $start AND r.ts <= $end
AND r.weight > 1000
RETURN a.id, b.id, r.ts, r.weight
ORDER BY r.weight DESC
-- Inbound direction — routes to ^KG("tin")
MATCH (b:Service)<-[r:CALLS_AT]-(a)
WHERE r.ts >= $start AND r.ts <= $end
RETURN a.id, b.id, r.tsSweet spot: Temporal Cypher is designed for trajectory-style queries (≤~50 edges, ordered output). For aggregation over large windows, use
get_temporal_aggregate()/get_bucket_groups()— these are O(1) pre-aggregated and 400× faster.
-- Named paths
MATCH p = (a:Service)-[r:CALLS]->(b:Service)
WHERE a.id = 'auth'
RETURN p, length(p), nodes(p), relationships(p)
-- Variable-length paths
MATCH (a:Service)-[:CALLS*1..3]->(b:Service)
WHERE a.id = 'auth'
RETURN b.id
-- Shortest path between two nodes (v1.49.0+)
MATCH p = shortestPath((a {id: $from})-[*..8]-(b {id: $to}))
RETURN p, length(p), nodes(p), relationships(p)
-- All shortest paths — returns every minimum-length path
MATCH p = allShortestPaths((a {id: $from})-[*..8]-(b {id: $to}))
RETURN p
-- CASE WHEN
MATCH (n:Service)
RETURN n.id,
CASE WHEN n.calls > 1000 THEN 'high' WHEN n.calls > 100 THEN 'medium' ELSE 'low' END AS load
-- UNION
MATCH (n:ServiceA) RETURN n.id
UNION
MATCH (n:ServiceB) RETURN n.id
-- Vector search in Cypher
CALL ivg.vector.search('Service', 'embedding', [0.1, 0.2, ...], 5) YIELD node, score
RETURN node, scorefrom iris_vector_graph.operators import IRISGraphOperators
ops = IRISGraphOperators(conn)
# Personalized PageRank
scores = ops.kg_PAGERANK(seed_entities=["service:auth"], damping=0.85)
# K-hop subgraph
subgraph = ops.kg_SUBGRAPH(seed_ids=["service:auth"], k_hops=3)
# PPR-guided subgraph (prevents k^n blowup)
guided = ops.kg_PPR_GUIDED_SUBGRAPH(seed_ids=["service:auth"], top_k=50, max_hops=5)
# Community detection
communities = ops.kg_CDLP()
components = ops.kg_WCC()# Load ICD-10→MeSH mappings from UMLS MRCONSO
# python scripts/ingest/load_umls_bridges.py --mrconso /path/to/MRCONSO.RRF
anchors = engine.get_kg_anchors(icd_codes=["J18.0", "E11.9"])
# → ["MeSH:D001996", "MeSH:D003924"] (filtered to nodes in KG)| Global | Purpose |
|---|---|
^KG("out", s, p, o) |
Knowledge graph — outbound edges |
^KG("in", o, p, s) |
Knowledge graph — inbound edges |
^KG("tout", ts, s, p, o) |
Temporal index — outbound, ordered by timestamp |
^KG("tin", ts, o, p, s) |
Temporal index — inbound, ordered by timestamp |
^KG("bucket", bucket, s) |
Pre-aggregated edge count per 5-minute bucket |
^KG("tagg", bucket, s, p, key) |
Pre-aggregated COUNT/SUM/MIN/MAX/HLL per bucket |
^KG("edgeprop", ts, s, p, o, key) |
Rich edge attributes |
^NKG |
Integer-encoded ^KG for Arno acceleration |
^VecIdx |
VecIndex RP-tree ANN |
^PLAID |
PLAID multi-vector |
^BM25Idx |
BM25 lexical search index |
| Table | Purpose |
|---|---|
nodes |
Node registry (node_id PK) |
rdf_edges |
Edges (s, p, o_id) |
rdf_labels |
Node labels (s, label) |
rdf_props |
Node properties (s, key, val) |
kg_NodeEmbeddings |
HNSW vector index (id, emb VECTOR) |
kg_EdgeEmbeddings |
Triple embeddings (s, p, o_id, emb VECTOR) — composite PK |
fhir_bridges |
ICD-10→MeSH clinical code mappings |
| Class | Key Methods |
|---|---|
Graph.KG.TemporalIndex |
InsertEdge, BulkInsert, QueryWindow, GetVelocity, FindBursts, GetAggregate, GetBucketGroups, GetDistinctCount, Purge |
Graph.KG.VecIndex |
Create, InsertJSON, Build, SearchJSON, SearchMultiJSON, InsertBatchJSON |
Graph.KG.PLAIDSearch |
StoreCentroids, BuildInvertedIndex, Search |
Graph.KG.PageRank |
RunJson, PageRankGlobalJson |
Graph.KG.Algorithms |
WCCJson, CDLPJson |
Graph.KG.Subgraph |
SubgraphJson, PPRGuidedJson |
Graph.KG.Traversal |
BuildKG, BuildNKG, BFSFastJson, ShortestPathJson |
Graph.KG.BulkLoader |
BulkLoad (INSERT %NOINDEX %NOCHECK + %BuildIndices) |
Graph.KG.BM25Index |
Build, Search, Insert, Drop, Info, SearchProc (kg_BM25 stored procedure) |
Graph.KG.IVFIndex |
Build, Search, Drop, Info, SearchProc (kg_IVF stored procedure) |
Graph.KG.EdgeScan |
MatchEdges (Graph_KG.MatchEdges stored procedure), WriteAdjacency, DeleteAdjacency |
| Operation | Latency | Dataset |
|---|---|---|
| Temporal edge ingest | 134K edges/sec | RE2-TT 535M edges, Enterprise IRIS |
| Window query (selective) | 0.1ms | O(results), B-tree traversal |
| GetAggregate (1 bucket, 5min) | 0.085ms | 50K-edge dataset |
| GetAggregate (288 buckets, 24hr) | 0.160ms | O(buckets), not O(edges) |
| GetBucketGroups (3 sources, 1hr) | 0.193ms | |
| GetDistinctCount (1 bucket) | 0.101ms | 16-register HLL |
| VecIndex search (1K vecs, 128-dim) | 4ms | RP-tree + $vectorop SIMD |
| HNSW search (143K vecs, 768-dim) | 1.7ms | Native IRIS VECTOR index |
| PLAID search (500 docs, 4 tokens) | ~14ms | Centroid scoring + MaxSim |
| BM25Index search (174 nodes, 3-term) | 0.3ms | Pure ObjectScript $Order posting-list |
| PPR (10K nodes) | 62ms | Pure ObjectScript |
| 1-hop neighbors | 0.3ms | $Order on ^KG |
- Python SDK Reference
- Architecture
- Schema Reference
- Temporal Graph Full Spec
- Setup Guide
- Testing Policy
- feat:
embed_edges(model, text_fn, where, batch_size, force, progress_callback)— embed every(s, p, o_id)triple intokg_EdgeEmbeddings(VECTOR(DOUBLE))(spec 065) - feat:
edge_vector_search(query_embedding, top_k, score_threshold)— cosine similarity search over edge embeddings - feat:
kg_EdgeEmbeddingsadded to schema DDL (CREATE TABLE IF NOT EXISTS, composite PK),get_schema_status()required tables, and snapshot save/restore - Default text serialization:
"{s} {p} {o_id}"— caller-overridable viatext_fn;force=Falseskips already-embedded edges; mirrorsembed_nodesAPI exactly
- feat:
startNode(r)andendNode(r)functions — return source/target node IDs from a relationship variable - feat: Property access on function call results —
startNode(r).id,endNode(r).nameetc - fix:
UNWIND relationships(p) AS r RETURN startNode(r).id, endNode(r).id, type(r)— canonical path unpacking pattern now works
- feat:
engine.save_snapshot(path)— portable.ivgZIP: SQL tables as NDJSON + globals as NDJSON (endian-safe, cross-version) (spec 064) - feat:
IRISGraphEngine.snapshot_info(path)— @staticmethod, no connection needed; metadata header with IRIS version, ivg version, has_vector_sql - feat:
engine.restore_snapshot(path, merge=False)— destructive or additive restore; UPSERT on merge - feat:
engine.get_unembedded_nodes()— find nodes with no embedding after restore - feat:
embed_fnanduse_iris_embeddingparams on IRISGraphEngine.init - feat:
Graph.KG.SnapshotObjectScript class for file I/O helpers - fix: save_snapshot skips IRIS RowID columns (edge_id etc) — prevents non-insertable column errors on restore
- 5 E2E tests: roundtrip, snapshot_info staticmethod, destructive restore, merge restore, globals BFS after restore
- feat:
CALL ivg.shortestPath.weighted(from, to, weightProp, maxCost, maxHops) YIELD path, totalCost— Dijkstra minimum-cost path in pure ObjectScript - Uses edge weights from
^KG("out",0,...)globals (set by create_edge WriteAdjacency) - Falls back to unit weight 1.0 when weightProp not found
- Supports directed ("out") and undirected ("both") traversal
- 4 E2E tests: prefer lower-cost longer path, no path, same source/target, unit weight fallback
- fix: Bug 6 final — SQLCODE -400 on rdf_edges CREATE INDEX now debug-level (ALTER TABLE fallback handles it)
- fix: type(r) now returns edge predicate column (e.p) not node_id
- fix: id(n) now returns actual node_id column
- feat: =~ regex match operator — translates to IRIS %MATCHES
- fix: N-Quads import captures graph URI from quad's 4th element as graph_id
- fix: Bug 6 (final) — SQLCODE -400 on rdf_edges index creation now falls back to ALTER TABLE ADD INDEX; all standard indexes created even when Graph.KG.Edge class was never compiled
- fix: Graph.KG.Edge/TestEdge persistent classes excluded from ObjectScript deploy (fix DDL table ownership conflict — Bug 6)
- fix: conftest removes conflicting .cls before LoadDir
- fix: apoc.meta.data() samples all nodes per label via JOIN on rdf_labels (no longer skips labels with no first-node properties)
- feat: import_rdf/bulk_create_edges/create_edge_temporal/bulk_create_edges_temporal all accept graph= parameter
- feat: USE GRAPH filtering now strict (exact graph_id match, no NULL leakage)
- feat: UNIQUE constraint updated to (s,p,o_id,graph_id) allowing same triple in multiple named graphs
- feat: db.schema.relTypeProperties() returns actual relationship property names
- fix: import_rdf _ensure_node uses WHERE NOT EXISTS (no duplicate key errors)
- fix: import_rdf edge INSERT scoped to graph_id in WHERE NOT EXISTS check
- fix: graph_id column uses %EXACT for case-sensitive storage
- test: 8 E2E tests proving fail-before/pass-after for all 5 FRs (spec 061)
- fix: initialize_schema() idempotent — "already has index" suppressed (Bug 1)
- fix: idx_props_val_ifind (iFind) and idx_edges_confidence (JSON_VALUE) now optional — graceful skip on Community (Bugs 2+3)
- test: 6 new E2E schema init tests covering idempotency, required tables, optional indexes, core procedures (spec 060)
- fix: materialize_inference respects named graphs — inferred triples use correct graph_id (spec 055)
- fix: materialize_inference/retract_inference accept graph= parameter
- feat: Cypher % (modulo → MOD) and ^ (power → POWER) operators (spec 056)
- feat: FOREACH clause —
FOREACH (x IN list | update_clause)(spec 057) - fix: EXISTS { (n)-[r]->(m) } with edge patterns now works; MATCH keyword optional inside EXISTS (spec 058)
- feat: Pattern comprehension
[(a)-[r]->(b) | proj]collecting edge projections (spec 059)
- feat:
engine.materialize_inference(rules="rdfs"|"owl")— transitive subClassOf/subPropertyOf closure, rdf:type inheritance, domain/range, OWL equivalentClass/inverseOf/TransitiveProperty/SymmetricProperty - feat:
engine.retract_inference()— removes all inferred triples, restoring asserted-only graph - feat:
import_rdf(path, infer="rdfs")— runs inference automatically after load - Inferred triples tagged
qualifiers={"inferred":true}for easy exclusion
- feat: Named graphs —
create_edge(graph='name'),list_graphs(),drop_graph(name) - feat:
USE GRAPH 'name' MATCH (a)-[r]->(b)Cypher syntax adds graph_id filter - feat: Schema migration —
graph_idcolumn added tordf_edges(idempotent, run on initialize_schema)
- feat:
engine.import_rdf(path)— load Turtle (.ttl), N-Triples (.nt), N-Quads (.nq) into the graph - Format auto-detected from extension; streaming batch ingest; blank node synthetic IDs; language tags preserved
- feat:
ALL/ANY/NONE/SINGLE(x IN list WHERE ...)list predicate expressions - feat:
[x IN list WHERE pred | proj]list comprehensions - feat:
reduce(acc = init, x IN list | body)reduce expressions - feat:
filter()/extract()legacy list functions as aliases - feat: Arithmetic operators
+,-,*,/in Cypher expressions
- feat:
apoc.meta.data()returns proper schema columns — LangChainNeo4jGraph()connects without error - feat:
apoc.meta.schema()returns schema summary
- feat:
keys(n)returns node property keys via rdf_props subquery - feat:
range(start, end)andrange(start, end, step)generate integer lists - feat:
size(list)uses JSON_ARRAYLENGTH;head(),last(),tail(),isEmpty()implemented
- Fix:
initialize_schema()createsSQLUser.*views automatically — no more manual DEFAULT_SCHEMA workaround - Fix:
initialize_schema()detects pre-compiled ObjectScript classes via%Dictionary— fast 0.2ms PPR path activates correctly instead of falling back to 1800ms Python path
- Fix:
MATCH (a)-[r]->(b)with unbound source falls back tordf_edgesSQL (avoids IRIS SqlProc 32KB string limit for large graphs with 88K+ edges) MatchEdgesis now only used when source node ID is bound — safe path for single-node traversal
- Fix:
bulk_create_edgesnow callsBuildKG()after batch SQL — bulk-inserted static edges immediately visible to MATCH/BFS - Fix:
BuildKG()already uses shard-0^KG("out",0,...)layout (confirmed, no code change needed)
- Unified edge store PR-A —
MATCH (a)-[r]->(b)now returns both static and temporal edges (spec 048) Graph.KG.EdgeScan—MatchEdges(sourceId, predicate, shard)SqlProc scans^KG("out",0,...)globalscreate_edgewrites^KGsynchronously;delete_edge(new) kills^KGentry synchronously- Cypher
MATCH (a)-[r]->(b)routes toMatchEdgesCTE — no SQL JOIN on rdf_edges TemporalIndexand all traversal code updated to shard-0 layout- IVF index fixes:
$vector("double"), JSON float arrays, leading-zero scores,VECTOR(DOUBLE)schema - Parser: negative float literals in list expressions now work
shortestPath()/allShortestPaths()openCypher syntax — fixes parse error reported by mindwalk (spec 047)MATCH p = shortestPath((a {id:$from})-[*..8]-(b {id:$to})) RETURN pnow works end-to-endRETURN p→ JSON{"nodes":[...],"rels":[...],"length":N};RETURN length(p),nodes(p),relationships(p)all supportedallShortestPaths(...)returns all minimum-length paths (diamond graphs return both paths)Graph.KG.Traversal.ShortestPathJson— pure ObjectScript BFS with multi-parent backtracking for all-paths support- Parser fix:
[*..N](dot-dot without leading integer) now parses correctly - Parser fix: bare
--undirected relationship pattern now parses correctly - Translator/engine fix:
CREATEwithout RETURN clause no longer throwsUnboundLocalError
- IVFFlat vector index —
Graph.KG.IVFIndexObjectScript class +^IVFglobals (spec 046) ivf_build(name, nlist, metric, batch_size)— Python MiniBatchKMeans build fromkg_NodeEmbeddings; stores centroids + inverted lists as$vectorin^IVFglobalsivf_search(name, query, k, nprobe)— pure ObjectScript centroid scoring → cell scan → top-k;nprobe=nlistgives exact searchivf_drop(name)/ivf_info(name)— lifecycle managementGraph_KG.kg_IVFSQL stored procedure — enablesJSON_TABLECTE pattern- Cypher
CALL ivg.ivf.search(name, query_vec, k, nprobe) YIELD node, score - Translator fix:
ORDER BY <alias> DESCnow resolves SELECT-level aliases (e.g.count(r) AS deg) withoutUndefinederror cypher_api.py: Bolt TCP/WS sessions use dedicated IRIS connections (_make_engine) to prevent connection contention with HTTP handlers;threading.Lockon shared engine cachetest_bolt_server.py: fixed 2TestBoltSessionHellotests using deprecatedasyncio.get_event_loop().run_until_complete()→asyncio.run()
- Bolt 5.4 protocol server — TCP (port 7687) + WebSocket (port 8000). Standard graph drivers (Python, Java, Go, .NET), LangChain, and visualization tools connect via
bolt:// - Graph browser — bundled at
/browser/with force-directed visualization, schema sidebar,:sysinfo - Cypher HTTP API —
/api/cypher+ Bolt-compatible transactional endpoints. API key auth viaX-API-Key - System procedures —
db.labels(),db.relationshipTypes(),db.schema.visualization(),dbms.queryJmx(),SHOW DATABASES/PROCEDURES/FUNCTIONS - Graph object encoding —
RETURN n, r, mproduces typed Node/Relationship structures for visualization - SQL audit —
FETCH FIRST→TOP,DISTINCT TOPorder, IN clause chunking at 499 - Translator fixes — anonymous nodes, BM25 CTE literals, var-length min-hop, UNION ALL with LIMIT
- Embedding fixes — probe false negative, string model loading
scripts/load_demo_data.py— canonical dataset loader (NCIT + HLA immunology + embeddings + BM25)- 456 tests, 0 skipped
- BM25Index — pure ObjectScript Okapi BM25 lexical search over
^BM25Idxglobals. Zero SQL tables, no Enterprise license required. Graph.KG.BM25Index.Build(name, propsCSV)— indexes all graph nodes by specified text properties; returns{"indexed":N,"avgdl":F,"vocab_size":V}Graph.KG.BM25Index.Search(name, query, k)— Robertson BM25 scoring via$Orderposting-list traversal; returns JSON[{"id":nodeId,"score":S},...]Graph.KG.BM25Index.Insert(name, docId, text)— incremental document add/replace; updates IDF only for new document's terms (O(doc_length))Graph.KG.BM25Index.Drop(name)— O(1) Kill of full indexGraph.KG.BM25Index.Info(name)— returns{"N":N,"avgdl":F,"vocab_size":V}or{}if not found- Python wrappers:
engine.bm25_build(),bm25_search(),bm25_insert(),bm25_drop(),bm25_info() kg_TXTautomatic upgrade:_kg_TXT_fallbackdetects a"default"BM25 index and routes through BM25 instead of LIKE-based fallback- Cypher
CALL ivg.bm25.search(name, $query, k) YIELD node, score— Stage CTE usingGraph_KG.kg_BM25SQL stored procedure - Translator fix:
BM25andPPRCTEs now use own column names in RETURN clause (BM25.nodenotBM25.node_id) - SC-002 benchmark: 0.3ms median search on 174-node community IRIS instance
translate_relationship_pattern: inline property filters on relationship nodes were silently dropped —MATCH (t)-[:R]->(c {id: 'x'})returned all nodes instead of filtering. Fixed by applyingsource_node.propertiesandtarget_node.propertiesafter JOIN construction.vector_search:TO_VECTOR(?, DOUBLE, {dim})now includes explicit dimension in query cast, resolving type mismatch on IRIS 2025.1 when column dimension is known- 2 regression tests added (375 unit tests total)
embedded.py: auto-fixessys.pathshadowing — ensures/usr/irissys/lib/pythonis first so the embeddedirismodule takes priority over pip-installedintersystems_irispythonembedded.py: clear error message when shadowed iris (noiris.sql) is detected, naming the root cause- Documented the XD timeout constraint and embed_daemon pattern for long-running ML operations in embedded context
- 3 new tests covering path-fix and shadowing detection
embed_nodes: FK-safe delete — DELETE failure onkg_NodeEmbeddings(spurious FK error in embedded Python context) is silently ignored; INSERT proceeds correctlyvector_search: usesVECTOR_COSINE(TO_VECTOR(col), ...)so it works on both native VECTOR columns AND VARCHAR-stored vectors (e.g. DocChunk.VectorChunk from fhir-017)
embed_nodes(model, where, text_fn, batch_size, force, progress_callback)— incremental node embedding overGraph_KG.nodeswith SQL WHERE filter, custom text builder, and per-call model override. Unblocks mixed-ontology graphs (embed only KG8 nodes without re-embedding NCIT's 200K nodes).vector_search(table, vector_col, query_embedding, top_k, id_col, return_cols, score_threshold)— search any IRIS VECTOR column, not justkg_NodeEmbeddings. Works on DocChunk tables, RAG corpora, custom HNSW indexes.multi_vector_search(sources, query_embedding, top_k, fusion='rrf')— unified search across multiple IRIS VECTOR tables with RRF fusion. Returnssource_tableper result. Powers hybrid KG+FHIR document search.validate_vector_table(table, vector_col)— returns{dimension, row_count}for any IRIS VECTOR column.
- SQL Table Bridge — map existing IRIS SQL tables as virtual graph nodes/edges with zero data copy
engine.map_sql_table(table, id_column, label)— register any IRIS table as a Cypher-queryable node set; no ETL, no data movementengine.map_sql_relationship(source, predicate, target, target_fk=None, via_table=None)— FK and M:M join relationships traversable via Cypherengine.attach_embeddings_to_table(label, text_columns, force=False)— overlay HNSW vector search on existing table rowsengine.list_table_mappings(),remove_table_mapping(),reload_table_mappings()— mapping lifecycle management- Cypher
MATCH (n:MappedLabel)routes to registered SQL table with WHERE pushdown — O(SQL query), not O(copy) - Mixed queries:
MATCH (p:MappedPatient)-[:HAS_DOC]->(d:NativeDocument)spans both mapped and native nodes seamlessly - SQL mapping wins over native
Graph_KG.nodesrows for the same label (FR-016) TableNotMappedErrorraised with helpful message whenattach_embeddings_to_tableis called on unregistered label
EmbeddedConnectionandEmbeddedCursornow importable directly fromiris_vector_graph(top-level)IRISGraphEngine(iris.sql)— acceptsiris.sqlmodule directly; auto-wraps inEmbeddedConnection(no manual wrapper needed inside IRIS Language=python methods)load_obo(encoding=, encoding_errors='replace')— handles UTF-8 BOM and Latin-1 bytes from IRIS-written files; fixes NCIT.obo loading edge caseload_obo/load_networkxacceptprogress_callback=lambda n_nodes, n_edges: ...— called every 10K items; enables progress reporting for large ontologies (NCIT.obo: 200K+ concepts)- Verified: temporal Cypher (
WHERE r.ts >= $start AND r.ts <= $end) works end-to-end viaEmbeddedConnectionpath
- Cypher temporal edge filtering:
WHERE r.ts >= $start AND r.ts <= $endroutes MATCH patterns to^KG("tout")B-tree — O(results), not O(total edges) r.tsandr.weightaccessible in RETURN and ORDER BY on temporal edges- Inbound direction
(b)<-[r:P]-(a) WHERE r.ts >= $startroutes to^KG("tin") r.tswithout WHERE filter → NULL + query-level warning (prevents accidental full scans)r.weight > exprin WHERE applies as post-filter on temporal result set- Uses IRIS-compatible derived table subquery (not WITH CTE) — works on protocol 65 xDBC
w→weightcanonical field name in temporal CTE (consistent with v1.41.0 API aliases)- Sweet spot: trajectory queries ≤50 edges. For aggregation, use
get_temporal_aggregate().
get_edges_in_window()now returnssource/target/predicate/timestamp/weightaliases alongsides/o/p/ts/w— backward compatibleget_edges_in_window(direction="in")— query inbound edges by target node (uses^KG("tin"))create_edge_temporal(..., upsert=True)andbulk_create_edges_temporal(..., upsert=True)— skip write if edge already exists at that timestamppurge_before(ts)— delete all temporal edges older thants, with^KG("tagg")and^KG("bucket")cleanupGraph.KG.TemporalIndex.PurgeBefore(ts)andQueryWindowInbound(target, predicate, ts_start, ts_end)ObjectScript methods
iris_vector_graph.embedded.EmbeddedConnection— dbapi2 adapter for IRIS Language=python methods- Zero-boilerplate:
IRISGraphEngine(EmbeddedConnection())works inside IRIS identically to externaliris.connect() commit()/rollback()are intentional no-ops (IRIS manages transactions in embedded context)START TRANSACTION/COMMIT/ROLLBACKviacursor.execute()silently dropped (avoids<COMMAND>in wgproto jobs)fetchmany(),rowcount,descriptionfully implemented
- Pre-aggregated temporal analytics:
^KG("tagg")COUNT/SUM/AVG/MIN/MAX at O(1) GetAggregate,GetBucketGroups,GetDistinctCountObjectScript methodsget_temporal_aggregate(),get_bucket_groups(),get_distinct_count()Python wrappers- 16-register HyperLogLog COUNT DISTINCT (SHA1, ~26% error — suitable for fanout threshold detection)
- Benchmark: 134K–157K edges/sec sustained across RE2-TT/RE2-OB/RE1-TT (535M edges total)
- Rich edge properties:
^KG("edgeprop", ts, s, p, o, key)— arbitrary typed attributes per temporal edge get_edge_attrs(),create_edge_temporal(attrs={...})- NDJSON import/export:
import_graph_ndjson(),export_graph_ndjson(),export_temporal_edges_ndjson()
- Temporal property graph:
create_edge_temporal(),bulk_create_edges_temporal() get_edges_in_window(),get_edge_velocity(),find_burst_nodes()^KG("tout"/"tin"/"bucket")globals — bidirectional time-indexed edge storeGraph.KG.TemporalIndexObjectScript class
- UNION / UNION ALL in Cypher
- EXISTS {} subquery predicates
- Variable-length paths:
MATCH (a)-[:REL*1..5]->(b)via BFSFastJson bridge
- CASE WHEN / THEN / ELSE / END in Cypher RETURN and WHERE
- CAST functions:
toInteger(),toFloat(),toString(),toBoolean()
- RDF 1.2 reification API:
reify_edge(),get_reifications(),delete_reification()
- BulkLoader:
INSERT %NOINDEX %NOCHECK+%BuildIndices— 46K rows/sec SQL ingest - RDF 1.2 reification schema DDL
- OBO ontology ingest:
load_obo(),load_networkx()
- Lightweight install — base requires only
intersystems-irispython - Optional extras:
[full],[plaid],[dev],[ml],[visualization],[biodata]
- PLAID multi-vector retrieval —
PLAIDSearch.clspure ObjectScript +$vectorop - PLAID packed token storage: 53
$Order→ 1$Get
- VecIndex nprobe recall fix (counts leaf visits, not branch points)
- Annoy-style two-means tree splitting (fixes degenerate trees)
- Batch APIs:
SearchMultiJSON,InsertBatchJSON
- VecIndex RP-tree ANN
SearchJSON/InsertJSON— eliminated xecute path (250ms → 4ms)
- Arno acceleration wrappers:
khop(),ppr(),random_walk()
^NKGinteger index for Arno acceleration
- FHIR-to-KG bridge:
fhir_bridgestable,get_kg_anchors(), UMLS MRCONSO ingest
- Cypher named path bindings, CALL subqueries, PPR-guided subgraph
License: MIT | Author: Thomas Dyar (thomas.dyar@intersystems.com)