AT Search

A decentralized search and discovery layer for AT Protocol objects, built as a Kademlia DHT overlay that routes queries to indexers without replacing AT Protocol's native identity, repo, or PDS model.

Monorepo layout and tooling

This repo is a single workspace: the root package.json lists "workspaces": ["packages/*"]. Bun is the supported tool for installs and scripts (bun install, bun run …); the lockfile is bun.lock. Run installs from the repository root so workspace:* links resolve.

Packages

Directory	Package name	Purpose
`packages/common`	`@atsearch/common`	Shared types, descriptor derivation, Ed25519 signing. `bun run --filter @atsearch/common build` runs `tsc` to `dist/`. `bun run --filter @atsearch/common test` runs Jest.
`packages/indexer`	`@atsearch/indexer`	SQLite index, DHT provider, `GET /pointers/:key`, `GET /record`. Depends on `@atsearch/common` via `workspace:`. `build`* → `tsc`; `dev` → `tsx watch`. `seed` → `tsx src/seed.ts`. The compiled app is run with `node dist/index.js` in production and in `scripts/run-demo.sh` because better-sqlite3 is built for Node’s native ABI (Bun cannot load that binary).
`packages/query-node`	`@atsearch/query-node`	Search API (`/search`, `/resolve`, `/interactions`), Slingshot/Constellation/XRPC hydration, ranking. Depends on `@atsearch/common` via `workspace:`. `build`* / `dev` same pattern as indexer (no better-sqlite3; Bun can run `dist/index.js`).
`packages/demo-client`	`@atsearch/demo-client`	SvelteKit UI. No `workspace:` dependency on `@atsearch/common`; it only calls the query node over HTTP. `vite build`* / `vite dev`.

Root scripts

Defined in the root package.json and meant to be run from the repo root:

Command	Effect
`bun install`	Installs all workspace dependencies and links `workspace:*` packages.
`bun run build`	`bun run --filter '*' build` — runs `build` in every package that defines it (build order follows Bun’s workspace scheduling; `common` should complete before dependents in practice).
`bun run dev`	Runs each package’s `dev` script in parallel (useful less often than running one package’s dev server).
`bun run test`	Runs `test` in each package that defines it (currently @atsearch/common).
`bun run lint`	`tsc --noEmit` / `svelte-check` per package where configured.
`bun run seed`	`bun run --filter @atsearch/indexer seed` — writes seed data into `ATSEARCH_DB_PATH` (default `./data/indexer.db`).
`bun run demo`	`bash scripts/run-demo.sh` — builds packages, seeds, then starts indexer (Node), query node (Bun), and the demo (Bun + Vite). See that script for ports and env.

Work on one package: cd packages/<name> && bun run dev, or from root bun run --filter @atsearch/<name> <script> (e.g. bun run --filter @atsearch/query-node dev).

Architecture

┌───────────────────────────────────────────────────────────────────┐
│  Publisher                                                        │
│  Writes records (com.example.thing) into their AT Proto repo     │
└─────────────────────────┬─────────────────────────────────────────┘
                          │ AT Proto firehose / polling
                          ▼
┌───────────────────────────────────────────────────────────────────┐
│  Indexer Node                                                     │
│  • Ingests com.example.thing records                             │
│  • Derives descriptor keys (type, tag, token, geo)               │
│  • Stores (uri, cid, descriptors) in SQLite                      │
│  • Advertises as DHT provider for each descriptor key            │
│  • Serves GET /pointers/:descriptorKey (signed PointerRecords)   │
└────────────┬──────────────────────────────┬──────────────────────┘
             │ DHT advertise                │ HTTP /pointers/:key
             ▼                              ▼
┌───────────────────┐           ┌───────────────────────────────────┐
│  Kademlia DHT     │           │  Query Node                       │
│  (libp2p kad-dht) │◄─────────►│  • Accepts GET /search?q=…        │
│  descriptorKey    │   find    │  • Derives descriptor keys        │
│    → providers    │  providers│  • Queries DHT for providers      │
└───────────────────┘           │  • Fetches pointer records        │
                                │  • Hydrates uri+cid via Slingshot → XRPC fallback → indexer cache │
                                │  • Constellation for likes/replies │
                                │  • Verifies CID on PDS            │
                                │  • Ranks and returns results      │
                                └────────────────┬──────────────────┘
                                                 │ JSON results
                                                 ▼
                                ┌────────────────────────────────────┐
                                │  Demo Client (SvelteKit + Vite)    │
                                │  • Search input                    │
                                │  • Results, pagination, interactions │
                                │  • Verification status             │
                                │  • Matched descriptors             │
                                └────────────────────────────────────┘

Data flow (conceptual)

See Monorepo layout and tooling for package names and how they are built and run.

Identity model

Why CID alone is insufficient

A CID (Content Identifier) is an immutable hash of a record's content at a specific point in time. Using a CID alone as an identity is incorrect for three reasons:

No location. A CID tells you nothing about where to fetch the record. You need the AT URI to resolve the DID to a PDS and fetch the record.
No ownership. The same content could be published by multiple different accounts under different URIs. Records with identical content (same CID) from different repositories are distinct objects.
No update path. When a record is updated, the CID changes. The AT URI remains stable. Treating CID as identity breaks identity across updates.

Strong reference: `uri + cid`

Every result in AT Search is represented as a strong reference:

uri: at://did:plc:abc/com.example.thing/123
cid: bafyreiXXXXXX

The URI (at://did/collection/rkey) is the durable identity. It tells you who published the record and where.
The CID is the immutable content hash for that specific version of the record.
Together uri + cid uniquely identifies a specific version of a specific record.

The current serving location (PDS endpoint) is derived at query time by resolving the DID — it is not stored as part of identity and can change without invalidating the reference.

Key invariants

Scenario	Behavior
Same CID, different URIs	Never deduplicated — these are different objects
Same URI, new CID	Treated as a new version — both uri+cid pairs are indexed
PDS migration	Identity unchanged — re-resolved from DID at fetch time
Stale DHT entry	Ignored if pointer `expiresAt` is in the past
Duplicate providers	Merged; candidates deduplicated by `uri + cid`

DHT vs AT Protocol roles

Concern	DHT layer	AT Protocol
Record storage	❌ never	✅ authoritative
Record identity	❌ never	✅ AT URI
Content truth	❌ never	✅ CID from PDS
Discovery hints	✅ descriptor → provider	❌
Pointer records	✅ (hint only, verified)	❌
DID resolution	❌	✅ PLC directory / did:web

The DHT is purely a discovery and hinting layer. Every query ultimately verifies results against the live AT Protocol PDS. No record content lives in the DHT.

Descriptor derivation

The indexer normalises several collections (see packages/common / packages/indexer ingest). From a com.example.thing record, descriptors include:

Descriptor type	Example	Source
`type:…`	`type:com.example.thing`	Static
`tag:…`	`tag:mutual-aid`	`record.tags`
`token:…`	`token:fridge`	Tokenized title + description
`geo:…`	`geo:c2`, `geo:c2b2`, `geo:c2b2n`	Geohash prefixes at lengths 2, 4, full

Rules: lowercase, stopwords removed, deduped.

Each descriptor key is hashed as sha256("atsearch:v1:" + key) before being used as a DHT CID.

Pointer record structure

Indexers serve signed pointer records over HTTP at GET /pointers/:descriptorKey.

interface PointerRecord {
  version: 1
  descriptorKey: string        // e.g. "tag:food"
  ref: {
    uri: string                // at://did/collection/rkey
    cid: string                // immutable content hash
  }
  providerPeerId: string       // libp2p peer ID of this indexer
  providerDid?: string         // optional AT Proto DID of this indexer
  indexedAt: string            // ISO timestamp
  expiresAt: string            // ISO timestamp (TTL: 24h default)
  signature: string            // Ed25519 signature (base64), covers all other fields
}

The signature covers all fields except signature itself, serialized as canonically sorted JSON. This allows query nodes to verify that pointer records have not been tampered with in transit.

Ranking

Results are scored as follows:

Condition	Score change
All query tokens match record	+5
Per matching token	+1
Per matching tag	+2
Geohash match	+2 per level
CID verified against PDS	+1
Expired pointer	-3
Fetch failure	-2
CID mismatch (stale pointer)	-2

Results are returned sorted descending by score.

Setup

Prerequisites

Bun 1.0+ (package manager and runtime used in this repo)

Install

bun install

Build

bun run build

Test

bun run test

Running the demo

Quick start (local mode, no AT Proto credentials needed)

# 1. Install dependencies
bun install

# 2. Build all packages
bun run build

# 3. Seed the database with synthetic records
bun run seed

# 4. Start the full stack
bun run demo

scripts/run-demo.sh uses Bun for installs, builds, seeding, the query node, and the demo client. The indexer process is started with node because better-sqlite3 is a native addon built for Node’s ABI; running that package under Bun’s runtime triggers NODE_MODULE_VERSION mismatch errors. Set NODE_BIN if node is not on your PATH.

The script checks that indexer HTTP, indexer DHT, query HTTP, query DHT, and Vite ports are free before starting (so you do not accidentally hit an old query node and get 404 on /interactions). Override ports if needed, for example:

INDEXER_PORT=3011 QUERY_PORT=3012 DEMO_PORT=5180 bun run demo

This will start:

Indexer on http://localhost:3001 (or INDEXER_PORT)
Query node on http://localhost:3002 (or QUERY_PORT)
Demo client on http://localhost:5173 (or DEMO_PORT; the script prints the real URL)

Open the printed URL and try queries like fridge, vancouver, mutual-aid. For a Bluesky / federated profile not present in your local SQLite index, enter a handle (user.bsky.social) or DID (did:plc:…) alone on the search line; the query node resolves it via Slingshot / public XRPC and returns app.bsky.actor.profile/self as a hit.

Deploy publicly (single VM, Docker Compose)

This repo includes a production-ready docker-compose.yml that runs the full stack on one host:

indexer: HTTP :3001, libp2p DHT :8001
query-node: HTTP :3002, libp2p DHT :8002
web: Caddy on :80/:443 serving the demo and proxying /api/* → query-node

Prereqs

Install Docker + Compose plugin
Point a DNS name at the server (A/AAAA record)
Allow inbound 80/tcp + 443/tcp
Optional: allow inbound 8001/tcp + 8002/tcp (if you want DHT reachability from other machines)

Configure environment

On the server (repo root), create .env with at least:

ATSEARCH_PUBLIC_HOSTNAME=atsearch.yourdomain.com

Recommended:

USE_MICROCOSM=true
MICROCOSM_SLINGSHOT_BASE_URL=https://slingshot.microcosm.blue
MICROCOSM_CONSTELLATION_BASE_URL=https://constellation.microcosm.blue
FALLBACK_ATPROTO_XRPC_BASE_URL=https://public.api.bsky.app
APP_USER_AGENT='at-search-demo/0.1 (public; +https://atsearch.yourdomain.com)'

# Optional: stable pointer signatures across restarts
# ATSEARCH_NODE_KEY=ed25519_private_key_hex

Launch

docker compose up -d --build

Then:

UI: https://atsearch.yourdomain.com
API (proxied): https://atsearch.yourdomain.com/api/health

Notes

The demo UI defaults to calling window.location.origin + "/api"; Caddy reverse-proxies that to the query node.
SQLite persists in the atsearch_data Docker volume.
If you do not want DHT ports exposed publicly, remove the 8001:8001 / 8002:8002 mappings from docker-compose.yml.
If you run another app on the same host and want it to call the search API, bind the query node to localhost only and call it at http://127.0.0.1:<port>. (The provided docker-compose.yml binds it to 127.0.0.1:13002.)

Manual start

# Terminal 1 — indexer
cd packages/indexer
ATSEARCH_LIBP2P_LISTEN_HOST=127.0.0.1 \
ATSEARCH_DB_PATH=../../data/indexer.db ATSEARCH_HTTP_PORT=3001 bun run dev

# Terminal 2 — query node
cd packages/query-node
USE_MICROCOSM=true \
MICROCOSM_SLINGSHOT_BASE_URL=https://slingshot.microcosm.blue \
MICROCOSM_CONSTELLATION_BASE_URL=https://constellation.microcosm.blue \
FALLBACK_ATPROTO_XRPC_BASE_URL=https://public.api.bsky.app \
APP_USER_AGENT='at-search-demo/0.1 (manual)' \
ATSEARCH_LIBP2P_LISTEN_HOST=127.0.0.1 \
ATSEARCH_HTTP_PORT=3002 ATSEARCH_INDEXER_URLS=http://localhost:3001 bun run dev

# Terminal 3 — demo client
cd packages/demo-client
# The dev server proxies /api → query-node (see vite.config.ts), so you don't need VITE_QUERY_API_URL.
# If you changed the query node port, set VITE_QUERY_PROXY_TARGET accordingly.
# Example: VITE_QUERY_PROXY_TARGET=http://127.0.0.1:3012 bun run dev -- --port 5180
bun run dev -- --port 5173

With AT Proto credentials (poll mode)

ATSEARCH_MODE=poll \
ATSEARCH_PDS_URL=https://bsky.social \
ATSEARCH_POLL_DIDS=did:plc:yourDid \
ATSEARCH_DB_PATH=./data/indexer.db \
ATSEARCH_HTTP_PORT=3001 \
  bun run --filter @atsearch/indexer dev

Environment variables

Deploy to Render.com

This repo includes a Render Blueprint at render.yaml that provisions:

at-search-web (public) — static demo + /api proxy
at-search-query-node (private) — Fastify API
at-search-indexer (private) — Fastify indexer + SQLite disk

Steps

Push this repo to GitHub/GitLab.
In Render, choose New → Blueprint and select the repository.
Render will detect render.yaml and propose creating the three services.
After deploy completes, open the at-search-web URL; the UI will call /api/* on the same origin.

Notes (Render limitations)

Render services only expose a single HTTP port. The Blueprint sets ATSEARCH_DHT_PORT=0 to disable DHT listeners by default.
The indexer uses an attached disk mounted at /data and stores SQLite at /data/indexer.db.
If you want to run ATSEARCH_MODE=poll, set ATSEARCH_MODE=poll, ATSEARCH_PDS_URL, and ATSEARCH_POLL_DIDS on the at-search-indexer service in Render.

Indexer

Variable	Default	Description
`ATSEARCH_MODE`	`local`	`local`, `poll`, or `jetstream` (jetstream logs a warning; use `MIGRATION_MICROCOSM.md`)
`ATSEARCH_PDS_URL`	`https://bsky.social`	PDS base URL
`ATSEARCH_HANDLE`	—	AT handle (for auth; poll-related tooling if used)
`ATSEARCH_PASSWORD`	—	App password
`ATSEARCH_DHT_BOOTSTRAP`	—	Comma-separated bootstrap multiaddrs
`ATSEARCH_DHT_PORT`	`8001`	libp2p TCP listen port
`ATSEARCH_LIBP2P_LISTEN_HOST`	`127.0.0.1`	IPv4 for libp2p TCP bind (`0.0.0.0` can fail on some hosts with `ERR_NO_VALID_ADDRESSES`)
`ATSEARCH_HTTP_PORT`	`3001`	HTTP server port
`ATSEARCH_DB_PATH`	`./data/indexer.db`	SQLite database path
`ATSEARCH_NODE_KEY`	—	Ed25519 private key hex (persisted signing key)

Query node

Variable	Default	Description
`ATSEARCH_DHT_BOOTSTRAP`	—	Comma-separated bootstrap multiaddrs
`ATSEARCH_DHT_PORT`	`8002`	libp2p TCP listen port
`ATSEARCH_LIBP2P_LISTEN_HOST`	`127.0.0.1`	Same as indexer; libp2p bind address
`ATSEARCH_HTTP_PORT`	`3002`	HTTP server port
`ATSEARCH_INDEXER_URLS`	`http://localhost:3001`	Comma-separated indexer base URLs
`USE_MICROCOSM`	`true` when set; if unset, defaults to on when `MICROCOSM_SLINGSHOT_BASE_URL` is non-empty	Prefer Slingshot for record/identity reads
`MICROCOSM_SLINGSHOT_BASE_URL`	—	e.g. `https://slingshot.microcosm.blue`
`MICROCOSM_CONSTELLATION_BASE_URL`	—	e.g. `https://constellation.microcosm.blue` (for `GET /interactions` / backlinks)
`FALLBACK_ATPROTO_XRPC_BASE_URL`	`https://public.api.bsky.app`	App View / public XRPC when Slingshot does not serve a method
`APP_USER_AGENT`	`at-search-demo/0.1 (local dev)`	User-Agent on outbound Microcosm + XRPC requests

Why Microcosm?

Slingshot is an edge cache for public com.atproto.* reads (records, handles). The query node tries it first, then falls back to direct XRPC and finally the indexer’s SQLite cache (seed / poll data).

Constellation indexes backlinks from firehose-seen records. The demo uses it for likes and replies on Bluesky posts (GET /interactions?subjectUri=…), not for full-text search.

Demo client

Variable	Default	Description
`VITE_QUERY_API_URL`	(unset)	Optional override for query node base URL. If unset, the demo calls `window.location.origin + "/api"` (and in dev, Vite proxies `/api` to the local query node).

Seed data

The seed script (bun run seed) inserts 12 synthetic records covering:

Vancouver community fridge (2 records: original + updated version)
East Van Tool Library (overlapping tags and geo)
Seattle Capitol Hill Fridge (overlapping token fridge)
Toronto Kensington food pantry (overlapping tag food)
London Hackney Bike Repair (different geography)
Berlin Kreuzberg Free Shop (different geography)
Duplicate content record: identical to the Vancouver fridge but different URI (demonstrates CID-alone deduplication is wrong)
Updated Vancouver fridge: same URI as original but new CID (demonstrates versioning)
Stale Community Garden (old createdAt, exercises stale pointer path)
Auckland Grey Lynn Repair Cafe
Sydney Newtown Tool Library (overlapping tag tools)
Vancouver Seed Library (overlapping geo)

Limitations

Firehose full decode not implemented. The firehose emits CAR-encoded blocks. Full record extraction requires @atproto/sync + DAG-CBOR decoding; the prototype logs seen records but falls back to PDS polling for actual content.
DHT → HTTP peer resolution not implemented. In production, DHT findProviders returns multiaddrs that need to be resolved to HTTP endpoints. The prototype uses configured ATSEARCH_INDEXER_URLS directly (DHT discovery supplements but does not replace this in the current implementation).
No DHT record persistence. libp2p kad-dht provider records expire; real indexers would need periodic re-advertisement.
CID computation. AT Protocol CIDs are DAG-CBOR encoded. We cannot recompute them from JSON. Instead, we compare against the CID returned by the PDS via com.atproto.repo.getRecord. This is correct but means verification requires a live PDS.
No authentication for write operations. The indexer's pointer signing keys are ephemeral by default. In production, keys should be persisted and associated with a verified AT Proto DID.
SQLite is single-node. For production, a distributed store or at least replicated SQLite would be needed.
No rate limiting on the query node or indexer HTTP endpoints.

Future work

Full CAR block decoding for firehose ingestion
DHT multiaddr → HTTP endpoint resolution (removes need for out-of-band indexer URL configuration)
DID-anchored indexer identity (indexer signs with DID key)
Geohash range queries (expand geo prefix search)
Lexicon-driven descriptor derivation (support any AT Proto record type)
Server-side search pagination (the demo UI paginates 10 results per page client-side)
Persistent DHT routing tables
HTTPS + TLS for indexer HTTP endpoints
Query node result caching
Prometheus metrics for query latency, DHT lookup time, verification success rate

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
deploy		deploy
lexicons		lexicons
packages		packages
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.npmrc		.npmrc
Dockerfile.demo-client		Dockerfile.demo-client
Dockerfile.indexer		Dockerfile.indexer
Dockerfile.query-node		Dockerfile.query-node
Dockerfile.web-render		Dockerfile.web-render
README.md		README.md
bun.lock		bun.lock
docker-compose.yml		docker-compose.yml
package.json		package.json
pnpm-workspace.yaml		pnpm-workspace.yaml
render.yaml		render.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AT Search

Monorepo layout and tooling

Packages

Root scripts

Architecture

Data flow (conceptual)

Identity model

Why CID alone is insufficient

Strong reference: uri + cid

Key invariants

DHT vs AT Protocol roles

Descriptor derivation

Pointer record structure

Ranking

Setup

Prerequisites

Install

Build

Test

Running the demo

Quick start (local mode, no AT Proto credentials needed)

Deploy publicly (single VM, Docker Compose)

Prereqs

Configure environment

Launch

Notes

Manual start

With AT Proto credentials (poll mode)

Environment variables

Deploy to Render.com

Steps

Notes (Render limitations)

Indexer

Query node

Why Microcosm?

Demo client

Seed data

Limitations

Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Strong reference: `uri + cid`

Packages