Description
Right now there's no way to know what's happening on a running nostream relay without connecting directly to the database. No connection counts, no event throughput, no latency data. You're flying blind.
Proposal
Two things:
A GET /metrics endpoint that returns standard Prometheus text format. Operators who already run Prometheus + Grafana can point a scrape config at it and get dashboards immediately with no extra work.
Example output:
# HELP nostream_connections_active Current number of active WebSocket connections
# TYPE nostream_connections_active gauge
nostream_connections_active 42
# HELP nostream_connections_total Total WebSocket connections since process start
# TYPE nostream_connections_total counter
nostream_connections_total 18340
# HELP nostream_events_received_total Events received from clients
# TYPE nostream_events_received_total counter
nostream_events_received_total{kind="1"} 52341
nostream_events_received_total{kind="7"} 12032
# HELP nostream_events_accepted_total Events written to the database
# TYPE nostream_events_accepted_total counter
nostream_events_accepted_total 48210
# HELP nostream_events_rejected_total Events rejected before storage
# TYPE nostream_events_rejected_total counter
nostream_events_rejected_total{reason="rate-limited"} 2891
nostream_events_rejected_total{reason="invalid"} 340
nostream_events_rejected_total{reason="blocked"} 900
# HELP nostream_subscriptions_active Current active REQ subscriptions
# TYPE nostream_subscriptions_active gauge
nostream_subscriptions_active 156
# HELP nostream_eose_duration_seconds Time from REQ received to EOSE sent
# TYPE nostream_eose_duration_seconds histogram
nostream_eose_duration_seconds_bucket{le="0.01"} 18200
nostream_eose_duration_seconds_bucket{le="0.05"} 31200
nostream_eose_duration_seconds_bucket{le="0.1"} 38900
nostream_eose_duration_seconds_bucket{le="0.5"} 42100
nostream_eose_duration_seconds_bucket{le="1"} 42800
nostream_eose_duration_seconds_bucket{le="5"} 42990
nostream_eose_duration_seconds_bucket{le="+Inf"} 43001
nostream_eose_duration_seconds_sum 4312.94
nostream_eose_duration_seconds_count 43001
# HELP nostream_db_query_duration_seconds Database query latency
# TYPE nostream_db_query_duration_seconds histogram
nostream_db_query_duration_seconds_bucket{le="0.005"} 39100
nostream_db_query_duration_seconds_bucket{le="0.01"} 45210
nostream_db_query_duration_seconds_bucket{le="0.025"} 47300
nostream_db_query_duration_seconds_bucket{le="0.05"} 48100
nostream_db_query_duration_seconds_bucket{le="0.1"} 48900
nostream_db_query_duration_seconds_bucket{le="0.5"} 49150
nostream_db_query_duration_seconds_bucket{le="+Inf"} 49200
nostream_db_query_duration_seconds_sum 621.33
nostream_db_query_duration_seconds_count 49200
Also a simple GET /stats page for operators who don't run Grafana , just a server-rendered HTML page using the same Bootstrap template pattern as the existing /, /invoices, and /terms pages.
Both endpoints are disabled by default and opt-in via settings.yaml
How it works
A MetricsStore singleton holds all counters and histograms in memory. Each component calls into it directly:
WebSocketServerAdapter increments connection counters on open/close
WebSocketAdapter tracks active subscription count
EventMessageHandler increments event counters per kind and per rejection reason
SubscribeMessageHandler records EOSE latency
EventRepository records query latency
Plan
I'll split this into smaller PRs:
PR 1 : MetricsStore + settings flag + tests. No routes yet, just the data structure.
PR 2 : GET /metrics route + connection and subscription counters wired up. At this point you can actually curl the endpoint.
PR 3 : Event counters: received/accepted/rejected by kind and reason, wired into EventMessageHandler.
PR 4 : Latency histograms: EOSE duration in the subscribe handler, query duration in the event repository.
PR 5 : GET /stats HTML page for operators who prefer a browser over Prometheus.
PR 6 : Docs: metric reference + a docker-compose example with a Prometheus + Grafana sidecar for operators who want to set up the full stack.
Description
Right now there's no way to know what's happening on a running nostream relay without connecting directly to the database. No connection counts, no event throughput, no latency data. You're flying blind.
Proposal
Two things:
A
GET /metricsendpoint that returns standard Prometheus text format. Operators who already run Prometheus + Grafana can point a scrape config at it and get dashboards immediately with no extra work.Example output:
Also a simple
GET /statspage for operators who don't run Grafana , just a server-rendered HTML page using the same Bootstrap template pattern as the existing/,/invoices, and/termspages.Both endpoints are disabled by default and opt-in via settings.yaml
How it works
A
MetricsStoresingleton holds all counters and histograms in memory. Each component calls into it directly:WebSocketServerAdapterincrements connection counters on open/closeWebSocketAdaptertracks active subscription countEventMessageHandlerincrements event counters per kind and per rejection reasonSubscribeMessageHandlerrecords EOSE latencyEventRepositoryrecords query latencyPlan
I'll split this into smaller PRs:
PR 1 :
MetricsStore+ settings flag + tests. No routes yet, just the data structure.PR 2 :
GET /metricsroute + connection and subscription counters wired up. At this point you can actually curl the endpoint.PR 3 : Event counters: received/accepted/rejected by kind and reason, wired into
EventMessageHandler.PR 4 : Latency histograms: EOSE duration in the subscribe handler, query duration in the event repository.
PR 5 :
GET /statsHTML page for operators who prefer a browser over Prometheus.PR 6 : Docs: metric reference + a docker-compose example with a Prometheus + Grafana sidecar for operators who want to set up the full stack.