Length-biased sampling and 5-partnership truncation bias in formation target stats

## Context

Identified during PR #71 review while implementing `duration.method = \"weibull_strat\"`. The issue surfaced in the duration-fitting discussion but is *broader* than duration estimation — it affects the formation target stats (`edges`, `nodefactor_*`, `nodematch_*`, `concurrent`, `absdiff_*`) that come out of `build_netparams()` / `build_netstats()`. Opening as a separate issue per PI request.

## The problem

ARTnet is a **cross-sectional web-based survey** with two non-standard sampling features that neither the legacy univariate fits nor the joint g-computation infrastructure (#61–#63) currently correct for:

### 1. Length-biased sampling

The survey asks about partners that were active within the past 12 months. For **ongoing** partnerships, the probability of being in the sample is proportional to partnership duration (longer partnerships intersect the recall window more often). For **completed** partnerships, a different bias applies: only those that ended within the past 12 months are included, which selectively drops older completed partnerships.

This shows up downstream:
- Respondents with longer main partnerships are overrepresented in the \"has a main partner\" sample, which biases `md.main` upward.
- Mean degree calculations conditional on age/race carry the same bias, potentially heterogeneously.

### 2. 5-most-recent partner truncation

Respondents are asked about up to 5 most-recent partners per layer. For respondents with more than 5, the excess partners are not observed. Two effects:

- **Right truncation of partnership count**: true degree is underestimated for high-activity respondents.
- **Selection bias on retained partners**: the 5 most-recent tend to skew longer (if \"most recent\" orders by recency of activity rather than start date), further interacting with the length-bias issue above.

## Scope: which target stats are affected

| Target stat | Affected by length-bias | Affected by 5-truncation |
|---|---|---|
| `md.main` / `md.casl` | yes (upward) | yes (downward cap on high-degree) |
| `nf.<attr>` (all) | yes (heterogeneous by attr) | yes |
| `concurrent` | yes | yes (especially) |
| `nf.deg.{main,casl,tot}` | yes | yes |
| `nm.<attr>` (all) | partial (mixing by duration-correlated attrs) | partial |
| `absdiff_*` | partial | partial |
| durations (`durs.*.byage`) | **yes (handled in #63 phase 3 via length-biased Weibull)** | tangential |

## Proposed approach (open for discussion)

1. **Survey the methodology literature** — the ARTnet Weiss et al. 2020 paper should note how the univariate approach was designed to be robust to these biases (if at all). Similar literature on egocentric network sampling corrections (Vardi 1989, Asgharian et al. 2002, Krivitsky & Morris 2017 on egocentric inference).
2. **Length-biased correction for Poisson/binomial fits** — the joint Poisson fits in `netparams$<layer>$joint_model` are currently standard MLE. A length-biased version would weight observations by `1 / P(obs | duration)`. Needs partnership-duration covariates on the RHS, which may not be directly available for the ego-level fits.
3. **Truncation correction** — fit a truncated Poisson or zero-inflated alternative for count-of-partners models. Would need the truncation boundary (5) explicitly in the likelihood.
4. **Reweighting** — alternative: reweight observations by inverse probability of inclusion, if we can estimate that probability.

## Priority / impact

Unknown until we compare ARTnet-derived estimates against a length-bias-correct baseline. Magnitudes:
- On duration estimation (already addressed in #63 phase 3): under naive Weibull the bias was catastrophic (mean.dur estimates off by ~1000x in heavily-censored strata).
- On formation stats: likely more modest but still material, especially for attributes correlated with partnership duration.

## Related

- Blocked by: completion of #63 (establishes the joint-fit infrastructure that sampling corrections would modify).
- Informs: any model projection onto a target population whose joint attribute distribution differs from ARTnet's (NHBS MSM, AMIS 2022-24 projection) — the sampling corrections would matter more in those settings.
- Discussion: see PR #71 thread, comment chain starting \"why the weibull model failed\".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Length-biased sampling and 5-partnership truncation bias in formation target stats #72

Context

The problem

1. Length-biased sampling

2. 5-most-recent partner truncation

Scope: which target stats are affected

Proposed approach (open for discussion)

Priority / impact

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Target stat	Affected by length-bias	Affected by 5-truncation
`md.main` / `md.casl`	yes (upward)	yes (downward cap on high-degree)
`nf.<attr>` (all)	yes (heterogeneous by attr)	yes
`concurrent`	yes	yes (especially)
`nf.deg.{main,casl,tot}`	yes	yes
`nm.<attr>` (all)	partial (mixing by duration-correlated attrs)	partial
`absdiff_*`	partial	partial
durations (`durs.*.byage`)	yes (handled in #63 phase 3 via length-biased Weibull)	tangential

Length-biased sampling and 5-partnership truncation bias in formation target stats #72

Description

Context

The problem

1. Length-biased sampling

2. 5-most-recent partner truncation

Scope: which target stats are affected

Proposed approach (open for discussion)

Priority / impact

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions