Context
Identified during PR #71 review while implementing duration.method = \"weibull_strat\". The issue surfaced in the duration-fitting discussion but is broader than duration estimation — it affects the formation target stats (edges, nodefactor_*, nodematch_*, concurrent, absdiff_*) that come out of build_netparams() / build_netstats(). Opening as a separate issue per PI request.
The problem
ARTnet is a cross-sectional web-based survey with two non-standard sampling features that neither the legacy univariate fits nor the joint g-computation infrastructure (#61–#63) currently correct for:
1. Length-biased sampling
The survey asks about partners that were active within the past 12 months. For ongoing partnerships, the probability of being in the sample is proportional to partnership duration (longer partnerships intersect the recall window more often). For completed partnerships, a different bias applies: only those that ended within the past 12 months are included, which selectively drops older completed partnerships.
This shows up downstream:
- Respondents with longer main partnerships are overrepresented in the "has a main partner" sample, which biases
md.main upward.
- Mean degree calculations conditional on age/race carry the same bias, potentially heterogeneously.
2. 5-most-recent partner truncation
Respondents are asked about up to 5 most-recent partners per layer. For respondents with more than 5, the excess partners are not observed. Two effects:
- Right truncation of partnership count: true degree is underestimated for high-activity respondents.
- Selection bias on retained partners: the 5 most-recent tend to skew longer (if "most recent" orders by recency of activity rather than start date), further interacting with the length-bias issue above.
Scope: which target stats are affected
| Target stat |
Affected by length-bias |
Affected by 5-truncation |
md.main / md.casl |
yes (upward) |
yes (downward cap on high-degree) |
nf.<attr> (all) |
yes (heterogeneous by attr) |
yes |
concurrent |
yes |
yes (especially) |
nf.deg.{main,casl,tot} |
yes |
yes |
nm.<attr> (all) |
partial (mixing by duration-correlated attrs) |
partial |
absdiff_* |
partial |
partial |
durations (durs.*.byage) |
yes (handled in #63 phase 3 via length-biased Weibull) |
tangential |
Proposed approach (open for discussion)
- Survey the methodology literature — the ARTnet Weiss et al. 2020 paper should note how the univariate approach was designed to be robust to these biases (if at all). Similar literature on egocentric network sampling corrections (Vardi 1989, Asgharian et al. 2002, Krivitsky & Morris 2017 on egocentric inference).
- Length-biased correction for Poisson/binomial fits — the joint Poisson fits in
netparams$<layer>$joint_model are currently standard MLE. A length-biased version would weight observations by 1 / P(obs | duration). Needs partnership-duration covariates on the RHS, which may not be directly available for the ego-level fits.
- Truncation correction — fit a truncated Poisson or zero-inflated alternative for count-of-partners models. Would need the truncation boundary (5) explicitly in the likelihood.
- Reweighting — alternative: reweight observations by inverse probability of inclusion, if we can estimate that probability.
Priority / impact
Unknown until we compare ARTnet-derived estimates against a length-bias-correct baseline. Magnitudes:
Related
Context
Identified during PR #71 review while implementing
duration.method = \"weibull_strat\". The issue surfaced in the duration-fitting discussion but is broader than duration estimation — it affects the formation target stats (edges,nodefactor_*,nodematch_*,concurrent,absdiff_*) that come out ofbuild_netparams()/build_netstats(). Opening as a separate issue per PI request.The problem
ARTnet is a cross-sectional web-based survey with two non-standard sampling features that neither the legacy univariate fits nor the joint g-computation infrastructure (#61–#63) currently correct for:
1. Length-biased sampling
The survey asks about partners that were active within the past 12 months. For ongoing partnerships, the probability of being in the sample is proportional to partnership duration (longer partnerships intersect the recall window more often). For completed partnerships, a different bias applies: only those that ended within the past 12 months are included, which selectively drops older completed partnerships.
This shows up downstream:
md.mainupward.2. 5-most-recent partner truncation
Respondents are asked about up to 5 most-recent partners per layer. For respondents with more than 5, the excess partners are not observed. Two effects:
Scope: which target stats are affected
md.main/md.caslnf.<attr>(all)concurrentnf.deg.{main,casl,tot}nm.<attr>(all)absdiff_*durs.*.byage)Proposed approach (open for discussion)
netparams$<layer>$joint_modelare currently standard MLE. A length-biased version would weight observations by1 / P(obs | duration). Needs partnership-duration covariates on the RHS, which may not be directly available for the ego-level fits.Priority / impact
Unknown until we compare ARTnet-derived estimates against a length-bias-correct baseline. Magnitudes:
Related