BlogEngineering
Engineering

Building a multi-signal fingerprint scorer that doesn't lie

What goes into a probabilistic match? Walk through the weights, the gotchas, and the test harness we use to keep accuracy honest.

DODaniel OkaforPrincipal EngineerApr 2, 2026·11 min read

Fingerprint attribution sounds like dark magic. It is not. It is five fields, a weighted similarity score, and a confidence threshold. The hard part is the discipline: choosing your weights, holding the line on the threshold, and writing the tests that keep both honest.

The five fields that earn their keep

  • IP address — strongest single signal, but mobile-network shared NAT is real. Don't weight it >0.5.
  • User-Agent — fine-grained on browsers, coarse on apps. Useful but noisy.
  • Screen dimensions — device-class signal. Strong when paired with model hints.
  • Timezone — narrows geography without revealing it.
  • Locale — language + region. Cheap to compare, surprisingly discriminating.
export function score(click: Signals, install: Signals): number {
  const ip = click.ip === install.ip ? 1 : 0;
  const ua = uaSimilarity(click.ua, install.ua); // 0..1
  const sc = click.screen === install.screen ? 1 : 0;
  const tz = click.timezone === install.timezone ? 1 : 0;
  const lo = click.locale === install.locale ? 1 : 0;
  return 0.40 * ip + 0.25 * ua + 0.15 * sc + 0.10 * tz + 0.10 * lo;
}

The test harness

We keep two reference datasets: one of known-attributed installs (deterministic match available, fingerprint also computed) and one of known-organic installs (no preceding click). We compute precision and recall every release and refuse to ship a scoring change that regresses either by more than half a percent.

If your fingerprint scorer doesn't have a precision-recall harness, you don't have a scorer — you have a vibe.
Tagged#Engineering#Fingerprinting#Testing