LIVE ·  ENTITIES · 22 SOURCES · METHODOLOGY v1.2 · REVISED 2026-05-31 STATUS ↗
Technical documentation · May 2026

How Yard Registry Works

Yard Registry is a structured intelligence pipeline — raw public listing data flows through ingestion, deduplication, entity resolution, enrichment, and scoring layers to produce a clean, scored, actionable business graph across Jamaica's 14 parishes.

01 Data Ingestion

Yard Registry currently ingests from 17 institutional Jamaican sources: 12 are named publicly as anchor sources on the landing page, and 5 additional regulators, registries and industry bodies also contribute records. Every source is checked against robots.txt before any fetch begins. Only six fields are ever stored per record: name, category, phone, address/parish, website URL, and source attribution. See the full source index for individual dossiers.

Public anchor sources (12 of 17, named on landing page)

Additional institutional sources (5 of 17, methodology-only)

Non-negotiable rules: robots.txt is checked before every fetch. 403 or 429 responses cause immediate termination with no retry. Only the six listed fields are ever stored. No full-page scraping, no cookies, no tracking.

02 Normalisation

Before any record is stored, the raw business name passes through a normalisation pipeline: lowercasing, punctuation removal, common suffix stripping (Ltd, Jamaica, Co., etc.), and whitespace collapsing. This produces a normalized_name field used by all downstream deduplication logic.

03 Entity Resolution & Deduplication

Raw listing records map to canonical business entities — the deduplicated, merged representation of a real-world business. Entity resolution uses two strategies:

Pairs scoring ≥ 0.92 similarity are auto-merged. Pairs scoring 0.70–0.92 are queued for human review. Each canonical entity receives a stable Yard Registry identifier of the form JBIP-{PARISH_CODE}-{SEQ:06d} (e.g. JBIP-KGN-000042).

04 Enrichment

After entity resolution, each entity goes through the enrichment pipeline. Three enrichers run:

Enrichment is batched in a background sweep running every 15 minutes. Force-re-enrichment is available via the CLI for entities where signals may have changed.

05 Digital Maturity Scoring

Each enriched entity receives a digital maturity tier based on its online presence profile:

TierLabelCriteria
AStrong presenceLive website + social + phone verified
BGood presenceLive website + at least one social signal
CPartial presenceSocial only or website with no social
DMinimal presenceWhatsApp only or listing-only presence
FNo online presenceNo detectable website, social, or WhatsApp

06 Opportunity Scoring

The opportunity score (0–100) measures the readiness of a business for digital services. Higher scores indicate higher outreach value. Components:

FactorPointsLogic
No website+40Primary signal — business has no web presence
Phone verified+20Phone number is listed and appears valid
Category demand+15Category has historically high web conversion
Parish market size+10Parish has larger addressable SMB market
Recency+15Business was recently active/verified

Scores map to tiers:

Scores are recalculated daily at 06:00 JMT by the background scheduler.

07 Geo Intelligence

Businesses with lat/lon coordinates (from OSM, JTB, or manual entry) are indexed into H3 hexagonal cells at resolution 9, 7, and 5. Hex-cell aggregates (density, average score, website adoption rate) are recomputed weekly and exposed via the /geo API endpoints and the dashboard Map view.

08 Data accuracy disclaimer

Business data is aggregated from public sources and may contain errors, outdated listings, or duplicates that escaped the deduplication pipeline. Yard Registry data is intended as a starting point for market research and lead generation, not as a definitive source of record. We recommend verifying contact details before outreach.

Businesses wishing to be removed from the registry can email [email protected]. Removal requests are processed within 14 business days.

← Home Privacy Policy Terms of Service API Docs ↗