Methodology

How We Verify Economic Relationships

Every relationship in RedGraphs is traceable to primary evidence and subject to governed verification.

For the data schema, relationship taxonomy, and temporal model, see Business Relationship Analytics.

Infrastructure Design

Built for Long-Horizon Coverage and Accuracy

An institutional-grade economic relationship graph requires solving entity resolution, temporal normalization, evidence binding, and human validation simultaneously, across millions of documents, in multiple jurisdictions, over decades of filings.

Compounding Validation

Each relationship that passes governed human review improves entity resolution, strengthens extraction models, and enriches the graph structure. The system becomes more accurate with each relationship added.

Long-Horizon Historical Coverage

The Economic Relationship Graph covers filings from 2005 to the present, enabling point-in-time analysis and temporal comparisons across two decades of economic relationships.

Evidence-Bound Architecture

Every relationship traces to a specific sentence in a specific document. This provenance chain is maintained through extraction, validation, correction, and publication through a purpose-built governance layer.

Multi-Jurisdiction Normalization

Different regulatory regimes produce different filing formats, disclosure requirements, and fiscal calendars. RedGraphs normalizes these into a single coherent graph through jurisdiction-specific processing pipelines.

01

Verification Lifecycle

Relationships progress through a defined lifecycle with clear state transitions and audit requirements.

CANDIDATE

Extracted from source document. Pending entity resolution.

RESOLVED

Entities matched to canonical records. Awaiting verification.

VERIFIED

Reviewed and confirmed. Evidence citation validated.

PUBLISHED

Available in graph with full provenance chain.

02

Extraction Classification

Every relationship is classified by how it was derived from source evidence. Classification determines confidence ceiling and audit requirements.

The classification below describes RedGraphs' internal validation methodology. For source data, the statusType field distinguishes between ACTUAL (extracted from disclosure) and ESTIMATED (computed) values.

Extracted

Direct verbatim or near-verbatim match from source document. Value stated explicitly with no transformation.

Tier: Highest

Derived

Calculated from one or more extracted values using documented formula. Derivation logic is auditable.

Tier: Medium

Inferred

Relationship implied by contextual evidence but not explicitly stated. Lower confidence ceiling.

Tier: Lowest

Classification Rules

  • Relationships default to the lowest-confidence classification supported by evidence
  • Extraction method is immutable once assigned; changes require new relationship creation
03

Auditability

Every state change is logged with timestamp, actor, and justification. The full history of any relationship is retrievable.

{
  "audit_trail": [
    {
      "timestamp": "2024-01-15T09:12:00Z",
      "action": "CANDIDATE_CREATED",
      "actor": "extraction_pipeline_v3",
      "details": {
        "source_document": "doc_p0o9i8u7y6t5",
        "confidence": 0.87
      }
    },
    {
      "timestamp": "2024-01-15T11:45:00Z",
      "action": "ENTITIES_RESOLVED",
      "actor": "resolution_service",
      "details": {
        "from_entity": "ent_a1b2c3d4e5f6",
        "to_entity": "ent_z9y8x7w6v5u4"
      }
    },
    {
      "timestamp": "2024-01-15T14:32:00Z",
      "action": "VERIFIED",
      "actor": "analyst_review_queue",
      "details": {
        "reviewer": "reviewer_8h7g6f5e",
        "verification_type": "institutional"
      }
    },
    {
      "timestamp": "2024-01-15T14:33:00Z",
      "action": "PUBLISHED",
      "actor": "publication_service",
      "details": {
        "graph_version": "v2024.01.15.001"
      }
    }
  ]
}

Immutable History

Audit records are append-only. Historical states cannot be modified or deleted.

Attribution

Every action is attributed to a specific actor, whether automated system or human reviewer.

Confidence Model

Confidence scores are bounded by evidence tier. A relationship supported only by lower-tier sources cannot exceed the ceiling for that tier regardless of other factors.

Tier-Based Ceilings

Confidence cannot exceed the ceiling of the highest-quality evidence supporting the relationship. Multiple lower-tier sources do not elevate confidence above tier ceiling.

Decay and Refresh

Confidence decays over time without corroborating evidence. Relationships without recent evidence are flagged for review.

04

Evidence Hierarchy

Source evidence is ranked by reliability. The evidence tier determines the confidence ceiling for any relationship it supports.

Tier Source Type Confidence Level
1 Regulatory filings (SEC, FCA, corporate registries) Highest
2 Audited financial statements High
3 Corporate disclosures (press releases, investor presentations) Medium
4 Third-party databases (with citation) Lower

For the temporal model and point-in-time reconstruction rules, see Business Relationship Analytics: Temporal Model.

05

Estimating Unknown Values

Public disclosures are incomplete by nature. Companies are required to report material relationships, but the monetary values of many disclosed relationships are not stated explicitly. A graph with missing edge weights is structurally accurate but analytically limited.

RedGraphs addresses this through a constrained estimation framework. Where a relationship is established but its monetary value is not disclosed, the system infers the missing value using observed constraints: total revenue, total cost, and the values of other relationships attached to the same entity. The estimation preserves observed totals and respects disclosed values as fixed inputs.

The approach uses constrained probabilistic inversion and matrix completion techniques. Known values anchor the estimate. Unknown values are inferred to be consistent with the observed financial structure of each entity. The result is a fully weighted graph suitable for downstream analysis: concentration metrics, exposure modeling, and systemic risk assessment.

Estimated values are labeled explicitly in the data model (statusType: ESTIMATED) and are never conflated with extracted values. Consumers can filter by status to use only disclosed figures or include estimates for analytical completeness.

This estimation framework uses our patented constrained estimation methods (US 2013/0103553 A1).

06

Data Integrity

The following properties are enforced by system architecture. They are not policies — they are structural constraints of the data pipeline.

Evidence Binding

Every published relationship has at least one evidence record linking to a specific sentence in a specific document.

Audit Coverage

Every state transition is logged with timestamp, actor, and justification. Audit records are append-only and cannot be modified.

Non-Destructive Corrections

Corrections are applied as overlays. The original extraction, reviewer actions, and data lineage are preserved in full.

Confidence Bounds

Confidence scores are bounded by evidence tier. A relationship supported only by lower-tier sources cannot exceed the ceiling for that tier.

Temporal Integrity

The dual-date model prevents lookahead contamination. Networks reflect only information that was available at the specified point in time.

Estimation Transparency

Estimated values are labeled explicitly (statusType: ESTIMATED) and are never conflated with values extracted from disclosures.