First Enrichment: How to Store Source and TimestampWithout Breaking Your Profile

Table of Contents

Last week, we built the Enrichment Brain:

inputs → enrich → normalize → identity resolve → score → action

Now we go deeper.

Architecture alone is not enough.

If your system overwrites data, you lose truth.

And once truth is lost, scoring becomes guesswork.


The Real Problem

Most enrichment systems store data like this:

 
{
“name”: “Daniel Petrov”,
“address”: “Sofia”
}
 

Then a new vendor says:

 
{
“name”: “Dan Petrov”,
“address”: “Plovdiv”
}
 

What happens?

Many systems overwrite.

Now you lost:

  • Which source said what

  • When it was observed

  • How confident it was

That is not engineering.

That is silent corruption.


Evidence-First Architecture Pattern

Instead of storing “final truth”, store claims with proof.

Pattern:

 
Field

Value

Evidence object
– source
– timestamp
– confidence
– match_rule
 

Your /enrich endpoint must return:

  • profile (grouped fields)

  • evidence (per value)

  • conflicts (explicit)

  • score

  • reasons

  • next_actions

Evidence is not optional.

It is the foundation.


Example Scenario

User signs up with:

  • Name: Daniel Petrov

  • Phone: +359888123456

Vendors return:

Phone API:

  • Name: Daniel Petrov (0.82)

  • Address: Sofia (0.90)

Email API:

  • Name: Dan Petrov (0.76)

  • Address: Plovdiv (0.60)

Court API:

  • Possible match (0.45)

Now instead of overwriting, store this:


Example Evidence Model

 
{
“profile”: {
“names”: [“Daniel Petrov”, “Dan Petrov”],
“addresses”: [“Sofia”, “Plovdiv”],
“conflicts”: [
{
“field”: “address”,
“values”: [“Sofia”, “Plovdiv”]
}
]
},
“evidence”: [
{
“field”: “name”,
“value”: “Daniel Petrov”,
“source”: “phone_api”,
“confidence”: 0.82,
“timestamp”: “2025-04-24”
},
{
“field”: “address”,
“value”: “Plovdiv”,
“source”: “email_api”,
“confidence”: 0.60,
“timestamp”: “2025-04-24”
}
],
“score”: 68,
“reasons”: [
“Address conflict across sources”,
“Low-confidence court candidate”
],
“next_actions”: [“manual_review”]
}
 

Now the system is explainable.


Decision Logic with Conflicts

Evidence-first scoring rules:

  • Strong agreement across sources → lower risk

  • Conflicts → risk penalty

  • Low-confidence claims → weighted lightly

  • Recent data → weighted higher

Important:

Do not hide conflict resolution inside normalization.

Conflicts are signals.


What Can Go Wrong

1. Silent Overwrite

Database column stores only one address.

Second address replaces first.

Signal lost forever.


2. Array Collapse

System stores arrays during enrichment.

Later pipeline step converts array into single value.

Conflict disappears.


3. No Timestamp

Old court record treated equal to fresh data.

Risk score becomes inaccurate.


4. Confidence Ignored

All values treated equal.

Low-confidence guess becomes strong signal.


Engineering Checklist

If building evidence-first enrichment, ensure:

Profile Design

  • Arrays for every identity field

  • Separate conflict object

  • No destructive updates

Evidence Object Must Include

  • source

  • timestamp

  • confidence

  • match_rule (how it was derived)

Storage Layer

  • Never overwrite raw evidence

  • Version score calculation rules

  • Log score inputs

Scoring

  • Penalize conflicts

  • Boost cross-source agreement

  • Decay old signals

Debugging

  • Reconstruct decision months later

If you cannot replay the decision, your evidence model is incomplete.


How to Start

If you are testing enrichment APIs, use:

Quick start page:
https://espysys.com/irbis-api-quickstart-15-min/

Full tutorial:
https://espysys.com/api-tutorial/

But remember:

Vendor output is raw input.

Your enrichment layer must store proof.


Next Step

This week:

  1. Audit your database schema.

  2. Check which fields overwrite.

  3. Add evidence objects per field.

  4. Add conflict detection rules.

  5. Add timestamp weighting in scoring.

Do not add new vendors.

Fix structure first.

.

More Articles

Skip to content