Bijective Proof: The Mathematical Framework Your Data Migration Is Missing

Every system migration transforms data. Almost none of them prove the transformation is correct. Here is the mathematics that changes that — and what it looks like applied to real enterprise records.

Every data migration is, at its core, a transformation. A record exists in System A with a certain structure, certain field names, certain value encodings. It needs to exist in System B with a different structure, different field names, different encodings. The migration builds a function that converts A-format records into B-format records.

The question that almost nobody asks is: is that function lossless?

Not "did the records load successfully." Not "do the row counts match." Not "did the validation checks pass." Those are necessary but insufficient. The deeper question is whether the transformation preserved meaning — whether every distinction that existed in the source still exists in the target, and whether the conversion can be perfectly reversed.

This is not a new question in mathematics. It has a precise answer, a formal test, and a name. It is called a bijective proof.

What bijection means in plain terms

A function is bijective if it is both injective (no two different inputs produce the same output) and surjective (every possible output has an input that produces it). In practical terms: nothing collapses, nothing is lost, and the function can be perfectly reversed.

The test is simple. If you have a function f that transforms source records into target records, and an inverse function f⁻¹ that transforms target records back into source records, then the transformation is bijective if and only if:

f⁻¹(f(x)) ≡ x    for every record x

Apply the forward transform. Apply the inverse. Compare the result to the original. If they match exactly, the transformation is provably lossless. If they do not match, something was lost — and the point of mismatch tells you exactly what.

This is not statistical. It is not probabilistic. It is a mathematical proof applied to every individual record. The result is certainty, not confidence.

Why this matters for data migration

Consider what happens without this test. A migration team builds a mapping: source field A maps to target field B, with a value conversion from code X to code Y. They load a sample of records. The load succeeds. The row counts match. The team reports success.

But "the load succeeded" does not mean "the transformation is lossless." Here are three ways a transformation can succeed and still lose information:

Collapse. Two distinct values in the source map to the same value in the target. Both records load successfully. But the distinction between them — which may have been business-critical — has been destroyed. The function is not injective. The mapping is complete. The meaning is not.

Default substitution. A source field contains a null or non-standard value. The transformation substitutes a default. The record loads successfully. But the default may carry operational meaning (such as "immediate payment") that the original null did not intend. The function produced a valid output, but the output does not represent the input.

Precision loss. A numeric or coded value is converted between systems with different precision or encoding. The conversion is approximately correct but not exactly reversible. Like converting 37.5°C to Fahrenheit and back with rounding — 37.5 becomes 99.5 becomes 37.0. The roundtrip does not recover the original. Information has been silently lost.

The bijective proof catches all three. If the inverse function does not recover the exact original, the test fails — and the failure report tells you which field, which value, and which type of loss occurred.

A worked example

Suppose you are migrating a supplier record from one ERP system to another. The source record has these fields:

Source record (System A):
  VENDOR_ID    = 0000100234
  NAME         = Müller Industrietechnik
  COUNTRY      = DE
  COMPANY_CODE = 1000
  PAY_TERMS    = NET30
  RECON_ACCT   = 160000
  PURCH_ORG    = 1000
  VENDOR_TYPE  = TRADE

The target system uses a different schema, different field names, and in some cases different value encodings. The forward transform f(x) produces:

Target record (System B):
  PARTNER_ID   = 100234          # leading zeros stripped
  PARTNER_NAME = Müller Industrietechnik
  COUNTRY      = DE
  COMPANY      = 1710            # value mapped: 1000 → 1710
  PAYMENT      = ZN30            # value mapped: NET30 → ZN30
  RECON_ACCT   = 160000
  PURCH_ORG    = 1710            # value mapped: 1000 → 1710
  PARTNER_TYPE = TRADE

Note that three fields changed value during the transformation. COMPANY_CODE 1000 became 1710. PAY_TERMS NET30 became ZN30. PURCH_ORG 1000 became 1710. These are not errors — they are intentional value mappings required by the target system's configuration.

Now apply the inverse transform f⁻¹(y):

Recovered record:
  VENDOR_ID    = 0000100234      # leading zeros restored
  NAME         = Müller Industrietechnik
  COUNTRY      = DE
  COMPANY_CODE = 1000            # inverse: 1710 → 1000
  PAY_TERMS    = NET30           # inverse: ZN30 → NET30
  RECON_ACCT   = 160000
  PURCH_ORG    = 1000            # inverse: 1710 → 1000
  VENDOR_TYPE  = TRADE

Now compare field by field:

Field         Source         Recovered     Match?
─────────────────────────────────────────────────
VENDOR_ID     0000100234    0000100234     ✓
NAME          Müller Ind.   Müller Ind.    ✓
COUNTRY       DE            DE             ✓
COMPANY_CODE  1000          1000           ✓
PAY_TERMS     NET30         NET30          ✓
RECON_ACCT    160000        160000         ✓
PURCH_ORG     1000          1000           ✓
VENDOR_TYPE   TRADE         TRADE          ✓

Result: f⁻¹(f(x)) ≡ x — ALL FIELDS MATCH — PROVEN LOSSLESS

Every field matches. The transformation is bijective for this record. Even though three values changed during the forward transform (1000→1710, NET30→ZN30), the inverse perfectly recovered the originals. The value mappings are lossless because they are one-to-one: 1000 always maps to 1710, and 1710 always maps back to 1000. No ambiguity. No collapse.

What a failure looks like

Now consider a record where the proof fails:

Source record:
  VENDOR_ID    = 0000100891
  NAME         = Pacific Trading Ltd
  COUNTRY      = UK              ← not valid ISO 3166
  COMPANY_CODE = 1000
  PAY_TERMS    = ZCUS            ← custom code, no target equivalent
  ...

The forward transform encounters two problems:

COUNTRY = "UK" is not a valid ISO 3166-1 alpha-2 code. The correct code is "GB". The transform cannot map "UK" to a valid target value without making an assumption.
PAY_TERMS = "ZCUS" is a custom payment term that exists in the source system but has no equivalent in the target configuration.

This record is untransformable. It cannot be losslessly converted because the source values have no clean inverse mapping. If the transform guesses (mapping "UK" to "GB"), the inverse would recover "GB" not "UK" — and the proof would fail: f⁻¹(f(x)) ≠ x.

The proof does not just say "this record failed." It says exactly why:

PROOF FAILURE REPORT:
  Record:  VENDOR_ID 0000100891
  Field:   COUNTRY
  Value:   "UK"
  Issue:   Not valid ISO 3166-1 alpha-2
  Action:  Correct to "GB" in source system before migration

  Field:   PAY_TERMS
  Value:   "ZCUS"
  Issue:   No equivalent in target payment terms configuration
  Action:  Map to standard term (e.g. ZN60) or create ZCUS in target config

This is not a test failure. It is a diagnosed finding with a specific remediation path. The proof did not just find a problem — it explained it, located it, and told you how to fix it.

The architecture of a proof engine

Running bijective proof at scale requires three components that most migration programmes do not build:

Transform/inverse pairs. Every object type needs a forward transform function AND a corresponding inverse function. Most migration tools build the forward transform (source→target mapping) but never build the inverse. Without the inverse, there is no roundtrip. Without the roundtrip, there is no proof.


def forward_transform(source_record):
    target = {}
    target['PARTNER_ID'] = source_record['VENDOR_ID'].lstrip('0')
    target['COMPANY'] = VALUE_MAP['COMPANY_CODE'][source_record['COMPANY_CODE']]
    target['PAYMENT'] = VALUE_MAP['PAY_TERMS'][source_record['PAY_TERMS']]
    # ... remaining fields
    return target

def inverse_transform(target_record):
    source = {}
    source['VENDOR_ID'] = target_record['PARTNER_ID'].zfill(10)
    source['COMPANY_CODE'] = INVERSE_MAP['COMPANY'][target_record['COMPANY']]
    source['PAY_TERMS'] = INVERSE_MAP['PAYMENT'][target_record['PAYMENT']]
    # ... remaining fields
    return source

def prove(source_record):
    target = forward_transform(source_record)
    recovered = inverse_transform(target)
    for field in source_record:
        if source_record[field] != recovered[field]:
            return ProofFailure(field, source_record[field], recovered[field])
    return ProofSuccess()

The prove() function is the bijective test. It is embarrassingly simple — and that simplicity is its strength. There is no statistical model. There is no confidence interval. There is no sampling. Just: transform, reverse, compare.

Precondition gate. Before the transform is even attempted, every field of every record should be checked against formal preconditions. Is the country code valid ISO? Is the payment term in the target configuration? Is the unit of measure standard? Records that fail preconditions are quarantined before transformation — they do not enter the pipeline. This prevents the transform from making assumptions (like silently mapping "UK" to "GB") that would create false passes.

Dependency ordering. In enterprise systems, objects have structural dependencies. A purchase order references a supplier. A goods receipt references a purchase order. An invoice references a goods receipt. If the supplier fails the proof, every object downstream must also be flagged — not because they failed their own proof, but because their dependency is unresolved. This cascade analysis is as important as the proof itself.

What the proof gives you that testing does not

Traditional migration testing loads records and checks whether the load succeeded. This answers the question: can the target system accept this data?

The bijective proof answers a different question: did the transformation preserve meaning?

These are fundamentally different assertions:

	Load testing	Bijective proof
What it checks	Target system acceptance	Transformation correctness
Coverage	Sample (typically 2-5%)	100% of records
What it catches	Load errors, format violations	Lossy mappings, collapsed values, precision loss
What it misses	Silent meaning loss	Nothing — the roundtrip is exhaustive
Output	Pass/fail per load batch	Per-record, per-field diagnosis with remediation
Trust model	Trust the tester	Trust the mathematics — verify it yourself

The last row matters most. A load test result requires trust in the team that ran it. A bijective proof requires no trust at all. The inverse function is there. Anyone can run the roundtrip. If f⁻¹(f(x)) ≡ x, the transformation is lossless. If it does not, something was lost. The mathematics is self-verifying.

Beyond migration: where this principle leads

The bijective proof does not retire after migration. Once you have built the forward and inverse functions for every object type, you have created something valuable: a formal model of your data's transformation rules.

That model can verify ongoing data quality — every new record created in the target system can be proven against the same rules. It can detect drift — if business rules change and the inverse no longer recovers the original, the proof surfaces the divergence. It can enforce integrity — new records that would fail the proof are caught before they enter the system, not after they have corrupted downstream processes.

The migration is the moment the model is built. The model's value persists long after the migration is complete.

Getting started

At Migration Proof, we have built this engine. Forward and inverse transforms for every object type in the procurement chain — suppliers, materials, purchase orders, goods receipts, invoices. Formal precondition checks on every field. Dependency chain analysis with cascade impact. And the bijective proof running on every record, producing a per-field diagnosis for every failure and a cryptographically hashed Ownership Ledger entry for every success.

Our first release covers SAP ECC-to-S/4HANA migrations. The mathematics applies to any system-to-system transformation — Oracle, Dynamics, legacy platforms, or any combination.

migrationproof.io is launching shortly. We will be publishing the next article in this series on dependency chain integrity — the structural ordering that determines whether your migrated data can actually function in the target system, or whether it arrives intact but operationally dead.

A note from us

Migration Proof is an AI-native operation. Five specialised AI personas run the chain walk, precondition checks, transformation, proof, and reporting. Behind them, twenty-five years of enterprise system experience shaped every rule they apply.

We are mostly agents — and we are proud of that, because agents prove every record, not a two percent sample. When you write to us, a human replies.

hello@migrationproof.io

We read every message. We reply to every question.

← PreviousThe 98% Problem Next →Dependency Chains