Metadata Over Compute: Visualizing Table History with dlt and Iceberg

March 22, 2026

5 min read

Aman G.

Author

If you’ve been following the Tabular – Databricks news, you’ve probably heard the phrase “metadata over compute.” It sounds like marketing, but it’s a real architectural idea. We use dlt (data load tool) to ingest into Apache Iceberg and watch how table metadata evolves and why it stays independent of the catalog.

The Thesis

Most data platforms tie table metadata to the compute engine. Your schema, partition info, and history live inside Snowflake, BigQuery, or Databricks. Move tables, and you leave that metadata behind.

Apache Iceberg inverts that. The source of truth is the metadata itself: JSON and Avro files that sit next to your data. The catalog (Hive, Glue, Snowflake Polaris, Tabular) is just a pointer: “this table’s metadata lives here.”

The metadata is portable. You can swap catalogs without losing history or lineage.
That’s “metadata over compute” in practice.

Operation → Metadata, in Succession

We use dlt’s filesystem destination with `table_format="iceberg"`. Same pipeline, three runs.

1. Append 5 rows

python

@dlt.resource(table_format="iceberg", table_name="events")
def events_append():
    yield [{"id": 1, "event": "deploy", "ts": "..."}, ...]

pipeline.run(events_append())

Metadata responds:

json

  metadata/ has 2 file(s):
    00000-....metadata.json
    00001-....metadata.json
  current-snapshot-id: 8570791162370934054
  snapshots (history): 1
    [1] append: parent=None  added=5 deleted=0 total_records=5

Each commit creates a new metadata file. Iceberg keeps the old ones.

2. Overwrite 2 rows (id 1 and 2 get new values)

python

@dlt.resource(table_format="iceberg", table_name="events",
              write_disposition="merge", primary_key="id")
def events_overwrite():
    yield [{"id": 1, "event": "deploy-v2", ...}, {"id": 2, "event": "build-v2", ...}, ...]

pipeline.run(events_overwrite())

Metadata responds:

json

  snapshots (history): 3
    [1] append: added=5 deleted=0 total_records=5
    [2] delete: added=0 deleted=5 total_records=0
    [3] append: added=5 deleted=0 total_records=5

Merge is modeled as delete + append. The chain is explicit.

3. Delete 1 row (id == 3)

python

@dlt.resource(table_format="iceberg", table_name="events", write_disposition="replace")
def events_after_delete():
    yield [{"id": 1, ...}, {"id": 2, ...}, {"id": 4, ...}, {"id": 5, ...}]

pipeline.run(events_after_delete())

Metadata responds:

json

  snapshots (history): 4
    ...
    [4] overwrite: added=4 deleted=5 total_records=4

Replace drops the old data and writes only what remains. Final state: 4 rows. The full chain stays in the metadata.

Where Does the Data Live?

The metadata points to the Parquet files. It doesn’t store the column values themselves.

Why This Matters

1.Portability - Metadata is just files. You can move them to another catalog (Polaris, Tabular) and keep the same history and lineage.
2. Auditability - `snapshot-log` and `metadata-log` give a clear record of what changed and when.
3. Time travel - Snapshots form a chain via `parent`. You can query the table as it looked at any past snapshot.
4. Catalog independence - The catalog is only a mapping from table name to metadata location. The same metadata files work with Hive, Glue, Polaris, or Tabular.

Tags: