Package Exports
- @dotdo/iceberg
- @dotdo/iceberg/catalog
- @dotdo/iceberg/wal
Readme
@dotdo/iceberg
Apache Iceberg table format for Cloudflare Workers.
TypeScript implementation of the Apache Iceberg specification for building data lakehouses on Cloudflare R2.
import { MetadataBuilder, SchemaBuilder } from '@dotdo/iceberg'
// Define schema
const schema = new SchemaBuilder()
.addField({ name: 'id', type: 'long', required: true })
.addField({ name: 'name', type: 'string' })
.addField({ name: 'created_at', type: 'timestamptz' })
.build()
// Create table metadata
const metadata = new MetadataBuilder({ location: 's3://bucket/table' })
.setSchema(schema)
.setPartitionSpec([{ sourceId: 3, transform: 'day', fieldId: 1000 }])
.build()Why Iceberg?
You have data that grows. Users need to query it efficiently. You want time-travel.
Traditional approaches:
- Raw JSON files - No schema evolution. No efficient queries. No time-travel.
- SQLite/PGLite - Single-file limits. No column pruning. Backups are snapshots.
- External data warehouses - Latency. Cost. Vendor lock-in.
Iceberg gives you:
- Schema evolution - Add columns, rename fields, widen types
- Time-travel queries - Query any historical snapshot
- Partition pruning - Skip irrelevant data files
- Column pruning - Read only the columns you need
- ACID transactions - Concurrent writes without corruption
What This Package Provides
This is the type system and builder layer for Iceberg tables. It provides:
| Module | Purpose |
|---|---|
| Types | Complete TypeScript types for Iceberg spec v2 |
| Schema | Schema builder with evolution and validation |
| Metadata | Table metadata builder with snapshot management |
| Manifest | Manifest and manifest list builders |
| Catalog | Catalog interface for table discovery |
| WAL | WAL-to-Iceberg transformation utilities |
This package does NOT include:
- Parquet file reading/writing (use
@dotdo/parquet) - Query execution (use
@dotdo/pg-lake) - Storage backends (use R2 directly or via
@dotdo/pg-lake)
Installation
npm install @dotdo/icebergQuick Start
Building a Schema
import { SchemaBuilder } from '@dotdo/iceberg'
const schema = new SchemaBuilder()
.addField({ name: 'id', type: 'long', required: true })
.addField({ name: 'name', type: 'string' })
.addField({ name: 'email', type: 'string' })
.addField({ name: 'metadata', type: { type: 'map', keyType: 'string', valueType: 'string' } })
.addField({ name: 'tags', type: { type: 'list', elementType: 'string' } })
.addField({ name: 'created_at', type: 'timestamptz', required: true })
.build()Schema Evolution
import { SchemaEvolution } from '@dotdo/iceberg'
const evolution = new SchemaEvolution(existingSchema)
// Add new columns
evolution.addColumn({ name: 'status', type: 'string' })
// Rename columns (readers with old schema still work)
evolution.renameColumn('email', 'email_address')
// Widen types (int -> long, float -> double)
evolution.updateColumnType('count', 'long')
const newSchema = evolution.apply()Creating Table Metadata
import { MetadataBuilder, createPartitionSpec, createSortOrder } from '@dotdo/iceberg'
const metadata = new MetadataBuilder({
location: 's3://my-bucket/warehouse/events',
tableUuid: crypto.randomUUID(),
})
.setSchema(schema)
.setPartitionSpec([
{ sourceId: 6, transform: 'day', fieldId: 1000, name: 'created_day' }
])
.setSortOrder([
{ sourceId: 1, direction: 'asc', nullOrder: 'nulls-last', transform: 'identity' }
])
.setProperties({
'write.format.default': 'parquet',
'write.parquet.compression-codec': 'zstd',
})
.build()Building Manifests
import { ManifestBuilder, ManifestListBuilder } from '@dotdo/iceberg'
// Build a manifest for data files
const manifest = new ManifestBuilder({ schema, partitionSpec })
.addEntry({
status: 'added',
dataFile: {
content: 'data',
filePath: 'data/part-00000.parquet',
fileFormat: 'PARQUET',
recordCount: 10000,
fileSizeInBytes: 1024 * 1024,
}
})
.build()
// Build a manifest list for a snapshot
const manifestList = new ManifestListBuilder({ snapshotId: 1n })
.addManifest({
manifestPath: 'metadata/manifest-1.avro',
manifestLength: 4096,
partitionSpecId: 0,
addedFilesCount: 1,
addedRowsCount: 10000,
})
.build()WAL to Iceberg
Transform PostgreSQL WAL records into Iceberg format for time-travel queries:
import { walRecordToRow, WAL_TABLE_SCHEMA } from '@dotdo/iceberg/wal'
const walRecord = {
lsn: 12345678n,
operation: 'INSERT',
schema: 'public',
table: 'users',
newRow: { id: 1, name: 'Alice' },
timestamp: Date.now(),
doId: 'do-abc123',
}
// Convert to Iceberg row format
const row = walRecordToRow(walRecord)
// WAL_TABLE_SCHEMA is the Iceberg schema for WAL tables
// Use it to create a metadata builder for WAL storageAPI Reference
Schema Module
import {
SchemaBuilder, // Build schemas incrementally
SchemaEvolution, // Evolve existing schemas
validateSchema, // Validate schema correctness
findFieldById, // Find field by ID
findFieldByName, // Find field by name
} from '@dotdo/iceberg'Types Module
import type {
// Core types
IcebergSchema,
IcebergField,
IcebergType,
IcebergPrimitiveType,
// Partitioning
IcebergPartitionSpec,
IcebergPartitionField,
IcebergTransform,
// Sorting
IcebergSortOrder,
IcebergSortField,
// Snapshots
IcebergSnapshot,
IcebergSnapshotSummary,
IcebergTableMetadata,
// Manifests
IcebergManifestFile,
IcebergManifestEntry,
IcebergDataFile,
} from '@dotdo/iceberg'Metadata Module
import {
MetadataBuilder, // Build table metadata
generateMetadataFileName, // Generate v123.metadata.json names
generateManifestFileName, // Generate manifest file names
generateDataFileName, // Generate data file names
} from '@dotdo/iceberg'Catalog Module
import {
BaseCatalog, // Abstract catalog base class
type ICatalog, // Catalog interface
type TableIdentifier, // { namespace: string[], name: string }
type NamespaceIdentifier, // string[]
// Commit operations
createAppendCommit, // Create append commit request
assertRefSnapshotId, // Requirement for optimistic locking
} from '@dotdo/iceberg'Supported Types
| Primitive | Description |
|---|---|
boolean |
True or false |
int |
32-bit signed integer |
long |
64-bit signed integer |
float |
32-bit IEEE 754 floating point |
double |
64-bit IEEE 754 floating point |
decimal(P,S) |
Arbitrary-precision decimal |
date |
Calendar date |
time |
Time of day (microsecond precision) |
timestamp |
Timestamp without timezone |
timestamptz |
Timestamp with timezone |
string |
UTF-8 string |
uuid |
UUID |
fixed(L) |
Fixed-length byte array |
binary |
Variable-length byte array |
| Nested | Description |
|---|---|
struct |
Tuple of typed fields |
list |
Collection of elements |
map |
Key-value pairs |
Supported Transforms
| Transform | Description | Example |
|---|---|---|
identity |
Value unchanged | identity |
bucket[N] |
Hash mod N | bucket[16] |
truncate[W] |
Truncate to width | truncate[10] |
year |
Extract year | year |
month |
Extract year-month | month |
day |
Extract year-month-day | day |
hour |
Extract year-month-day-hour | hour |
void |
Always null | void |
Part of the postgres.do Ecosystem
| Package | Description |
|---|---|
@dotdo/pg-lake |
Data lakehouse on R2 |
@dotdo/parquet |
Parquet file support |
@dotdo/postgres |
PostgreSQL server |
Links
License
MIT