Package Exports
- arrow-supercluster
Readme
arrow-supercluster
A spatial clustering engine for Apache Arrow tables. Reimplements the Supercluster algorithm to work directly with Arrow columnar memory — no GeoJSON serialization, no intermediate JS objects.
Why
Supercluster expects GeoJSON in and produces GeoJSON out. If your data is already in Arrow format (e.g. loaded from GeoParquet), that means:
- Iterating the Arrow table to build GeoJSON features
- Supercluster internally converts those back to flat arrays
getClusters()builds new GeoJSON Feature objects on every call
This library skips all of that. It reads coordinate buffers directly from the Arrow geometry column and outputs typed arrays (Float64Array, Uint32Array, Uint8Array) ready for any rendering pipeline.
Install
# pnpm
pnpm add arrow-supercluster apache-arrow
# npm
npm install arrow-supercluster apache-arrow
# yarn
yarn add arrow-supercluster apache-arrowapache-arrow is a peer dependency — you control the version (>=14 supported).
Usage
import { ArrowClusterEngine } from "arrow-supercluster";
import type { Table } from "apache-arrow";
// `table` is an Arrow Table with a GeoArrow Point geometry column
// (FixedSizeList[2] of Float64 — the standard encoding for point data)
const engine = new ArrowClusterEngine({
radius: 75, // cluster radius in pixels (default: 40)
maxZoom: 16, // max zoom level to cluster (default: 16)
minZoom: 0, // min zoom level to cluster (default: 0)
minPoints: 2, // minimum points to form a cluster (default: 2)
});
engine.load(table, "geometry");
// Query clusters for a bounding box and zoom level
const output = engine.getClusters([-180, -85, 180, 85], 4);
// output.positions — Float64Array [lng0, lat0, lng1, lat1, ...]
// output.pointCounts — Uint32Array [count0, count1, ...]
// output.ids — Float64Array [id0, id1, ...]
// output.isCluster — Uint8Array [1, 0, 1, ...] (1 = cluster, 0 = point)
// output.length — numberAPI
new ArrowClusterEngine(options?)
| Option | Type | Default | Description |
|---|---|---|---|
radius |
number |
40 |
Cluster radius in pixels |
extent |
number |
512 |
Tile extent (radius is relative to this) |
minZoom |
number |
0 |
Minimum zoom level for clustering |
maxZoom |
number |
16 |
Maximum zoom level for clustering |
minPoints |
number |
2 |
Minimum points to form a cluster |
engine.load(table, geometryColumn?, idColumn?)
Index an Arrow Table. The geometry column must be GeoArrow Point encoding (FixedSizeList[2] of Float64). Single-chunk tables use a zero-copy fast path.
engine.getClusters(bbox, zoom) → ClusterOutput
Query clusters within a bounding box [minLng, minLat, maxLng, maxLat] at the given zoom level. Returns typed arrays — no object allocation per result.
The returned arrays are views into reusable internal buffers. They're valid until the next getClusters() call. Copy them if you need to retain the data.
engine.getChildren(clusterId) → ClusterOutput
Get the immediate children of a cluster.
engine.getLeaves(clusterId, limit?, offset?) → number[]
Get all leaf point indices for a cluster. Returns indices into the original Arrow table — use table.get(index) to materialize rows.
engine.getClusterExpansionZoom(clusterId) → number
Get the zoom level at which a cluster expands into its children.
engine.getOriginZoom(clusterId) → number
Decode the zoom level from an encoded cluster ID.
engine.getOriginId(clusterId) → number
Decode the origin index from an encoded cluster ID.
ClusterOutput
interface ClusterOutput {
positions: Float64Array; // interleaved [lng, lat, lng, lat, ...]
pointCounts: Uint32Array; // points per cluster (1 for individual points)
ids: Float64Array; // cluster ID or Arrow row index
isCluster: Uint8Array; // 1 = cluster, 0 = individual point
length: number; // total items
}Performance
Benchmarked against Supercluster with the same datasets:
| Metric | 200k points | 1M points |
|---|---|---|
| Load time | ~1× (parity) | ~1× (parity) |
| Query time (avg) | ~7.5× faster | ~8× faster |
| Query time (mid-zoom peak) | ~20× faster | ~27× faster |
| Wire size (Arrow IPC vs JSON) | 84% smaller | 84% smaller |
Query speedups come from returning pre-allocated typed arrays instead of GeoJSON Feature objects. The more clustering happening (low/mid zoom), the bigger the win.
How It Works
Same algorithm as Supercluster (~400 lines), different I/O:
- Reads
Float64Arraycoordinate buffer directly from the Arrow geometry column - Converts lng/lat → Mercator, packs into flat arrays
- Builds a KDBush spatial index per zoom level (top-down clustering)
getClusters()does a range query and writes results into reusable typed array buffers
For individual points at high zoom, coordinates are read directly from the original Arrow buffer — no inverse Mercator transform needed.
License
ISC (same as Supercluster)