JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 15
  • Score
    100M100P100Q41328F
  • License MIT OR Apache-2.0

WebAssembly library for Arrow/Feather data with zero-copy ownership semantics

Package Exports

  • arrow-rs-wasm
  • arrow-rs-wasm/arrow_rs_wasm.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (arrow-rs-wasm) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

arrow-rs-wasm

License Version Build

A high-performance WebAssembly library for Apache Arrow, Feather, and Parquet data processing with zero-copy semantics, LZ4 compression, and comprehensive TypeScript support.

โœจ Features

  • ๐Ÿš€ Zero-copy data transfer between WebAssembly and JavaScript
  • ๐Ÿ“Š Apache Arrow IPC format support with LZ4 compression
  • ๐Ÿชถ Feather and Parquet file format support
  • ๐Ÿ”ง TypeScript-first API with complete type definitions
  • ๐ŸŒ Cross-platform - works in browsers and Node.js
  • โšก High performance - built on Rust's arrow-rs ecosystem
  • ๐Ÿงช Model-based testing for comprehensive validation
  • ๐Ÿ“ฆ Tree-shaking friendly ES modules

๐Ÿ“ฆ Installation

npm

npm install arrow-rs-wasm

Direct WASM

# Using wasm-pack directly
wasm-pack build --target web

๐Ÿš€ Quick Start

Browser Usage

import {
  initialize,
  createTestTable,
  writeTableToIpc,
  readTableFromBytes,
  getTableInfo,
  freeTable
} from 'arrow-rs-wasm';

// Initialize the WASM module
const initResult = await initialize();
if (!initResult.ok) {
  console.error('Failed to initialize:', initResult.error);
  return;
}

// Create a test table with sample data
const tableResult = createTestTable();
if (!tableResult.ok) {
  console.error('Failed to create table:', tableResult.error);
  return;
}

const tableHandle = tableResult.value;

// Get table information
const info = getTableInfo(tableHandle);
console.log(`Table has ${info.numRows} rows and ${info.numColumns} columns`);

// Write table to Arrow IPC format with LZ4 compression
const ipcData = writeTableToIpc(tableHandle, true);
console.log(`Serialized table to ${ipcData.length} bytes`);

// Read the data back
const newTableResult = readTableFromBytes(ipcData);
if (newTableResult.ok) {
  console.log('Successfully round-tripped data!');

  // Clean up
  await freeTable(tableHandle);
  await freeTable(newTableResult.value);
}

Node.js Usage

import { readFile } from 'fs/promises';
import {
  initialize,
  readTableFromBytes,
  getTableSchemaJson,
  exportColumnWithType
} from 'arrow-rs-wasm';

async function processArrowFile(filePath: string) {
  // Initialize WASM
  await initialize();

  // Read Arrow file
  const fileData = await readFile(filePath);
  const bytes = new Uint8Array(fileData);

  // Parse Arrow data
  const tableResult = readTableFromBytes(bytes);
  if (!tableResult.ok) {
    throw new Error(`Failed to read table: ${tableResult.error}`);
  }

  const tableHandle = tableResult.value;

  // Get schema as JSON
  const schema = getTableSchemaJson(tableHandle);
  console.log('Schema:', JSON.parse(schema));

  // Export a specific column
  const columnData = exportColumnWithType(tableHandle, 'column_name', 'int32');
  console.log('Column data:', columnData);

  // Clean up
  await freeTable(tableHandle);
}

๐Ÿ“š API Reference

Core Functions

initialize(options?: InitOptions): Promise<Result<void>>

Initialize the WASM module. Must be called before using other functions.

interface InitOptions {
  wasmUrl?: string;  // Custom WASM binary URL
  memoryPages?: number;  // Initial memory allocation
}

createTestTable(): Result<TableHandle>

Create a simple test table with sample data for experimentation.

const result = createTestTable();
if (result.ok) {
  const handle = result.value;
  // Use the table...
}

readTableFromBytes(data: Uint8Array): Result<TableHandle>

Read an Arrow IPC format byte array into a table handle.

const bytes = new Uint8Array(arrowData);
const result = readTableFromBytes(bytes);
if (result.ok) {
  const table = result.value;
  // Process table...
}

writeTableToIpc(handle: TableHandle, enableLz4: boolean): Uint8Array

Write a table to Arrow IPC format with optional LZ4 compression.

const compressed = writeTableToIpc(tableHandle, true);
const uncompressed = writeTableToIpc(tableHandle, false);

Schema and Metadata

getTableSchemaJson(handle: TableHandle): string

Get the table schema as a JSON string.

const schemaJson = getTableSchemaJson(tableHandle);
const schema = JSON.parse(schemaJson);
console.log('Fields:', schema.fields);

getTableInfo(handle: TableHandle): TableInfo

Get basic information about a table.

interface TableInfo {
  numRows: number;
  numColumns: number;
  numBatches: number;
}

exportColumnWithType(handle: TableHandle, columnName: string, dataType: string): any[]

Export a specific column's data as a JavaScript array.

const intColumn = exportColumnWithType(tableHandle, 'id', 'int32');
const stringColumn = exportColumnWithType(tableHandle, 'name', 'utf8');

Memory Management

freeTable(handle: TableHandle): Promise<Result<void>>

Release a table handle and free associated memory. Always call this when done with a table.

await freeTable(tableHandle);

getMemoryStats(): MemoryStats

Get current memory usage statistics.

interface MemoryStats {
  activeTables: number;
  totalRows: number;
  totalBatches: number;
}

๐Ÿ”ง Advanced Usage

Custom Schema Creation

import { initialize, createCustomTable } from 'arrow-rs-wasm';

await initialize();

// Define schema
const schema = {
  fields: [
    { name: 'timestamp', type: 'timestamp', nullable: false },
    { name: 'sensor_id', type: 'int64', nullable: false },
    { name: 'temperature', type: 'float64', nullable: true },
    { name: 'location', type: 'utf8', nullable: true }
  ]
};

// Create table with custom data
const data = {
  timestamp: [Date.now(), Date.now() + 1000],
  sensor_id: [1001, 1002],
  temperature: [23.5, 24.1],
  location: ['Building A', 'Building B']
};

const tableHandle = createCustomTable(schema, data);

Compression Comparison

const tableHandle = createTestTable().value;

// Compare compression ratios
const uncompressed = writeTableToIpc(tableHandle, false);
const compressed = writeTableToIpc(tableHandle, true);

console.log(`Uncompressed: ${uncompressed.length} bytes`);
console.log(`Compressed: ${compressed.length} bytes`);
console.log(`Ratio: ${(compressed.length / uncompressed.length * 100).toFixed(1)}%`);

Batch Processing

async function processBatches(files: string[]) {
  await initialize();

  for (const file of files) {
    const data = await readFile(file);
    const result = readTableFromBytes(new Uint8Array(data));

    if (result.ok) {
      const info = getTableInfo(result.value);
      console.log(`Processed ${file}: ${info.numRows} rows`);
      await freeTable(result.value);
    }
  }
}

๐ŸŒ Browser Support

Browser Version Notes
Chrome 57+ Full support
Firefox 52+ Full support
Safari 11+ Full support
Edge 16+ Full support

Browser Setup

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Arrow WASM Demo</title>
</head>
<body>
    <script type="module">
        import init, {
            initialize,
            createTestTable,
            getTableInfo
        } from './pkg/arrow_wasm.js';

        async function demo() {
            await init();
            await initialize();

            const table = createTestTable();
            if (table.ok) {
                const info = getTableInfo(table.value);
                console.log('Table created with', info.numRows, 'rows');
            }
        }

        demo();
    </script>
</body>
</html>

โšก Performance

Zero-Copy Benefits

The library uses zero-copy memory sharing between WebAssembly and JavaScript, providing significant performance advantages:

  • Memory efficiency: No data duplication between WASM and JS
  • Speed: Direct memory access without serialization overhead
  • Scalability: Handle large datasets efficiently

Benchmarks

Operation Size Time Memory
Read IPC 10MB ~50ms ~10MB
Write IPC 10MB ~75ms ~20MB
Schema parse 1000 fields ~5ms ~1MB

๐Ÿงช Testing

The library includes comprehensive model-based testing:

# Run all tests
npm test

# Browser tests
npm run test:browser

# Node.js tests
npm run test:node

# Build and test
npm run build && npm test

๐Ÿ”จ Development

Prerequisites

  • Rust 1.70+
  • Node.js 18+
  • wasm-pack

Building

# Install dependencies
npm install

# Build WASM and TypeScript
npm run build

# Watch mode for development
npm run dev

๐Ÿ“„ License

This project is licensed under either of

at your option.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Guidelines

  1. Follow the existing code style
  2. Add tests for new features
  3. Update documentation as needed
  4. Ensure all tests pass

๐Ÿ“ˆ Roadmap

  • Parquet writer support
  • Additional compression algorithms (Zstd, Snappy)
  • Streaming data support
  • WebAssembly SIMD optimizations
  • Advanced filtering and aggregation APIs

Built with โค๏ธ using Rust and WebAssembly