JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 15
  • Score
    100M100P100Q35638F
  • License MIT OR Apache-2.0

WebAssembly library for Arrow/Feather data with zero-copy ownership semantics

Package Exports

  • arrow-rs-wasm
  • arrow-rs-wasm/arrow_rs_wasm.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (arrow-rs-wasm) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

arrow-rs-wasm

npm version CI License: MIT OR Apache-2.0 WASM

A high-performance WebAssembly library for reading, writing, and processing Apache Arrow, Feather, and Parquet data with zero-copy semantics and LZ4 compression support.

=� Features

  • =% Zero-Copy Performance: Efficient data processing with minimal memory overhead
  • =� Multiple File Formats: Native support for Arrow IPC, Feather, and Parquet files
  • =� LZ4 Compression: Built-in compression support for optimized storage
  • = Type Safety: Complete TypeScript definitions with Result-based error handling
  • < Universal: Works in browsers and Node.js environments
  • � Production Ready: Optimized WebAssembly build with comprehensive testing
  • >� Model-Based Testing: Comprehensive MBD test suite with cross-browser validation
  • =' Plugin Architecture: Extensible design for custom data type support
  • =� Performance Monitoring: Built-in benchmarking and memory leak detection

=� Installation

npm install arrow-rs-wasm

<� Quick Start

Basic Usage (Browser)

import {
  initWasm,
  readTableFromArrayBuffer,
  getSchemaSummary,
  getTableRowCount,
  exportColumnByName,
  writeTableToIpc,
  freeTable
} from 'arrow-rs-wasm';

// Initialize the WASM module
const initResult = await initWasm();
if (!initResult.ok) {
  console.error('Failed to initialize WASM:', initResult.error);
  return;
}

// Load an Arrow file
const response = await fetch('data/sample.arrow');
const arrayBuffer = await response.arrayBuffer();

// Read the table
const readResult = await readTableFromArrayBuffer(arrayBuffer);
if (!readResult.ok) {
  console.error('Failed to read table:', readResult.error);
  return;
}

const tableHandle = readResult.value;

// Inspect the schema
const schemaResult = await getSchemaSummary(tableHandle);
if (schemaResult.ok) {
  console.log('Table Schema:');
  schemaResult.value.columns.forEach(col => {
    console.log(`- ${col.name}: ${col.arrowType} (nullable: ${col.nullable})`);
  });
}

// Get row count
const rowCountResult = await getTableRowCount(tableHandle);
if (rowCountResult.ok) {
  console.log(`Table has ${rowCountResult.value} rows`);
}

// Export a specific column
const columnResult = await exportColumnByName(tableHandle, 'id');
if (columnResult.ok) {
  const column = columnResult.value;
  console.log(`Column type: ${column.arrowType}, length: ${column.length}`);
  
  // For Int32 columns, you can convert to typed array
  if (column.arrowType === 'Int32') {
    const values = new Int32Array(column.data);
    console.log('First few values:', values.slice(0, 5));
  }
}

// Write back with LZ4 compression
const writeResult = await writeTableToIpc(tableHandle, true);
if (writeResult.ok) {
  const compressedData = writeResult.value;
  console.log(`Compressed size: ${compressedData.byteLength} bytes`);
}

// Clean up memory
await freeTable(tableHandle);

Node.js Usage

import { readFileSync } from 'fs';
import {
  initWasm,
  readTableFromArrayBuffer,
  getSchemaSummary
} from 'arrow-rs-wasm';

// Initialize WASM
await initWasm();

// Read file from disk
const fileData = readFileSync('data/sample.parquet');
const arrayBuffer = fileData.buffer.slice(
  fileData.byteOffset,
  fileData.byteOffset + fileData.byteLength
);

// Process the file
const result = await readTableFromArrayBuffer(arrayBuffer);
if (result.ok) {
  const schema = await getSchemaSummary(result.value);
  console.log('Columns:', schema.value?.columns.length);

  await freeTable(result.value);
}

=� File Format Examples

Reading Different Formats

// The library automatically detects file formats
async function processFile(arrayBuffer: ArrayBuffer) {
  const result = await readTableFromArrayBuffer(arrayBuffer);

  if (!result.ok) {
    console.error('Read failed:', result.error);
    return;
  }

  // Works with:
  // - Arrow IPC files (.arrow)
  // - Feather files (.feather, .fea)
  // - Parquet files (.parquet)

  const handle = result.value;
  const schema = await getSchemaSummary(handle);

  console.log('Successfully read file with', schema.value?.columns.length, 'columns');
  await freeTable(handle);
}

// Usage with different file types
await processFile(await fetch('data.arrow').then(r => r.arrayBuffer()));
await processFile(await fetch('data.feather').then(r => r.arrayBuffer()));
await processFile(await fetch('data.parquet').then(r => r.arrayBuffer()));

Writing with Compression

async function saveWithCompression(tableHandle: number) {
  // Write with LZ4 compression enabled
  const compressedResult = await writeTableToIpc(tableHandle, true);

  // Write without compression
  const uncompressedResult = await writeTableToIpc(tableHandle, false);

  if (compressedResult.ok && uncompressedResult.ok) {
    const compressed = compressedResult.value;
    const uncompressed = uncompressedResult.value;

    console.log(`Compression ratio: ${(compressed.byteLength / uncompressed.byteLength * 100).toFixed(1)}%`);

    // Save compressed version
    const blob = new Blob([compressed], { type: 'application/octet-stream' });
    const url = URL.createObjectURL(blob);

    const link = document.createElement('a');
    link.href = url;
    link.download = 'data_compressed.arrow';
    link.click();
  }
}

=' Advanced Usage

Memory Management

import {
  getMemoryStats,
  clearAllTables,
  isValidHandle
} from 'arrow-rs-wasm';

async function demonstrateMemoryManagement() {
  // Check memory usage
  const statsResult = await getMemoryStats();
  if (statsResult.ok) {
    const stats = statsResult.value;
    console.log(`Active tables: ${stats.activeTables}`);
    console.log(`Total rows: ${stats.totalRows}`);
    console.log(`Total batches: ${stats.totalBatches}`);
  }

  // Validate handles
  const handle = 123;
  if (isValidHandle(handle)) {
    console.log('Handle is valid');
    await freeTable(handle);
  }

  // Clear all tables (useful for testing)
  clearAllTables();
}

Error Handling Patterns

// The library uses Result types instead of throwing exceptions
async function safeProcessing(data: ArrayBuffer) {
  const readResult = await readTableFromArrayBuffer(data);

  // Pattern 1: Early return on error
  if (!readResult.ok) {
    console.error('Failed to read:', readResult.error);
    return;
  }

  const handle = readResult.value;

  // Pattern 2: Nested result handling
  const schemaResult = await getSchemaSummary(handle);
  if (schemaResult.ok) {
    const columns = schemaResult.value.columns;

    // Process columns...
    const writeResult = await writeTableToIpc(handle, true);
    if (writeResult.ok) {
      console.log('Successfully processed and wrote data');
      return writeResult.value;
    } else {
      console.error('Write failed:', writeResult.error);
    }
  } else {
    console.error('Schema read failed:', schemaResult.error);
  }

  // Always clean up
  await freeTable(handle);
}

Batch Processing

async function processManyFiles(files: File[]) {
  const results = [];

  for (const file of files) {
    const arrayBuffer = await file.arrayBuffer();
    const readResult = await readTableFromArrayBuffer(arrayBuffer);

    if (readResult.ok) {
      const handle = readResult.value;
      const schemaResult = await getSchemaSummary(handle);

      if (schemaResult.ok) {
        results.push({
          filename: file.name,
          columns: schemaResult.value.columns.length,
          handle: handle
        });
      }
    }
  }

  // Process results...
  console.log(`Successfully loaded ${results.length} files`);

  // Clean up all handles
  for (const result of results) {
    await freeTable(result.handle);
  }

  return results;
}

Working with Column Data

async function analyzeColumnData(handle: TableHandle, columnName: string) {
  // Export column data
  const colResult = await exportColumnByName(handle, columnName);
  if (!colResult.ok) {
    console.error('Failed to export column:', colResult.error);
    return;
  }

  const column = colResult.value;
  console.log(`Column '${columnName}' - Type: ${column.arrowType}, Length: ${column.length}`);

  // Handle different data types
  switch (column.arrowType) {
    case 'Int32':
      const int32Values = new Int32Array(column.data);
      console.log('Int32 values:', int32Values.slice(0, 10));
      break;
      
    case 'Float64':
      const float64Values = new Float64Array(column.data);
      console.log('Float64 values:', float64Values.slice(0, 10));
      break;
      
    case 'Utf8':
      // UTF-8 strings require offset buffer + data buffer
      if (column.extraBuffers && column.extraBuffers.length > 0) {
        const offsets = new Int32Array(column.extraBuffers[0]);
        const bytes = new Uint8Array(column.data);
        
        // Extract first few strings
        const strings = [];
        for (let i = 0; i < Math.min(5, offsets.length - 1); i++) {
          const start = offsets[i];
          const end = offsets[i + 1];
          const stringBytes = bytes.slice(start, end);
          strings.push(new TextDecoder().decode(stringBytes));
        }
        console.log('String values:', strings);
      }
      break;
      
    case 'Boolean':
      // Boolean arrays are bit-packed
      const boolBytes = new Uint8Array(column.data);
      const boolValues = [];
      for (let i = 0; i < Math.min(column.length, 40); i++) {
        const byteIndex = Math.floor(i / 8);
        const bitIndex = i % 8;
        const value = (boolBytes[byteIndex] & (1 << bitIndex)) !== 0;
        boolValues.push(value);
      }
      console.log('Boolean values:', boolValues);
      break;
      
    default:
      console.log('Raw data buffer size:', column.data.byteLength);
  }
  
  // Handle null bitmap if present
  if (column.nullBitmap) {
    const nullBytes = new Uint8Array(column.nullBitmap);
    console.log('Has null bitmap, size:', nullBytes.length);
    
    // Count nulls in first 64 values
    let nullCount = 0;
    for (let i = 0; i < Math.min(column.length, 64); i++) {
      const byteIndex = Math.floor(i / 8);
      const bitIndex = i % 8;
      const isNull = (nullBytes[byteIndex] & (1 << bitIndex)) === 0;
      if (isNull) nullCount++;
    }
    console.log(`Null count in first 64 values: ${nullCount}`);
  }
}

=� API Reference

Core Functions

Function Description Returns
initWasm(wasmBytes?, opts?) Initialize the WASM module Promise<Result<void>>
readTableFromArrayBuffer(data) Read Arrow/Feather/Parquet data Promise<Result<TableHandle>>
getSchemaSummary(handle) Get table schema information Promise<Result<SchemaSummary>>
getTableRowCount(handle) Get number of rows in table Promise<Result<number>>
exportColumnByName(handle, name) Export column data as typed arrays Promise<Result<ColumnExport>>
writeTableToIpc(handle, enableLz4) Write table as Arrow IPC Promise<Result<ArrayBuffer>>
freeTable(handle) Release table memory Promise<Result<void>>

Utility Functions

Function Description Returns
isWasmInitialized() Check if WASM is ready boolean
isValidHandle(handle) Validate table handle boolean
getMemoryStats() Get memory usage statistics Promise<Result<MemoryStats>>
clearAllTables() Clear all tables (testing) void

TypeScript Types

// Result type for error handling
export type Result<T> =
  | { ok: true; value: T }
  | { ok: false; error: string };

// Table handle (opaque reference)
export type TableHandle = number;

// Schema information
export interface SchemaSummary {
  columns: ColumnInfo[];
  metadata: Record<string, string>;
}

export interface ColumnInfo {
  name: string;
  arrowType: string;
  nullable: boolean;
}

// Column export data
export interface ColumnExport {
  arrowType: string;
  nullable: boolean;
  data: ArrayBuffer;
  nullBitmap?: ArrayBuffer;
  extraBuffers?: ArrayBuffer[];
}

// Memory statistics
export interface MemoryStats {
  activeTables: number;
  totalRows: number;
  totalBatches: number;
}

// Initialization options
export interface WasmInitOptions {
  enableConsoleLogs?: boolean;
}

� Performance Considerations

Memory Management Best Practices

  1. Always free table handles when done processing
  2. Use getMemoryStats() to monitor memory usage
  3. Process files in batches for large datasets
  4. Enable LZ4 compression for network transfers

Optimization Tips

// Good: Process and immediately free
async function efficientProcessing(data: ArrayBuffer) {
  const result = await readTableFromArrayBuffer(data);
  if (result.ok) {
    const schema = await getSchemaSummary(result.value);
    const output = await writeTableToIpc(result.value, true);

    await freeTable(result.value); // Free immediately
    return output;
  }
}

// Avoid: Accumulating handles without cleanup
const handles = []; // This will leak memory!
for (const file of files) {
  const result = await readTableFromArrayBuffer(await file.arrayBuffer());
  if (result.ok) {
    handles.push(result.value); // Don't do this
  }
}

< Browser vs Node.js

Browser-Specific Features

  • Automatic WASM loading from CDN
  • File drag-and-drop processing
  • Blob/URL creation for downloads
  • Worker thread support

Node.js-Specific Features

  • File system integration
  • Stream processing
  • Buffer handling
  • Command-line tools

>� Testing

This library includes a comprehensive Model-Based Development (MBD) test suite for thorough validation:

# Run the full MBD test suite
cd mbd_tests
npm install
npm run test:models

# Cross-browser testing
npm run test:browsers

# Performance benchmarking
npm run test:performance

# Model-based test generation
npm run models:generate

Test Coverage

  • Behavioral Models: API lifecycle, error handling, memory management
  • Structural Models: Architecture validation, data flow verification
  • Cross-Browser: Chrome, Firefox, Safari compatibility
  • Performance: Baseline establishment and regression detection

> Contributing

Contributions are welcome! Please read our contributing guidelines and ensure all tests pass before submitting a pull request.

Development Setup

# Clone the repository
git clone https://github.com/BectorVoom/arrow-rs-wasm.git
cd arrow-rs-wasm

# Install dependencies
npm install

# Build the project
npm run build

# Run tests
npm test

# Run comprehensive MBD tests
cd mbd_tests && npm run validate:all

=� License

This project is dual-licensed under either:

at your option.


Built with d using Rust and WebAssembly