Package Exports
- arrow-rs-wasm
- arrow-rs-wasm/pkg
Readme
arrow-rs-wasm
A high-performance WebAssembly library for reading, writing, and processing Apache Arrow, Feather, and Parquet data with zero-copy semantics and LZ4 compression support.
๐ Features
- ๐ฅ Zero-Copy Performance: Efficient data processing with minimal memory overhead
- ๐ Multiple File Formats: Native support for Arrow IPC, Feather, and Parquet files
- ๐๏ธ LZ4 Compression: Built-in compression support for optimized storage
- ๐ Type Safety: Complete TypeScript definitions with Result-based error handling
- ๐ Universal: Works in browsers and Node.js environments
- โก Production Ready: Optimized WebAssembly build with comprehensive testing
- ๐งช Model-Based Testing: Comprehensive MBD test suite with cross-browser validation
- ๐ง Plugin Architecture: Extensible design for custom data type support
- ๐ Performance Monitoring: Built-in benchmarking and memory leak detection
๐ฆ Installation
npm install arrow-rs-wasm๐ Quick Start
Basic Usage (Browser)
import {
initWasm,
readTableFromArrayBuffer,
getSchemaSummary,
writeTableToIpc,
freeTable
} from 'arrow-rs-wasm';
// Initialize the WASM module
const initResult = await initWasm();
if (!initResult.ok) {
console.error('Failed to initialize WASM:', initResult.error);
return;
}
// Load an Arrow file
const response = await fetch('data/sample.arrow');
const arrayBuffer = await response.arrayBuffer();
// Read the table
const readResult = await readTableFromArrayBuffer(arrayBuffer);
if (!readResult.ok) {
console.error('Failed to read table:', readResult.error);
return;
}
const tableHandle = readResult.value;
// Inspect the schema
const schemaResult = await getSchemaSummary(tableHandle);
if (schemaResult.ok) {
console.log('Table Schema:');
schemaResult.value.columns.forEach(col => {
console.log(`- ${col.name}: ${col.arrowType} (nullable: ${col.nullable})`);
});
}
// Write back with LZ4 compression
const writeResult = await writeTableToIpc(tableHandle, true);
if (writeResult.ok) {
const compressedData = writeResult.value;
console.log(`Compressed size: ${compressedData.byteLength} bytes`);
}
// Clean up memory
await freeTable(tableHandle);Node.js Usage
import { readFileSync } from 'fs';
import {
initWasm,
readTableFromArrayBuffer,
getSchemaSummary
} from 'arrow-rs-wasm';
// Initialize WASM
await initWasm();
// Read file from disk
const fileData = readFileSync('data/sample.parquet');
const arrayBuffer = fileData.buffer.slice(
fileData.byteOffset,
fileData.byteOffset + fileData.byteLength
);
// Process the file
const result = await readTableFromArrayBuffer(arrayBuffer);
if (result.ok) {
const schema = await getSchemaSummary(result.value);
console.log('Columns:', schema.value?.columns.length);
await freeTable(result.value);
}๐ File Format Examples
Reading Different Formats
// The library automatically detects file formats
async function processFile(arrayBuffer: ArrayBuffer) {
const result = await readTableFromArrayBuffer(arrayBuffer);
if (!result.ok) {
console.error('Read failed:', result.error);
return;
}
// Works with:
// - Arrow IPC files (.arrow)
// - Feather files (.feather, .fea)
// - Parquet files (.parquet)
const handle = result.value;
const schema = await getSchemaSummary(handle);
console.log('Successfully read file with', schema.value?.columns.length, 'columns');
await freeTable(handle);
}
// Usage with different file types
await processFile(await fetch('data.arrow').then(r => r.arrayBuffer()));
await processFile(await fetch('data.feather').then(r => r.arrayBuffer()));
await processFile(await fetch('data.parquet').then(r => r.arrayBuffer()));Writing with Compression
async function saveWithCompression(tableHandle: number) {
// Write with LZ4 compression enabled
const compressedResult = await writeTableToIpc(tableHandle, true);
// Write without compression
const uncompressedResult = await writeTableToIpc(tableHandle, false);
if (compressedResult.ok && uncompressedResult.ok) {
const compressed = compressedResult.value;
const uncompressed = uncompressedResult.value;
console.log(`Compression ratio: ${(compressed.byteLength / uncompressed.byteLength * 100).toFixed(1)}%`);
// Save compressed version
const blob = new Blob([compressed], { type: 'application/octet-stream' });
const url = URL.createObjectURL(blob);
const link = document.createElement('a');
link.href = url;
link.download = 'data_compressed.arrow';
link.click();
}
}๐ง Advanced Usage
Memory Management
import {
getMemoryStats,
clearAllTables,
isValidHandle
} from 'arrow-rs-wasm';
async function demonstrateMemoryManagement() {
// Check memory usage
const statsResult = await getMemoryStats();
if (statsResult.ok) {
const stats = statsResult.value;
console.log(`Active tables: ${stats.activeTables}`);
console.log(`Total rows: ${stats.totalRows}`);
console.log(`Total batches: ${stats.totalBatches}`);
}
// Validate handles
const handle = 123;
if (isValidHandle(handle)) {
console.log('Handle is valid');
await freeTable(handle);
}
// Clear all tables (useful for testing)
clearAllTables();
}Error Handling Patterns
// The library uses Result types instead of throwing exceptions
async function safeProcessing(data: ArrayBuffer) {
const readResult = await readTableFromArrayBuffer(data);
// Pattern 1: Early return on error
if (!readResult.ok) {
console.error('Failed to read:', readResult.error);
return;
}
const handle = readResult.value;
// Pattern 2: Nested result handling
const schemaResult = await getSchemaSummary(handle);
if (schemaResult.ok) {
const columns = schemaResult.value.columns;
// Process columns...
const writeResult = await writeTableToIpc(handle, true);
if (writeResult.ok) {
console.log('Successfully processed and wrote data');
return writeResult.value;
} else {
console.error('Write failed:', writeResult.error);
}
} else {
console.error('Schema read failed:', schemaResult.error);
}
// Always clean up
await freeTable(handle);
}Batch Processing
async function processManyFiles(files: File[]) {
const results = [];
for (const file of files) {
const arrayBuffer = await file.arrayBuffer();
const readResult = await readTableFromArrayBuffer(arrayBuffer);
if (readResult.ok) {
const handle = readResult.value;
const schemaResult = await getSchemaSummary(handle);
if (schemaResult.ok) {
results.push({
filename: file.name,
columns: schemaResult.value.columns.length,
handle: handle
});
}
}
}
// Process results...
console.log(`Successfully loaded ${results.length} files`);
// Clean up all handles
for (const result of results) {
await freeTable(result.handle);
}
return results;
}๐ API Reference
Core Functions
| Function | Description | Returns |
|---|---|---|
initWasm(wasmBytes?, opts?) |
Initialize the WASM module | Promise<Result<void>> |
readTableFromArrayBuffer(data) |
Read Arrow/Feather/Parquet data | Promise<Result<TableHandle>> |
getSchemaSummary(handle) |
Get table schema information | Promise<Result<SchemaSummary>> |
getTableRowCount(handle) |
Get number of rows in table | Promise<Result<number>> |
exportColumnByName(handle, name) |
Export column data as typed arrays | Promise<Result<ColumnExport>> |
writeTableToIpc(handle, enableLz4) |
Write table as Arrow IPC | Promise<Result<ArrayBuffer>> |
freeTable(handle) |
Release table memory | Promise<Result<void>> |
Utility Functions
| Function | Description | Returns |
|---|---|---|
isWasmInitialized() |
Check if WASM is ready | boolean |
isValidHandle(handle) |
Validate table handle | boolean |
getMemoryStats() |
Get memory usage statistics | Promise<Result<MemoryStats>> |
clearAllTables() |
Clear all tables (testing) | void |
TypeScript Types
// Result type for error handling
export type Result<T> =
| { ok: true; value: T }
| { ok: false; error: string };
// Table handle (opaque reference)
export type TableHandle = number;
// Schema information
export interface SchemaSummary {
columns: ColumnInfo[];
metadata: Record<string, string>;
}
export interface ColumnInfo {
name: string;
arrowType: string;
nullable: boolean;
}
// Column export data
export interface ColumnExport {
arrowType: string;
nullable: boolean;
data: ArrayBuffer;
nullBitmap?: ArrayBuffer;
extraBuffers?: ArrayBuffer[];
}
// Memory statistics
export interface MemoryStats {
activeTables: number;
totalRows: number;
totalBatches: number;
}
// Initialization options
export interface WasmInitOptions {
enableConsoleLogs?: boolean;
}โก Performance Considerations
Memory Management Best Practices
- Always free table handles when done processing
- Use
getMemoryStats()to monitor memory usage - Process files in batches for large datasets
- Enable LZ4 compression for network transfers
Optimization Tips
// Good: Process and immediately free
async function efficientProcessing(data: ArrayBuffer) {
const result = await readTableFromArrayBuffer(data);
if (result.ok) {
const schema = await getSchemaSummary(result.value);
const output = await writeTableToIpc(result.value, true);
await freeTable(result.value); // Free immediately
return output;
}
}
// Avoid: Accumulating handles without cleanup
const handles = []; // This will leak memory!
for (const file of files) {
const result = await readTableFromArrayBuffer(await file.arrayBuffer());
if (result.ok) {
handles.push(result.value); // Don't do this
}
}๐ Browser vs Node.js
Browser-Specific Features
- Automatic WASM loading from CDN
- File drag-and-drop processing
- Blob/URL creation for downloads
- Worker thread support
Node.js-Specific Features
- File system integration
- Stream processing
- Buffer handling
- Command-line tools
๐งช Testing
This library includes a comprehensive Model-Based Development (MBD) test suite for thorough validation:
# Run the full MBD test suite
cd mbd_tests
npm install
npm run test:models
# Cross-browser testing
npm run test:browsers
# Performance benchmarking
npm run test:performance
# Model-based test generation
npm run models:generateTest Coverage
- Behavioral Models: API lifecycle, error handling, memory management
- Structural Models: Architecture validation, data flow verification
- Cross-Browser: Chrome, Firefox, Safari compatibility
- Performance: Baseline establishment and regression detection
๐ค Contributing
Contributions are welcome! Please read our contributing guidelines and ensure all tests pass before submitting a pull request.
Development Setup
# Clone the repository
git clone https://github.com/BectorVoom/arrow-rs-wasm.git
cd arrow-rs-wasm
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Run comprehensive MBD tests
cd mbd_tests && npm run validate:all๐ License
This project is dual-licensed under either:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option.
๐ Links
- npm package
- GitHub repository
- Apache Arrow project
- WebAssembly
- arrow - The official Rust Arrow implementation
- wasm-bindgen - Rust and WebAssembly integration
- lz4_flex - Pure Rust LZ4 implementation
Built with โค๏ธ using Rust and WebAssembly