Package Exports
- arrow-rs-wasm
- arrow-rs-wasm/arrow_rs_wasm.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (arrow-rs-wasm) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
arrow-rs-wasm
A high-performance WebAssembly library for reading, writing, and processing Apache Arrow, Feather, and Parquet data with zero-copy semantics and LZ4 compression support.
=� Features
- =% Zero-Copy Performance: Efficient data processing with minimal memory overhead
- =� Multiple File Formats: Native support for Arrow IPC, Feather, and Parquet files
- =� LZ4 Compression: Built-in compression support for optimized storage
- = Type Safety: Complete TypeScript definitions with Result-based error handling
- < Universal: Works in browsers and Node.js environments
- � Production Ready: Optimized WebAssembly build with comprehensive testing
- >� Model-Based Testing: Comprehensive MBD test suite with cross-browser validation
- =' Plugin Architecture: Extensible design for custom data type support
- =� Performance Monitoring: Built-in benchmarking and memory leak detection
=� Installation
npm install arrow-rs-wasm<� Quick Start
Basic Usage (Browser)
import {
initWasm,
readTableFromArrayBuffer,
getSchemaSummary,
getTableRowCount,
exportColumnByName,
writeTableToIpc,
freeTable
} from 'arrow-rs-wasm';
// Initialize the WASM module
const initResult = await initWasm();
if (!initResult.ok) {
console.error('Failed to initialize WASM:', initResult.error);
return;
}
// Load an Arrow file
const response = await fetch('data/sample.arrow');
const arrayBuffer = await response.arrayBuffer();
// Read the table
const readResult = await readTableFromArrayBuffer(arrayBuffer);
if (!readResult.ok) {
console.error('Failed to read table:', readResult.error);
return;
}
const tableHandle = readResult.value;
// Inspect the schema
const schemaResult = await getSchemaSummary(tableHandle);
if (schemaResult.ok) {
console.log('Table Schema:');
schemaResult.value.columns.forEach(col => {
console.log(`- ${col.name}: ${col.arrowType} (nullable: ${col.nullable})`);
});
}
// Get row count
const rowCountResult = await getTableRowCount(tableHandle);
if (rowCountResult.ok) {
console.log(`Table has ${rowCountResult.value} rows`);
}
// Export a specific column
const columnResult = await exportColumnByName(tableHandle, 'id');
if (columnResult.ok) {
const column = columnResult.value;
console.log(`Column type: ${column.arrowType}, length: ${column.length}`);
// For Int32 columns, you can convert to typed array
if (column.arrowType === 'Int32') {
const values = new Int32Array(column.data);
console.log('First few values:', values.slice(0, 5));
}
}
// Write back with LZ4 compression
const writeResult = await writeTableToIpc(tableHandle, true);
if (writeResult.ok) {
const compressedData = writeResult.value;
console.log(`Compressed size: ${compressedData.byteLength} bytes`);
}
// Clean up memory
await freeTable(tableHandle);Node.js Usage
import { readFileSync } from 'fs';
import {
initWasm,
readTableFromArrayBuffer,
getSchemaSummary
} from 'arrow-rs-wasm';
// Initialize WASM
await initWasm();
// Read file from disk
const fileData = readFileSync('data/sample.parquet');
const arrayBuffer = fileData.buffer.slice(
fileData.byteOffset,
fileData.byteOffset + fileData.byteLength
);
// Process the file
const result = await readTableFromArrayBuffer(arrayBuffer);
if (result.ok) {
const schema = await getSchemaSummary(result.value);
console.log('Columns:', schema.value?.columns.length);
await freeTable(result.value);
}=� File Format Examples
Reading Different Formats
// The library automatically detects file formats
async function processFile(arrayBuffer: ArrayBuffer) {
const result = await readTableFromArrayBuffer(arrayBuffer);
if (!result.ok) {
console.error('Read failed:', result.error);
return;
}
// Works with:
// - Arrow IPC files (.arrow)
// - Feather files (.feather, .fea)
// - Parquet files (.parquet)
const handle = result.value;
const schema = await getSchemaSummary(handle);
console.log('Successfully read file with', schema.value?.columns.length, 'columns');
await freeTable(handle);
}
// Usage with different file types
await processFile(await fetch('data.arrow').then(r => r.arrayBuffer()));
await processFile(await fetch('data.feather').then(r => r.arrayBuffer()));
await processFile(await fetch('data.parquet').then(r => r.arrayBuffer()));Writing with Compression
async function saveWithCompression(tableHandle: number) {
// Write with LZ4 compression enabled
const compressedResult = await writeTableToIpc(tableHandle, true);
// Write without compression
const uncompressedResult = await writeTableToIpc(tableHandle, false);
if (compressedResult.ok && uncompressedResult.ok) {
const compressed = compressedResult.value;
const uncompressed = uncompressedResult.value;
console.log(`Compression ratio: ${(compressed.byteLength / uncompressed.byteLength * 100).toFixed(1)}%`);
// Save compressed version
const blob = new Blob([compressed], { type: 'application/octet-stream' });
const url = URL.createObjectURL(blob);
const link = document.createElement('a');
link.href = url;
link.download = 'data_compressed.arrow';
link.click();
}
}=' Advanced Usage
Memory Management
import {
getMemoryStats,
clearAllTables,
isValidHandle
} from 'arrow-rs-wasm';
async function demonstrateMemoryManagement() {
// Check memory usage
const statsResult = await getMemoryStats();
if (statsResult.ok) {
const stats = statsResult.value;
console.log(`Active tables: ${stats.activeTables}`);
console.log(`Total rows: ${stats.totalRows}`);
console.log(`Total batches: ${stats.totalBatches}`);
}
// Validate handles
const handle = 123;
if (isValidHandle(handle)) {
console.log('Handle is valid');
await freeTable(handle);
}
// Clear all tables (useful for testing)
clearAllTables();
}Error Handling Patterns
// The library uses Result types instead of throwing exceptions
async function safeProcessing(data: ArrayBuffer) {
const readResult = await readTableFromArrayBuffer(data);
// Pattern 1: Early return on error
if (!readResult.ok) {
console.error('Failed to read:', readResult.error);
return;
}
const handle = readResult.value;
// Pattern 2: Nested result handling
const schemaResult = await getSchemaSummary(handle);
if (schemaResult.ok) {
const columns = schemaResult.value.columns;
// Process columns...
const writeResult = await writeTableToIpc(handle, true);
if (writeResult.ok) {
console.log('Successfully processed and wrote data');
return writeResult.value;
} else {
console.error('Write failed:', writeResult.error);
}
} else {
console.error('Schema read failed:', schemaResult.error);
}
// Always clean up
await freeTable(handle);
}Batch Processing
async function processManyFiles(files: File[]) {
const results = [];
for (const file of files) {
const arrayBuffer = await file.arrayBuffer();
const readResult = await readTableFromArrayBuffer(arrayBuffer);
if (readResult.ok) {
const handle = readResult.value;
const schemaResult = await getSchemaSummary(handle);
if (schemaResult.ok) {
results.push({
filename: file.name,
columns: schemaResult.value.columns.length,
handle: handle
});
}
}
}
// Process results...
console.log(`Successfully loaded ${results.length} files`);
// Clean up all handles
for (const result of results) {
await freeTable(result.handle);
}
return results;
}Working with Column Data
async function analyzeColumnData(handle: TableHandle, columnName: string) {
// Export column data
const colResult = await exportColumnByName(handle, columnName);
if (!colResult.ok) {
console.error('Failed to export column:', colResult.error);
return;
}
const column = colResult.value;
console.log(`Column '${columnName}' - Type: ${column.arrowType}, Length: ${column.length}`);
// Handle different data types
switch (column.arrowType) {
case 'Int32':
const int32Values = new Int32Array(column.data);
console.log('Int32 values:', int32Values.slice(0, 10));
break;
case 'Float64':
const float64Values = new Float64Array(column.data);
console.log('Float64 values:', float64Values.slice(0, 10));
break;
case 'Utf8':
// UTF-8 strings require offset buffer + data buffer
if (column.extraBuffers && column.extraBuffers.length > 0) {
const offsets = new Int32Array(column.extraBuffers[0]);
const bytes = new Uint8Array(column.data);
// Extract first few strings
const strings = [];
for (let i = 0; i < Math.min(5, offsets.length - 1); i++) {
const start = offsets[i];
const end = offsets[i + 1];
const stringBytes = bytes.slice(start, end);
strings.push(new TextDecoder().decode(stringBytes));
}
console.log('String values:', strings);
}
break;
case 'Boolean':
// Boolean arrays are bit-packed
const boolBytes = new Uint8Array(column.data);
const boolValues = [];
for (let i = 0; i < Math.min(column.length, 40); i++) {
const byteIndex = Math.floor(i / 8);
const bitIndex = i % 8;
const value = (boolBytes[byteIndex] & (1 << bitIndex)) !== 0;
boolValues.push(value);
}
console.log('Boolean values:', boolValues);
break;
default:
console.log('Raw data buffer size:', column.data.byteLength);
}
// Handle null bitmap if present
if (column.nullBitmap) {
const nullBytes = new Uint8Array(column.nullBitmap);
console.log('Has null bitmap, size:', nullBytes.length);
// Count nulls in first 64 values
let nullCount = 0;
for (let i = 0; i < Math.min(column.length, 64); i++) {
const byteIndex = Math.floor(i / 8);
const bitIndex = i % 8;
const isNull = (nullBytes[byteIndex] & (1 << bitIndex)) === 0;
if (isNull) nullCount++;
}
console.log(`Null count in first 64 values: ${nullCount}`);
}
}=� API Reference
Core Functions
| Function | Description | Returns |
|---|---|---|
initWasm(wasmBytes?, opts?) |
Initialize the WASM module | Promise<Result<void>> |
readTableFromArrayBuffer(data) |
Read Arrow/Feather/Parquet data | Promise<Result<TableHandle>> |
getSchemaSummary(handle) |
Get table schema information | Promise<Result<SchemaSummary>> |
getTableRowCount(handle) |
Get number of rows in table | Promise<Result<number>> |
exportColumnByName(handle, name) |
Export column data as typed arrays | Promise<Result<ColumnExport>> |
writeTableToIpc(handle, enableLz4) |
Write table as Arrow IPC | Promise<Result<ArrayBuffer>> |
freeTable(handle) |
Release table memory | Promise<Result<void>> |
Utility Functions
| Function | Description | Returns |
|---|---|---|
isWasmInitialized() |
Check if WASM is ready | boolean |
isValidHandle(handle) |
Validate table handle | boolean |
getMemoryStats() |
Get memory usage statistics | Promise<Result<MemoryStats>> |
clearAllTables() |
Clear all tables (testing) | void |
TypeScript Types
// Result type for error handling
export type Result<T> =
| { ok: true; value: T }
| { ok: false; error: string };
// Table handle (opaque reference)
export type TableHandle = number;
// Schema information
export interface SchemaSummary {
columns: ColumnInfo[];
metadata: Record<string, string>;
}
export interface ColumnInfo {
name: string;
arrowType: string;
nullable: boolean;
}
// Column export data
export interface ColumnExport {
arrowType: string;
nullable: boolean;
data: ArrayBuffer;
nullBitmap?: ArrayBuffer;
extraBuffers?: ArrayBuffer[];
}
// Memory statistics
export interface MemoryStats {
activeTables: number;
totalRows: number;
totalBatches: number;
}
// Initialization options
export interface WasmInitOptions {
enableConsoleLogs?: boolean;
}� Performance Considerations
Memory Management Best Practices
- Always free table handles when done processing
- Use
getMemoryStats()to monitor memory usage - Process files in batches for large datasets
- Enable LZ4 compression for network transfers
Optimization Tips
// Good: Process and immediately free
async function efficientProcessing(data: ArrayBuffer) {
const result = await readTableFromArrayBuffer(data);
if (result.ok) {
const schema = await getSchemaSummary(result.value);
const output = await writeTableToIpc(result.value, true);
await freeTable(result.value); // Free immediately
return output;
}
}
// Avoid: Accumulating handles without cleanup
const handles = []; // This will leak memory!
for (const file of files) {
const result = await readTableFromArrayBuffer(await file.arrayBuffer());
if (result.ok) {
handles.push(result.value); // Don't do this
}
}< Browser vs Node.js
Browser-Specific Features
- Automatic WASM loading from CDN
- File drag-and-drop processing
- Blob/URL creation for downloads
- Worker thread support
Node.js-Specific Features
- File system integration
- Stream processing
- Buffer handling
- Command-line tools
>� Testing
This library includes a comprehensive Model-Based Development (MBD) test suite for thorough validation:
# Run the full MBD test suite
cd mbd_tests
npm install
npm run test:models
# Cross-browser testing
npm run test:browsers
# Performance benchmarking
npm run test:performance
# Model-based test generation
npm run models:generateTest Coverage
- Behavioral Models: API lifecycle, error handling, memory management
- Structural Models: Architecture validation, data flow verification
- Cross-Browser: Chrome, Firefox, Safari compatibility
- Performance: Baseline establishment and regression detection
> Contributing
Contributions are welcome! Please read our contributing guidelines and ensure all tests pass before submitting a pull request.
Development Setup
# Clone the repository
git clone https://github.com/BectorVoom/arrow-rs-wasm.git
cd arrow-rs-wasm
# Install dependencies
npm install
# Build the project
npm run build
# Run tests
npm test
# Run comprehensive MBD tests
cd mbd_tests && npm run validate:all=� License
This project is dual-licensed under either:
- MIT License (LICENSE-MIT or http://opensource.org/licenses/MIT)
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
at your option.
= Links
- npm package
- GitHub repository
- Apache Arrow project
- WebAssembly
- arrow - The official Rust Arrow implementation
- wasm-bindgen - Rust and WebAssembly integration
- lz4_flex - Pure Rust LZ4 implementation
Built with d using Rust and WebAssembly