Package Exports
- arrow-rs-wasm
- arrow-rs-wasm/arrow_rs_wasm.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (arrow-rs-wasm) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
arrow-rs-wasm
A high-performance WebAssembly library for Apache Arrow, Feather, and Parquet data processing with zero-copy semantics, LZ4 compression, and comprehensive TypeScript support.
โจ Features
- ๐ Zero-copy data transfer between WebAssembly and JavaScript
- ๐ Apache Arrow IPC format support with LZ4 compression
- ๐ชถ Feather and Parquet file format support
- ๐ง TypeScript-first API with complete type definitions
- ๐ Cross-platform - works in browsers and Node.js
- โก High performance - built on Rust's arrow-rs ecosystem
- ๐งช Model-based testing for comprehensive validation
- ๐ฆ Tree-shaking friendly ES modules
๐ฆ Installation
npm
npm install arrow-rs-wasmDirect WASM
# Using wasm-pack directly
wasm-pack build --target web๐ Quick Start
Browser Usage
import {
initialize,
createTestTable,
writeTableToIpc,
readTableFromBytes,
getTableInfo,
freeTable
} from 'arrow-rs-wasm';
// Initialize the WASM module
const initResult = await initialize();
if (!initResult.ok) {
console.error('Failed to initialize:', initResult.error);
return;
}
// Create a test table with sample data
const tableResult = createTestTable();
if (!tableResult.ok) {
console.error('Failed to create table:', tableResult.error);
return;
}
const tableHandle = tableResult.value;
// Get table information
const info = getTableInfo(tableHandle);
console.log(`Table has ${info.numRows} rows and ${info.numColumns} columns`);
// Write table to Arrow IPC format with LZ4 compression
const ipcData = writeTableToIpc(tableHandle, true);
console.log(`Serialized table to ${ipcData.length} bytes`);
// Read the data back
const newTableResult = readTableFromBytes(ipcData);
if (newTableResult.ok) {
console.log('Successfully round-tripped data!');
// Clean up
await freeTable(tableHandle);
await freeTable(newTableResult.value);
}Node.js Usage
import { readFile } from 'fs/promises';
import {
initialize,
readTableFromBytes,
getTableSchemaJson,
exportColumnWithType
} from 'arrow-rs-wasm';
async function processArrowFile(filePath: string) {
// Initialize WASM
await initialize();
// Read Arrow file
const fileData = await readFile(filePath);
const bytes = new Uint8Array(fileData);
// Parse Arrow data
const tableResult = readTableFromBytes(bytes);
if (!tableResult.ok) {
throw new Error(`Failed to read table: ${tableResult.error}`);
}
const tableHandle = tableResult.value;
// Get schema as JSON
const schema = getTableSchemaJson(tableHandle);
console.log('Schema:', JSON.parse(schema));
// Export a specific column
const columnData = exportColumnWithType(tableHandle, 'column_name', 'int32');
console.log('Column data:', columnData);
// Clean up
await freeTable(tableHandle);
}๐ API Reference
Core Functions
initialize(options?: InitOptions): Promise<Result<void>>
Initialize the WASM module. Must be called before using other functions.
interface InitOptions {
wasmUrl?: string; // Custom WASM binary URL
memoryPages?: number; // Initial memory allocation
}createTestTable(): Result<TableHandle>
Create a simple test table with sample data for experimentation.
const result = createTestTable();
if (result.ok) {
const handle = result.value;
// Use the table...
}readTableFromBytes(data: Uint8Array): Result<TableHandle>
Read an Arrow IPC format byte array into a table handle.
const bytes = new Uint8Array(arrowData);
const result = readTableFromBytes(bytes);
if (result.ok) {
const table = result.value;
// Process table...
}writeTableToIpc(handle: TableHandle, enableLz4: boolean): Uint8Array
Write a table to Arrow IPC format with optional LZ4 compression.
const compressed = writeTableToIpc(tableHandle, true);
const uncompressed = writeTableToIpc(tableHandle, false);Schema and Metadata
getTableSchemaJson(handle: TableHandle): string
Get the table schema as a JSON string.
const schemaJson = getTableSchemaJson(tableHandle);
const schema = JSON.parse(schemaJson);
console.log('Fields:', schema.fields);getTableInfo(handle: TableHandle): TableInfo
Get basic information about a table.
interface TableInfo {
numRows: number;
numColumns: number;
numBatches: number;
}exportColumnWithType(handle: TableHandle, columnName: string, dataType: string): any[]
Export a specific column's data as a JavaScript array.
const intColumn = exportColumnWithType(tableHandle, 'id', 'int32');
const stringColumn = exportColumnWithType(tableHandle, 'name', 'utf8');Memory Management
freeTable(handle: TableHandle): Promise<Result<void>>
Release a table handle and free associated memory. Always call this when done with a table.
await freeTable(tableHandle);getMemoryStats(): MemoryStats
Get current memory usage statistics.
interface MemoryStats {
activeTables: number;
totalRows: number;
totalBatches: number;
}๐ง Advanced Usage
Custom Schema Creation
import { initialize, createCustomTable } from 'arrow-rs-wasm';
await initialize();
// Define schema
const schema = {
fields: [
{ name: 'timestamp', type: 'timestamp', nullable: false },
{ name: 'sensor_id', type: 'int64', nullable: false },
{ name: 'temperature', type: 'float64', nullable: true },
{ name: 'location', type: 'utf8', nullable: true }
]
};
// Create table with custom data
const data = {
timestamp: [Date.now(), Date.now() + 1000],
sensor_id: [1001, 1002],
temperature: [23.5, 24.1],
location: ['Building A', 'Building B']
};
const tableHandle = createCustomTable(schema, data);Compression Comparison
const tableHandle = createTestTable().value;
// Compare compression ratios
const uncompressed = writeTableToIpc(tableHandle, false);
const compressed = writeTableToIpc(tableHandle, true);
console.log(`Uncompressed: ${uncompressed.length} bytes`);
console.log(`Compressed: ${compressed.length} bytes`);
console.log(`Ratio: ${(compressed.length / uncompressed.length * 100).toFixed(1)}%`);Batch Processing
async function processBatches(files: string[]) {
await initialize();
for (const file of files) {
const data = await readFile(file);
const result = readTableFromBytes(new Uint8Array(data));
if (result.ok) {
const info = getTableInfo(result.value);
console.log(`Processed ${file}: ${info.numRows} rows`);
await freeTable(result.value);
}
}
}๐ Browser Support
| Browser | Version | Notes |
|---|---|---|
| Chrome | 57+ | Full support |
| Firefox | 52+ | Full support |
| Safari | 11+ | Full support |
| Edge | 16+ | Full support |
Browser Setup
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Arrow WASM Demo</title>
</head>
<body>
<script type="module">
import init, {
initialize,
createTestTable,
getTableInfo
} from './pkg/arrow_wasm.js';
async function demo() {
await init();
await initialize();
const table = createTestTable();
if (table.ok) {
const info = getTableInfo(table.value);
console.log('Table created with', info.numRows, 'rows');
}
}
demo();
</script>
</body>
</html>โก Performance
Zero-Copy Benefits
The library uses zero-copy memory sharing between WebAssembly and JavaScript, providing significant performance advantages:
- Memory efficiency: No data duplication between WASM and JS
- Speed: Direct memory access without serialization overhead
- Scalability: Handle large datasets efficiently
Benchmarks
| Operation | Size | Time | Memory |
|---|---|---|---|
| Read IPC | 10MB | ~50ms | ~10MB |
| Write IPC | 10MB | ~75ms | ~20MB |
| Schema parse | 1000 fields | ~5ms | ~1MB |
๐งช Testing
The library includes comprehensive model-based testing:
# Run all tests
npm test
# Browser tests
npm run test:browser
# Node.js tests
npm run test:node
# Build and test
npm run build && npm test๐จ Development
Prerequisites
- Rust 1.70+
- Node.js 18+
- wasm-pack
Building
# Install dependencies
npm install
# Build WASM and TypeScript
npm run build
# Watch mode for development
npm run dev๐ License
This project is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Development Guidelines
- Follow the existing code style
- Add tests for new features
- Update documentation as needed
- Ensure all tests pass
๐ Links
๐ Roadmap
- Parquet writer support
- Additional compression algorithms (Zstd, Snappy)
- Streaming data support
- WebAssembly SIMD optimizations
- Advanced filtering and aggregation APIs
Built with โค๏ธ using Rust and WebAssembly