Package Exports
- @eredzik/calaminejs
- @eredzik/calaminejs/node
- @eredzik/calaminejs/package.json
- @eredzik/calaminejs/web
Readme
@eredzik/calaminejs
A high-performance JavaScript/TypeScript library for reading Excel files (XLS/XLSX) and converting them to Parquet format. Built on the Rust calamine library using WebAssembly for optimal performance.
Features
- 📊 Read Excel files (XLS and XLSX formats)
- 🚀 High performance through WebAssembly
- 🔄 Convert sheets to Parquet format
- 🎯 Smart header row detection
- 📈 Progress tracking for large files
- 🌐 Works in Node.js and browsers
- 📝 Full TypeScript support
- 🔢 Preserves data types (strings, numbers, booleans, dates, etc.)
Installation
npm install @eredzik/calaminejsQuick Start
Node.js
import { Workbook } from '@eredzik/calaminejs';
import { readFileSync } from 'fs';
// Read an Excel file
const buffer = readFileSync('data.xlsx');
const workbook = Workbook.from_bytes(new Uint8Array(buffer));
// Get sheet names
const sheetNames = workbook.sheet_names();
console.log('Sheets:', sheetNames);
// Access a sheet
const sheet = workbook.get_sheet(sheetNames[0]);
console.log(`Rows: ${sheet.row_count()}, Columns: ${sheet.col_count()}`);
// Access cell data
const rows = sheet.rows;
console.log('First row:', rows[0]);Browser
import { Workbook } from '@eredzik/calaminejs/web';
// From file input
const file = document.querySelector('input[type="file"]').files[0];
const arrayBuffer = await file.arrayBuffer();
const workbook = Workbook.from_bytes(new Uint8Array(arrayBuffer));
// Process the workbook
const sheet = workbook.get_sheet_by_index(0);
console.log('Sheet name:', sheet.name);API Reference
Workbook
The main class for working with Excel files.
Static Methods
Workbook.from_bytes(data: Uint8Array): Workbook
Load an Excel file from bytes. Automatically detects XLS or XLSX format.
const workbook = Workbook.from_bytes(new Uint8Array(buffer));Workbook.from_bytes_with_progress(data: Uint8Array, callback?: Function, interval?: number): Workbook
Load an Excel file with progress tracking.
const workbook = Workbook.from_bytes_with_progress(
new Uint8Array(buffer),
(progress) => {
console.log(`Processing sheet ${progress.sheetIndex + 1}/${progress.totalSheets}`);
console.log(`Sheet: ${progress.sheetName}, Row: ${progress.currentRow}`);
},
100 // Report progress every 100 rows
);Progress object properties:
sheetIndex: number- Current sheet index (0-based)totalSheets: number- Total number of sheetssheetName: string- Name of current sheetcurrentRow: number- Current row being processedtotalRows: number | null- Total rows (available when sheet is complete)
Instance Methods
sheet_names(): string[]
Get an array of all sheet names in the workbook.
const names = workbook.sheet_names();get_sheet(name: string): Sheet | undefined
Get a sheet by name.
const sheet = workbook.get_sheet('Sheet1');get_sheet_by_index(index: number): Sheet | undefined
Get a sheet by index (0-based).
const firstSheet = workbook.get_sheet_by_index(0);sheet_count(): number
Get the total number of sheets.
const count = workbook.sheet_count();Sheet
Represents a single worksheet in an Excel file.
Properties
name: string
The name of the sheet.
console.log(sheet.name); // "Sheet1"rows: Array<Array<any>>
A 2D array of cell values. Values are converted to native JavaScript types:
- Strings →
string - Numbers →
number - Booleans →
boolean - Empty cells →
null - Dates →
number(Excel date format) - Errors →
string
const rows = sheet.rows;
rows.forEach((row, rowIndex) => {
row.forEach((cell, colIndex) => {
console.log(`Cell [${rowIndex}, ${colIndex}]:`, cell);
});
});Methods
get_cell(row: number, col: number): CellValue | undefined
Get a specific cell with type information.
const cell = sheet.get_cell(0, 0);
if (cell.is_string) {
console.log('String value:', cell.to_string_value());
}row_count(): number
Get the number of rows in the sheet.
const rowCount = sheet.row_count();col_count(): number
Get the maximum number of columns in the sheet.
const colCount = sheet.col_count();infer_header_row(): HeaderInfo | undefined
Automatically detect which row contains the table header using heuristics:
- Headers typically contain string values in most columns
- Headers are followed by rows with data
- Headers have multiple non-empty cells
- Prioritizes rows in the first 20 rows
const headerInfo = sheet.infer_header_row();
if (headerInfo) {
console.log('Header found at row:', headerInfo.row_index);
console.log('Column names:', headerInfo.column_names);
}to_parquet(): Uint8Array
Convert the sheet to Parquet format. Column types are automatically inferred:
- All booleans → Boolean column
- All integers → Int64 column
- All floats/numbers → Float64 column
- All dates → Datetime column (millisecond precision)
- Mixed or strings → String column
const parquetBytes = sheet.to_parquet();
// Save to file or process furtherto_parquet_with_names(columnNames: string[]): Uint8Array
Convert the sheet to Parquet format with custom column names.
const columnNames = ['ID', 'Name', 'Age', 'Email'];
const parquetBytes = sheet.to_parquet_with_names(columnNames);CellValue
Detailed cell information with type checking and conversion methods.
Type Checking Properties
is_empty: boolean- Check if cell is emptyis_string: boolean- Check if cell contains a stringis_float: boolean- Check if cell contains a floatis_int: boolean- Check if cell contains an integeris_bool: boolean- Check if cell contains a booleanis_error: boolean- Check if cell contains an erroris_datetime: boolean- Check if cell contains a date/timeis_duration: boolean- Check if cell contains a duration
Conversion Methods
to_string_value(): string | undefined
Convert cell to string representation.
const cell = sheet.get_cell(0, 0);
const str = cell.to_string_value();to_float_value(): number | undefined
Convert cell to float (works for numbers, booleans, dates).
const num = cell.to_float_value();to_int_value(): number | undefined
Convert cell to integer (works for integers, floats, booleans).
const int = cell.to_int_value();to_bool_value(): boolean | undefined
Get boolean value (only works for boolean cells).
const bool = cell.to_bool_value();HeaderInfo
Information about detected header row.
Properties
row_index: number- The index of the header row (0-based)column_names: string[]- Array of column names extracted from the header
Examples
Reading and Processing Data
import { Workbook } from '@eredzik/calaminejs';
import { readFileSync } from 'fs';
const buffer = readFileSync('sales.xlsx');
const workbook = Workbook.from_bytes(new Uint8Array(buffer));
const sheet = workbook.get_sheet('Sales Data');
// Detect header
const headerInfo = sheet.infer_header_row();
if (headerInfo) {
console.log('Columns:', headerInfo.column_names);
// Process data rows (skip header)
const dataRows = sheet.rows.slice(headerInfo.row_index + 1);
dataRows.forEach(row => {
console.log('Row data:', row);
});
}Converting to Parquet
import { Workbook } from '@eredzik/calaminejs';
import { writeFileSync, readFileSync } from 'fs';
const buffer = readFileSync('data.xlsx');
const workbook = Workbook.from_bytes(new Uint8Array(buffer));
const sheet = workbook.get_sheet_by_index(0);
// Option 1: Auto-generated column names
const parquet1 = sheet.to_parquet();
writeFileSync('output1.parquet', parquet1);
// Option 2: Custom column names
const headerInfo = sheet.infer_header_row();
if (headerInfo) {
// Skip header row and convert data
const dataSheet = {
...sheet,
rows: sheet.rows.slice(headerInfo.row_index + 1)
};
const parquet2 = dataSheet.to_parquet_with_names(headerInfo.column_names);
writeFileSync('output2.parquet', parquet2);
}Working with Cell Types
const sheet = workbook.get_sheet_by_index(0);
for (let row = 0; row < sheet.row_count(); row++) {
for (let col = 0; col < sheet.col_count(); col++) {
const cell = sheet.get_cell(row, col);
if (cell.is_string) {
console.log(`String: ${cell.to_string_value()}`);
} else if (cell.is_int) {
console.log(`Integer: ${cell.to_int_value()}`);
} else if (cell.is_float) {
console.log(`Float: ${cell.to_float_value()}`);
} else if (cell.is_bool) {
console.log(`Boolean: ${cell.to_bool_value()}`);
} else if (cell.is_datetime) {
const excelDate = cell.to_float_value();
// Convert Excel date to JavaScript Date
const jsDate = new Date((excelDate - 25569) * 86400 * 1000);
console.log(`Date: ${jsDate.toISOString()}`);
}
}
}Progress Tracking for Large Files
import { Workbook } from '@eredzik/calaminejs';
import { readFileSync } from 'fs';
const buffer = readFileSync('large-file.xlsx');
console.log('Loading workbook...');
const workbook = Workbook.from_bytes_with_progress(
new Uint8Array(buffer),
(progress) => {
const percent = ((progress.currentRow / (progress.totalRows || progress.currentRow)) * 100).toFixed(1);
console.log(`[${progress.sheetName}] Processing: ${percent}%`);
},
500 // Report every 500 rows
);
console.log('Workbook loaded successfully!');Browser vs Node.js
The package provides separate builds optimized for each environment:
// Node.js (default)
import { Workbook } from '@eredzik/calaminejs';
// or
import { Workbook } from '@eredzik/calaminejs/node';
// Browser
import { Workbook } from '@eredzik/calaminejs/web';Performance Tips
- Use progress callbacks for large files to provide user feedback and avoid blocking
- Process sheets on-demand instead of loading all sheets at once
- Use
get_cell()for sparse data instead of accessing the fullrowsarray - Infer header once and reuse the result instead of calling it multiple times
- Convert to Parquet for efficient storage and further processing with data tools
Requirements
- Node.js >= 16.0.0 (for Node.js usage)
- Modern browser with WebAssembly support (for browser usage)
Supported File Formats
- XLSX - Excel 2007+ (.xlsx)
- XLS - Excel 97-2003 (.xls)
Supported Data Types
The library preserves Excel data types:
- Empty - Empty cells
- String - Text values
- Float - Floating-point numbers
- Int - Integer numbers
- Bool - Boolean values (TRUE/FALSE)
- Error - Excel error values (#N/A, #REF!, etc.)
- DateTime - Date and time values (stored as Excel serial numbers)
- Duration - Duration values
License
MIT
Repository
GitHub: https://github.com/eredzik/calaminejs
Issues
Report issues: https://github.com/eredzik/calaminejs/issues
Credits
Built on top of the excellent calamine Rust library and Polars for Parquet conversion.