JSPM

csv-super

1.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 141
  • Score
    100M100P100Q57330F
  • License MIT

Read a 10GB CSV file without crashing RAM — Streaming CSV parser with fixed 50MB memory footprint

Package Exports

  • csv-super

Readme

csv-super ⚡

Read a 10GB CSV file without crashing RAM.
Fixed ~50MB memory footprint. RFC 4180 compliant. Zero dependencies.

npm version npm downloads GitHub stars TypeScript License: MIT Node.js Zero deps


The Real Problem

// ❌ This CRASHES on a 10GB file — allocates 16+ GB of RAM
const results = [];
fs.createReadStream('data.csv')
  .pipe(csvParser())
  .on('data', row => results.push(row));   // ← ALL rows land in heap

Why it crashes: csv-parser, fast-csv, and papaparse emit rows via EventEmitter 'data' events. With no backpressure applied, the stream reads at disk speed (~500 MB/s) and piles all rows into an array. By the time a 10GB file finishes, the V8 heap holds 15–30GB of objects (JSON has 1.5–3× overhead).

The Solution

import { csvSuper } from 'csv-super';

// ✅ RAM stays at ~50MB regardless of file size
for await (const { rows, totalSoFar } of csvSuper('data.csv', { batch: 1000 })) {
  await db.insertMany(rows);                // process 1000 rows at a time
  console.log(`Processed: ${totalSoFar}`);  // cumulative counter built-in
}

Why it works: csv-super uses an async function* (Async Generator). Each yield suspends the generator until the consumer's await resolves. While suspended, fs.createReadStream is automatically paused — no data accumulates in the heap. This is native JavaScript backpressure.


Performance Comparison

Metric csv-parser (traditional) csv-super
1GB file ~2GB RAM ~50MB RAM
10GB file 💀 OOM crash ~50MB RAM
First row delay After full file read Within seconds
TypeScript Partial 100% typed
RFC 4180 Partial Full compliance
Dependencies 2 0
Node.js ≥ 18

Installation

npm install csv-super

Requirements: Node.js ≥ 18.0.0 (uses native fs.createReadStream + Async Generators)


Quick Start

import { csvSuper } from 'csv-super';

// Basic: 1000 rows per batch (default)
for await (const batch of csvSuper('sales.csv')) {
  console.log(batch.rows);      // CsvRow[] = Record<string, string>[]
  console.log(batch.batchIndex); // 0, 1, 2, ...
  console.log(batch.count);      // rows in this batch (≤ 1000)
  console.log(batch.totalSoFar); // cumulative rows processed
}

API Reference

csvSuper(filePath, options?)AsyncGenerator<BatchResult>

import { csvSuper } from 'csv-super';
import type { CsvSuperOptions, BatchResult } from 'csv-super';

Options

Option Type Default Description
batch number 1000 Rows per yielded batch (1–100000)
delimiter string ',' Field separator (single char)
quote string '"' Quote character
escape string '"' Escape char (RFC 4180: same as quote)
headers boolean true First row = column names
skipEmptyLines boolean true Skip blank lines
encoding 'utf8' | 'utf16le' | 'latin1' | 'auto' 'auto' File encoding (auto = BOM detection)
chunkSize number 65536 Read buffer size in bytes (≥ 1024)
onProgress (info: ProgressInfo) => void null Progress callback

BatchResult

interface BatchResult {
  rows:       CsvRow[];    // The parsed rows
  batchIndex: number;      // 0-based batch counter
  count:      number;      // rows.length (convenience)
  totalSoFar: number;      // cumulative rows across all batches
}

ProgressInfo

interface ProgressInfo {
  bytesRead:             number;  // bytes consumed so far
  totalBytes:            number;  // total file size
  percentage:            number;  // 0.0–100.0
  speedMBps:             number;  // current read speed (sliding window)
  estimatedSecondsLeft:  number;  // ETA in seconds
  rowsProcessed:         number;  // rows parsed so far
}

Examples

Insert into Database

import { csvSuper } from 'csv-super';

for await (const batch of csvSuper('customers.csv', { batch: 5000 })) {
  await db.customers.insertMany(batch.rows);
  console.log(`✅ Inserted ${batch.totalSoFar} customers`);
}

Progress Bar

for await (const batch of csvSuper('large.csv', {
  batch: 5000,
  onProgress: ({ percentage, speedMBps, estimatedSecondsLeft }) => {
    const bar = '█'.repeat(Math.floor(percentage / 5)).padEnd(20, '░');
    process.stdout.write(
      `\r[${bar}] ${percentage.toFixed(1)}% @ ${speedMBps.toFixed(1)} MB/s`
    );
  },
})) {
  await processRows(batch.rows);
}

TSV Files

for await (const batch of csvSuper('data.tsv', { delimiter: '\t' })) {
  // ...
}

Early Termination (stream closes automatically, no leak)

for await (const batch of csvSuper('events.csv')) {
  for (const row of batch.rows) {
    if (row.severity === 'CRITICAL') {
      console.log('Critical event found:', row);
      return; // or break — stream closes cleanly
    }
  }
}

No Headers (index-based access)

for await (const batch of csvSuper('raw.csv', { headers: false })) {
  // rows: { '0': 'value1', '1': 'value2', ... }
  console.log(batch.rows[0]?.['0']);
}

Pro Features ($17/month)

For enterprise workloads that process CSV files daily, csv-super Pro adds:

Multi-Thread Processing (Worker Threads)

import { csvSuperPro } from 'csv-super';

for await (const batch of csvSuperPro('huge-file.csv', {
  licenseKey: process.env.CSV_SUPER_KEY,
  threads: 8,         // Use 8 CPU cores in parallel
  batch: 10_000,
})) {
  await db.insertMany(batch.rows);
}

Speed: ~N× faster on N-core machines. A file that takes 60s on 1 core takes ~8s on 8 cores.

Transform Pipeline

import { csvSuperPro, TransformPipeline } from 'csv-super';

const pipeline = new TransformPipeline()
  .filter(row => row.status === 'active')          // Filter rows
  .trim()                                           // Trim all fields
  .select(['id', 'name', 'email', 'salary'])        // Select columns
  .rename({ 'id': 'employee_id' })                  // Rename columns
  .mapField('salary', v => String(parseInt(v, 10))) // Type coerce
  .pipe(async row => {                              // Async enrichment
    const dept = await getDept(row.employee_id);
    return { ...row, department: dept };
  });

for await (const batch of csvSuperPro('employees.csv', {
  licenseKey: process.env.CSV_SUPER_KEY,
  transform: pipeline.toFn(),
})) {
  await db.employees.insertMany(batch.rows);
}

Pro Pricing

Plan Price Features
Free $0 forever Streaming + Batch + TypeScript + RFC 4180 + Progress
Pro $17/month + Multi-thread + Transform + Priority support
Enterprise Custom + SLA + Custom seat count + Dedicated support

Get Pro →


Architecture

I/O Layer           Parser Layer           Delivery Layer
──────────          ──────────────         ───────────────
fs.createReadStream → CsvParser (FSM)  →  BatchController
     ↓                    ↓                     ↓
  64KB chunks      RFC 4180 State Machine   Async Generator
  (backpressure)   (incremental, chunked)   (yield + await)

Key insight: When the consumer awaits inside for await...of, the Async Generator suspends at the yield. While suspended, getNextChunk() is not called, so readStream.resume() is never called — the stream stays paused. This is mechanical, guaranteed backpressure.

Memory formula:

heap ≈ chunkSize (64KB) + batch_size × avg_row_size
     ≈ 0.064MB  +  1000  ×  0.05MB
     ≈ ~50MB (constant regardless of file size)

RFC 4180 Compliance

Full support for all CSV edge cases:

✅ Fields with commas:    "Smith, John"
✅ Fields with newlines:  "line1\nline2"
✅ Escaped quotes (x2):  "He said ""hi"""
✅ Empty fields:          a,,b
✅ CRLF line endings:     \r\n
✅ No trailing newline    (handled by finalize())
✅ Unicode content:       UTF-8, UTF-16 LE/BE
✅ Custom delimiters:     TSV (\t), PSV (|), SSV (;)

Error Handling

import { csvSuper, CsvSuperError, ParseError } from 'csv-super';

try {
  for await (const batch of csvSuper('data.csv')) {
    await processRows(batch.rows);
  }
} catch (err) {
  if (err instanceof ParseError) {
    console.error(`Parse error at line ${err.lineNumber}: ${err.message}`);
  } else if (err instanceof CsvSuperError) {
    console.error(`csv-super error [${err.code}]: ${err.message}`);
  } else {
    throw err; // re-throw unexpected errors
  }
}

TypeScript

Fully typed out of the box. No @types package needed.

import type {
  CsvRow,           // Record<string, string>
  BatchResult,      // { rows, batchIndex, count, totalSoFar }
  ProgressInfo,     // { percentage, speedMBps, ... }
  CsvSuperOptions,  // full options type
  TransformFn,      // (row: CsvRow) => CsvRow | null | Promise<CsvRow | null>
} from 'csv-super';

Contributing

Issues and PRs are welcome at github.com/csv-super/csv-super.

git clone https://github.com/Brah-Timo/csv-super
cd csv-super
npm install
npm test
npm run bench

License

Core (free tier): MIT License
Pro (multi-thread + transform): Commercial License — see csv-super.dev/pro


Built with ❤️ for data engineers who have felt the pain of a 10GB CSV on a 16GB server.