JSPM

table-header-detective

1.0.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 3
  • Score
    100M100P100Q12852F
  • License MIT

Intelligent detection of header rows in tabular data using multiple heuristics

Package Exports

  • table-header-detective
  • table-header-detective/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (table-header-detective) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

๐Ÿ•ต๏ธ Table Header Detective

npm version License: MIT TypeScript

An intelligent utility for detecting header rows in tabular data using multiple heuristics and adaptive algorithms.

๐Ÿ” Features

  • Detects if tabular data has one or more header rows
  • Analyzes multiple factors to make an intelligent decision:
    • Type differences (strings vs numbers)
    • Capitalization patterns (Title Case, ALL CAPS)
    • Format consistency
    • Text length patterns
    • Special character usage
    • Empty cell analysis
    • Value uniqueness
    • Common header terminology
  • Advanced detection capabilities:
    • Multi-header detection for tables with title rows
    • Adaptive weighting based on table size and characteristics
    • Handling of challenging edge cases and varied formats
  • Returns detailed reasoning for why rows were classified as headers
  • Written in TypeScript with full type safety
  • Zero dependencies
  • Configurable detection parameters
  • Thoroughly tested with extensive test fixtures

๐Ÿ“ฆ Installation

npm install table-header-detective
# or
yarn add table-header-detective

๐Ÿš€ Quick Start

// Import the detector and all default heuristics
import { detectHeaderRows, DEFAULT_HEURISTICS } from "table-header-detective";

// Sample tabular data
const tableData = [
  ["ID", "Name", "Age", "Department", "Salary"],
  [1, "John Doe", 32, "Marketing", 58000],
  [2, "Jane Smith", 28, "Engineering", 72000],
  [3, "Bob Johnson", 45, "Finance", 91000],
];

// Detect header rows using all heuristics
const result = detectHeaderRows(tableData, {}, DEFAULT_HEURISTICS);

console.log(`Found ${result.headerRowIndices.length} header rows`);
console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);
console.log("Header rows:", result.headerRowIndices);

Multi-header Detection

The library can detect complex header structures with multiple consecutive header rows:

// Table with title and column headers
const multiHeaderTable = [
  ["QUARTERLY FINANCIAL REPORT", "", "", "", "", ""],
  ["Q1", "Q2", "Q3", "Q4", "YTD", "Budget"],
  [125000, 132000, 141000, 138000, 536000, 520000],
  [95000, 97000, 102000, 99000, 393000, 400000],
];

const result = detectHeaderRows(multiHeaderTable, {}, DEFAULT_HEURISTICS);
// Detects rows 0 and 1 as headers with high confidence

Optimizing Bundle Size

For production applications, you can reduce bundle size by only importing the specific heuristics you need:

import { detectHeaderRows } from "table-header-detective";
import { 
  TypeDifferenceHeuristic, 
  CapitalizationPatternHeuristic 
} from "table-header-detective";

// Use only specific heuristics to reduce bundle size
const result = detectHeaderRows(tableData, {}, [
  TypeDifferenceHeuristic,
  CapitalizationPatternHeuristic
]);

๐Ÿ“‹ API Reference

detectHeaderRows(rows, options?, heuristics?)

Main function that analyzes tabular data and detects header rows.

Parameters

  • rows: TableData - The tabular data to analyze as an array of arrays
  • options?: HeaderDetectionOptions - Optional configuration settings
  • heuristics?: HeaderHeuristic[] - Array of heuristics to use for detection. You must provide either specific heuristics or DEFAULT_HEURISTICS.

Returns

Returns a HeaderDetectionResult object containing:

  • headerRowIndices: number[] - Array of row indices identified as headers
  • confidence: number - Confidence level (0-1) that the detected rows are headers
  • message: string - Human-readable result message
  • details: RowDetail[] - Detailed analysis of each row

Available Heuristics

The library provides multiple heuristics, each analyzing different aspects of the table:

  • TypeDifferenceHeuristic - Analyzes data type patterns (strings vs. numbers)
  • CapitalizationPatternHeuristic - Detects typical header capitalization (ALL CAPS, Title Case)
  • FormatConsistencyHeuristic - Identifies format pattern differences between rows
  • TextLengthPatternHeuristic - Analyzes text length patterns (headers tend to be shorter)
  • SpecialCharUsageHeuristic - Examines special character usage in cells
  • EmptyCellAnalysisHeuristic - Analyzes patterns of empty cells
  • ValueUniquenessHeuristic - Checks for unique values typical of headers
  • HeaderTerminologyHeuristic - Detects common header terms and terminology

Import only the heuristics you need to optimize bundle size, or use DEFAULT_HEURISTICS to include all of them.

Adaptive Weighting System

The detector uses an intelligent adaptive weighting system that adjusts heuristic importance based on table characteristics:

  • For small tables (1-3 rows):

    • Upweights terminology-based detection
    • Downweights pattern-based heuristics that need more rows
  • For large tables (10+ rows):

    • Upweights pattern recognition heuristics
    • Better utilizes statistical patterns across larger datasets

This ensures optimal header detection regardless of table size or structure.

Configuration Options

You can customize the detection behavior:

import { detectHeaderRows, DEFAULT_HEURISTICS } from "table-header-detective";

const result = detectHeaderRows(myData, {
  // Minimum score to consider a row a header (0-1)
  headerThreshold: 0.6, // Default is 0.6

  // Maximum number of consecutive header rows to detect
  maxHeaderRows: 3, // Default is 3

  // Whether to apply first row bias (higher score for first row)
  applyFirstRowBias: true, // Default is true

  // Additional custom header terms to look for
  customHeaderTerms: ["code", "reference", "location"],
  
  // Minimum rows needed for sampling patterns
  minSampleRows: 2 // Default is 2
}, DEFAULT_HEURISTICS);

// You can also pass configuration options to specific heuristics
import { HeaderTerminologyHeuristic } from "table-header-detective";

const customTerminologyHeuristic = {
  ...HeaderTerminologyHeuristic,
  analyze: (rows) => HeaderTerminologyHeuristic.analyze(rows, {
    customHeaderTerms: ["product_id", "sku_code", "category_name"],
    matchPartialTerms: true,
    headerTermBoost: 0.5
  })
};

const result = detectHeaderRows(myData, options, [
  customTerminologyHeuristic,
  // ...other heuristics
]);

๐Ÿ“Š Examples

Example 1: CSV Data

import { detectHeaderRows } from "table-header-detective";
import { DEFAULT_HEURISTICS } from "table-header-detective";
import { parse } from "csv-parse/sync";
import fs from "fs";

// Read and parse CSV
const csvData = fs.readFileSync("data.csv", "utf8");
const records = parse(csvData);

// Detect headers
const result = detectHeaderRows(records, {}, DEFAULT_HEURISTICS);

if (result.headerRowIndices.length > 0) {
  // Extract headers
  const headers = result.headerRowIndices.map((idx) => records[idx]);
  console.log("Detected headers:", headers);

  // Extract data (rows after headers)
  const lastHeaderIdx = Math.max(...result.headerRowIndices);
  const data = records.slice(lastHeaderIdx + 1);
  console.log(`${data.length} data rows found`);
}

Example 2: Exploring Detailed Results

import { detectHeaderRows } from "table-header-detective";

const data = [
  /* your table data */
];
const result = detectHeaderRows(data);

// Examine detailed analysis for each row
result.details.forEach((detail) => {
  console.log(`Row ${detail.rowIndex}:`);
  console.log(`  Is header: ${detail.isLikelyHeader}`);
  console.log(`  Header Type: ${detail.headerType}`);
  console.log(`  Confidence: ${(detail.confidence * 100).toFixed(1)}%`);
  console.log(`  Reasons:`);
  detail.reasons.forEach((reason) => {
    console.log(`    - ${reason}`);
  });
  console.log("");
});

๐Ÿ› ๏ธ Advanced Usage

Custom Heuristic Combinations

You can select and configure specific heuristics based on your data characteristics:

import { detectHeaderRows } from "table-header-detective";
import { 
  TypeDifferenceHeuristic, 
  HeaderTerminologyHeuristic,
  EmptyCellAnalysisHeuristic
} from "table-header-detective";

// For tables with mixed data types and specialized terminology
const result = detectHeaderRows(tableData, options, [
  TypeDifferenceHeuristic,
  HeaderTerminologyHeuristic,
  EmptyCellAnalysisHeuristic
]);

Multi-Header Table Detection

The library is specially designed to recognize complex multi-header tables, like:

// Complex header with title rows and column headers
const complexHeaderTable = [
  ['EMPLOYEE INFORMATION', '', '', '', ''],
  ['FISCAL YEAR 2023', '', '', '', ''],
  ['ID', 'Name', 'Age', 'Department', 'Salary'],
  [1, 'John Doe', 32, 'Marketing', 58000],
  // More data rows...
];

const result = detectHeaderRows(complexHeaderTable, {}, DEFAULT_HEURISTICS);
// Will detect all three rows as headers (0, 1, and 2)

Creating Custom Heuristics

You can create your own custom heuristics by implementing the HeaderHeuristic interface:

import { HeaderHeuristic } from "table-header-detective";
import { TableData, HeuristicResult } from "table-header-detective";

// Custom heuristic for multi-line headers
const MultiLineHeaderHeuristic: HeaderHeuristic = {
  name: 'multiLineHeader',
  analyze: (rows: TableData): HeuristicResult => {
    // Your custom analysis logic here
    const rowScores: number[] = [];
    const reasons: string[][] = [];
    
    // Analyze each row and populate rowScores and reasons
    // ...
    
    return { rowScores, reasons };
  }
};

// Use your custom heuristic
const result = detectHeaderRows(tableData, options, [
  TypeDifferenceHeuristic,
  MultiLineHeaderHeuristic
]);

Helper Functions

The package exports additional helper functions for advanced usage:

  • isTitleCase(str) - Checks if a string follows Title Case convention
  • detectCellType(value) - Detects the data type of a cell value
  • getRowCompleteness(row, options) - Analyzes how complete a row is
  • normalizeValue(value, options) - Normalizes values for comparison

๐Ÿงช Testing

The library includes comprehensive test coverage for various table scenarios:

  • Basic header detection for simple tables
  • Complex multi-level header structures
  • Tables with various formatting challenges
  • Large tables with many rows
  • Edge cases and challenging patterns

Each heuristic is individually tested, and integration tests verify the system works correctly as a whole.

๐Ÿ”ฎ Future Enhancements

Future versions will add support for even more complex table structures:

  • Full multi-level header detection with column groups spanning multiple columns
  • Better detection of mid-table category headers
  • Support for merged cell patterns from spreadsheet exports
  • Handling of tables with irregular structures

๐Ÿ“„ License

MIT ยฉ Lyle Underwood