Package Exports
- table-header-detective
- table-header-detective/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (table-header-detective) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
๐ต๏ธ Table Header Detective
An intelligent utility for detecting header rows in tabular data using multiple heuristics and adaptive algorithms.
๐ Features
- Detects if tabular data has one or more header rows
- Analyzes multiple factors to make an intelligent decision:
- Type differences (strings vs numbers)
- Capitalization patterns (Title Case, ALL CAPS)
- Format consistency
- Text length patterns
- Special character usage
- Empty cell analysis
- Value uniqueness
- Common header terminology
- Advanced detection capabilities:
- Multi-header detection for tables with title rows
- Adaptive weighting based on table size and characteristics
- Handling of challenging edge cases and varied formats
- Returns detailed reasoning for why rows were classified as headers
- Written in TypeScript with full type safety
- Zero dependencies
- Configurable detection parameters
- Thoroughly tested with extensive test fixtures
๐ฆ Installation
npm install table-header-detective
# or
yarn add table-header-detective๐ Quick Start
// Import the detector and all default heuristics
import { detectHeaderRows, DEFAULT_HEURISTICS } from "table-header-detective";
// Sample tabular data
const tableData = [
["ID", "Name", "Age", "Department", "Salary"],
[1, "John Doe", 32, "Marketing", 58000],
[2, "Jane Smith", 28, "Engineering", 72000],
[3, "Bob Johnson", 45, "Finance", 91000],
];
// Detect header rows using all heuristics
const result = detectHeaderRows(tableData, {}, DEFAULT_HEURISTICS);
console.log(`Found ${result.headerRowIndices.length} header rows`);
console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);
console.log("Header rows:", result.headerRowIndices);Multi-header Detection
The library can detect complex header structures with multiple consecutive header rows:
// Table with title and column headers
const multiHeaderTable = [
["QUARTERLY FINANCIAL REPORT", "", "", "", "", ""],
["Q1", "Q2", "Q3", "Q4", "YTD", "Budget"],
[125000, 132000, 141000, 138000, 536000, 520000],
[95000, 97000, 102000, 99000, 393000, 400000],
];
const result = detectHeaderRows(multiHeaderTable, {}, DEFAULT_HEURISTICS);
// Detects rows 0 and 1 as headers with high confidenceOptimizing Bundle Size
For production applications, you can reduce bundle size by only importing the specific heuristics you need:
import { detectHeaderRows } from "table-header-detective";
import {
TypeDifferenceHeuristic,
CapitalizationPatternHeuristic
} from "table-header-detective";
// Use only specific heuristics to reduce bundle size
const result = detectHeaderRows(tableData, {}, [
TypeDifferenceHeuristic,
CapitalizationPatternHeuristic
]);๐ API Reference
detectHeaderRows(rows, options?, heuristics?)
Main function that analyzes tabular data and detects header rows.
Parameters
rows: TableData- The tabular data to analyze as an array of arraysoptions?: HeaderDetectionOptions- Optional configuration settingsheuristics?: HeaderHeuristic[]- Array of heuristics to use for detection. You must provide either specific heuristics or DEFAULT_HEURISTICS.
Returns
Returns a HeaderDetectionResult object containing:
headerRowIndices: number[]- Array of row indices identified as headersconfidence: number- Confidence level (0-1) that the detected rows are headersmessage: string- Human-readable result messagedetails: RowDetail[]- Detailed analysis of each row
Available Heuristics
The library provides multiple heuristics, each analyzing different aspects of the table:
TypeDifferenceHeuristic- Analyzes data type patterns (strings vs. numbers)CapitalizationPatternHeuristic- Detects typical header capitalization (ALL CAPS, Title Case)FormatConsistencyHeuristic- Identifies format pattern differences between rowsTextLengthPatternHeuristic- Analyzes text length patterns (headers tend to be shorter)SpecialCharUsageHeuristic- Examines special character usage in cellsEmptyCellAnalysisHeuristic- Analyzes patterns of empty cellsValueUniquenessHeuristic- Checks for unique values typical of headersHeaderTerminologyHeuristic- Detects common header terms and terminology
Import only the heuristics you need to optimize bundle size, or use DEFAULT_HEURISTICS to include all of them.
Adaptive Weighting System
The detector uses an intelligent adaptive weighting system that adjusts heuristic importance based on table characteristics:
For small tables (1-3 rows):
- Upweights terminology-based detection
- Downweights pattern-based heuristics that need more rows
For large tables (10+ rows):
- Upweights pattern recognition heuristics
- Better utilizes statistical patterns across larger datasets
This ensures optimal header detection regardless of table size or structure.
Configuration Options
You can customize the detection behavior:
import { detectHeaderRows, DEFAULT_HEURISTICS } from "table-header-detective";
const result = detectHeaderRows(myData, {
// Minimum score to consider a row a header (0-1)
headerThreshold: 0.6, // Default is 0.6
// Maximum number of consecutive header rows to detect
maxHeaderRows: 3, // Default is 3
// Whether to apply first row bias (higher score for first row)
applyFirstRowBias: true, // Default is true
// Additional custom header terms to look for
customHeaderTerms: ["code", "reference", "location"],
// Minimum rows needed for sampling patterns
minSampleRows: 2 // Default is 2
}, DEFAULT_HEURISTICS);
// You can also pass configuration options to specific heuristics
import { HeaderTerminologyHeuristic } from "table-header-detective";
const customTerminologyHeuristic = {
...HeaderTerminologyHeuristic,
analyze: (rows) => HeaderTerminologyHeuristic.analyze(rows, {
customHeaderTerms: ["product_id", "sku_code", "category_name"],
matchPartialTerms: true,
headerTermBoost: 0.5
})
};
const result = detectHeaderRows(myData, options, [
customTerminologyHeuristic,
// ...other heuristics
]);๐ Examples
Example 1: CSV Data
import { detectHeaderRows } from "table-header-detective";
import { DEFAULT_HEURISTICS } from "table-header-detective";
import { parse } from "csv-parse/sync";
import fs from "fs";
// Read and parse CSV
const csvData = fs.readFileSync("data.csv", "utf8");
const records = parse(csvData);
// Detect headers
const result = detectHeaderRows(records, {}, DEFAULT_HEURISTICS);
if (result.headerRowIndices.length > 0) {
// Extract headers
const headers = result.headerRowIndices.map((idx) => records[idx]);
console.log("Detected headers:", headers);
// Extract data (rows after headers)
const lastHeaderIdx = Math.max(...result.headerRowIndices);
const data = records.slice(lastHeaderIdx + 1);
console.log(`${data.length} data rows found`);
}Example 2: Exploring Detailed Results
import { detectHeaderRows } from "table-header-detective";
const data = [
/* your table data */
];
const result = detectHeaderRows(data);
// Examine detailed analysis for each row
result.details.forEach((detail) => {
console.log(`Row ${detail.rowIndex}:`);
console.log(` Is header: ${detail.isLikelyHeader}`);
console.log(` Header Type: ${detail.headerType}`);
console.log(` Confidence: ${(detail.confidence * 100).toFixed(1)}%`);
console.log(` Reasons:`);
detail.reasons.forEach((reason) => {
console.log(` - ${reason}`);
});
console.log("");
});๐ ๏ธ Advanced Usage
Custom Heuristic Combinations
You can select and configure specific heuristics based on your data characteristics:
import { detectHeaderRows } from "table-header-detective";
import {
TypeDifferenceHeuristic,
HeaderTerminologyHeuristic,
EmptyCellAnalysisHeuristic
} from "table-header-detective";
// For tables with mixed data types and specialized terminology
const result = detectHeaderRows(tableData, options, [
TypeDifferenceHeuristic,
HeaderTerminologyHeuristic,
EmptyCellAnalysisHeuristic
]);Multi-Header Table Detection
The library is specially designed to recognize complex multi-header tables, like:
// Complex header with title rows and column headers
const complexHeaderTable = [
['EMPLOYEE INFORMATION', '', '', '', ''],
['FISCAL YEAR 2023', '', '', '', ''],
['ID', 'Name', 'Age', 'Department', 'Salary'],
[1, 'John Doe', 32, 'Marketing', 58000],
// More data rows...
];
const result = detectHeaderRows(complexHeaderTable, {}, DEFAULT_HEURISTICS);
// Will detect all three rows as headers (0, 1, and 2)Creating Custom Heuristics
You can create your own custom heuristics by implementing the HeaderHeuristic interface:
import { HeaderHeuristic } from "table-header-detective";
import { TableData, HeuristicResult } from "table-header-detective";
// Custom heuristic for multi-line headers
const MultiLineHeaderHeuristic: HeaderHeuristic = {
name: 'multiLineHeader',
analyze: (rows: TableData): HeuristicResult => {
// Your custom analysis logic here
const rowScores: number[] = [];
const reasons: string[][] = [];
// Analyze each row and populate rowScores and reasons
// ...
return { rowScores, reasons };
}
};
// Use your custom heuristic
const result = detectHeaderRows(tableData, options, [
TypeDifferenceHeuristic,
MultiLineHeaderHeuristic
]);Helper Functions
The package exports additional helper functions for advanced usage:
isTitleCase(str)- Checks if a string follows Title Case conventiondetectCellType(value)- Detects the data type of a cell valuegetRowCompleteness(row, options)- Analyzes how complete a row isnormalizeValue(value, options)- Normalizes values for comparison
๐งช Testing
The library includes comprehensive test coverage for various table scenarios:
- Basic header detection for simple tables
- Complex multi-level header structures
- Tables with various formatting challenges
- Large tables with many rows
- Edge cases and challenging patterns
Each heuristic is individually tested, and integration tests verify the system works correctly as a whole.
๐ฎ Future Enhancements
Future versions will add support for even more complex table structures:
- Full multi-level header detection with column groups spanning multiple columns
- Better detection of mid-table category headers
- Support for merged cell patterns from spreadsheet exports
- Handling of tables with irregular structures
๐ License
MIT ยฉ Lyle Underwood