Package Exports
- @brinda_yawa/file-scanner
- @brinda_yawa/file-scanner/dist/index.mjs
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@brinda_yawa/file-scanner) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
๐ @brinda_yawa/file-scanner
A powerful Node.js package for extracting structured invoice data from PDF and image files using AI-powered OCR and vision models.
โจ Features
- ๐ค AI-Powered Extraction - Uses Google's Gemini AI for intelligent data extraction
- ๐ PDF Support - Extract text from PDF documents
- ๐ผ๏ธ Image Support - OCR for PNG, JPG, JPEG, and WEBP image formats
- ๐ Smart Fallback - Automatically switches between text extraction and vision mode
- ๐ฏ Structured Output - Returns clean, structured JSON data
- ๐ Singleton Pattern - Efficient resource management
๐ฆ Installation
npm install @brinda_yawa/file-scanner๐ Quick Start
import InvoiceExtractor from '@brinda_yawa/file-scanner';
import fs from 'fs';
// Initialize the extractor (singleton)
const extractor = InvoiceExtractor;
extractor.init({ apiKey: 'YOUR API KEY', model: 'gemini-2.5-flash' });
// Read your invoice file
const fileBuffer = fs.readFileSync('invoice.pdf');
// Extract invoice data
const invoiceData = await extractor.extract(fileBuffer, '.pdf');๐ API Reference
InvoiceExtractor.init(options)
Creates or returns the singleton instance of the extractor.
Parameters:
options.apiKey(string, required) - Your Google Gemini API keyoptions.model(string, optional) - The Gemini model to use. Default:'gemini-2.5-flash'
Returns: InvoiceExtractor instance
Example:
// Initialize the singleton instance
const extractor = InvoiceExtractor;
extractor.init({
apiKey: process.env.GEMINI_API_KEY,
model: 'gemini-2.5-flash'
});
// Read your invoice file
const fileBuffer = fs.readFileSync('invoice.pdf');
const invoiceData = await extractor.extract(fileBuffer, '.pdf');
console.log(invoiceData);extractor.extract(fileBuffer, extension)
Extracts structured data from an invoice file.
Parameters:
fileBuffer(Buffer, required) - The file buffer to processextension(string, required) - File extension (e.g., '.pdf', '.png', '.jpg')
Returns: Promise<Object|null> - Extracted invoice data or null if extraction fails
Example:
const invoiceData = await extractor.extract(fileBuffer, '.pdf');
// Returns:
{
invoiceNumber: "INV-001",
date: "2024-01-15",
supplier: "Acme Corp",
total: 1250.00,
items: [
{
description: "Product A",
quantity: 2,
unitPrice: 500.00,
total: 1000.00,
category: "Electronics"
}
]
// ... more fields
}๐ฏ Supported File Types
Images
.png- PNG images.jpg,.jpeg- JPEG images.gif- GIF images.webp- WebP images.bmp- Bitmap images
Documents
.pdf- PDF documents
Environment Variables
Create a .env file in your project root:
GEMINI_API_KEY=your_api_key_hereRecommendation: Use gemini-2.5-flash for development and production.
๐ก Advanced Usage
Handling Multiple Files
import InvoiceExtractor from '@brinda_yawa/file-scanner';
import fs from 'fs';
import path from 'path';
const extractor = InvoiceExtractor;
extractor.init({
apiKey: process.env.GEMINI_API_KEY
});
async function processInvoices(folderPath) {
const files = fs.readdirSync(folderPath);
const results = [];
for (const file of files) {
const ext = path.extname(file);
if (['.pdf', '.png', '.jpg', '.jpeg'].includes(ext)) {
const buffer = fs.readFileSync(path.join(folderPath, file));
const data = await extractor.extract(buffer, ext);
if (data) {
results.push({ file, data });
}
}
}
return results;
}
const invoices = await processInvoices('./invoices');
console.log(`Processed ${invoices.length} invoices`);Error Handling
const extractor = InvoiceExtractor;
extractor.init({
apiKey: process.env.GEMINI_API_KEY
});
try {
const buffer = fs.readFileSync('invoice.pdf');
const data = await extractor.extract(buffer, '.pdf');
if (!data) {
console.error('Failed to extract data from invoice');
} else {
console.log('Invoice extracted successfully:', data);
}
} catch (error) {
if (error.message.includes('quota')) {
console.error('API quota exceeded. Please wait or upgrade your plan.');
} else if (error.message.includes('API key')) {
console.error('Invalid API key. Please check your credentials.');
} else {
console.error('Extraction error:', error.message);
}
}Using with Express.js
import express from 'express';
import multer from 'multer';
import InvoiceExtractor from '@brinda_yawa/file-scanner';
const app = express();
const upload = multer({ storage: multer.memoryStorage() });
const extractor = InvoiceExtractor.getInstance({
apiKey: process.env.GEMINI_API_KEY
});
app.post('/extract', upload.single('invoice'), async (req, res) => {
try {
if (!req.file) {
return res.status(400).json({ error: 'No file uploaded' });
}
const ext = '.' + req.file.originalname.split('.').pop();
const data = await extractor.extract(req.file.buffer, ext);
if (!data) {
return res.status(500).json({ error: 'Extraction failed' });
}
res.json({ success: true, data });
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('Server running on port 3000');
});Using with TypeScript
import InvoiceExtractor from '@brinda_yawa/file-scanner';
import * as fs from 'fs';
interface InvoiceData {
invoiceNumber: string;
date: string;
supplier: string;
total: number;
items: Array<{
description: string;
quantity: number;
unitPrice: number;
total: number;
}>;
}
const extractor = InvoiceExtractor;
extractor.init({
apiKey: process.env.GEMINI_API_KEY!,
model: 'gemini-2.5-flash'
});
async function extractInvoice(filePath: string): Promise<InvoiceData | null> {
const buffer = fs.readFileSync(filePath);
const ext = filePath.substring(filePath.lastIndexOf('.'));
return await extractor.extract(buffer, ext) as InvoiceData;
}
const data = await extractInvoice('invoice.pdf');
if (data) {
console.log(`Invoice ${data.invoiceNumber} total: ${data.total}`);
}๐งช Testing
import InvoiceExtractor from '@brinda_yawa/file-scanner';
import fs from 'fs';
import assert from 'assert';
// Reset instance before each test
InvoiceExtractor.resetInstance();
const extractor = InvoiceExtractor;
extractor.init({
apiKey: process.env.GEMINI_API_KEY
});
// Test PDF extraction
const pdfBuffer = fs.readFileSync('test/sample.pdf');
const pdfData = await extractor.extract(pdfBuffer, '.pdf');
assert(pdfData !== null, 'PDF extraction failed');
assert(pdfData.invoiceNumber, 'Invoice number not extracted');
// Test image extraction
const imgBuffer = fs.readFileSync('test/sample.png');
const imgData = await extractor.extract(imgBuffer, '.png');
assert(imgData !== null, 'Image extraction failed');
console.log('โ
All tests passed');โ ๏ธ Common Issues
Quota Exceeded Error
Error: You exceeded your current quotaSolutions:
- Wait for quota reset (daily)
- Use a model with higher free tier quota (
gemini-1.5-flash) - Enable billing on your Google Cloud account
Invalid API Key
Error: Gemini API key is requiredSolutions:
- Check that your API key is set in
.env - Verify the API key is valid in Google AI Studio
- Make sure you're using the correct environment variable name
Extraction Returns Null
Possible causes:
- File is corrupted or unreadable
- File format not supported
- API quota exceeded
- Poor image quality (for OCR)
Solutions:
- Check file integrity
- Try a different file
- Increase image resolution
- Check API quota
Made with โค๏ธ by Brinda Yawa