JSPM

@brinda_yawa/file-scanner

1.0.2
    • ESM via JSPM
    • ES Module Entrypoint
    • Export Map
    • Keywords
    • License
    • Repository URL
    • TypeScript Types
    • README
    • Created
    • Published
    • Downloads 22
    • Score
      100M100P100Q43654F
    • License ISC

    A Node.js package that scans files in a specified directory, extracts text using OCR, and generates summaries using the Google GenAI API.

    Package Exports

    • @brinda_yawa/file-scanner
    • @brinda_yawa/file-scanner/dist/index.mjs

    This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@brinda_yawa/file-scanner) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

    Readme

    ๐Ÿ“„ @brinda_yawa/file-scanner

    A powerful Node.js package for extracting structured invoice data from PDF and image files using AI-powered OCR and vision models.

    โœจ Features

    • ๐Ÿค– AI-Powered Extraction - Uses Google's Gemini AI for intelligent data extraction
    • ๐Ÿ“„ PDF Support - Extract text from PDF documents
    • ๐Ÿ–ผ๏ธ Image Support - OCR for PNG, JPG, JPEG, and WEBP image formats
    • ๐Ÿ”„ Smart Fallback - Automatically switches between text extraction and vision mode
    • ๐ŸŽฏ Structured Output - Returns clean, structured JSON data
    • ๐Ÿ”’ Singleton Pattern - Efficient resource management

    ๐Ÿ“ฆ Installation

    npm install @brinda_yawa/file-scanner

    ๐Ÿš€ Quick Start

    import InvoiceExtractor from '@brinda_yawa/file-scanner';
    import fs from 'fs';
    
    // Initialize the extractor (singleton)
    const extractor = InvoiceExtractor;
    extractor.init({ apiKey: 'YOUR API KEY', model: 'gemini-2.5-flash' });
    
    
    // Read your invoice file
    const fileBuffer = fs.readFileSync('invoice.pdf');
    
    // Extract invoice data
    const invoiceData = await extractor.extract(fileBuffer, '.pdf');

    ๐Ÿ“– API Reference

    InvoiceExtractor.init(options)

    Creates or returns the singleton instance of the extractor.

    Parameters:

    • options.apiKey (string, required) - Your Google Gemini API key
    • options.model (string, optional) - The Gemini model to use. Default: 'gemini-2.5-flash'

    Returns: InvoiceExtractor instance

    Example:

    // Initialize the singleton instance
    const extractor = InvoiceExtractor;
    extractor.init({
        apiKey: process.env.GEMINI_API_KEY,
        model: 'gemini-2.5-flash' 
    });
    
    // Read your invoice file
    const fileBuffer = fs.readFileSync('invoice.pdf');
    const invoiceData = await extractor.extract(fileBuffer, '.pdf');
    console.log(invoiceData);

    extractor.extract(fileBuffer, extension)

    Extracts structured data from an invoice file.

    Parameters:

    • fileBuffer (Buffer, required) - The file buffer to process
    • extension (string, required) - File extension (e.g., '.pdf', '.png', '.jpg')

    Returns: Promise<Object|null> - Extracted invoice data or null if extraction fails

    Example:

    const invoiceData = await extractor.extract(fileBuffer, '.pdf');
    
    // Returns:
    {
        invoiceNumber: "INV-001",
        date: "2024-01-15", 
        supplier: "Acme Corp",
        total: 1250.00,
        items: [
            {
                description: "Product A",
                quantity: 2,
                unitPrice: 500.00,
                total: 1000.00,
                category: "Electronics"
            }
        ]
        // ... more fields
    }

    ๐ŸŽฏ Supported File Types

    Images

    • .png - PNG images
    • .jpg, .jpeg - JPEG images
    • .gif - GIF images
    • .webp - WebP images
    • .bmp - Bitmap images

    Documents

    • .pdf - PDF documents

    Environment Variables

    Create a .env file in your project root:

    GEMINI_API_KEY=your_api_key_here

    Recommendation: Use gemini-2.5-flash for development and production.

    ๐Ÿ’ก Advanced Usage

    Handling Multiple Files

    import InvoiceExtractor from '@brinda_yawa/file-scanner';
    import fs from 'fs';
    import path from 'path';
    
    const extractor = InvoiceExtractor;
    extractor.init({
        apiKey: process.env.GEMINI_API_KEY
    });
    
    async function processInvoices(folderPath) {
        const files = fs.readdirSync(folderPath);
        const results = [];
    
        for (const file of files) {
            const ext = path.extname(file);
            if (['.pdf', '.png', '.jpg', '.jpeg'].includes(ext)) {
                const buffer = fs.readFileSync(path.join(folderPath, file));
                const data = await extractor.extract(buffer, ext);
                
                if (data) {
                    results.push({ file, data });
                }
            }
        }
    
        return results;
    }
    
    const invoices = await processInvoices('./invoices');
    console.log(`Processed ${invoices.length} invoices`);

    Error Handling

    const extractor = InvoiceExtractor;
    extractor.init({
        apiKey: process.env.GEMINI_API_KEY
    });
    
    try {
        const buffer = fs.readFileSync('invoice.pdf');
        const data = await extractor.extract(buffer, '.pdf');
        
        if (!data) {
            console.error('Failed to extract data from invoice');
        } else {
            console.log('Invoice extracted successfully:', data);
        }
    } catch (error) {
        if (error.message.includes('quota')) {
            console.error('API quota exceeded. Please wait or upgrade your plan.');
        } else if (error.message.includes('API key')) {
            console.error('Invalid API key. Please check your credentials.');
        } else {
            console.error('Extraction error:', error.message);
        }
    }

    Using with Express.js

    import express from 'express';
    import multer from 'multer';
    import InvoiceExtractor from '@brinda_yawa/file-scanner';
    
    const app = express();
    const upload = multer({ storage: multer.memoryStorage() });
    
    const extractor = InvoiceExtractor.getInstance({
        apiKey: process.env.GEMINI_API_KEY
    });
    
    app.post('/extract', upload.single('invoice'), async (req, res) => {
        try {
            if (!req.file) {
                return res.status(400).json({ error: 'No file uploaded' });
            }
    
            const ext = '.' + req.file.originalname.split('.').pop();
            const data = await extractor.extract(req.file.buffer, ext);
    
            if (!data) {
                return res.status(500).json({ error: 'Extraction failed' });
            }
    
            res.json({ success: true, data });
        } catch (error) {
            res.status(500).json({ error: error.message });
        }
    });
    
    app.listen(3000, () => {
        console.log('Server running on port 3000');
    });

    Using with TypeScript

    import InvoiceExtractor from '@brinda_yawa/file-scanner';
    import * as fs from 'fs';
    
    interface InvoiceData {
        invoiceNumber: string;
        date: string;
        supplier: string;
        total: number;
        items: Array<{
            description: string;
            quantity: number;
            unitPrice: number;
            total: number;
        }>;
    }
    const extractor = InvoiceExtractor;
    
    extractor.init({
        apiKey: process.env.GEMINI_API_KEY!,
        model: 'gemini-2.5-flash'
    });
    
    async function extractInvoice(filePath: string): Promise<InvoiceData | null> {
        const buffer = fs.readFileSync(filePath);
        const ext = filePath.substring(filePath.lastIndexOf('.'));
        return await extractor.extract(buffer, ext) as InvoiceData;
    }
    
    const data = await extractInvoice('invoice.pdf');
    if (data) {
        console.log(`Invoice ${data.invoiceNumber} total: ${data.total}`);
    }

    ๐Ÿงช Testing

    import InvoiceExtractor from '@brinda_yawa/file-scanner';
    import fs from 'fs';
    import assert from 'assert';
    
    // Reset instance before each test
    InvoiceExtractor.resetInstance();
    const extractor = InvoiceExtractor;
    
    extractor.init({
        apiKey: process.env.GEMINI_API_KEY
    });
    
    // Test PDF extraction
    const pdfBuffer = fs.readFileSync('test/sample.pdf');
    const pdfData = await extractor.extract(pdfBuffer, '.pdf');
    assert(pdfData !== null, 'PDF extraction failed');
    assert(pdfData.invoiceNumber, 'Invoice number not extracted');
    
    // Test image extraction
    const imgBuffer = fs.readFileSync('test/sample.png');
    const imgData = await extractor.extract(imgBuffer, '.png');
    assert(imgData !== null, 'Image extraction failed');
    
    console.log('โœ… All tests passed');

    โš ๏ธ Common Issues

    Quota Exceeded Error

    Error: You exceeded your current quota

    Solutions:

    1. Wait for quota reset (daily)
    2. Use a model with higher free tier quota (gemini-1.5-flash)
    3. Enable billing on your Google Cloud account

    Invalid API Key

    Error: Gemini API key is required

    Solutions:

    1. Check that your API key is set in .env
    2. Verify the API key is valid in Google AI Studio
    3. Make sure you're using the correct environment variable name

    Extraction Returns Null

    Possible causes:

    1. File is corrupted or unreadable
    2. File format not supported
    3. API quota exceeded
    4. Poor image quality (for OCR)

    Solutions:

    1. Check file integrity
    2. Try a different file
    3. Increase image resolution
    4. Check API quota

    Made with โค๏ธ by Brinda Yawa