JSPM

Found 80 results for document-processing

stopword

A module for node.js and the browser that takes in text and returns text that is stripped of stopwords. Has pre-defined stopword lists for 62 languages and also takes lists with custom stopwords as input.

  • v3.1.5
  • 73.51
  • Published

@mastra/rag

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

  • v1.3.0
  • 67.73
  • Published

@bonginkan/maria

🚀 MARIA v4.3.46 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes na

  • v4.3.46
  • 50.72
  • Published

@promptbook/documents

Promptbook: Run AI apps in plain human language across multiple models and platforms

  • v0.102.0-18
  • 48.33
  • Published

@paulmeller/docflow

A developer-friendly transformation engine for programmatic document manipulation

    • v0.0.27
    • 48.18
    • Published

    sensible-api

    Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products

    • v0.0.12
    • 47.55
    • Published

    @promptbook/pdf

    Promptbook: Run AI apps in plain human language across multiple models and platforms

    • v0.102.0-18
    • 46.94
    • Published

    @promptbook/legacy-documents

    Promptbook: Run AI apps in plain human language across multiple models and platforms

    • v0.102.0-18
    • 45.50
    • Published

    pageindex-mcp

    MCP server for PageIndex

      • v1.6.2
      • 40.40
      • Published

      docstrange

      Official Node.js client for Docstrange API - Extract data from PDFs, images, and documents in multiple formats

      • v1.0.7
      • 39.05
      • Published

      n8n-nodes-unicraft

      UniCraft N8N custom nodes - Unified AI Model Router with Multi-Modal Support by CloudCraft Labs for OpenAI, Anthropic, Google Gemini, and more

      • v2.1.8
      • 36.13
      • Published

      n8n-nodes-solar

      Solar LLM and Embeddings nodes for n8n

      • v0.3.23
      • 35.49
      • Published

      n8n-nodes-docutray

      n8n community nodes for Docutray OCR, document identification, and knowledge base search services

      • v0.5.1
      • 35.16
      • Published

      knowledge-mgmt-mcp

      Production-ready MCP server for document ingestion and knowledge management with vector search. Supports PDF, DOCX, TXT, MD, CSV, JSON, HTML with ChromaDB and multiple embedding providers.

      • v1.1.0
      • 34.21
      • Published

      ignidor-idp-mcp

      MCP server for Ignidor IDP B2B API integration - enables Claude to process documents through Ignidor's enterprise document processing pipeline

      • v1.0.3
      • 34.19
      • Published

      rag-system-pgvector

      A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with buffer, URL processing, and advanced filtering support - fully configurable without environment variables

        • v2.1.1
        • 34.15
        • Published

        n8n-nodes-pdf-accessibility

        AI-powered PDF accessibility automation for N8N - comprehensive WCAG compliance analysis, intelligent remediation, and professional audit reporting with 5 integrated accessibility tools

        • v3.0.0
        • 33.69
        • Published

        @ninjadoc-ai/sdk

        TypeScript SDK for document processing with zero-friction framework adapters. Features intelligent coordinate handling, semantic regions, React UI overlays, and automatic route detection for Remix and Next.js. Transform raw bounding boxes into interactive

        • v1.0.9
        • 32.25
        • Published

        n8n-nodes-mistral-ocr

        n8n node for Mistral OCR API integration with structured annotations

        • v1.0.0
        • 31.57
        • Published

        @aidalinfo/pdf-processor

        Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.

        • v1.0.18
        • 31.46
        • Published

        doc-ops-mcp

        MCP Document Converter Server — A Model Context Protocol server for seamless document format conversion and processing

        • v0.3.8
        • 31.27
        • Published

        intelligent-text-chunking

        An intelligent text chunking library that respects document structure and semantic boundaries

        • v1.0.3
        • 30.80
        • Published

        @jmndao/mongoose-ai

        AI-powered Mongoose plugin for intelligent document processing with auto-summarization, semantic search, MongoDB Vector Search, and function calling

        • v1.4.0
        • 30.69
        • Published

        n8n-nodes-unstract

        n8n nodes for Unstract services including LLMWhisperer and Unstract API

        • v0.4.2
        • 28.35
        • Published

        @ozritesh/queue-agnostic

        Universal queue abstraction library supporting RabbitMQ, AWS SQS, Azure Service Bus, and GCP Pub/Sub with a single unified interface

        • v1.0.2
        • 27.96
        • Published

        autollama

        Modern JavaScript-first RAG framework with contextual embeddings, professional CLI, and one-command deployment

        • v3.0.10
        • 27.71
        • Published

        ppu-paddle-ocr

        Blazing-fast and lightweight PaddleOCR library for Node.js and Bun. Perform accurate text detection, recognition, and image deskew with a simple, modern, and type-safe API. Ideal for document processing, data extraction, and computer vision tasks.

        • v3.1.1
        • 27.22
        • Published

        rag-lite-ts

        Local-first TypeScript retrieval engine for semantic search over static documents

        • v1.0.2
        • 25.75
        • Published

        @tfw.in/structura-sdk

        TypeScript SDK for Saral Structura, providing Zod schemas and validation for document processing outputs.

        • v0.1.0
        • 25.14
        • Published

        context1000

        **context1000** is a documentation format for software systems, designed for integration with artificial intelligence tools. The key artifacts are ADRs and RFCs, enriched with formalized links between documents.

          • v0.1.8
          • 25.08
          • Published

          @aidalinfo/office-to-markdown

          Modern TypeScript library for converting Office documents (DOCX) to Markdown format, optimized for Bun runtime with enhanced table support and math equation conversion.

          • v1.0.2
          • 24.99
          • Published

          @instafill.ai/instafill

          Instafill AI Node.js library for automating PDF form filling using AI-powered technology.

          • v0.3.2
          • 24.75
          • Published

          passport-ocr-api

          Passport OCR API client for extracting passport data from images and PDF files using OCR technology.

          • v1.1.5
          • 23.53
          • Published

          yq-pdf

          High-performance PDF manipulation library with native processing capabilities. Supports encryption, decryption, merging, splitting, watermarking, optimization, and comprehensive PDF operations with both file and buffer support.

            • v0.0.2
            • 23.32
            • Published

            @project-lakechain/sdk

            An SDK providing helpers to create Lakechain middlewares in TypeScript.

            • v0.10.0
            • 22.43
            • Published

            @docrouter/mcp

            TypeScript MCP server for DocRouter API

              • v0.1.2
              • 22.25
              • Published

              n8n-nodes-extract-pdf

              n8n node to extract text, images and tables from PDF with multilingual support, language detection and comprehensive test suite

              • v1.0.26
              • 21.94
              • Published

              @buildel/ocr

              Document processing application with CLI and API interfaces

                • v0.1.1
                • 21.94
                • Published

                mcp-upstage-server

                MCP server for Upstage AI document processing - Node.js implementation

                • v0.5.0
                • 21.79
                • Published

                n8n-nodes-puter-ai

                Advanced n8n node for Puter.js AI with RAG agentic capabilities, document processing, audio transcription, Supabase integration, and cost-optimized model priorities

                • v2.0.4
                • 21.76
                • Published

                peslac

                A Node.js package to interact with the Peslac API for document processing.

                • v1.1.3
                • 21.37
                • Published

                doc-to-readable

                Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

                • v1.5.3
                • 19.63
                • Published

                odg-processor

                A Node.js package for processing ODG (OpenDocument Graphics) files using LibreOffice API

                • v1.0.16
                • 19.49
                • Published

                chatbot-test-william

                A flexible and customizable React chat component that supports context-aware conversations and document processing

                  • v1.0.16
                  • 18.43
                  • Published

                  nanonets

                  Node.js SDK for the Nanonets API: OCR, document extraction, and workflow automation.

                  • v2.0.1
                  • 18.20
                  • Published

                  mastra-browser-rag

                  The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

                    • v0.0.9
                    • 17.48
                    • Published

                    docuglean-ocr

                    An SDK for intelligent document processing using State of the Art AI models.

                    • v1.0.0
                    • 17.21
                    • Published

                    llm-gen

                    A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

                    • v1.0.3
                    • 16.99
                    • Published

                    @nyazkhan/react-pdf-viewer

                    A comprehensive React TypeScript component library for viewing and interacting with PDF files using Mozilla PDF.js. Features include text selection, highlighting, search, sidebar, multiple view modes, and complete PDF.js web viewer functionality.

                    • v1.1.1
                    • 16.20
                    • Published

                    universal-documents-converter

                    Universal MCP Server for Multi-Rendering PDF Quality Assurance System with AI-powered optimization

                    • v1.0.1
                    • 15.95
                    • Published

                    uns-mcp-server

                    Pure JavaScript MCP server for Unstructured.io - No Python required!

                    • v2.0.2
                    • 15.75
                    • Published

                    markdoc-traverse

                    A simple and tiny traversal library for MarkDoc AST

                    • v1.1.1
                    • 15.54
                    • Published

                    n8n-nodes-docx-converter-enhanced

                    Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunking, and metadata extraction. Fork of n8n-nodes-docx-converter with advanced features for AI/ML workflows.

                    • v1.0.0
                    • 15.06
                    • Published

                    treechunk

                    Hierarchical markdown chunking for RAG systems with AI-powered context summarization

                    • v1.1.0
                    • 14.80
                    • Published

                    koncile-js

                    JavaScript SDK for the Koncile Intelligent Document Processing API

                    • v0.1.4
                    • 14.54
                    • Published

                    n8n-nodes-docx-genie

                    n8n node package for DOCX document manipulation and processing

                    • v0.1.0
                    • 14.54
                    • Published

                    pdf-tax-reader-cl

                    PDF scraping library for Chilean tax documents. Extract emitter name, economic activities, and address from structured PDF documents like 'CARPETA TRIBUTARIA ELECTRÓNICA PARA SOLICITAR CRÉDITOS'

                    • v1.0.0
                    • 13.75
                    • Published

                    @jojihatzz/lemmedoc

                    A comprehensive Model Context Protocol (MCP) server for document processing, PDF manipulation, format conversion, and text extraction with robust error handling. Now includes advanced features like document conversion, image processing, PDF comparison, se

                      • v2.0.0
                      • 11.86
                      • Published

                      n8n-nodes-agentic-rag-supabase

                      Advanced n8n node for Agentic RAG with Supabase pgvector - handles structured/unstructured documents with AI-powered query refinement

                      • v1.0.0
                      • 11.82
                      • Published

                      pdftotext-mcp

                      A reliable Model Context Protocol server for PDF text extraction using pdftotext from poppler-utils

                      • v1.0.0
                      • 11.02
                      • Published

                      @caleblawson/rag

                      The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

                        • v1.0.0
                        • 10.55
                        • Published

                        lcp-nodes

                        Node RED Custom Nodes for LCP

                        • v1.0.43
                        • 10.27
                        • Published

                        hashub-docapp-js

                        JavaScript/TypeScript SDK for Hashub Document Processing API

                        • v1.0.0
                        • 5.79
                        • Published

                        langchain-chatbot-react-app

                        A flexible and customizable React chat component that supports context-aware conversations and document processing

                        • v1.0.0
                        • 2.59
                        • Published

                        invoicify-json-craft

                        AI-powered invoice to JSON converter using Mistral AI with dynamic field detection and master schema management

                        • v1.0.0
                        • 2.56
                        • Published

                        @abhi-arya1/mastra-minirag

                        Minimal recursive text chunking functionality extracted from @mastra/rag for edge deployments

                        • v1.0.1
                        • 0.00
                        • Published

                        @entro314labs/mdd

                        Semantic document layer for AI-to-Office pipeline. Transform markdown into professional PDF/DOCX with preserved document structure.

                        • v0.0.7
                        • 0.00
                        • Published

                        @docrouter/sdk

                        TypeScript SDK for DocRouter API

                        • v0.1.0
                        • 0.00
                        • Published

                        n8n-nodes-doctr

                        Extract text from images using docTR OCR in n8n workflows

                        • v0.1.4
                        • 0.00
                        • Published