Found 80 results for document-processing

stopword

A module for node.js and the browser that takes in text and returns text that is stripped of stopwords. Has pre-defined stopword lists for 62 languages and also takes lists with custom stopwords as input.

@mastra/rag

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

@bonginkan/maria

🚀 MARIA v4.3.46 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes na

@promptbook/documents

Promptbook: Run AI apps in plain human language across multiple models and platforms

@paulmeller/docflow

A developer-friendly transformation engine for programmatic document manipulation

sensible-api

Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products

@promptbook/pdf

Promptbook: Run AI apps in plain human language across multiple models and platforms

@promptbook/legacy-documents

Promptbook: Run AI apps in plain human language across multiple models and platforms

pageindex-mcp

MCP server for PageIndex

docstrange

Official Node.js client for Docstrange API - Extract data from PDFs, images, and documents in multiple formats

n8n-nodes-pdf-excel

N8N nodes for processing PDF and Excel files

n8n-nodes-unicraft

UniCraft N8N custom nodes - Unified AI Model Router with Multi-Modal Support by CloudCraft Labs for OpenAI, Anthropic, Google Gemini, and more

n8n-nodes-solar

Solar LLM and Embeddings nodes for n8n

n8n-nodes-docutray

n8n community nodes for Docutray OCR, document identification, and knowledge base search services

knowledge-mgmt-mcp

Production-ready MCP server for document ingestion and knowledge management with vector search. Supports PDF, DOCX, TXT, MD, CSV, JSON, HTML with ChromaDB and multiple embedding providers.

ignidor-idp-mcp

MCP server for Ignidor IDP B2B API integration - enables Claude to process documents through Ignidor's enterprise document processing pipeline

rag-system-pgvector

A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Node.js applications with buffer, URL processing, and advanced filtering support - fully configurable without environment variables

n8n-nodes-pdf-accessibility

AI-powered PDF accessibility automation for N8N - comprehensive WCAG compliance analysis, intelligent remediation, and professional audit reporting with 5 integrated accessibility tools

@trsdn/mistraldocai-mcp-server

MCP server for document-to-Markdown conversion using Mistral AI OCR

pdf-image-extractor

pdf

@ninjadoc-ai/sdk

TypeScript SDK for document processing with zero-friction framework adapters. Features intelligent coordinate handling, semantic regions, React UI overlays, and automatic route detection for Remix and Next.js. Transform raw bounding boxes into interactive

n8n-nodes-mistral-ocr

n8n node for Mistral OCR API integration with structured annotations

@aidalinfo/pdf-processor

Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured, validated data using TypeScript, Zod, and AI providers like Scaleway and Ollama.

@timangames/vector-grounding-service

A REST wrapper for SAP AI Core Vector API with document grounding capabilities

doc-ops-mcp

MCP Document Converter Server — A Model Context Protocol server for seamless document format conversion and processing

intelligent-text-chunking

An intelligent text chunking library that respects document structure and semantic boundaries

@jmndao/mongoose-ai

AI-powered Mongoose plugin for intelligent document processing with auto-summarization, semantic search, MongoDB Vector Search, and function calling

n8n-nodes-inner-batched-chain-summarization

n8n community node with intelligent batched chain summarization for processing large documents efficiently

n8n-nodes-unstract

n8n nodes for Unstract services including LLMWhisperer and Unstract API

@lumina-ai-inc/chunkr-ai

Node.js client for Chunkr API

@ozritesh/queue-agnostic

Universal queue abstraction library supporting RabbitMQ, AWS SQS, Azure Service Bus, and GCP Pub/Sub with a single unified interface

autollama

Modern JavaScript-first RAG framework with contextual embeddings, professional CLI, and one-command deployment

ppu-paddle-ocr

Blazing-fast and lightweight PaddleOCR library for Node.js and Bun. Perform accurate text detection, recognition, and image deskew with a simple, modern, and type-safe API. Ideal for document processing, data extraction, and computer vision tasks.

rag-lite-ts

Local-first TypeScript retrieval engine for semantic search over static documents

@tfw.in/structura-sdk

TypeScript SDK for Saral Structura, providing Zod schemas and validation for document processing outputs.

context1000

**context1000** is a documentation format for software systems, designed for integration with artificial intelligence tools. The key artifacts are ADRs and RFCs, enriched with formalized links between documents.

@aidalinfo/office-to-markdown

Modern TypeScript library for converting Office documents (DOCX) to Markdown format, optimized for Bun runtime with enhanced table support and math equation conversion.

@instafill.ai/instafill

Instafill AI Node.js library for automating PDF form filling using AI-powered technology.

passport-ocr-api

Passport OCR API client for extracting passport data from images and PDF files using OCR technology.

yq-pdf

High-performance PDF manipulation library with native processing capabilities. Supports encryption, decryption, merging, splitting, watermarking, optimization, and comprehensive PDF operations with both file and buffer support.

@majkapp/majk-chat-document-tools

Document processing tools for majk chat - PDF, Excel, Word, PowerPoint parsing and analysis

@project-lakechain/sdk

An SDK providing helpers to create Lakechain middlewares in TypeScript.

@docrouter/mcp

TypeScript MCP server for DocRouter API

n8n-nodes-extract-pdf

n8n node to extract text, images and tables from PDF with multilingual support, language detection and comprehensive test suite

@buildel/ocr

Document processing application with CLI and API interfaces

mcp-upstage-server

MCP server for Upstage AI document processing - Node.js implementation

n8n-nodes-puter-ai

Advanced n8n node for Puter.js AI with RAG agentic capabilities, document processing, audio transcription, Supabase integration, and cost-optimized model priorities

peslac

A Node.js package to interact with the Peslac API for document processing.

doc-to-readable

Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

odg-processor

A Node.js package for processing ODG (OpenDocument Graphics) files using LibreOffice API

n8n-nodes-docx-genie-pro

n8n node package for DOCX document manipulation and processing

chatbot-test-william

A flexible and customizable React chat component that supports context-aware conversations and document processing

nanonets

Node.js SDK for the Nanonets API: OCR, document extraction, and workflow automation.

mastra-browser-rag

The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilities.

docuglean-ocr

An SDK for intelligent document processing using State of the Art AI models.

llm-gen

A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

@nyazkhan/react-pdf-viewer

A comprehensive React TypeScript component library for viewing and interacting with PDF files using Mozilla PDF.js. Features include text selection, highlighting, search, sidebar, multiple view modes, and complete PDF.js web viewer functionality.

universal-documents-converter

Universal MCP Server for Multi-Rendering PDF Quality Assurance System with AI-powered optimization

uns-mcp-server

Pure JavaScript MCP server for Unstructured.io - No Python required!

markdoc-traverse

A simple and tiny traversal library for MarkDoc AST

n8n-nodes-docx-converter-enhanced

Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunking, and metadata extraction. Fork of n8n-nodes-docx-converter with advanced features for AI/ML workflows.

treechunk

Hierarchical markdown chunking for RAG systems with AI-powered context summarization

koncile-js

JavaScript SDK for the Koncile Intelligent Document Processing API

n8n-nodes-docx-genie

n8n node package for DOCX document manipulation and processing

pdf-tax-reader-cl

PDF scraping library for Chilean tax documents. Extract emitter name, economic activities, and address from structured PDF documents like 'CARPETA TRIBUTARIA ELECTRÓNICA PARA SOLICITAR CRÉDITOS'

@jojihatzz/lemmedoc

A comprehensive Model Context Protocol (MCP) server for document processing, PDF manipulation, format conversion, and text extraction with robust error handling. Now includes advanced features like document conversion, image processing, PDF comparison, se