JSPM

Found 232 results for multimodal

@promptbook/utils

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 86.53
  • Published

@mux/ai

AI library for Mux

  • v0.21.1
  • 58.33
  • Published

glin-profanity

Glin-Profanity is a lightweight and efficient npm package designed to detect and filter profane language in text inputs across multiple languages. Whether you’re building a chat application, a comment section, or any platform where user-generated content

  • v3.3.0
  • 57.96
  • Published

@promptbook/anthropic-claude

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 55.21
  • Published

thebird

Anthropic SDK to Gemini streaming bridge — drop-in proxy that translates Anthropic message format and tool calls to Google Gemini

  • v1.2.114
  • 51.93
  • Published

modelfusion

The TypeScript library for building AI applications.

  • v0.137.0
  • 51.92
  • Published

llama-cpp-capacitor

A native Capacitor plugin that embeds llama.cpp directly into mobile apps, enabling offline AI inference with chat-first API design. Complete iOS and Android support: text generation, chat, multimodal, TTS, LoRA, embeddings, and more.

  • v0.1.5
  • 51.89
  • Published

@promptbook/core

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 50.78
  • Published

@promptbook/remote-client

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 49.15
  • Published

@promptbook/remote-server

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 49.00
  • Published

agentkits

Agent Model Layer — One-line LLM access with built-in memory. 29 providers, 18 embeddings, zero lock-in.

  • v2.0.3
  • 48.56
  • Published

@promptbook/types

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 48.10
  • Published

@promptbook/openai

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 47.79
  • Published

@promptbook/pdf

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 47.57
  • Published

@promptbook/markdown-utils

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 47.06
  • Published

@promptbook/legacy-documents

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 46.79
  • Published

@promptbook/fake-llm

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 46.75
  • Published

@promptbook/templates

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 46.62
  • Published

promptbook

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 46.38
  • Published

@promptbook/google

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 46.18
  • Published

@promptbook/vercel

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.85
  • Published

@promptbook/node

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.81
  • Published

@promptbook/editable

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.70
  • Published

@promptbook/documents

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.51
  • Published

@promptbook/cli

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.48
  • Published

@promptbook/azure-openai

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.37
  • Published

@promptbook/ollama

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 45.10
  • Published

@promptbook/color

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.77
  • Published

@promptbook/javascript

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.77
  • Published

@promptbook/browser

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.50
  • Published

@promptbook/markitdown

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.45
  • Published

@promptbook/deepseek

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.36
  • Published

@promptbook/wizard

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.26
  • Published

@promptbook/website-crawler

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.19
  • Published

@promptbook/components

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

  • v0.112.0-62
  • 44.11
  • Published

@space3-npm/cybersoul-client

Cyber-Soul multimodal character interaction SDK by Space3 Digital Media Tech Studio

    • v1.2.9
    • 44.08
    • Published

    ptbk

    Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

    • v0.112.0-62
    • 44.02
    • Published

    acptoapi

    Anthropic SDK to multi-provider streaming bridge — converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs

    • v1.0.39
    • 43.48
    • Published

    @fre4x/gemini

    A Gemini MCP server providing multimodal analysis and image/video generation.

      • v1.0.65
      • 42.16
      • Published

      clawplay

      Unified multimodal CLI for ClawPlay — image generation, vision analysis, and LLM via Ark + Gemini relay

      • v0.2.4
      • 41.88
      • Published

      claude-video-vision

      MCP server that gives Claude Code the ability to watch and understand videos — extracts frames via ffmpeg and processes audio via multiple backends

      • v1.2.1
      • 40.80
      • Published

      @sogni-ai/sogni-client

      Sogni SDK - AI image, video & audio generation plus LLM chat with vision via the Sogni Supernet (Stable Diffusion, Flux, WAN 2.2, LTX-2, Qwen VLM)

      • v4.1.1
      • 40.04
      • Published

      @shift-preflight/runtime

      Multimodal preflight for any AI agent — AI SDK middleware for transparent image optimization

      • v0.9.3
      • 39.58
      • Published

      @lab94/frenchie

      Frenchie — your agent's best friend. MCP-first multimodal Kit, Method workflow toolkit, and stdio MCP server for agents: OCR, transcription, file extraction, image generation, and product development process.

      • v0.5.0
      • 39.40
      • Published

      @mcptoolshop/style-dataset-lab

      Canon-aligned dataset production and generation workbench — define visual rules, build versioned training data, compile production briefs, run local workflows, batch-produce, select winners, and re-ingest into your corpus

      • v3.1.0
      • 37.00
      • Published

      @promptbook/wizzard

      Promptbook: Run AI apps in plain human language across multiple models and platforms

      • v0.94.0
      • 35.57
      • Published

      @dooor-ai/cortexdb

      Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing

      • v0.9.12
      • 35.11
      • Published

      @hzttt/multimodal-rag

      OpenClaw plugin for multimodal RAG - semantic indexing and time-aware search for images and audio using local AI models

      • v0.5.3
      • 34.87
      • Published

      @yyc3/core

      YYC³ AI Family 核心包 — 统一认证、MCP协议、技能系统、八位AI家人智能体、多模态处理

      • v1.4.0
      • 34.60
      • Published

      @harrylabs/llm-knowledge-bases

      Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.

        • v0.4.3
        • 34.48
        • Published

        @kessler/gemma

        Node.js package for Gemma 4 — local multimodal inference and agent system

          • v4.0.1
          • 34.43
          • Published

          @captain-sdk/openclaw-captain

          Captain multimodal search plugin for OpenClaw — search text, images, video, and audio with natural language

          • v2.0.0
          • 34.21
          • Published

          babysea

          Open source TypeScript SDK for the BabySea execution control plane for generative media.

          • v1.4.3
          • 33.96
          • Published

          llm-frames

          Extract video frames for LLM context injection

          • v0.2.2
          • 33.70
          • Published

          n8n-nodes-universal-llm-vision

          n8n nodes for Universal LLM Vision - Includes standalone node and langchain-compatible Vision Chain

          • v0.5.5
          • 33.58
          • Published

          @villutur/gemini-ai-lib

          Reusable TypeScript wrappers and helpers for Google's Gemini API.

          • v0.6.3
          • 33.22
          • Published

          lucid-mcp-server

          Model Context Protocol (MCP) server for Lucid App integration with multimodal AI analysis

          • v0.1.5
          • 33.01
          • Published

          vidclaude

          Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video

            • v0.2.4
            • 32.26
            • Published

            @captain-sdk/captain-mcp

            MCP server for Captain — multimodal RAG search and live project search. Usable from Claude Code, Cursor, and any MCP-aware client.

            • v0.1.4
            • 32.03
            • Published

            @goonnguyen/human-mcp

            Human MCP: Bringing Human Capabilities to Coding Agents

              • v2.15.1
              • 31.96
              • Published

              bordair

              Official JavaScript/TypeScript SDK + CLI for the Bordair AI security stack - detect prompt injection programmatically or test any LLM against the 503K-sample multimodal dataset

              • v0.5.0
              • 31.93
              • Published

              @echosaw/mcp-server

              Echosaw MCP Server - Media intelligence for AI assistants. Connect your LLM to Echosaw and analyze media directly within your workflow.

                • v1.4.0
                • 31.36
                • Published

                n8n-nodes-gemini-ai

                n8n community node for Google Gemini AI integration with text generation, file upload & analysis, and TTS (Text-to-Speech) support

                • v0.6.8
                • 31.23
                • Published

                @yummysource/yummycli

                AI-friendly CLI for multimodal model providers (Gemini, and more)

                • v0.1.3
                • 31.05
                • Published

                n8n-nodes-vertex-ai-full

                n8n community node for Google Vertex AI with advanced multimodal capabilities

                • v0.0.3
                • 30.75
                • Published

                @vespermcp/mcp-server

                AI-powered dataset discovery, quality analysis, and preparation MCP server with multimodal support (text, image, audio, video)

                • v1.5.2
                • 30.73
                • Published

                pdfvision

                Extract text, metadata, and page images from PDF files. Designed for AI agents.

                • v0.3.0
                • 30.65
                • Published

                apiz-mcp

                MCP (Model Context Protocol) server for apiz.ai — exposes generate / get_result / search_models / guide / account / speak / parse_video / transfer_url tools

                • v0.2.0
                • 30.48
                • Published

                @marsulta/mailman

                A structured interagent messaging protocol — send validated, typed packets between roles, tools, and services in multi-agent systems.

                • v0.1.2
                • 30.43
                • Published

                @vowel.to/client

                Framework-agnostic voice agent library powered by Google Gemini Live API

                • v0.4.1-beta
                • 30.42
                • Published

                noosphere

                Unified AI creation engine — text, image, video, audio across all providers

                • v0.9.3
                • 29.50
                • Published

                apiz-sdk

                Official TypeScript SDK for the apiz.ai AI generation platform

                • v0.2.0
                • 28.78
                • Published

                overture-mcp

                MCP Server for Claude Code, Cursor, Cline, Copilot, Github Copilot, Windsurf - Visual AI Agent Plan Execution, Approval Workflow, Plan Visualization, Agent Orchestration. See what your AI is thinking before it writes code. Works with Claude, GPT, Gemini,

                • v0.1.8
                • 28.13
                • Published

                node-red-contrib-gemini

                Node-RED nodes for Google Gemini AI integration including text generation, chat, vision, image generation, speech generation, and audio understanding capabilities

                • v1.0.3
                • 28.11
                • Published

                @sami7786/opencode-image-proxy

                OpenCode plugin that proxies images through a vision-capable model, enabling image-incapable models to "see"

                  • v1.0.6
                  • 26.91
                  • Published

                  n8n-nodes-siliconflow

                  n8n community node for SiliconFlow AI models - chat completions, vision language models, embeddings, and reranking

                  • v1.4.2
                  • 26.71
                  • Published

                  @orka-js/multimodal

                  Multimodal utilities and agents for OrkaJS - Vision, Audio, Cross-modal workflows

                    • v3.0.1
                    • 26.53
                    • Published

                    @harrylabs/llm-wiki-karpathy

                    Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.

                      • v0.4.4
                      • 26.11
                      • Published

                      qwen-vision-mcp

                      MCP server for image analysis using Qwen 3.5 via Ollama. Adds vision capabilities to models without native image support.

                      • v0.2.0
                      • 26.01
                      • Published

                      clawdrive

                      Google Drive for AI agents. Multimodal semantic search and 3D file visualization.

                      • v0.1.9
                      • 25.82
                      • Published

                      weiclaw

                      一条命令,把微信变成任何 AI Agent 的入口

                      • v1.0.13
                      • 25.05
                      • Published

                      2dai-cloud-sdk

                      Official 2DAI SDK - Multimodal AI for autonomous image/video generation, vision analysis, agentic LLM workflows, STT speech-to-text, and TTS text-to-speech

                      • v1.12.0
                      • 25.04
                      • Published

                      n8n-nodes-vertex-advanced

                      n8n community node for Google Vertex AI with advanced multimodal capabilities

                      • v1.0.0
                      • 24.83
                      • Published

                      @intentsolutionsio/003-jeremy-vertex-ai-media-master

                      Comprehensive Google Vertex AI multimodal mastery for Jeremy - video processing (6+ hours), audio generation, image creation with Gemini 2.0/2.5 and Imagen 4. Marketing campaign automation, content ge

                      • v2.0.0
                      • 24.79
                      • Published

                      av-blocks

                      Copy-paste UI blocks for audio-visual model inspection. shadcn-style: no install, copy the component.

                      • v0.1.0
                      • 24.79
                      • Published

                      @intentsolutionsio/zai-cli

                      Z.AI vision, search, reader, and GitHub exploration via CLI and MCP. Analyze images, search the web, read pages as markdown, explore repos.

                      • v1.0.0
                      • 24.78
                      • Published

                      ai-powered

                      Unified AI client and CLI: multi-modal, multi-provider, browser-safe, fully mock-able

                      • v0.3.2
                      • 24.71
                      • Published

                      n8n-nodes-kimi

                      n8n community plugin for Kimi (Moonshot) Chat Models

                      • v0.2.1
                      • 24.63
                      • Published

                      mlx-vlm

                      Native Vision-Language Model support for Node.js. Process images and video with multimodal LLMs using MLX and Apple Silicon.

                        • v0.0.0
                        • 24.61
                        • Published

                        @yyc3/emotion

                        YYC³ AI Family 情感引擎 — 多模态情感融合 + 情绪音乐桥接 + 事件总线

                        • v1.0.0
                        • 24.56
                        • Published

                        @mrgoonie/multix

                        AI multimodal CLI — Gemini (incl. Flash TTS), MiniMax, OpenRouter, Leonardo, BytePlus, ElevenLabs, ffmpeg, ImageMagick, doc-to-md

                        • v0.0.7
                        • 24.52
                        • Published

                        jimeng-ai-mcp

                        火山引擎即梦AI多模态生成服务MCP工具

                        • v1.0.14
                        • 24.29
                        • Published

                        vision-mcp

                        MCP server providing multimodal vision capabilities via OpenAI-compatible APIs

                        • v0.1.0
                        • 24.09
                        • Published

                        sight-mcp

                        MCP server for AI-powered image and video analysis - supports OpenAI, Claude, and multimodal vision APIs with local file and URL processing

                          • v1.0.4
                          • 23.53
                          • Published

                          mmir-lib

                          MMIR (Mobile Multimodal Interaction and Relay) library

                          • v7.1.0
                          • 21.96
                          • Published

                          bard-api-node

                          A Node.js library harnessing the power of Bard's Large Language Model (LLM) for seamless chat experiences and streamlined accessibility to Google's Gemini. Empower your applications with advanced conversational AI, leveraging Bard's LLM to answer question

                          • v2.1.0
                          • 21.59
                          • Published

                          @pshkv/os-core

                          SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.

                          • v0.1.0
                          • 21.33
                          • Published

                          pplx-zero

                          Fast, minimal Perplexity AI CLI with local RAG. Stream answers, search your notes, analyze files. Zero bloat.

                          • v2.4.2
                          • 21.17
                          • Published

                          claude-gateway

                          OpenAI-compatible gateway for Claude Code subscription - Use your Claude subscription with any OpenAI-compatible tool

                          • v2.2.0
                          • 20.80
                          • Published

                          jimeng4.0-mcp-steve

                          火山引擎即梦AI多模态生成服务MCP工具 - 完整支持4.0/3.1/3.0所有最新模型

                            • v1.0.9
                            • 20.34
                            • Published

                            @sint-ai/os-core

                            SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.

                            • v0.1.0
                            • 20.31
                            • Published

                            omni-llm

                            A unified, zero-dependency interface for multiple LLM providers (OpenAI, Anthropic, Google, Groq, Ollama, xAI, DeepSeek) with streaming, tool calling, vision, and thinking mode support

                            • v1.1.0
                            • 20.22
                            • Published

                            ai-pp3

                            CLI tool combining multimodal AI analysis with RawTherapee's engine to generate optimized PP3 profiles for RAW photography. Features automatic histogram analysis for enhanced AI processing.

                            • v2.1.2
                            • 20.14
                            • Published

                            tekimax-omat

                            Open Multimodal Assessment Toolkit — a type-safe AI SDK for building human-centered, multimodal AI applications.

                            • v0.4.1
                            • 20.02
                            • Published

                            @msbayindir/context-rag

                            A powerful, multimodal RAG engine with contextual retrieval, auto-prompt discovery, and PostgreSQL-native vector search

                            • v1.0.0-beta.10
                            • 19.80
                            • Published

                            knowfun-skills

                            KnowFun AI multimodal content generator - Create AI PPT/presentations, posters, interactive games, and educational films. Lightning-fast REST API for educators, content creators, and learners. Integrate with Claude Code, Cursor, Cline, OpenClaw AI assista

                            • v1.0.15
                            • 19.80
                            • Published

                            @mixpeek/n8n-nodes-mixpeek

                            n8n community node for Mixpeek - multimodal data processing and semantic search API

                            • v1.0.11
                            • 19.74
                            • Published

                            n8n-nodes-rahyana

                            Production-ready n8n community node for Rahyana.ir (chat-first, multimodal, streaming-ready)

                            • v0.6.42
                            • 19.68
                            • Published

                            @perceptron-ai/mcp-server

                            Perceptron MCP server for high-accuracy visual perception powered by fast, efficient vision-language models

                            • v0.2.0
                            • 19.61
                            • Published

                            claude-gemini-multimodal-bridge

                            Enterprise-grade AI integration bridge connecting Claude Code, Gemini CLI, and Google AI Studio with intelligent routing and advanced multimodal processing capabilities

                            • v1.1.0
                            • 19.58
                            • Published

                            geminisst

                            Revolutionary high-accuracy Audio-to-Text library powered by Gemini 2.5 Flash Lite with 1M+ context window.

                            • v2.0.3
                            • 19.52
                            • Published

                            n8n-nodes-openroutercached

                            n8n node for OpenRouter API integration with advanced features: prompt caching, reasoning tokens, vision, streaming, and more

                            • v0.4.3
                            • 19.18
                            • Published

                            @unwarkz/n8n-nodes-qdrant

                            N8N community node for Qdrant vector store — semantic search, embedding storage, and full collection management for AI Agent workflows

                            • v1.0.6
                            • 19.02
                            • Published

                            iflow-mcp-sixhq-overture

                            MCP Server for Claude Code, Cursor, Cline, Copilot, Github Copilot, Windsurf - Visual AI Agent Plan Execution, Approval Workflow, Plan Visualization, Agent Orchestration. See what your AI is thinking before it writes code. Works with Claude, GPT, Gemini,

                            • v0.1.8
                            • 18.99
                            • Published

                            @mixpeek/prebid

                            Mixpeek RTD (Real-Time Data) Adapter for Prebid.js - Privacy-first contextual targeting with sub-100ms performance, ad adjacency awareness, and cookie-free bid enrichment

                            • v1.0.3
                            • 18.88
                            • Published

                            rag-lite-ts

                            Local-first TypeScript retrieval engine with Chameleon Multimodal Architecture for semantic search over text and image content

                            • v2.3.1
                            • 18.88
                            • Published

                            @mintmcqueen/gemini-mcp

                            Enhanced MCP server for Google Gemini 3 with Image Generation, Batch API integration (50% cost, async processing), advanced file handling, and conversation management. Features Gemini 3 Pro (default) and Gemini 3 Pro Image models with state-of-the-art rea

                            • v0.4.0
                            • 18.75
                            • Published

                            noodle-perplexity-mcp

                            A Perplexity API Model Context Protocol (MCP) server that unlocks Perplexity's search-augmented AI capabilities for LLM agents. Features robust error handling, secure input validation, transparent reasoning, and multimodal support with file attachments (P

                            • v1.7.0
                            • 18.56
                            • Published

                            pagebraid-mcp

                            A local PDF MCP server with budget-first multimodal PDF reading.

                            • v0.4.0
                            • 18.09
                            • Published

                            @philtzjp/omni-subagents

                            Claude Code subagents for multimodal capabilities via OpenCode + OpenRouter

                            • v0.1.2
                            • 17.97
                            • Published

                            mixpeek

                            Official Mixpeek TypeScript/JavaScript SDK for multimodal data processing and retrieval

                            • v0.81.1
                            • 17.87
                            • Published

                            @callmedayz/ai-prompt-toolkit

                            Professional AI prompt engineering toolkit with advanced template features, real-time dashboards, conditional logic, template inheritance, live monitoring, OpenRouter integration, and 310+ model support

                            • v2.6.2
                            • 17.74
                            • Published

                            @morphik/mcp

                            MCP server for Morphik multimodal database

                            • v1.0.13
                            • 17.62
                            • Published

                            @mixpeek/openai

                            OpenAI integration for Mixpeek — embedding bridge, function calling adapter, and assistant tools

                            • v1.0.0
                            • 17.11
                            • Published

                            @orga-ai/core

                            Framework-agnostic core for Orga AI SDK - real-time multimodal AI

                              • v1.1.1
                              • 16.86
                              • Published

                              @spark-agents/engram

                              Multimodal memory for OpenClaw agents — powered by Gemini Embedding-2. Hybrid search (BM25 + vectors), cross-encoder reranking, image/audio/PDF memory.

                              • v0.1.2
                              • 16.82
                              • Published

                              aio-llm

                              All-In-One LLM Framework - Multi-provider LLM integration with auto-fallback, priority management, multimodal support, and XML-based tool calling

                              • v1.0.6
                              • 16.77
                              • Published

                              sylix

                              The official Sylix AI SDK for TypeScript and JavaScript. Build intelligent applications with Helix models (helix-1.0, helix-code, helix-r1) featuring tool calling, reasoning, streaming, and OpenAI-compatible APIs. Enterprise-grade AI infrastructure for de

                              • v5.1.0
                              • 16.76
                              • Published

                              @mixpeek/langchain

                              LangChain integration for Mixpeek — retriever, tool, and document loader for LLM-powered applications

                              • v1.0.0
                              • 16.67
                              • Published

                              mcp-mia-narrative

                              MCP server for narrative audio generation - enabling LLMs to create immersive audio experiences

                                • v1.1.2
                                • 16.33
                                • Published

                                n8n-nodes-gemini-file-manager

                                n8n node for managing files with Google Gemini Files API - batch upload, list, and manage files for AI workflows

                                • v0.1.0
                                • 16.14
                                • Published

                                unocr

                                Unified OCR library with multi-driver support for Tesseract.js and AI models, providing structured text extraction using hast-based output format

                                • v0.0.3
                                • 15.76
                                • Published

                                @mixpeek/anthropic

                                Anthropic integration for Mixpeek — tool definitions, content adapters, and MCP server for Claude

                                • v1.0.0
                                • 15.56
                                • Published

                                @ashvardanian/uform

                                Pocket-Sized Multimodal AI for Content Understanding and Generation

                                  • v2.0.2
                                  • 15.52
                                  • Published

                                  @unum-cloud/uform

                                  Pocket-Sized Multimodal AI for Content Understanding and Generation

                                    • v3.1.4
                                    • 15.33
                                    • Published

                                    botrun-pdf-multimodal

                                    PDF multimodal conversion MCP tool for Claude Code and Gemini CLI

                                    • v1.0.2
                                    • 15.15
                                    • Published

                                    channel3-sdk

                                    The official TypeScript/JavaScript SDK for Channel3 AI Shopping API

                                    • v1.0.1
                                    • 15.12
                                    • Published

                                    ai-zhipu-free-sdk

                                    智谱AI免费模型快捷接入库 - 支持文本对话和多模态看图,浏览器与Node.js通用

                                    • v0.0.1
                                    • 14.95
                                    • Published

                                    @mixpeek/wordpress

                                    WordPress integration for Mixpeek — REST API handlers, post enrichment, and hook management

                                    • v1.0.0
                                    • 14.93
                                    • Published

                                    exocor

                                    Multimodal intent layer for enterprise web applications

                                    • v0.2.1
                                    • 14.81
                                    • Published

                                    zerolabel

                                    Zero-shot multimodal classification SDK - classify text and images with custom labels, no training required

                                    • v1.0.18
                                    • 14.14
                                    • Published

                                    glm4v-cli

                                    GLM-4V 视觉模型 CLI 工具 - 支持本地图片分析

                                    • v1.0.1
                                    • 13.97
                                    • Published

                                    @arclabs561/rank-refine

                                    SIMD-accelerated MaxSim (ColBERT/ColPali), cosine similarity, diversity (MMR/DPP), token alignment/highlighting for vector search and RAG. Supports text and multimodal late interaction.

                                    • v0.7.36
                                    • 13.74
                                    • Published

                                    agentlify-js

                                    Official JavaScript/TypeScript library for the Agentlify API - OpenAI-compatible interface for intelligent model routing

                                    • v1.1.2
                                    • 13.47
                                    • Published

                                    @mixpeek/gcp-functions

                                    Google Cloud Functions integration for Mixpeek — handler wrappers, event routing, and response formatting

                                    • v1.0.0
                                    • 13.10
                                    • Published

                                    @multiface.js/fusion

                                    Multi-modal input fusion engine for simultaneous interaction handling

                                      • v1.0.5
                                      • 13.10
                                      • Published

                                      n8n-nodes-unicraft

                                      UniCraft N8N custom nodes - Unified AI Model Router with Multi-Modal Support by CloudCraft Labs for OpenAI, Anthropic, Google Gemini, and more

                                      • v2.1.8
                                      • 12.80
                                      • Published

                                      @robilabs/lexa

                                      JavaScript SDK for Lexa, the AI Model by Robi Labs. Build with powerful chat, text, and multimodal APIs.

                                      • v2.0.1
                                      • 12.50
                                      • Published

                                      @mixpeek/llamaindex

                                      LlamaIndex integration for Mixpeek — reader, retriever, and tool spec for RAG applications

                                      • v1.0.0
                                      • 12.45
                                      • Published

                                      contextual-agent-sdk

                                      SDK for building AI agents with seamless voice-text context switching

                                      • v1.3.3
                                      • 12.25
                                      • Published

                                      @liustack/modlens

                                      CLI tool to provide visual understanding for non-vision LLMs

                                      • v0.1.0
                                      • 12.05
                                      • Published

                                      @multiface.js/sensors

                                      Device sensor integration for multimodal interactions

                                        • v1.0.5
                                        • 11.86
                                        • Published

                                        jimeng4-mcp

                                        火山引擎即梦AI多模态生成服务MCP工具 - 完整支持4.0/3.1/3.0所有最新模型

                                          • v1.0.9
                                          • 11.64
                                          • Published

                                          n8n-nodes-gemini-with-grounding-search

                                          n8n community node for Google Gemini AI with support for text, image, audio, video, document processing and grounding search

                                          • v0.1.0
                                          • 11.32
                                          • Published

                                          @multiface.js/context

                                          Context awareness and memory management for multimodal interactions

                                            • v1.0.5
                                            • 11.32
                                            • Published

                                            @mixpeek/ui

                                            Mixpeek component library — shadcn/ui primitives + API-bound components with autonomous feedback loops

                                            • v0.1.0
                                            • 11.32
                                            • Published

                                            @workstudio/ai

                                            Unified interface for AI agents supporting OpenAI, Gemini, Anthropic, and xAI with multimodal and tool-calling support.

                                            • v1.0.1
                                            • 11.08
                                            • Published

                                            @cohesiumai/modules-vlm

                                            Local Vision-Language Model module for browser-ai - image understanding

                                              • v2.1.2
                                              • 11.08
                                              • Published

                                              gemini-multimodal-mcp

                                              MCP server with multimodal capabilities - process documents, images, videos, audio using Gemini Pro with 1M context window

                                              • v1.1.4
                                              • 11.07
                                              • Published

                                              chakra-multi-modal

                                              A Chakra UI Multi Modal - one modal with multiple, switchable sections

                                              • v1.0.1
                                              • 11.07
                                              • Published

                                              receipt-ocr

                                              A TypeScript library for extracting structured product data from receipt images using multimodal LLMs

                                              • v1.0.5
                                              • 11.02
                                              • Published

                                              unimodaly-ingest

                                              A unified data-ingestion CLI that auto-detects and converts text, image, audio and tabular sources into standardized training datasets

                                              • v1.0.0
                                              • 11.02
                                              • Published

                                              awesome-context-scribe

                                              A CLI tool powered by Google Gemini AI to transcribe audio files into structured text conversations with automatic speaker recognition

                                              • v1.0.0
                                              • 10.89
                                              • Published

                                              rate-limiter-multimodal

                                              Rate limiter middleware for Express.js that allows very tight limits while providing a seamless experience to the users.

                                              • v1.0.1
                                              • 10.89
                                              • Published

                                              @mixpeek/huggingface

                                              Hugging Face integration for Mixpeek — model bridging, dataset sync, and pipeline adaptation

                                              • v1.0.0
                                              • 10.73
                                              • Published

                                              icon-generator-mcp

                                              Zero-dependency MCP server for AI-powered SVG icon generation with multimodal LLM support

                                              • v0.5.0
                                              • 10.71
                                              • Published

                                              @mixpeek/gcs

                                              Google Cloud Storage integration for Mixpeek — watch buckets, enrich objects, and parse GCS events

                                              • v1.0.0
                                              • 10.60
                                              • Published

                                              jimeng-ai-v31-mcp

                                              火山引擎即梦AI v3.1 多模态生成服务MCP工具(更新到文生图3.1模型)

                                              • v1.0.0
                                              • 10.55
                                              • Published

                                              multimodal-search-bar

                                              Componente de barra de pesquisa multimodal para React com suporte a texto multilinhas e imagens

                                              • v0.1.0
                                              • 10.19
                                              • Published

                                              @mixpeek/aws-s3

                                              AWS S3 integration for Mixpeek — watch buckets for new objects, enrich content, and parse S3 events

                                              • v1.0.0
                                              • 10.19
                                              • Published

                                              llama-latex

                                              Image to LaTeX with Llama 3.2 Vision.

                                              • v0.0.4
                                              • 10.19
                                              • Published

                                              @mixpeek/databricks

                                              Databricks integration for Mixpeek — notebook helpers, Delta Lake integration, and Unity Catalog connector

                                              • v1.0.0
                                              • 10.07
                                              • Published

                                              @overtunned/mcp

                                              MCP server for Morphik multimodal database

                                              • v1.0.14
                                              • 10.04
                                              • Published

                                              @mixpeek/spark

                                              Apache Spark integration for Mixpeek — UDF transformers, batch processing, and schema mapping

                                              • v1.0.0
                                              • 10.04
                                              • Published

                                              @mixpeek/datadog

                                              Datadog integration for Mixpeek — metrics, logs, and distributed tracing for enrichment pipelines

                                              • v1.0.0
                                              • 10.03
                                              • Published

                                              @mixpeek/pagerduty

                                              PagerDuty integration for Mixpeek — incident management, alert routing, and health monitoring

                                              • v1.0.0
                                              • 10.03
                                              • Published

                                              @mixpeek/azure-blob

                                              Azure Blob Storage integration for Mixpeek — watch containers, enrich blobs, and parse Event Grid events

                                              • v1.0.0
                                              • 9.77
                                              • Published

                                              @mixpeek/prebid-server

                                              Mixpeek Server-Side RTD Reference for Prebid Server - Reference implementation and test harness for OpenRTB bid request enrichment with identical field mappings to the Go data provider

                                              • v1.0.1
                                              • 9.77
                                              • Published

                                              banana-nft-sdk-ts

                                              🍌 BananaNFT TypeScript SDK - Revolutionary AI-powered NFT generation with nano-banana intelligence

                                              • v2.0.0
                                              • 9.62
                                              • Published

                                              nexusai-sdk

                                              JavaScript SDK for NexusAI - AI Agent Platform for Businesses

                                              • v1.0.2
                                              • 9.58
                                              • Published

                                              capacitor-gemma-3n

                                              Capacitor plugin for Google Gemma 3n on-device AI via MediaPipe LLM Inference API

                                              • v1.0.0
                                              • 9.58
                                              • Published

                                              @mixpeek/sanity

                                              Sanity.io integration for Mixpeek — webhook handling, document enrichment, and GROQ-powered queries

                                              • v1.0.0
                                              • 9.58
                                              • Published

                                              @mixpeek/strapi

                                              Strapi integration for Mixpeek — lifecycle hooks, content enrichment, and plugin configuration

                                              • v1.0.0
                                              • 9.52
                                              • Published

                                              @mixpeek/google-ad-manager

                                              Mixpeek contextual enrichment for Google Ad Manager - Privacy-safe targeting keys with IAB taxonomy, sentiment, brand safety, and ad adjacency signals

                                              • v1.0.0
                                              • 9.48
                                              • Published

                                              mcp-video-analyser

                                              MCP server for video analysis using Google Gemini and Moonshot Kimi AI

                                              • v0.1.0
                                              • 9.48
                                              • Published

                                              snaxel-query-engine

                                              A robust, modular web scraping library for multimodal content discovery

                                              • v1.0.2
                                              • 9.48
                                              • Published

                                              @mixpeek/snowflake

                                              Snowflake integration for Mixpeek — external functions, stream processing, and data enrichment

                                              • v1.0.0
                                              • 9.47
                                              • Published

                                              llm-mcp-server

                                              MCP server for LLM API with multimodal support (text, image, audio, video)

                                                • v1.0.0
                                                • 9.37
                                                • Published

                                                @mixpeek/grafana

                                                Grafana integration for Mixpeek — dashboard provisioning, annotations, and Prometheus metric export

                                                • v1.0.0
                                                • 9.37
                                                • Published

                                                zelai-cloud-sdk

                                                Official ZelAI SDK - Multimodal AI for autonomous image/video generation, vision analysis, agentic LLM workflows, STT speech-to-text, and TTS text-to-speech

                                                • v1.12.0
                                                • 9.37
                                                • Published

                                                @mixpeek/kafka

                                                Apache Kafka integration for Mixpeek — consume events, produce enrichment results, and transform messages

                                                • v1.0.0
                                                • 9.16
                                                • Published

                                                @mixpeek/shopify

                                                Shopify integration for Mixpeek — webhook handling, product enrichment, and Admin API integration

                                                • v1.0.0
                                                • 9.05
                                                • Published

                                                @mixpeek/aws-lambda

                                                AWS Lambda integration for Mixpeek — handler wrappers, event routing, and response formatting for serverless enrichment

                                                • v1.0.0
                                                • 9.05
                                                • Published

                                                @mixpeek/airflow

                                                Apache Airflow integration for Mixpeek — custom operators, DAG generators, and task builders

                                                • v1.0.0
                                                • 9.05
                                                • Published

                                                @mixpeek/sentry

                                                Sentry integration for Mixpeek — error tracking, performance monitoring, and enrichment pipeline observability

                                                • v1.0.0
                                                • 8.75
                                                • Published

                                                @mixpeek/prometheus

                                                Prometheus metrics exporter for Mixpeek — expose enrichment metrics, latency histograms, and custom collectors

                                                • v1.0.0
                                                • 8.62
                                                • Published

                                                @mixpeek/contentful

                                                Contentful integration for Mixpeek — webhook handling, content enrichment, and management API integration

                                                • v1.0.0
                                                • 8.56
                                                • Published

                                                bella-omni-vue

                                                Vue 3 plugin for Qwen Omni realtime multimodal conversation

                                                  • v1.0.0
                                                  • 8.45
                                                  • Published

                                                  @stylor/embeddings

                                                  JavaScript SDK for Stylor Embedding API - multimodal embeddings for text and images

                                                    • v1.0.0
                                                    • 8.45
                                                    • Published

                                                    gemini-cli-sidx1fork

                                                    Fork of Google's Gemini CLI by sidx1. CLI tool for accessing Gemini AI with enhancements by sidx1.

                                                    • v1.0.4
                                                    • 8.16
                                                    • Published

                                                    llmplug

                                                    A library to easily integrate various LLM models and vendors into applications, with advanced features.

                                                      • v0.1.0
                                                      • 8.16
                                                      • Published

                                                      gemini-executor

                                                      Integration layer combining Google's Gemini CLI with Claude Code for complementary AI collaboration

                                                      • v0.1.0
                                                      • 7.89
                                                      • Published

                                                      n8n-nodes-aurora-ai

                                                      n8n community node for the Aurora AI multimodal API

                                                        • v1.0.0
                                                        • 7.18
                                                        • Published

                                                        google-genai-live-lib

                                                        A library for interacting with Google's Generative AI models in real-time

                                                          • v0.1.4
                                                          • 7.04
                                                          • Published

                                                          celesteai

                                                          Celeste AI Framework - JavaScript/TypeScript SDK (Placeholder)

                                                          • v0.0.1
                                                          • 6.39
                                                          • Published

                                                          visual-primitives-mcp

                                                          基于 DeepSeek Visual Primitives 论文的多模态空间锚定 MCP 服务器,将视觉模型的精确坐标推理能力封装为标准 MCP 工具

                                                          • v1.1.0
                                                          • 0.00
                                                          • Published

                                                          @lanmower/agentapi

                                                          Anthropic SDK to multi-provider streaming bridge — converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs

                                                          • v1.0.35
                                                          • 0.00
                                                          • Published

                                                          @daguito/sdk

                                                          Official TypeScript SDK for the Daguito conversational AI platform — text, voice, image, and multimodal flows over webhooks and the embeddable widget.

                                                          • v0.1.0
                                                          • 0.00
                                                          • Published

                                                          koishi-plugin-fastgpt-bot

                                                          对接 FastGPT Chat Completions API 的 Koishi 智能对话插件,支持多轮对话与多模态图片识别

                                                            • v1.0.1
                                                            • 0.00
                                                            • Published

                                                            @robinpath/google-gemini

                                                            Google Gemini integration — generateContent, multi-turn chat, multimodal vision, embeddings, token counting, and model listing. Uses the encrypted credential vault for API keys.

                                                              • v0.3.0
                                                              • 0.00
                                                              • Published