Found 232 results for multimodal

@promptbook/utils

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

Glin-Profanity is a lightweight and efficient npm package designed to detect and filter profane language in text inputs across multiple languages. Whether you’re building a chat application, a comment section, or any platform where user-generated content

@promptbook/anthropic-claude

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

thebird

Anthropic SDK to Gemini streaming bridge — drop-in proxy that translates Anthropic message format and tool calls to Google Gemini

modelfusion

The TypeScript library for building AI applications.

llama-cpp-capacitor

A native Capacitor plugin that embeds llama.cpp directly into mobile apps, enabling offline AI inference with chat-first API design. Complete iOS and Android support: text generation, chat, multimodal, TTS, LoRA, embeddings, and more.

@promptbook/core

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/remote-client

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/remote-server

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

agentkits

Agent Model Layer — One-line LLM access with built-in memory. 29 providers, 18 embeddings, zero lock-in.

@promptbook/types

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/openai

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/pdf

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/markdown-utils

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/legacy-documents

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/fake-llm

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/templates

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

promptbook

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/google

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/vercel

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/node

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/editable

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/documents

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/cli

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/azure-openai

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/ollama

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/color

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/javascript

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/browser

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/markitdown

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/deepseek

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/wizard

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/website-crawler

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@promptbook/components

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@space3-npm/cybersoul-client

Cyber-Soul multimodal character interaction SDK by Space3 Digital Media Tech Studio

ptbk

Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action

@speechly/browser-client

JavaScript client for Speechly Streaming API

acptoapi

Anthropic SDK to multi-provider streaming bridge — converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs

@fre4x/gemini

A Gemini MCP server providing multimodal analysis and image/video generation.

clawplay

Unified multimodal CLI for ClawPlay — image generation, vision analysis, and LLM via Ark + Gemini relay

claude-video-vision

MCP server that gives Claude Code the ability to watch and understand videos — extracts frames via ffmpeg and processes audio via multiple backends

@speechly/react-client

React client for Speechly Streaming API

@speechly/speech-recognition-polyfill

Polyfill for the Speech Recognition API using Speechly

@sogni-ai/sogni-client

Sogni SDK - AI image, video & audio generation plus LLM chat with vision via the Sogni Supernet (Stable Diffusion, Flux, WAN 2.2, LTX-2, Qwen VLM)

@shift-preflight/runtime

Multimodal preflight for any AI agent — AI SDK middleware for transparent image optimization

@lab94/frenchie

Frenchie — your agent's best friend. MCP-first multimodal Kit, Method workflow toolkit, and stdio MCP server for agents: OCR, transcription, file extraction, image generation, and product development process.

@mcptoolshop/style-dataset-lab

Canon-aligned dataset production and generation workbench — define visual rules, build versioned training data, compile production briefs, run local workflows, batch-produce, select winners, and re-ingest into your corpus

@promptbook/wizzard

Promptbook: Run AI apps in plain human language across multiple models and platforms

@dooor-ai/cortexdb

Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing

@hari7261/ainative-client

AINative client runtime for building AI-driven UIs

@hzttt/multimodal-rag

OpenClaw plugin for multimodal RAG - semantic indexing and time-aware search for images and audio using local AI models

@yyc3/core

YYC³ AI Family 核心包 — 统一认证、MCP协议、技能系统、八位AI家人智能体、多模态处理

@harrylabs/llm-knowledge-bases

Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.

@kessler/gemma

Node.js package for Gemma 4 — local multimodal inference and agent system

modelfusion-experimental

Experimental ModelFusion features

@captain-sdk/openclaw-captain

Captain multimodal search plugin for OpenClaw — search text, images, video, and audio with natural language

babysea

Open source TypeScript SDK for the BabySea execution control plane for generative media.

@shift-preflight/opencode-plugin

OpenCode plugin — auto-starts the SHIFT image optimization proxy for transparent token savings

llm-frames

Extract video frames for LLM context injection

n8n-nodes-universal-llm-vision

n8n nodes for Universal LLM Vision - Includes standalone node and langchain-compatible Vision Chain

@villutur/gemini-ai-lib

Reusable TypeScript wrappers and helpers for Google's Gemini API.

lucid-mcp-server

Model Context Protocol (MCP) server for Lucid App integration with multimodal AI analysis

vidclaude

Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video

@captain-sdk/captain-mcp

MCP server for Captain — multimodal RAG search and live project search. Usable from Claude Code, Cursor, and any MCP-aware client.

@goonnguyen/human-mcp

Human MCP: Bringing Human Capabilities to Coding Agents

bordair

Official JavaScript/TypeScript SDK + CLI for the Bordair AI security stack - detect prompt injection programmatically or test any LLM against the 503K-sample multimodal dataset

@echosaw/mcp-server

Echosaw MCP Server - Media intelligence for AI assistants. Connect your LLM to Echosaw and analyze media directly within your workflow.

n8n-nodes-gemini-ai

n8n community node for Google Gemini AI integration with text generation, file upload & analysis, and TTS (Text-to-Speech) support

@yummysource/yummycli

AI-friendly CLI for multimodal model providers (Gemini, and more)

n8n-nodes-vertex-ai-full

n8n community node for Google Vertex AI with advanced multimodal capabilities

@vespermcp/mcp-server

AI-powered dataset discovery, quality analysis, and preparation MCP server with multimodal support (text, image, audio, video)

pdfvision

Extract text, metadata, and page images from PDF files. Designed for AI agents.

apiz-mcp

MCP (Model Context Protocol) server for apiz.ai — exposes generate / get_result / search_models / guide / account / speak / parse_video / transfer_url tools

@marsulta/mailman

A structured interagent messaging protocol — send validated, typed packets between roles, tools, and services in multi-agent systems.

@vowel.to/client

Framework-agnostic voice agent library powered by Google Gemini Live API

noosphere

Unified AI creation engine — text, image, video, audio across all providers

apiz-sdk

Official TypeScript SDK for the apiz.ai AI generation platform

overture-mcp

MCP Server for Claude Code, Cursor, Cline, Copilot, Github Copilot, Windsurf - Visual AI Agent Plan Execution, Approval Workflow, Plan Visualization, Agent Orchestration. See what your AI is thinking before it writes code. Works with Claude, GPT, Gemini,

node-red-contrib-gemini

Node-RED nodes for Google Gemini AI integration including text generation, chat, vision, image generation, speech generation, and audio understanding capabilities

@sami7786/opencode-image-proxy

OpenCode plugin that proxies images through a vision-capable model, enabling image-incapable models to "see"

n8n-nodes-siliconflow

n8n community node for SiliconFlow AI models - chat completions, vision language models, embeddings, and reranking

@orka-js/multimodal

Multimodal utilities and agents for OrkaJS - Vision, Audio, Cross-modal workflows

@harrylabs/llm-wiki-karpathy

Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.

qwen-vision-mcp

MCP server for image analysis using Qwen 3.5 via Ollama. Adds vision capabilities to models without native image support.

clawdrive

Google Drive for AI agents. Multimodal semantic search and 3D file visualization.

weiclaw

一条命令，把微信变成任何 AI Agent 的入口

2dai-cloud-sdk

Official 2DAI SDK - Multimodal AI for autonomous image/video generation, vision analysis, agentic LLM workflows, STT speech-to-text, and TTS text-to-speech

n8n-nodes-vertex-advanced

n8n community node for Google Vertex AI with advanced multimodal capabilities

@intentsolutionsio/003-jeremy-vertex-ai-media-master

Comprehensive Google Vertex AI multimodal mastery for Jeremy - video processing (6+ hours), audio generation, image creation with Gemini 2.0/2.5 and Imagen 4. Marketing campaign automation, content ge

av-blocks

Copy-paste UI blocks for audio-visual model inspection. shadcn-style: no install, copy the component.

@intentsolutionsio/zai-cli

Z.AI vision, search, reader, and GitHub exploration via CLI and MCP. Analyze images, search the web, read pages as markdown, explore repos.

ai-powered

Unified AI client and CLI: multi-modal, multi-provider, browser-safe, fully mock-able

n8n-nodes-kimi

n8n community plugin for Kimi (Moonshot) Chat Models

mlx-vlm

Native Vision-Language Model support for Node.js. Process images and video with multimodal LLMs using MLX and Apple Silicon.

@yyc3/emotion

YYC³ AI Family 情感引擎 — 多模态情感融合 + 情绪音乐桥接 + 事件总线

@mrgoonie/multix

AI multimodal CLI — Gemini (incl. Flash TTS), MiniMax, OpenRouter, Leonardo, BytePlus, ElevenLabs, ffmpeg, ImageMagick, doc-to-md

jimeng-ai-mcp

火山引擎即梦AI多模态生成服务MCP工具

claude-code-gemini-ui-assistant-mcp

MCP server for Claude Code - collaborative UI generation: Claude orchestrates, Gemini generates components

vision-mcp

MCP server providing multimodal vision capabilities via OpenAI-compatible APIs

sight-mcp

MCP server for AI-powered image and video analysis - supports OpenAI, Claude, and multimodal vision APIs with local file and URL processing

koishi-plugin-aka-chatluna-lark-message-bridge

Draft ChatLuna bridge plugin for preserving selected Lark message resources as model-consumable multimodal content.

ai-form-response-extractor

Hybrid paper + digital form collection powered by multimodal LLMs

mmir-lib

MMIR (Mobile Multimodal Interaction and Relay) library

bard-api-node

A Node.js library harnessing the power of Bard's Large Language Model (LLM) for seamless chat experiences and streamlined accessibility to Google's Gemini. Empower your applications with advanced conversational AI, leveraging Bard's LLM to answer question

@pshkv/os-core

SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.

pplx-zero

Fast, minimal Perplexity AI CLI with local RAG. Stream answers, search your notes, analyze files. Zero bloat.

claude-gateway

OpenAI-compatible gateway for Claude Code subscription - Use your Claude subscription with any OpenAI-compatible tool

jimeng4.0-mcp-steve

火山引擎即梦AI多模态生成服务MCP工具 - 完整支持4.0/3.1/3.0所有最新模型

@sint-ai/os-core

SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.

omni-llm

A unified, zero-dependency interface for multiple LLM providers (OpenAI, Anthropic, Google, Groq, Ollama, xAI, DeepSeek) with streaming, tool calling, vision, and thinking mode support

ai-pp3

CLI tool combining multimodal AI analysis with RawTherapee's engine to generate optimized PP3 profiles for RAW photography. Features automatic histogram analysis for enhanced AI processing.

tekimax-omat

Open Multimodal Assessment Toolkit — a type-safe AI SDK for building human-centered, multimodal AI applications.

@thenomadinorbit/vision-mcp-server

MCP server for AI vision analysis via OpenRouter

@msbayindir/context-rag

A powerful, multimodal RAG engine with contextual retrieval, auto-prompt discovery, and PostgreSQL-native vector search

knowfun-skills

KnowFun AI multimodal content generator - Create AI PPT/presentations, posters, interactive games, and educational films. Lightning-fast REST API for educators, content creators, and learners. Integrate with Claude Code, Cursor, Cline, OpenClaw AI assista

@mixpeek/n8n-nodes-mixpeek

n8n community node for Mixpeek - multimodal data processing and semantic search API

n8n-nodes-rahyana

Production-ready n8n community node for Rahyana.ir (chat-first, multimodal, streaming-ready)

@perceptron-ai/mcp-server

Perceptron MCP server for high-accuracy visual perception powered by fast, efficient vision-language models

claude-gemini-multimodal-bridge

Enterprise-grade AI integration bridge connecting Claude Code, Gemini CLI, and Google AI Studio with intelligent routing and advanced multimodal processing capabilities

geminisst

Revolutionary high-accuracy Audio-to-Text library powered by Gemini 2.5 Flash Lite with 1M+ context window.

n8n-nodes-openroutercached

n8n node for OpenRouter API integration with advanced features: prompt caching, reasoning tokens, vision, streaming, and more

@pickstar-2002/homework-grading-mcp

📝 AI驱动的作业批改MCP - 基于Qwen3-VL多模态模型的智能作业批改服务

@unwarkz/n8n-nodes-qdrant

N8N community node for Qdrant vector store — semantic search, embedding storage, and full collection management for AI Agent workflows

iflow-mcp-sixhq-overture

@mixpeek/prebid

Mixpeek RTD (Real-Time Data) Adapter for Prebid.js - Privacy-first contextual targeting with sub-100ms performance, ad adjacency awareness, and cookie-free bid enrichment

rag-lite-ts

Local-first TypeScript retrieval engine with Chameleon Multimodal Architecture for semantic search over text and image content

@mintmcqueen/gemini-mcp

Enhanced MCP server for Google Gemini 3 with Image Generation, Batch API integration (50% cost, async processing), advanced file handling, and conversation management. Features Gemini 3 Pro (default) and Gemini 3 Pro Image models with state-of-the-art rea

noodle-perplexity-mcp

A Perplexity API Model Context Protocol (MCP) server that unlocks Perplexity's search-augmented AI capabilities for LLM agents. Features robust error handling, secure input validation, transparent reasoning, and multimodal support with file attachments (P

@hoangdn3/openrouter-vision-mcp

MCP server for AI vision analysis via OpenRouter

pagebraid-mcp

A local PDF MCP server with budget-first multimodal PDF reading.

@philtzjp/omni-subagents

Claude Code subagents for multimodal capabilities via OpenCode + OpenRouter

mixpeek

Official Mixpeek TypeScript/JavaScript SDK for multimodal data processing and retrieval

@callmedayz/ai-prompt-toolkit

Professional AI prompt engineering toolkit with advanced template features, real-time dashboards, conditional logic, template inheritance, live monitoring, OpenRouter integration, and 310+ model support

@morphik/mcp

MCP server for Morphik multimodal database

ai-sdk-agent-sdk

@mixpeek/openai

OpenAI integration for Mixpeek — embedding bridge, function calling adapter, and assistant tools

@orga-ai/core

Framework-agnostic core for Orga AI SDK - real-time multimodal AI

@spark-agents/engram

Multimodal memory for OpenClaw agents — powered by Gemini Embedding-2. Hybrid search (BM25 + vectors), cross-encoder reranking, image/audio/PDF memory.

aio-llm

All-In-One LLM Framework - Multi-provider LLM integration with auto-fallback, priority management, multimodal support, and XML-based tool calling

@iflow-mcp/smartzan63-lucid-mcp-server

Model Context Protocol (MCP) server for Lucid App integration with multimodal AI analysis

sylix

The official Sylix AI SDK for TypeScript and JavaScript. Build intelligent applications with Helix models (helix-1.0, helix-code, helix-r1) featuring tool calling, reasoning, streaming, and OpenAI-compatible APIs. Enterprise-grade AI infrastructure for de