@promptbook/utils
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Found 232 results for multimodal
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
AI library for Mux
Glin-Profanity is a lightweight and efficient npm package designed to detect and filter profane language in text inputs across multiple languages. Whether you’re building a chat application, a comment section, or any platform where user-generated content
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Anthropic SDK to Gemini streaming bridge — drop-in proxy that translates Anthropic message format and tool calls to Google Gemini
The TypeScript library for building AI applications.
A native Capacitor plugin that embeds llama.cpp directly into mobile apps, enabling offline AI inference with chat-first API design. Complete iOS and Android support: text generation, chat, multimodal, TTS, LoRA, embeddings, and more.
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Agent Model Layer — One-line LLM access with built-in memory. 29 providers, 18 embeddings, zero lock-in.
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
Cyber-Soul multimodal character interaction SDK by Space3 Digital Media Tech Studio
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
JavaScript client for Speechly Streaming API
Anthropic SDK to multi-provider streaming bridge — converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs
A Gemini MCP server providing multimodal analysis and image/video generation.
Unified multimodal CLI for ClawPlay — image generation, vision analysis, and LLM via Ark + Gemini relay
MCP server that gives Claude Code the ability to watch and understand videos — extracts frames via ffmpeg and processes audio via multiple backends
React client for Speechly Streaming API
Polyfill for the Speech Recognition API using Speechly
Sogni SDK - AI image, video & audio generation plus LLM chat with vision via the Sogni Supernet (Stable Diffusion, Flux, WAN 2.2, LTX-2, Qwen VLM)
Multimodal preflight for any AI agent — AI SDK middleware for transparent image optimization
Frenchie — your agent's best friend. MCP-first multimodal Kit, Method workflow toolkit, and stdio MCP server for agents: OCR, transcription, file extraction, image generation, and product development process.
Canon-aligned dataset production and generation workbench — define visual rules, build versioned training data, compile production briefs, run local workflows, batch-produce, select winners, and re-ingest into your corpus
Promptbook: Run AI apps in plain human language across multiple models and platforms
Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document processing
AINative client runtime for building AI-driven UIs
OpenClaw plugin for multimodal RAG - semantic indexing and time-aware search for images and audio using local AI models
YYC³ AI Family 核心包 — 统一认证、MCP协议、技能系统、八位AI家人智能体、多模态处理
Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.
Node.js package for Gemma 4 — local multimodal inference and agent system
Experimental ModelFusion features
Captain multimodal search plugin for OpenClaw — search text, images, video, and audio with natural language
Open source TypeScript SDK for the BabySea execution control plane for generative media.
OpenCode plugin — auto-starts the SHIFT image optimization proxy for transparent token savings
Extract video frames for LLM context injection
n8n nodes for Universal LLM Vision - Includes standalone node and langchain-compatible Vision Chain
Reusable TypeScript wrappers and helpers for Google's Gemini API.
Model Context Protocol (MCP) server for Lucid App integration with multimodal AI analysis
Multimodal video understanding for Claude Code — extract frames, transcribe audio, build timelines from any video
MCP server for Captain — multimodal RAG search and live project search. Usable from Claude Code, Cursor, and any MCP-aware client.
Human MCP: Bringing Human Capabilities to Coding Agents
Official JavaScript/TypeScript SDK + CLI for the Bordair AI security stack - detect prompt injection programmatically or test any LLM against the 503K-sample multimodal dataset
Echosaw MCP Server - Media intelligence for AI assistants. Connect your LLM to Echosaw and analyze media directly within your workflow.
n8n community node for Google Gemini AI integration with text generation, file upload & analysis, and TTS (Text-to-Speech) support
AI-friendly CLI for multimodal model providers (Gemini, and more)
n8n community node for Google Vertex AI with advanced multimodal capabilities
AI-powered dataset discovery, quality analysis, and preparation MCP server with multimodal support (text, image, audio, video)
Extract text, metadata, and page images from PDF files. Designed for AI agents.
MCP (Model Context Protocol) server for apiz.ai — exposes generate / get_result / search_models / guide / account / speak / parse_video / transfer_url tools
A structured interagent messaging protocol — send validated, typed packets between roles, tools, and services in multi-agent systems.
Framework-agnostic voice agent library powered by Google Gemini Live API
Unified AI creation engine — text, image, video, audio across all providers
Official TypeScript SDK for the apiz.ai AI generation platform
MCP Server for Claude Code, Cursor, Cline, Copilot, Github Copilot, Windsurf - Visual AI Agent Plan Execution, Approval Workflow, Plan Visualization, Agent Orchestration. See what your AI is thinking before it writes code. Works with Claude, GPT, Gemini,
Node-RED nodes for Google Gemini AI integration including text generation, chat, vision, image generation, speech generation, and audio understanding capabilities
OpenCode plugin that proxies images through a vision-capable model, enabling image-incapable models to "see"
n8n community node for SiliconFlow AI models - chat completions, vision language models, embeddings, and reranking
Multimodal utilities and agents for OrkaJS - Vision, Audio, Cross-modal workflows
Representation-first multimodal Markdown wiki runtime for Obsidian vaults, with standalone CLI, MCP server, and OpenClaw compatibility.
MCP server for image analysis using Qwen 3.5 via Ollama. Adds vision capabilities to models without native image support.
Google Drive for AI agents. Multimodal semantic search and 3D file visualization.
一条命令,把微信变成任何 AI Agent 的入口
Official 2DAI SDK - Multimodal AI for autonomous image/video generation, vision analysis, agentic LLM workflows, STT speech-to-text, and TTS text-to-speech
n8n community node for Google Vertex AI with advanced multimodal capabilities
Comprehensive Google Vertex AI multimodal mastery for Jeremy - video processing (6+ hours), audio generation, image creation with Gemini 2.0/2.5 and Imagen 4. Marketing campaign automation, content ge
Copy-paste UI blocks for audio-visual model inspection. shadcn-style: no install, copy the component.
Z.AI vision, search, reader, and GitHub exploration via CLI and MCP. Analyze images, search the web, read pages as markdown, explore repos.
Unified AI client and CLI: multi-modal, multi-provider, browser-safe, fully mock-able
n8n community plugin for Kimi (Moonshot) Chat Models
Native Vision-Language Model support for Node.js. Process images and video with multimodal LLMs using MLX and Apple Silicon.
YYC³ AI Family 情感引擎 — 多模态情感融合 + 情绪音乐桥接 + 事件总线
AI multimodal CLI — Gemini (incl. Flash TTS), MiniMax, OpenRouter, Leonardo, BytePlus, ElevenLabs, ffmpeg, ImageMagick, doc-to-md
火山引擎即梦AI多模态生成服务MCP工具
MCP server for Claude Code - collaborative UI generation: Claude orchestrates, Gemini generates components
MCP server providing multimodal vision capabilities via OpenAI-compatible APIs
MCP server for AI-powered image and video analysis - supports OpenAI, Claude, and multimodal vision APIs with local file and URL processing
Draft ChatLuna bridge plugin for preserving selected Lark message resources as model-consumable multimodal content.
Hybrid paper + digital form collection powered by multimodal LLMs
MMIR (Mobile Multimodal Interaction and Relay) library
A Node.js library harnessing the power of Bard's Large Language Model (LLM) for seamless chat experiences and streamlined accessibility to Google's Gemini. Empower your applications with advanced conversational AI, leveraging Bard's LLM to answer question
SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.
Fast, minimal Perplexity AI CLI with local RAG. Stream answers, search your notes, analyze files. Zero bloat.
OpenAI-compatible gateway for Claude Code subscription - Use your Claude subscription with any OpenAI-compatible tool
火山引擎即梦AI多模态生成服务MCP工具 - 完整支持4.0/3.1/3.0所有最新模型
SINT OS Core — unified entrypoint for the SINT Operating System. Combines OpenClaw runtime, SINT Protocol governance, avatar face, and multimodal bridge.
A unified, zero-dependency interface for multiple LLM providers (OpenAI, Anthropic, Google, Groq, Ollama, xAI, DeepSeek) with streaming, tool calling, vision, and thinking mode support
CLI tool combining multimodal AI analysis with RawTherapee's engine to generate optimized PP3 profiles for RAW photography. Features automatic histogram analysis for enhanced AI processing.
Open Multimodal Assessment Toolkit — a type-safe AI SDK for building human-centered, multimodal AI applications.
MCP server for AI vision analysis via OpenRouter
A powerful, multimodal RAG engine with contextual retrieval, auto-prompt discovery, and PostgreSQL-native vector search
KnowFun AI multimodal content generator - Create AI PPT/presentations, posters, interactive games, and educational films. Lightning-fast REST API for educators, content creators, and learners. Integrate with Claude Code, Cursor, Cline, OpenClaw AI assista
n8n community node for Mixpeek - multimodal data processing and semantic search API
Production-ready n8n community node for Rahyana.ir (chat-first, multimodal, streaming-ready)
Perceptron MCP server for high-accuracy visual perception powered by fast, efficient vision-language models
Enterprise-grade AI integration bridge connecting Claude Code, Gemini CLI, and Google AI Studio with intelligent routing and advanced multimodal processing capabilities
Revolutionary high-accuracy Audio-to-Text library powered by Gemini 2.5 Flash Lite with 1M+ context window.
n8n node for OpenRouter API integration with advanced features: prompt caching, reasoning tokens, vision, streaming, and more
📝 AI驱动的作业批改MCP - 基于Qwen3-VL多模态模型的智能作业批改服务
N8N community node for Qdrant vector store — semantic search, embedding storage, and full collection management for AI Agent workflows
MCP Server for Claude Code, Cursor, Cline, Copilot, Github Copilot, Windsurf - Visual AI Agent Plan Execution, Approval Workflow, Plan Visualization, Agent Orchestration. See what your AI is thinking before it writes code. Works with Claude, GPT, Gemini,
Mixpeek RTD (Real-Time Data) Adapter for Prebid.js - Privacy-first contextual targeting with sub-100ms performance, ad adjacency awareness, and cookie-free bid enrichment
Local-first TypeScript retrieval engine with Chameleon Multimodal Architecture for semantic search over text and image content
Enhanced MCP server for Google Gemini 3 with Image Generation, Batch API integration (50% cost, async processing), advanced file handling, and conversation management. Features Gemini 3 Pro (default) and Gemini 3 Pro Image models with state-of-the-art rea
A Perplexity API Model Context Protocol (MCP) server that unlocks Perplexity's search-augmented AI capabilities for LLM agents. Features robust error handling, secure input validation, transparent reasoning, and multimodal support with file attachments (P
MCP server for AI vision analysis via OpenRouter
A local PDF MCP server with budget-first multimodal PDF reading.
Claude Code subagents for multimodal capabilities via OpenCode + OpenRouter
Official Mixpeek TypeScript/JavaScript SDK for multimodal data processing and retrieval
Professional AI prompt engineering toolkit with advanced template features, real-time dashboards, conditional logic, template inheritance, live monitoring, OpenRouter integration, and 310+ model support
MCP server for Morphik multimodal database
<div align="center">
OpenAI integration for Mixpeek — embedding bridge, function calling adapter, and assistant tools
Framework-agnostic core for Orga AI SDK - real-time multimodal AI
Multimodal memory for OpenClaw agents — powered by Gemini Embedding-2. Hybrid search (BM25 + vectors), cross-encoder reranking, image/audio/PDF memory.
All-In-One LLM Framework - Multi-provider LLM integration with auto-fallback, priority management, multimodal support, and XML-based tool calling
Model Context Protocol (MCP) server for Lucid App integration with multimodal AI analysis
The official Sylix AI SDK for TypeScript and JavaScript. Build intelligent applications with Helix models (helix-1.0, helix-code, helix-r1) featuring tool calling, reasoning, streaming, and OpenAI-compatible APIs. Enterprise-grade AI infrastructure for de
LangChain integration for Mixpeek — retriever, tool, and document loader for LLM-powered applications
MCP server for combining text and image descriptions
MCP server for narrative audio generation - enabling LLMs to create immersive audio experiences
n8n node for managing files with Google Gemini Files API - batch upload, list, and manage files for AI workflows
Unified OCR library with multi-driver support for Tesseract.js and AI models, providing structured text extraction using hast-based output format
Anthropic integration for Mixpeek — tool definitions, content adapters, and MCP server for Claude
Pocket-Sized Multimodal AI for Content Understanding and Generation
Pocket-Sized Multimodal AI for Content Understanding and Generation
PDF multimodal conversion MCP tool for Claude Code and Gemini CLI
The official TypeScript/JavaScript SDK for Channel3 AI Shopping API
智谱AI免费模型快捷接入库 - 支持文本对话和多模态看图,浏览器与Node.js通用
WordPress integration for Mixpeek — REST API handlers, post enrichment, and hook management
Multimodal intent layer for enterprise web applications
Zero-shot multimodal classification SDK - classify text and images with custom labels, no training required
GLM-4V 视觉模型 CLI 工具 - 支持本地图片分析
SIMD-accelerated MaxSim (ColBERT/ColPali), cosine similarity, diversity (MMR/DPP), token alignment/highlighting for vector search and RAG. Supports text and multimodal late interaction.
Official JavaScript/TypeScript library for the Agentlify API - OpenAI-compatible interface for intelligent model routing
Google Cloud Functions integration for Mixpeek — handler wrappers, event routing, and response formatting
Multi-modal input fusion engine for simultaneous interaction handling
Browser client for Speechly API
A set of react components and hooks to help with multimodal input
UniCraft N8N custom nodes - Unified AI Model Router with Multi-Modal Support by CloudCraft Labs for OpenAI, Anthropic, Google Gemini, and more
JavaScript SDK for Lexa, the AI Model by Robi Labs. Build with powerful chat, text, and multimodal APIs.
LlamaIndex integration for Mixpeek — reader, retriever, and tool spec for RAG applications
SDK for building AI agents with seamless voice-text context switching
CLI tool to provide visual understanding for non-vision LLMs
Device sensor integration for multimodal interactions
火山引擎即梦AI多模态生成服务MCP工具 - 完整支持4.0/3.1/3.0所有最新模型
React Native specific components and utilities for multimodal UI
n8n community node for Google Gemini AI with support for text, image, audio, video, document processing and grounding search
Context awareness and memory management for multimodal interactions
Mixpeek component library — shadcn/ui primitives + API-bound components with autonomous feedback loops
Unified interface for AI agents supporting OpenAI, Gemini, Anthropic, and xAI with multimodal and tool-calling support.
Local Vision-Language Model module for browser-ai - image understanding
MCP server with multimodal capabilities - process documents, images, videos, audio using Gemini Pro with 1M context window
A Chakra UI Multi Modal - one modal with multiple, switchable sections
A TypeScript library for extracting structured product data from receipt images using multimodal LLMs
A unified data-ingestion CLI that auto-detects and converts text, image, audio and tabular sources into standardized training datasets
A CLI tool powered by Google Gemini AI to transcribe audio files into structured text conversations with automatic speaker recognition
Rate limiter middleware for Express.js that allows very tight limits while providing a seamless experience to the users.
Hugging Face integration for Mixpeek — model bridging, dataset sync, and pipeline adaptation
Zero-dependency MCP server for AI-powered SVG icon generation with multimodal LLM support
MCP server for Morphik multimodal database
Google Cloud Storage integration for Mixpeek — watch buckets, enrich objects, and parse GCS events
火山引擎即梦AI v3.1 多模态生成服务MCP工具(更新到文生图3.1模型)
Componente de barra de pesquisa multimodal para React com suporte a texto multilinhas e imagens
AWS S3 integration for Mixpeek — watch buckets for new objects, enrich content, and parse S3 events
Image to LaTeX with Llama 3.2 Vision.
Databricks integration for Mixpeek — notebook helpers, Delta Lake integration, and Unity Catalog connector
MCP server for Morphik multimodal database
Apache Spark integration for Mixpeek — UDF transformers, batch processing, and schema mapping
Datadog integration for Mixpeek — metrics, logs, and distributed tracing for enrichment pipelines
PagerDuty integration for Mixpeek — incident management, alert routing, and health monitoring
Azure Blob Storage integration for Mixpeek — watch containers, enrich blobs, and parse Event Grid events
Mixpeek Server-Side RTD Reference for Prebid Server - Reference implementation and test harness for OpenRTB bid request enrichment with identical field mappings to the Go data provider
MCP server for AI vision analysis via OpenRouter
🍌 BananaNFT TypeScript SDK - Revolutionary AI-powered NFT generation with nano-banana intelligence
JavaScript SDK for NexusAI - AI Agent Platform for Businesses
Capacitor plugin for Google Gemma 3n on-device AI via MediaPipe LLM Inference API
Sanity.io integration for Mixpeek — webhook handling, document enrichment, and GROQ-powered queries
Human MCP: Bringing Human Capabilities to Coding Agents
Strapi integration for Mixpeek — lifecycle hooks, content enrichment, and plugin configuration
Mixpeek contextual enrichment for Google Ad Manager - Privacy-safe targeting keys with IAB taxonomy, sentiment, brand safety, and ad adjacency signals
MCP server for video analysis using Google Gemini and Moonshot Kimi AI
A robust, modular web scraping library for multimodal content discovery
Snowflake integration for Mixpeek — external functions, stream processing, and data enrichment
MCP server for LLM API with multimodal support (text, image, audio, video)
Grafana integration for Mixpeek — dashboard provisioning, annotations, and Prometheus metric export
Official ZelAI SDK - Multimodal AI for autonomous image/video generation, vision analysis, agentic LLM workflows, STT speech-to-text, and TTS text-to-speech
Apache Kafka integration for Mixpeek — consume events, produce enrichment results, and transform messages
Shopify integration for Mixpeek — webhook handling, product enrichment, and Admin API integration
AWS Lambda integration for Mixpeek — handler wrappers, event routing, and response formatting for serverless enrichment
Apache Airflow integration for Mixpeek — custom operators, DAG generators, and task builders
MCP server for Morphik multimodal database
Sentry integration for Mixpeek — error tracking, performance monitoring, and enrichment pipeline observability
Prometheus metrics exporter for Mixpeek — expose enrichment metrics, latency histograms, and custom collectors
Contentful integration for Mixpeek — webhook handling, content enrichment, and management API integration
Vue 3 plugin for Qwen Omni realtime multimodal conversation
JavaScript SDK for Stylor Embedding API - multimodal embeddings for text and images
Fork of Google's Gemini CLI by sidx1. CLI tool for accessing Gemini AI with enhancements by sidx1.
A library to easily integrate various LLM models and vendors into applications, with advanced features.
Integration layer combining Google's Gemini CLI with Claude Code for complementary AI collaboration
n8n community node for the Aurora AI multimodal API
A library for interacting with Google's Generative AI models in real-time
Celeste AI Framework - JavaScript/TypeScript SDK (Placeholder)
基于 DeepSeek Visual Primitives 论文的多模态空间锚定 MCP 服务器,将视觉模型的精确坐标推理能力封装为标准 MCP 工具
Anthropic SDK to multi-provider streaming bridge — converts Anthropic message format and tool calls to Gemini, OpenAI-compatible APIs
Official TypeScript SDK for the Daguito conversational AI platform — text, voice, image, and multimodal flows over webhooks and the embeddable widget.
对接 FastGPT Chat Completions API 的 Koishi 智能对话插件,支持多轮对话与多模态图片识别
Google Gemini integration — generateContent, multi-turn chat, multimodal vision, embeddings, token counting, and model listing. Uses the encrypted credential vault for API keys.