Found 98 results for content-extraction

firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, search, batch processing, structured data extraction, and LLM-powered content analysis.

defuddle

Extract article content and metadata from web pages.

fetcher-mcp

MCP server for fetching web content using Playwright browser

hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

valyu-js

Search for AIs - DeepSearch and Content API.

@bonginkan/maria

🚀 MARIA v4.4.8 - Enterprise AI Development Platform with identity system and character voice implementation. Features 74 production-ready commands with comprehensive fallback implementation, local LLM support, and zero external dependencies. Includes nat

playread

Web content extraction and automation via Playwright MCP

@michaelvanlaar/n8n-nodes-defuddle

n8n node to extract main content from webpages using Defuddle library

@just-every/mcp-read-website-fast

Markdown Content Preprocessor - Fetch web pages, extract content, convert to clean Markdown

docusaurus-plugin-copy-page-button

Docusaurus plugin that adds a copy page button to extract documentation content as markdown for AI tools like ChatGPT and Claude

@just-every/crawl

Fast, token-efficient web content extraction - fetch web pages and convert to clean Markdown

crawl-mcp-server

微信公众号文章抓取 MCP 服务器 - 支持自动图片下载、内容清理、智能抓取，可生成完整的本地化Markdown文档

agentql-mcp

Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

n8n-nodes-olyptik

n8n community node for Olyptik web crawling and content extraction API

h2m-parser

LLM-ready HTML to Markdown pipeline with Readability, htmlparser2, and post-processing utilities.

reelflow

Elegant and powerful Instagram video downloader for seamless content extraction

js-harvester

Harvester is a lightweight and highly optimized javascript library for extracting data from the DOM tree. It supports extraction of tag texts with specified types and attributes. it's tiny and has no dependencies and also works with Puppeteer

mcp-web-content-pick

A tool for extracting structured content from web pages with customizable selectors and crawling options

@clado-ai/mcp-router

Clado Model Context Protocol Server

mcp-jinaai-reader

MCP server for JinaAI reader

@pinkpixel/web-scout-mcp

MCP server for web search and content extraction with multiple URL support and memory optimizations

newpipe-extractor-js

JavaScript/Node.js port of NewPipeExtractor

@langgraph-js/crawler

A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

mcp-jinaai-search

MCP server for JinaAI search

mcp-jinaai-grounding

MCP server for JinaAI grounding

linkd-mcp

Linkd Model Context Protocol Server

doc-to-readable

Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.

fieldcraft-document-reader

An efficient React Native file reader library designed for comprehensive document handling with support for multiple file types and advanced content extraction capabilities

webscraping-ai-mcp

Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

@jtsang/fetcher-mcp

MCP server for fetching web content using Playwright browser

n8n-nodes-smart-web-scraper

Smart web scraper node for n8n with automatic failover and content extraction

@ipfsnut/evermark-metadata-kit

Content extraction and metadata processing SDK for Evermark Protocol

@fastmcp-me/mcp-jinaai-search

MCP server for JinaAI search

iflow-mcp-mcp-read-website-fast

Markdown Content Preprocessor - Fetch web pages, extract content, convert to clean Markdown

@tyronerossjr/blog-scraper

Powerful web scraping SDK for extracting blog articles and content. No LLM required.

@fastmcp-me/linkd-mcp

Linkd Model Context Protocol Server

@febbyrg/pdf-decomposer

A powerful PDF text and image extraction library with universal browser and Node.js support (Dual Licensed: Free for non-commercial, Paid for commercial use)

@tyroneross/blog-scraper

Powerful web scraping SDK for extracting blog articles and content. No LLM required.

tyroneross-blog-scraper

Powerful web scraping SDK for extracting blog articles and content. No LLM required.

ethos-crawler

Web crawler and API for aggregating and serving digital rights organizations' publications.

@supadata/mcp

MCP server for Supadata video & web scraping integration. Features include YouTube, TikTok, Instagram, Twitter, and file video transcription, web scraping, batch processing and structured data extraction.

@iflow-mcp/firecrawl-mcp

MCP server for Firecrawl web scraping integration. Supports both cloud and self-hosted instances. Features include web scraping, batch processing, structured data extraction, and LLM-powered content analysis.

askbudi-context

Provide up-to-date context about any library, built by askbudi.ai

@monostate/browsernative-client

Browser Native client SDK for web scraping and content extraction API

@clado-ai/mcp

Clado Model Context Protocol Server

url-to-json-markdown

A TypeScript library that fetches URLs and converts them to structured JSON and Markdown format.

web-fetch-mcp

MCP server for web content fetching, summarizing, comparing, and extracting information

markdown-crawler

A powerful web crawler that extracts content from web pages and converts them to clean Markdown format, with support for code blocks and GitHub Flavored Markdown

@cyanheads/jinaai-mcp-server

A Model Context Protocol (MCP) server that provides intelligent web reading capabilities using the Jina AI Reader API. It extracts clean, LLM-ready content from any URL.

threads-harvester

A TypeScript library for extracting threaded content from discussion platforms like Reddit, Twitter, and Hacker News

@stellarwp/archivist

A Bun-based tool for archiving web content as LLM context using Pure.md API

@crypblizz/docusaurus-plugin-copy-page-button

Docusaurus plugin that adds a copy page button to extract documentation content as markdown for AI tools like ChatGPT and Claude

url-to-markdown-cli-tool

CLI tool for converting web pages to clean, LLM-friendly markdown. Fetches content from URLs and converts HTML to optimized markdown format perfect for LLM training, RAG systems, and AI applications.

@cladoai/mcp

Clado Model Context Protocol Server

crawl-to-markdown

Crawl-to-markdown is a powerful TypeScript package designed to search search engines for a given keyword, crawl the resulting websites, and deliver the content in clean, readable Markdown format. Additionally, it can directly crawl specified websites for

llm-gen

A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.

graby-ts

TypeScript version of Graby content extraction library

scoopit

A tool that generates content files from website routes in multiple formats (text, JSON, markdown)

@mseep/firecrawl-mcp

@eric2788/mcp-jinaai-reader

MCP server for JinaAI reader

a1hul-mcp

MCP server for extracting content from web pages

content-web-extractor

MCP server for extracting content from web pages

html-content-processor

A professional library for processing, cleaning, filtering, and converting HTML content to Markdown. Features advanced customization options, presets, plugin support, fluent API, and TypeScript integration for reliable content extraction.

web-scraper-pro

Professional web scraper with Puppeteer & Mozilla Readability. Extract clean content from any website with full TypeScript support.

@axync/extract-html-main-content

![Test](https://github.com/AnxinYang/axync/actions/workflows/test.yml/badge.svg)

@langgraph-js/crawler-mcp

A powerful web crawler designed specifically for LLM applications, capable of extracting clean, readable content from various web pages and converting it to Markdown format.

@iflow-mcp/agentql-mcp

Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

ohmyreader

A powerful web content extractor that converts articles to clean markdown

cleanweb-mcp

A lightweight MCP server for extracting clean web content with intelligent content filtering and Markdown conversion

alou-fetch-mcp

MCP服务器用于抓取网页内容，支持HTML、Markdown、纯文本和JSON格式，特别优化了微信公众号文章和学术论文的抓取

@mseep/hyperbrowser-mcp

Hyperbrowser Model Context Protocol Server

@leonardocerv/web-browsing-mcp

MCP server for web browsing and content extraction

next-llms-generator

Generate LLM-friendly text files from Next.js applications by crawling sitemaps and extracting content

graby-ts-site-config

Site configuration loader for Graby-TS with dynamic imports

youtube-scrap-mcp

MCP server for extracting YouTube video content with transcript processing.

web-scraper-mcp

MCP server for scraping images and text from websites with comprehensive web content extraction capabilities

@mcpflow.io/mcp-mcp-svelte-docs

🔍 MCP 服务器，可让您使用内置缓存搜索和访问 Svelte 文档。

medium-mcp-server

LLM-optimized MCP server for fetching and processing Medium articles

@udx/mcurl

curl but in markdown - fetches content from URLs and converts to markdown

simple-reader-mode

A lightweight alternative to Mozilla's Readability library for extracting readable content from web pages

web-search-extract-mcp

A Model Context Protocol server for web search with content extraction

@mseep/fetcher-mcp

MCP server for fetching web content using Playwright browser

@mseep/agentql-mcp

Model Context Protocol (MCP) server that integrates AgentQL data extraction capabilities.

iftp-client

IFTP Service JavaScript client library for browser integration

defuddler

A command-line interface for extracting main content from web pages and articles

node-merle

A utility for cataloguing the metadata for a URL

vacuumjs

A low-level node.js web page content extractor based on `parse5`.

web-content-extract-mcp

MCP server for extracting web content using web-content-extract library

@mseep/mcp-jinaai-search

MCP server for JinaAI search

confluence-developer-mcp

MCP Server for fetching Confluence page content with authentication

@mseep/mcp-jinaai-reader

MCP server for JinaAI reader

@philwrenn/fetcher-mcp

MCP server for fetching web content using Playwright browser

@iflow-mcp/webscraping-ai-mcp-server

Model Context Protocol server for WebScraping.AI API. Provides LLM-powered web scraping tools with Chromium JavaScript rendering, rotating proxies, and HTML parsing.

@iflow-mcp/fetcher-mcp

MCP server for fetching web content using Playwright browser

@iflow-mcp/hyperbrowser

Hyperbrowser Model Context Protocol Server

search-agent

Oblien Search SDK - AI-powered web search, content extraction, and website crawling. Full documentation at https://oblien.com/docs/search-api

@mcpflow.io/mcp-svelte-docs

MCP server for Svelte docs

@seokaka/mcp-search

Enhanced MCP Server for intelligent search with real-time data extraction, AI integration, and Vietnamese financial content support