JSPM

website-to-markdown-mcp

1.2.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 6
  • Score
    100M100P100Q34720F
  • License MIT

Advanced MCP server for fetching websites and converting to markdown with AI-powered features and stealth capabilities

Package Exports

  • website-to-markdown-mcp
  • website-to-markdown-mcp/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (website-to-markdown-mcp) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

🌐 Website to Markdown MCP Server

A powerful Model Context Protocol (MCP) server designed for fetching website content and converting it to Markdown format, making it easier for AI to understand and process website information.

✨ Key Features

🌟 Enhanced Processing πŸ“Š OpenAPI Support βš™οΈ Smart Analysis 🎯 Advanced Extraction
AI-powered content cleanup OpenAPI 3.x/Swagger 2.0 Reading time calculation Main content detection
Auto ad removal Professional validation Word count statistics Language detection
Content summarization Structured API parsing Smart retry mechanism Multi-format support

πŸ†• What's New in v1.2.0

πŸš€ Major Enhancements

Feature Status Description
🧠 Enhanced Content Processor βœ… AI-powered content cleaning and extraction
πŸ“Š Smart Analytics βœ… Word count, reading time, content summary
🌍 Language Detection βœ… Automatic language identification
🎯 Intelligent Retry βœ… Smart retry mechanism with exponential backoff
πŸ” Stealth Browser βœ… Anti-detection browsing capabilities
⚑ Rate Limiting βœ… Built-in rate limiting and concurrency control
🧹 Content Cleanup βœ… Remove ads, navigation, and irrelevant content
πŸ“ Enhanced Markdown βœ… Support for strikethrough, underline, highlights

πŸš€ Quick Start

πŸ’‘ Easiest way: No local installation needed!

Step 1: Create Configuration File πŸ“„

Create a my-websites.json file:

{
  "websites": [
    {
      "name": "your_website",
      "url": "https://your-website.com",
      "description": "Your Project Website"
    },
    {
      "name": "api_docs",
      "url": "https://api.example.com/openapi.json",
      "description": "Your API Specification"
    }
  ]
}

Step 2: Configure MCP Server βš™οΈ

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "npx",
      "args": ["-y", "website-to-markdown-mcp"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

Step 3: Restart and Test πŸ”„

  1. Restart Cursor
  2. Open Chat and use Agent mode
  3. Test command: Please list all configured websites

πŸŽ‰ Done! No installation required!


🎯 Method 2: Local Installation

πŸ’‘ Best Practice: Use this method for development or customization!

Step 1: Clone and Build

git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp
npm install
npm run build

Step 2: Configure MCP Server

Add to .cursor/mcp.json:

{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-websites.json"
      }
    }
  }
}

πŸ”₯ Enhanced Output Features

πŸ“Š Rich Content Analysis

Every fetched content now includes:

  • πŸ“ Content Summary: AI-generated summary of the main content
  • ⏱️ Reading Time: Estimated reading time based on content length
  • πŸ”’ Word Count: Accurate word count for both English and Chinese
  • 🌍 Language Detection: Automatic language identification
  • 🎯 Content Quality Score: Assessment of content relevance

πŸ“‹ Enhanced Markdown Output

# πŸš€ Example Website

**Source**: https://example.com
**Website**: example_site - Example Website
**πŸ“Š Reading Time**: 5 minutes
**πŸ”’ Word Count**: 1,250 words
**🌍 Language**: English
**πŸ“ Summary**: This article discusses the latest developments in web technology...

---

[Enhanced Markdown content with better formatting...]

πŸ†• Complete OpenAPI/Swagger Support

πŸ”₯ Professional API Documentation

Feature OpenAPI 3.x Swagger 2.0 Description
πŸ” Auto Detection βœ… βœ… Support JSON/YAML formats
βœ… Professional Validation βœ… βœ… Using @readme/openapi-parser
πŸ“‹ Structured Parsing βœ… βœ… Endpoints, parameters, responses
πŸ”— Reference Resolution βœ… βœ… Auto handle $ref references
πŸ“Š Smart Summary βœ… βœ… Generate API overview
πŸ“ Formatted Output βœ… βœ… Readable Markdown

🌟 Pre-configured Example Websites

{
  "websites": [
    {
      "name": "petstore_openapi",
      "url": "https://petstore3.swagger.io/api/v3/openapi.json",
      "description": "πŸ• Swagger Petstore OpenAPI 3.0 Spec (Demo)"
    },
    {
      "name": "petstore_swagger",
      "url": "https://petstore.swagger.io/v2/swagger.json",
      "description": "🐱 Swagger Petstore Swagger 2.0 Spec (Demo)"
    },
    {
      "name": "github_api",
      "url": "https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json",
      "description": "πŸ™ GitHub REST API OpenAPI Spec"
    }
  ]
}

πŸ“¦ Installation & Setup

πŸ› οΈ System Requirements

  • Node.js 20.18.1+ (Recommended: v22.15.0 LTS)
  • npm 10.0.0+ or yarn
  • Cursor Editor

⚠️ Important: Some dependencies require Node.js v20.18.1 or higher. Please update your Node.js version if you encounter engine compatibility warnings.

⚑ NPM Package Installation

# Global installation
npm install -g website-to-markdown-mcp

# Or use directly with npx (recommended)
npx website-to-markdown-mcp

πŸ”§ Development Setup

# 1. Clone repository
git clone https://github.com/your-username/website-to-markdown-mcp.git
cd website-to-markdown-mcp

# 2. Install dependencies
npm install

# 3. Build project
npm run build

πŸŽ›οΈ Advanced Configuration Options

Configuration Priority Order

graph TD
    A[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG_PATH] --> B{File exists?}
    B -->|Yes| C[βœ… Load External Config File]
    B -->|No| D[πŸ” Check Environment Variable<br/>WEBSITES_CONFIG]
    D --> E{Valid JSON?}
    E -->|Yes| F[βœ… Load Embedded Config]
    E -->|No| G[πŸ” Check config.json]
    G --> H{File exists?}
    H -->|Yes| I[βœ… Load Local Config]
    H -->|No| J[πŸ”§ Use Default Config]

🎨 Configuration Method Details

πŸ’‘ Advantages: Easy to edit, syntax highlighting, version control friendly

πŸ”§ Detailed Setup Steps
  1. Create Configuration File

    # Can be placed anywhere
    touch my-api-configs.json
  2. Edit Configuration Content

    {
      "websites": [
        {
          "name": "my_docs",
          "url": "https://docs.example.com",
          "description": "πŸ“š My Documentation Website"
        }
      ]
    }
  3. Set Environment Variable

    {
      "env": {
        "WEBSITES_CONFIG_PATH": "./my-api-configs.json"
      }
    }

πŸ“‹ Method 2: Embedded JSON (Backward Compatible)

πŸ”§ Configuration Example
{
  "mcpServers": {
    "website-to-markdown": {
      "command": "cmd",
      "args": ["/c", "node", "./website-to-markdown-mcp/dist/index.js"],
      "disabled": false,
      "env": {
        "WEBSITES_CONFIG": "{\"websites\":[{\"name\":\"example\",\"url\":\"https://example.com\",\"description\":\"Example Website\"}]}"
      }
    }
  }
}

πŸ“‹ Method 3: Local config.json

πŸ”§ Local Configuration

Directly edit config.json in the project root directory:

{
  "websites": [
    {
      "name": "local_site",
      "url": "https://local.example.com",
      "description": "🏠 Local Test Website"
    }
  ]
}

πŸ”§ Available Tools

🌐 General Tools

Tool Name Function Parameters Example
fetch_website Fetch any website url: Website URL Fetch OpenAPI spec files
list_configured_websites List configured websites None View all available websites

🎯 Dedicated Tools

Each configured website automatically generates corresponding dedicated tools:

  • fetch_petstore_openapi - Fetch Petstore OpenAPI 3.0 spec
  • fetch_petstore_swagger - Fetch Petstore Swagger 2.0 spec
  • fetch_github_api - Fetch GitHub API spec
  • fetch_tailwind_css - Fetch Tailwind CSS documentation

πŸ“Š Enhanced Output Format Examples

🌐 General Website Content with Analytics

# Website Title

**Source**: https://example.com
**Website**: example_site - Example Website
**πŸ“Š Reading Time**: 3 minutes
**πŸ”’ Word Count**: 650 words
**🌍 Language**: English
**πŸ“ Summary**: This article provides a comprehensive overview of modern web development practices, covering frontend frameworks, backend technologies, and deployment strategies.

---

[Enhanced cleaned Markdown content with ads removed and main content extracted...]

πŸ“‹ OpenAPI 3.x Specification File

# πŸš€ Example API (v2.1.0)

**Source**: https://api.example.com/openapi.json
**OpenAPI Version**: 3.0.3
**Validation Status**: βœ… Valid
**πŸ“Š Processing Time**: 1.2 seconds
**πŸ”’ Endpoints**: 25 endpoints
**🌍 Server Locations**: 3 servers

---

## πŸ“‹ API Basic Information

- **API Name**: Example API
- **Version**: 2.1.0
- **OpenAPI Version**: 3.0.3
- **Description**: A powerful example API for modern applications

## 🌐 Servers

1. **https://api.example.com**
   - 🏒 Production server
2. **https://staging-api.example.com**
   - πŸ§ͺ Testing server

## πŸ› οΈ API Endpoints

Total of **25** endpoints:

### πŸ‘₯ `/users`
- **GET**: Get user list
- **POST**: Create new user

### πŸ” `/users/{id}`
- **GET**: Get specific user
- **PUT**: Update user information
- **DELETE**: Delete user

## 🧩 Components

- **Schemas**: 12 data models
- **Parameters**: 8 reusable parameters  
- **Responses**: 15 reusable responses
- **Security Schemes**: 3 security mechanisms

🎯 Usage Examples

πŸ’» Basic Usage

Please fetch the content from https://docs.example.com and convert to markdown

πŸ” OpenAPI Specification Fetching

Please use the fetch_petstore_openapi tool to fetch Petstore OpenAPI specification

πŸ“š Documentation Website Fetching

Please fetch React official documentation content

🚨 Troubleshooting

πŸ“‹ Complete Troubleshooting Guide: See TROUBLESHOOTING.md for detailed solutions to common issues.

❓ Quick Solutions

πŸ”§ Node.js Version Issues

Error: npm WARN EBADENGINE Unsupported engine

🌐 Module Not Found Issues

Error: Cannot find module './db.json'

  • Solution 1: Clear npm cache: npm cache clean --force
  • Solution 2: Update Node.js version
  • Solution 3: Use local installation instead of npx
βš™οΈ Configuration Issues

Q: Configuration changes not taking effect?

  • βœ… Confirm JSON format is correct
  • βœ… Restart Cursor
  • βœ… Check environment variable names

Q: JSON format errors?

  • πŸ› οΈ Use JSON Validator
  • πŸ› οΈ Confirm using double quotes
  • πŸ› οΈ Check for extra commas

πŸ” Debug Mode

Detailed logs are output to stderr at startup:

# View debug messages
npm run dev 2> debug.log

πŸ“ˆ Performance & Optimization

⚑ Performance Features

  • πŸš€ Smart Retry: Intelligent retry with exponential backoff
  • πŸ’Ύ Rate Limiting: Built-in rate limiting to prevent overload
  • 🎯 Content Filtering: Remove irrelevant content for faster processing
  • 🧹 Ad Removal: Automatic ad and popup removal
  • πŸ“Š Stealth Mode: Anti-detection browsing capabilities

πŸ›‘οΈ Security Considerations

  • πŸ”’ HTTPS websites only (recommended)
  • πŸ› οΈ Auto filter malicious scripts
  • πŸ“ Limit output content length
  • πŸ” Stealth browsing to avoid detection

πŸ“¦ Dependencies

Package Version Purpose
@modelcontextprotocol/sdk ^1.0.0 MCP Core Framework
@readme/openapi-parser ^4.1.0 Professional OpenAPI Parsing
axios ^1.6.0 HTTP Request Handling
cheerio ^1.0.0 HTML Parsing Engine
turndown ^7.1.2 HTML to Markdown
yaml ^2.8.0 YAML Format Support
zod ^3.22.0 Data Validation Framework
playwright ^1.40.0 Browser automation

πŸ“ Changelog

πŸŽ‰ v1.2.0 (Latest)

πŸš€ Major Feature Updates

  • ✨ Added Enhanced content processing with AI-powered cleanup
  • ✨ Added Smart analytics: word count, reading time, content summary
  • ✨ Added Language detection and multi-language support
  • ✨ Added Stealth browser capabilities for anti-detection
  • ✨ Added Built-in rate limiting and retry mechanisms
  • ✨ Added Advanced content filtering and ad removal
  • πŸ”§ Enhanced Markdown processing with more HTML element support
  • πŸ“Š Improved Output format with rich metadata
  • 🎯 Fixed Various technical issues and dependencies

🎯 v1.1.0 (Previous)

πŸš€ Major Feature Updates

  • ✨ Added Full OpenAPI 3.x/Swagger 2.0 support
  • ✨ Added JSON/YAML format auto-detection
  • ✨ Added Professional-grade spec validation and reference resolution
  • ✨ Added Version auto-adaptation mechanism
  • ✨ Added Structured API documentation summary
  • πŸ”§ Pre-configured Multiple OpenAPI/Swagger examples
  • πŸ“¦ Added NPM package distribution with npx support
  • 🎯 Enhanced Installation methods for better user experience

🎯 v1.0.0 (Stable)

  • πŸŽ‰ Initial Release
  • 🌐 Basic Functions Website content fetching
  • πŸ“ Core Functions Markdown conversion
  • βš™οΈ Configuration Support Multi-website management

🀝 Contributing

πŸ’‘ How to Contribute

  1. 🍴 Fork this project
  2. 🌟 Create feature branch (git checkout -b feature/AmazingFeature)
  3. πŸ“ Commit changes (git commit -m 'Add some AmazingFeature')
  4. πŸ“€ Push to branch (git push origin feature/AmazingFeature)
  5. πŸ”„ Open Pull Request

πŸ› Issue Reporting

Report issues on the Issues page, please include:

  • πŸ” Issue Description
  • πŸ”„ Reproduction Steps
  • πŸ’» Environment Information
  • πŸ“Έ Screenshots or Logs

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


🌟 If this project helps you, please give it a Star!

πŸ’¬ Have questions or suggestions? Feel free to open an Issue!


Made by Sun ❀️ for the Developer Community