Package Exports
- mcp-query-assistant
- mcp-query-assistant/build/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (mcp-query-assistant) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
Query Assistant MCP Server
A Model Context Protocol (MCP) server that helps generate queries using semantic search over training examples. This server provides few-shot learning capabilities by finding similar questions and their corresponding queries to guide query generation for various query languages (Cypher, SPARQL, SQL, etc.).
Features
- Semantic Search: Uses OpenAI embeddings to find similar questions in your training dataset
- Few-Shot Learning: Returns relevant examples to help generate accurate queries
- Training Data Management: Add, list, and manage question-query pairs with duplicate detection
- Vector Storage: Efficient similarity search using HNSW (Hierarchical Navigable Small World) algorithm
- Metadata Support: Organize examples by domain, complexity, and tags
- Multi-Language Support: Works with various query languages (Cypher, SPARQL, SQL, etc.)
Installation
Prerequisites: Node.js 18+ and npm
Build the server:
cd /Users/alkhalili/Documents/Cline/MCP/mcp-query-assistant npm install npm run build
Configure OpenAI API Key:
- Get your API key from OpenAI Platform
- Update the MCP settings file at:
/Users/alkhalili/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
- Replace
your_openai_api_key_here
with your actual API key
Configure Data Directory (Optional):
- By default, data is stored in the
data/
folder relative to the project - To use a custom data directory, you can:
- Set the
DATA_DIR
environment variable, or - Use the
--data-dir
command line argument
- Set the
- By default, data is stored in the
Server Configuration: The server is already configured in your MCP settings as:
"query-assistant": { "command": "npx", "args": ["mcp-query-assistant"], "env": { "OPENAI_API_KEY": "your_openai_api_key_here", "DATA_DIR": "/optional/custom/data/path" }, "disabled": false, "autoApprove": [] }
Alternative with command line arguments:
"query-assistant": { "command": "npx", "args": ["mcp-query-assistant", "--openai-key", "your_openai_api_key_here", "--data-dir", "/custom/data/path"], "disabled": false, "autoApprove": [] }
Available Tools
1. find_similar_queries
Find similar query examples based on a natural language question.
Parameters:
question
(required): The natural language question to find similar examples forlimit
(optional): Maximum number of similar examples to return (default: 3, max: 10)threshold
(optional): Minimum similarity threshold 0-1 (default: 0.7)
Example Usage:
Use the find_similar_queries tool to find examples for: "Give me the list of CDEs in the lineage"
2. add_training_example
Add a new question-query pair to the training dataset.
Parameters:
question
(required): The natural language questionquery
(required): The corresponding query (Cypher, SPARQL, SQL, etc.)metadata
(optional): Additional metadata (domain, complexity, tags)
Example Usage:
Add a training example:
Question: "Find users who bought expensive products"
Query: "MATCH (u:User)-[:PURCHASED]->(p:Product) WHERE p.price > 1000 RETURN u"
Metadata: {"domain": "user_analytics", "complexity": "medium"}
3. list_training_examples
List all training examples in the dataset.
Parameters:
limit
(optional): Maximum number of examples to return (default: 10, max: 100)domain
(optional): Filter by domain
4. find_duplicates
Find duplicate training examples based on question and query.
Parameters: None
Example Usage:
Use find_duplicates to identify duplicate training examples in your dataset.
5. remove_duplicates
Remove duplicate training examples, keeping only the first occurrence of each unique question-query pair.
Parameters:
confirm
(optional): Set to true to confirm removal of duplicates (default: false)
Example Usage:
Use remove_duplicates with confirm=true to clean up duplicate examples.
Default Training Examples
The server comes with 1 default example covering data lineage patterns:
- Data Lineage: "Give me the list of CDEs in the lineage"
- Query:
MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
- Domain: Data Lineage
- Complexity: simple
- Query:
Usage Workflow
- Ask for similar queries: When you need to write a query, use
find_similar_queries
with your natural language question - Get few-shot examples: The server returns similar questions and their queries with similarity scores
- Generate your query: Use the examples as guidance to write your specific query
- Add new examples: Use
add_training_example
to expand your training dataset with new patterns - Manage duplicates: Use
find_duplicates
andremove_duplicates
to keep your dataset clean
Data Storage
- Training Data: Stored in
{DATA_DIR}/training_data.json
(default:data/training_data.json
) - Vector Index: Stored in
{DATA_DIR}/vector_index.bin
(default:data/vector_index.bin
) - Embeddings: Generated using OpenAI's
text-embedding-3-small
model (1536 dimensions) - Data Directory: Configurable via
DATA_DIR
environment variable or--data-dir
command line argument
Example Interaction
User: "Show me all data elements with their descriptions"
Agent: Let me find similar examples for you.
[Uses find_similar_queries tool]
Server Response:
Found 1 similar examples for: "Show me all data elements with their descriptions"
Example 1 (similarity: 0.823):
Question: Give me the list of CDEs in the lineage
Query: MATCH (cde:CDE) RETURN cde.name, cde.description, cde.layer, cde.fqn ORDER BY cde.name
Domain: Data Lineage
Complexity: simple
Agent: Based on this example, here's a query for showing data elements with descriptions:
MATCH (cde:CDE)
RETURN cde.name, cde.description
ORDER BY cde.name
Troubleshooting
- "OpenAI API key not configured": Make sure you've set your API key in the MCP settings
- "No similar examples found": Try lowering the similarity threshold or adding more training data
- Server not connecting: Check that the build path is correct and the server compiled successfully
Contributing
To add more training examples or improve the server:
- Use the
add_training_example
tool to add new question-query pairs - Organize examples with appropriate metadata (domain, complexity, tags)
- Test similarity search with various question phrasings
- Consider adding domain-specific examples for your use case
Technical Details
- Vector Database: HNSW (Hierarchical Navigable Small World) for efficient similarity search
- Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
- Similarity Metric: Cosine similarity
- Storage Format: JSON for training data, binary for vector index
- Max Capacity: 10,000 training examples (configurable)