Package Exports
- @podx/scraper
- @podx/scraper/analyzers
- @podx/scraper/auth
- @podx/scraper/crypto
- @podx/scraper/scrapers
- @podx/scraper/services
- @podx/scraper/signals
Readme
@podx/scraper
The scraper package provides comprehensive Twitter/X data collection, analysis, and signal generation capabilities for the PODx ecosystem. It includes advanced scraping algorithms, bot detection, sentiment analysis, cryptocurrency token extraction, and trading signal generation.
๐ฆ Installation
# Install from workspace
bun add @podx/scraper@workspace:*
# Or install from npm (when published)
bun add @podx/scraper
๐๏ธ Architecture
The scraper package is organized into several specialized modules:
packages/scraper/src/
โโโ scrapers/ # Core scraping functionality
โ โโโ baseScraper.ts # Base scraper with authentication
โ โโโ searchScraper.ts # Search-based tweet scraping
โ โโโ commentScraper.ts # Comment/reply scraping
โ โโโ index.ts # Scraper exports
โโโ services/ # Service layer
โ โโโ index.ts # Main ScraperService
โโโ auth/ # Authentication handling
โโโ analyzers/ # Data analysis modules
โ โโโ BotDetector.ts # Bot detection algorithms
โ โโโ SentimentAnalyzer.ts # Sentiment analysis
โ โโโ SignalGenerator.ts # Trading signal generation
โ โโโ TokenExtractor.ts # Cryptocurrency token extraction
โ โโโ index.ts # Analyzer exports
โโโ crypto/ # Cryptocurrency analysis
โโโ signals/ # Signal processing and generation
โโโ types/ # TypeScript type definitions
โโโ index.ts # Main exports
๐ Quick Start
import { ScraperService, SentimentAnalyzer, TokenExtractor } from '@podx/scraper';
// Initialize scraper service
const scraper = new ScraperService();
// Scrape tweets from a user
const tweets = await scraper.scrapeAccount({
targetUsername: 'cryptowhale',
maxTweets: 100,
progressCallback: (progress) => {
console.log(`Scraped ${progress.count}/${progress.max} tweets`);
}
});
// Analyze sentiment
const analyzer = new SentimentAnalyzer();
const sentiment = await analyzer.analyze(tweets);
// Extract cryptocurrency tokens
const extractor = new TokenExtractor();
const tokens = await extractor.extract(tweets);
// Save results
const result = await scraper.saveTweetsToFile(tweets, 'cryptowhale');
console.log(`Saved ${tweets.length} tweets to ${result.filename}`);
๐ Authentication
The scraper supports multiple authentication methods for Twitter/X API access:
Environment Variables Setup
# Required credentials
export XSERVE_USERNAME="your_twitter_username"
export XSERVE_PASSWORD="your_twitter_password"
export XSERVE_EMAIL="your_email@example.com" # Optional, for account recovery
Authentication Flow
import { ScraperService } from '@podx/scraper';
const scraper = new ScraperService();
// Authentication happens automatically on first API call
try {
const tweets = await scraper.scrapeAccount({
targetUsername: 'example',
maxTweets: 10
});
console.log('Authentication successful!');
} catch (error) {
if (error.code === 'AUTHENTICATION_FAILED') {
console.error('Please check your Twitter credentials');
}
}
๐ Core Scraping Features
Account Scraping
import { ScraperService } from '@podx/scraper';
const scraper = new ScraperService();
// Scrape tweets from a specific account
const tweets = await scraper.scrapeAccount({
targetUsername: 'VitalikButerin',
maxTweets: 500,
progressCallback: (progress) => {
const percent = Math.round((progress.count / progress.max) * 100);
console.log(`Progress: ${percent}% (${progress.count}/${progress.max})`);
}
});
// Process scraped tweets
tweets.forEach(tweet => {
console.log(`@${tweet.username}: ${tweet.text}`);
console.log(`Likes: ${tweet.likes}, Retweets: ${tweet.retweets}`);
});
Search-Based Scraping
import { SearchScraper } from '@podx/scraper/scrapers';
// Search for tweets with specific criteria
const searchScraper = new SearchScraper();
const tweets = await searchScraper.search({
query: 'bitcoin OR ethereum',
maxTweets: 1000,
filters: {
language: 'en',
dateRange: {
from: new Date('2024-01-01'),
to: new Date('2024-01-31')
},
minLikes: 10,
minRetweets: 5
}
});
Comment/Reply Scraping
import { CommentScraper } from '@podx/scraper/scrapers';
const commentScraper = new CommentScraper();
// Scrape replies to a specific tweet
const replies = await commentScraper.scrapeComments({
tweetId: '1234567890123456789',
maxReplies: 200,
includeNested: true // Include replies to replies
});
// Analyze conversation threads
const threads = commentScraper.buildConversationThreads(replies);
๐ง Advanced Analysis
Bot Detection
import { BotDetector } from '@podx/scraper/analyzers';
const detector = new BotDetector();
// Analyze tweets for bot-like behavior
const analysis = await detector.analyze(tweets);
analysis.results.forEach(result => {
console.log(`@${result.username}: ${result.botProbability}% bot probability`);
console.log(`Reasons: ${result.reasons.join(', ')}`);
});
// Filter out likely bots
const humanTweets = analysis.results
.filter(result => result.botProbability < 30)
.map(result => result.tweet);
Sentiment Analysis
import { SentimentAnalyzer } from '@podx/scraper/analyzers';
const sentimentAnalyzer = new SentimentAnalyzer();
// Analyze sentiment of tweets
const sentimentResults = await sentimentAnalyzer.analyze(tweets);
sentimentResults.forEach(result => {
console.log(`Tweet: ${result.text}`);
console.log(`Sentiment: ${result.sentiment} (${result.confidence}%)`);
console.log(`Emotions: ${result.emotions.join(', ')}`);
});
// Aggregate sentiment over time
const timeSeries = sentimentAnalyzer.createSentimentTimeSeries(sentimentResults);
Cryptocurrency Token Extraction
import { TokenExtractor } from '@podx/scraper/analyzers';
const tokenExtractor = new TokenExtractor();
// Extract cryptocurrency mentions and addresses
const tokenResults = await tokenExtractor.extract(tweets);
tokenResults.forEach(result => {
console.log(`Found ${result.tokens.length} tokens in tweet`);
result.tokens.forEach(token => {
console.log(`- ${token.symbol}: ${token.address} (${token.blockchain})`);
console.log(` Context: ${token.context}`);
console.log(` Confidence: ${token.confidence}%`);
});
});
// Get trending tokens
const trending = tokenExtractor.getTrendingTokens(tokenResults, {
timeframe: '24h',
minMentions: 5
});
๐ Signal Generation
Trading Signals
import { SignalGenerator } from '@podx/scraper/analyzers';
const signalGenerator = new SignalGenerator();
// Generate trading signals from tweet analysis
const signals = await signalGenerator.generateSignals({
tweets,
sentimentResults,
tokenResults,
marketData: {
btcPrice: 45000,
ethPrice: 2500
}
});
signals.forEach(signal => {
console.log(`Signal: ${signal.type} for ${signal.token}`);
console.log(`Strength: ${signal.strength}/10`);
console.log(`Reason: ${signal.reason}`);
console.log(`Confidence: ${signal.confidence}%`);
console.log(`Timeframe: ${signal.timeframe}`);
});
// Filter high-confidence signals
const strongSignals = signals.filter(s => s.confidence > 80 && s.strength >= 7);
Market Sentiment Signals
// Generate market sentiment signals
const marketSignals = await signalGenerator.generateMarketSignals({
tweets,
sentimentData: sentimentResults,
tokenData: tokenResults,
marketContext: {
overallSentiment: 'bullish',
fearGreedIndex: 75,
volume24h: 1250000000
}
});
marketSignals.forEach(signal => {
console.log(`Market Signal: ${signal.type}`);
console.log(`Direction: ${signal.direction}`);
console.log(`Strength: ${signal.strength}`);
console.log(`Timeframe: ${signal.timeframe}`);
console.log(`Rationale: ${signal.rationale}`);
});
๐ง Advanced Configuration
Custom Scraping Options
import { ScraperService } from '@podx/scraper';
// Advanced scraping with custom options
const scraper = new ScraperService();
const tweets = await scraper.scrapeAccount({
targetUsername: 'crypto_influencer',
maxTweets: 1000,
filters: {
minLikes: 10,
minRetweets: 5,
dateRange: {
from: new Date('2024-01-01'),
to: new Date('2024-01-31')
},
language: 'en',
excludeReplies: false,
excludeRetweets: true
},
rateLimit: {
requestsPerMinute: 30,
delayBetweenRequests: 2000
},
retryPolicy: {
maxRetries: 3,
backoffMultiplier: 2,
initialDelay: 1000
}
});
Custom Analysis Pipeline
import {
SentimentAnalyzer,
TokenExtractor,
BotDetector,
SignalGenerator
} from '@podx/scraper/analyzers';
// Create custom analysis pipeline
class CryptoAnalysisPipeline {
constructor(
private sentimentAnalyzer = new SentimentAnalyzer(),
private tokenExtractor = new TokenExtractor(),
private botDetector = new BotDetector(),
private signalGenerator = new SignalGenerator()
) {}
async analyze(tweets: Tweet[]): Promise<AnalysisResult> {
// Step 1: Filter out bots
const botAnalysis = await this.botDetector.analyze(tweets);
const humanTweets = botAnalysis.results
.filter(r => r.botProbability < 50)
.map(r => r.tweet);
// Step 2: Analyze sentiment
const sentiment = await this.sentimentAnalyzer.analyze(humanTweets);
// Step 3: Extract tokens
const tokens = await this.tokenExtractor.extract(humanTweets);
// Step 4: Generate signals
const signals = await this.signalGenerator.generateSignals({
tweets: humanTweets,
sentimentResults: sentiment,
tokenResults: tokens
});
return {
originalTweetCount: tweets.length,
humanTweets: humanTweets.length,
sentiment,
tokens,
signals,
analysisTimestamp: new Date()
};
}
}
// Use the pipeline
const pipeline = new CryptoAnalysisPipeline();
const result = await pipeline.analyze(tweets);
๐พ Data Storage and Export
File Storage
import { ScraperService } from '@podx/scraper';
const scraper = new ScraperService();
// Scrape and save to file automatically
const result = await scraper.scrapeAndSave({
targetUsername: 'cryptopunk',
maxTweets: 200
});
console.log(`Saved ${result.tweets.length} tweets to ${result.filename}`);
// Custom file naming and organization
const customResult = await scraper.saveTweetsToFile(
tweets,
'custom_username',
{
format: 'json',
compress: true,
includeMetadata: true,
splitByDate: true
}
);
Database Integration
import { DatabaseService } from '@podx/core';
import { ScraperService } from '@podx/scraper';
const db = new DatabaseService(config.database);
const scraper = new ScraperService();
// Scrape and store in database
const tweets = await scraper.scrapeAccount({
targetUsername: 'defi_pulse',
maxTweets: 100
});
// Store with analysis
for (const tweet of tweets) {
const analysis = await analyzer.analyze([tweet]);
const tokens = await tokenExtractor.extract([tweet]);
await db.save('analyzed_tweets', {
...tweet,
sentiment: analysis[0]?.sentiment,
tokens: tokens[0]?.tokens || [],
analyzedAt: new Date()
});
}
๐ Analytics and Reporting
Generate Reports
import { AnalyticsEngine } from '@podx/scraper/analytics';
const analytics = new AnalyticsEngine();
// Generate comprehensive report
const report = await analytics.generateReport({
tweets,
sentimentResults,
tokenResults,
signals,
timeframe: {
from: new Date('2024-01-01'),
to: new Date('2024-01-31')
}
});
// Export report
await analytics.exportReport(report, {
format: 'pdf',
includeCharts: true,
includeRawData: false
});
// Get insights
const insights = analytics.extractInsights(report);
console.log('Key Insights:');
insights.forEach(insight => {
console.log(`- ${insight.category}: ${insight.description}`);
});
Real-time Monitoring
import { RealtimeMonitor } from '@podx/scraper/monitoring';
const monitor = new RealtimeMonitor({
targetUsernames: ['cryptowhale', 'defi_pulse'],
keywords: ['bitcoin', 'ethereum', 'defi'],
updateInterval: 30000 // 30 seconds
});
// Monitor with callbacks
monitor.onTweet((tweet) => {
console.log(`New tweet from @${tweet.username}: ${tweet.text}`);
});
monitor.onSignal((signal) => {
console.log(`New signal: ${signal.type} for ${signal.token}`);
// Send notification, update dashboard, etc.
});
// Start monitoring
await monitor.start();
๐ง API Reference
ScraperService
scrapeAccount(options: ScrapingOptions): Promise<Tweet[]>
Scrapes tweets from a specific Twitter account.
Parameters:
targetUsername: string
- Twitter username to scrapemaxTweets: number
- Maximum number of tweets to scrapeprogressCallback?: (progress: ScrapingProgress) => void
- Progress callback
scrapeAndSave(options: ScrapingOptions): Promise<{ tweets: Tweet[]; filename: string }>
Scrapes tweets and saves them to file.
saveTweetsToFile(tweets: Tweet[], username: string): Promise<string>
Saves tweets to a JSON file.
Analyzers
SentimentAnalyzer.analyze(tweets: Tweet[]): Promise<SentimentResult[]>
Analyzes sentiment of tweets.
TokenExtractor.extract(tweets: Tweet[]): Promise<TokenResult[]>
Extracts cryptocurrency tokens from tweets.
BotDetector.analyze(tweets: Tweet[]): Promise<BotAnalysis>
Detects bot-like behavior in tweets.
SignalGenerator.generateSignals(params: SignalParams): Promise<Signal[]>
Generates trading signals from tweet analysis.
๐ Data Types
Tweet
interface Tweet {
id: string;
username: string;
text: string;
createdAt: Date;
likes: number;
retweets: number;
replies: number;
isReply: boolean;
isRetweet: boolean;
media?: MediaData[];
hashtags: string[];
mentions: string[];
urls: string[];
}
SentimentResult
interface SentimentResult {
tweet: Tweet;
sentiment: 'positive' | 'negative' | 'neutral';
confidence: number;
emotions: string[];
score: number;
}
TokenResult
interface TokenResult {
tweet: Tweet;
tokens: TokenMention[];
}
interface TokenMention {
symbol: string;
address?: string;
blockchain: string;
context: string;
confidence: number;
}
Signal
interface Signal {
id: string;
type: 'buy' | 'sell' | 'hold' | 'alert';
token: string;
strength: number; // 1-10
confidence: number; // 0-100
reason: string;
timeframe: string;
timestamp: Date;
supportingTweets: Tweet[];
}
๐งช Testing
import { describe, test, expect, mock } from 'bun:test';
import { ScraperService } from '@podx/scraper';
describe('ScraperService', () => {
test('should scrape tweets from account', async () => {
const scraper = new ScraperService();
// Mock the scraper
mock.module('agent-twitter-client', () => ({
Scraper: class {
async login() {}
async getTweets() {
return [
{
id: '1',
username: 'testuser',
text: 'Hello world!',
createdAt: new Date(),
likes: 10,
retweets: 5,
replies: 2
}
];
}
}
}));
const tweets = await scraper.scrapeAccount({
targetUsername: 'testuser',
maxTweets: 1
});
expect(tweets).toHaveLength(1);
expect(tweets[0].username).toBe('testuser');
});
test('should handle authentication errors', async () => {
const scraper = new ScraperService();
// Mock authentication failure
mock.module('agent-twitter-client', () => ({
Scraper: class {
async login() {
throw new Error('Invalid credentials');
}
}
}));
expect(async () => {
await scraper.scrapeAccount({
targetUsername: 'testuser',
maxTweets: 1
});
}).toThrow('Invalid credentials');
});
});
โก Performance Optimization
Rate Limiting
// Configure rate limiting to avoid Twitter API limits
const scraper = new ScraperService({
rateLimit: {
requestsPerMinute: 30,
delayBetweenRequests: 2000
}
});
Caching
// Cache analysis results to improve performance
const cache = new AnalysisCache();
const analyzer = new SentimentAnalyzer({
cache: cache
});
// Results are cached automatically
const result1 = await analyzer.analyze(tweets);
const result2 = await analyzer.analyze(tweets); // Uses cache
Parallel Processing
// Process multiple accounts in parallel
const usernames = ['user1', 'user2', 'user3'];
const results = await Promise.allSettled(
usernames.map(username =>
scraper.scrapeAccount({ targetUsername: username, maxTweets: 100 })
)
);
๐ Security Considerations
- Credential Protection: Never store credentials in code
- Rate Limiting: Respect Twitter's API limits
- Data Privacy: Handle user data responsibly
- Error Handling: Don't expose sensitive information in errors
- Logging: Be careful with sensitive data in logs
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
๐ License
This package is licensed under the ISC License. See the LICENSE file for details.
๐ Related Packages
- @podx/core - Core utilities and types
- @podx/api - REST API server
- @podx/cli - Command-line interface
- podx - Main CLI application
๐ Support
For support and questions:
- ๐ง Email: support@podx.dev
- ๐ฌ Discord: PODx Community
- ๐ Documentation: docs.podx.dev/scraper
- ๐ Issues: GitHub Issues