Package Exports
- @twick/cloud-transcript
- @twick/cloud-transcript/aws
- @twick/cloud-transcript/platform/aws/Dockerfile
- @twick/cloud-transcript/platform/aws/handler.js
Readme
@twick/cloud-transcript
Transcribe audio/video to JSON captions using Google GenAI (Vertex AI) with Gemini models.
Extract text from audio content with precise millisecond timestamps. Perfect for generating caption data from audio files or video URLs.
What Problem Does This Solve?
- AI-powered transcription — Use Google's Gemini models for accurate audio-to-text conversion
- Precise timestamps — Get millisecond-level timing for each caption segment
- Serverless processing — Deploy as AWS Lambda for automatic scaling
- Multiple languages — Support various languages and fonts
Input → Output
Input: Audio URL + optional configuration
{
"audioUrl": "https://example.com/audio.mp3",
"language": "english",
"languageFont": "english"
}Output: JSON captions with timestamps
{
"captions": [
{
"t": "Example phrase 1",
"s": 0,
"e": 1500
},
{
"t": "Another short example",
"s": 1500,
"e": 2800
}
],
"rawText": "Full raw response text from the model..."
}Where it runs: AWS Lambda container image (Linux/AMD64)
Installation
npm install -D @twick/cloud-transcriptQuick Start
1. Scaffold AWS Lambda Template
npx twick-transcript init2. Build Docker Image
npx twick-transcript build twick-transcript:latest3. Configure Google Cloud
Required:
- Google Cloud project with Vertex AI API enabled
- Service account with Vertex AI permissions
Environment variables:
GOOGLE_CLOUD_PROJECT(required) — Your GCP project IDGOOGLE_CLOUD_LOCATION(optional) — Vertex AI location (default:"global")GOOGLE_VERTEX_MODEL(optional) — Model name (default:"gemini-2.5-flash")
Credentials (choose one):
- File path (recommended):
- Mount service account JSON and set
GOOGLE_APPLICATION_CREDENTIALSto the file path
- Mount service account JSON and set
- Environment JSON (alternative):
- Set
GOOGLE_KEYto the service account JSON string
- Set
4. Deploy to AWS Lambda
# Login to ECR
npx twick-transcript ecr-login us-east-1 YOUR_ACCOUNT_ID
# Push to ECR
npx twick-transcript push twick-transcript:latest us-east-1 YOUR_ACCOUNT_IDDeployment (High Level)
- Scaffold the Lambda container template
- Configure Google Cloud credentials (file mount or environment variable)
- Set environment variables (GCP project, location, model)
- Build and push Docker image to ECR
- Create Lambda function using the ECR image
The Lambda handler expects:
- Event format:
{ audioUrl, language?, languageFont? } - Response: JSON with
captionsarray andrawTextstring
Note: The audio URL must be publicly accessible via HTTP(S). Google Cloud Storage URIs (gs://) are not directly supported—use signed URLs instead.
Programmatic Usage
Use the core transcriber directly:
import { transcribeAudioUrl } from '@twick/cloud-transcript/core/transcriber.js';
const result = await transcribeAudioUrl({
audioUrl: 'https://example.com/audio.mp3',
language: 'english',
languageFont: 'english',
});
console.log(result.captions); // Array of {t, s, e} objects
console.log(result.rawText); // Raw model responseTechnical Details
- Model: Google Gemini (default:
gemini-2.5-flash, configurable viaGOOGLE_VERTEX_MODEL) - Format: Captions segmented into max 4 words per segment
- Timestamps: Millisecond precision, non-overlapping segments
- API: Google Vertex AI (GenAI)
For detailed setup instructions, see the complete deployment guide in the repository.