JSPM

  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 21
  • Score
    100M100P100Q44641F
  • License ISC

Revolutionary high-accuracy Audio-to-Text library powered by Gemini 2.5 Flash Lite with 1M+ context window.

Package Exports

  • geminisst
  • geminisst/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (geminisst) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

geminisst ๐ŸŽ™๏ธ

geminisst is a revolutionary, professional-grade Node.js library for high-accuracy Audio-to-Text conversion. Powered by Google's Gemini 2.5 Flash Lite (default) and compatible with the newest Gemini 3 series, it offers a massive 1 Million+ context window and next-gen multimodal understanding.

Unlike traditional STT engines, geminisst leverages the Files API for all requests, ensuring stability and accuracy whether you are processing a 3-second voice note or a multi-hour lecture.


๐Ÿš€ Key Features

  • Gemini 2.5 Flash Lite (Default): Optimized for cost-efficiency and speed, pre-configured with thinkingBudget: -1 (Dynamic Thinking).
  • Gemini 3 Ready: Full support for the latest Gemini 3 Flash and Pro models using thinkingLevel.
  • Universal Files API: All audio is processed via the robust Files API, eliminating size limits and base64 overhead.
  • Intelligent Reasoning:
    • Gemini 2.5: Controls reasoning via thinkingBudget (Default: Dynamic).
    • Gemini 3: Controls reasoning via thinkingLevel (minimal, low, medium, high).
  • Locked Core Logic: Built-in system instructions ensure 100% verbatim transcription (no summaries, no opinions).
  • Automatic Language Detection: Seamlessly handles Hindi, English, Hinglish, and mixed languages.
  • Rich Metadata: Real-time tracking of token usage, thoughts, and processing time.
  • TypeScript Native: Full type safety and IntelliSense.

๐Ÿ“ฆ Installation

npm install geminisst

Note: You need a Google Gemini API Key. Get one for free at Google AI Studio.


๐Ÿงช Testing

We provide dedicated test commands to verify different models safely.

1. Default Test (Gemini 2.5 Flash Lite)

Runs the standard transcription test using the default model.

npm test

2. Gemini 3 Test (Gemini 3 Flash Preview)

Runs the test specifically using the Gemini 3 model with thinkingLevel configuration.

npm run test:gemini3

๐Ÿ› ๏ธ Quick Start

1. Simple Transcription (Default)

Uses Gemini 2.5 Flash Lite with Dynamic Thinking.

import { audioToText } from 'geminisst';

const apiKey = "YOUR_GEMINI_API_KEY";
const result = await audioToText('./meeting.mp3', apiKey);

console.log("Transcript:", result.text);

2. Using Gemini 3 (Advanced)

Switch to Gemini 3 Flash and use thinkingLevel.

import { audioToText } from 'geminisst';

async function transcribeWithGemini3() {
  const result = await audioToText('./interview.wav', 'YOUR_API_KEY', {
    model: "gemini-3-flash-preview", // Switch model
    thinkingLevel: "high",           // Gemini 3 specific config
    verbose: true
  });

  if (result.thoughts) {
    console.log("Gemini 3 Thoughts:", result.thoughts);
  }
  console.log("Transcript:", result.text);
}

transcribeWithGemini3();

3. Reuse File Uploads (Efficiency)

Avoid re-uploading the same file. The API returns a fileUri valid for 48 hours.

// 1. First call uploads the file
const result1 = await audioToText('./large-audio.mp3', apiKey);
console.log("File URI:", result1.fileUri); // e.g., https://...

// 2. Second call reuses the URI (Instant processing, no upload)
const result2 = await audioToText(result1.fileUri, apiKey, {
    prompt: "Summarize this now.",
    model: "gemini-3-flash-preview"
});

๐Ÿ“– API Reference

audioToText(audioInput, apiKey, options?)

Parameter Type Description
audioInput string Absolute path to the local audio file OR an existing https:// File URI.
apiKey string Your Google Gemini API Key.
options SSTOptions (Optional) Configuration for models and thinking.

SSTOptions

Option Type Default Description
prompt string undefined Guidance (e.g., "Transcribe in Hindi").
model string "gemini-2.5-flash-lite" The model to use. Supports both 2.5 and 3 series.
verbose boolean false Log upload and processing steps.
thinkingBudget number -1 (Gemini 2.5 Only) Token budget for reasoning (-1 = dynamic).
thinkingLevel string undefined (Gemini 3 Only) "minimal", "low", "medium", "high".

TranscriptionResult

Property Type Description
text string The verbatim transcript.
thoughts string AI's internal reasoning process.
fileUri string New: The reusable File URI (valid for 48h).
model string The specific model version used.
usage object Stats: inputTokens, outputTokens, thoughtsTokenCount, etc.

๐Ÿ›ก๏ธ Transcription Rules (Locked Logic)

This library is hardcoded with professional transcription rules ensuring:

  1. Verbatim Content: Captures stutters, hesitations, and repetitions.
  2. Noise Suppression: Ignores background noise, focuses on speech.
  3. No Interpretation: No summaries or translations (unless requested via prompt).
  4. Style Integrity: Matches the natural flow and pronunciation.

๐Ÿ“„ License

ISC - Distributed under the ISC License. See LICENSE for more information.

Copyright (c) 2026, Smart Tell Line.