JSPM

sampleshard

0.1.0
  • ESM via JSPM
  • ES Module Entrypoint
  • Export Map
  • Keywords
  • License
  • Repository URL
  • TypeScript Types
  • README
  • Created
  • Published
  • Downloads 2
  • Score
    100M100P100Q29220F
  • License MIT

SampleShard - Training sample storage format

Package Exports

  • sampleshard
  • sampleshard/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (sampleshard) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

SampleShard TypeScript/JavaScript

TypeScript/JavaScript implementation of the SampleShard format for storing training samples.

Installation

npm install @sampleshard/core

Quick Start

import { SampleShardWriter, SampleShardReader } from '@sampleshard/core';

// Writing samples
const writer = new SampleShardWriter('train.smpl');
await writer.open();
await writer.addSample(1, { input: [1, 2, 3], label: 0 });
await writer.addSample(2, { input: [4, 5, 6], label: 1 });
await writer.addSample(3, { input: [7, 8, 9], label: 2 });
await writer.close();

// Reading samples
const reader = new SampleShardReader('train.smpl');
await reader.open();

// Get sample count
console.log(`Total samples: ${reader.sampleCount()}`);

// Random access by ID
const sample = await reader.getSample(1);
console.log(sample); // { input: [1, 2, 3], label: 0 }

// Check if sample exists
if (reader.hasSample(2)) {
  console.log('Sample 2 exists!');
}

// Iterate all samples
for await (const [sampleId, sample] of reader) {
  console.log(`Sample ${sampleId}:`, sample);
}

// Batch access
const batch = await reader.getBatch([1, 2, 3]);
const rangeBatch = await reader.getBatchByRange(0, 10);

await reader.close();

Features

  • Fast random access by sample ID (O(1) lookup)
  • Deterministic iteration order
  • Metadata-safe: Reserved entries (starting with __) excluded from sample counts
  • Async/await API with async iterators
  • CRC32 checksums for data integrity
  • BigInt support for sample IDs

File Format

SampleShard uses the .smpl extension and the Shard v2 binary format:

  • 64-byte header with magic bytes SHRD
  • Role byte = 0x02 (Sample)
  • 48-byte index entries with name hashes
  • JSON-encoded sample data
  • CRC32 checksums per entry

Interoperability

SampleShard files created with TypeScript can be read by:

  • Go: agentscope/cowrie/ucodec.OpenSampleShard()
  • Python: sampleshard.SampleShardReader()

API Reference

SampleShardWriter

class SampleShardWriter {
  constructor(path: string, options?: { alignment?: number; compression?: number });
  async open(): Promise<void>;
  async addSample(sampleId: number | bigint, sample: unknown): Promise<void>;
  async addSampleRaw(sampleId: number | bigint, data: Buffer): Promise<void>;
  async close(): Promise<void>;
}

SampleShardReader

class SampleShardReader {
  constructor(path: string);
  async open(): Promise<void>;
  sampleCount(): number;
  getSampleIds(): bigint[];
  sampleIdByIndex(index: number): bigint;
  hasSample(sampleId: number | bigint): boolean;
  async getSample(sampleId: number | bigint): Promise<unknown>;
  async getSampleByIndex(index: number): Promise<unknown>;
  async getBatch(sampleIds: (number | bigint)[]): Promise<unknown[]>;
  async getBatchByRange(start: number, end: number): Promise<unknown[]>;
  async close(): Promise<void>;
  [Symbol.asyncIterator](): AsyncIterableIterator<[bigint, unknown]>;
}

License

MIT