Package Exports
- sampleshard
- sampleshard/dist/index.js
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (sampleshard) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
SampleShard TypeScript/JavaScript
TypeScript/JavaScript implementation of the SampleShard format for storing training samples.
Installation
npm install @sampleshard/coreQuick Start
import { SampleShardWriter, SampleShardReader } from '@sampleshard/core';
// Writing samples
const writer = new SampleShardWriter('train.smpl');
await writer.open();
await writer.addSample(1, { input: [1, 2, 3], label: 0 });
await writer.addSample(2, { input: [4, 5, 6], label: 1 });
await writer.addSample(3, { input: [7, 8, 9], label: 2 });
await writer.close();
// Reading samples
const reader = new SampleShardReader('train.smpl');
await reader.open();
// Get sample count
console.log(`Total samples: ${reader.sampleCount()}`);
// Random access by ID
const sample = await reader.getSample(1);
console.log(sample); // { input: [1, 2, 3], label: 0 }
// Check if sample exists
if (reader.hasSample(2)) {
console.log('Sample 2 exists!');
}
// Iterate all samples
for await (const [sampleId, sample] of reader) {
console.log(`Sample ${sampleId}:`, sample);
}
// Batch access
const batch = await reader.getBatch([1, 2, 3]);
const rangeBatch = await reader.getBatchByRange(0, 10);
await reader.close();Features
- Fast random access by sample ID (O(1) lookup)
- Deterministic iteration order
- Metadata-safe: Reserved entries (starting with
__) excluded from sample counts - Async/await API with async iterators
- CRC32 checksums for data integrity
- BigInt support for sample IDs
File Format
SampleShard uses the .smpl extension and the Shard v2 binary format:
- 64-byte header with magic bytes
SHRD - Role byte = 0x02 (Sample)
- 48-byte index entries with name hashes
- JSON-encoded sample data
- CRC32 checksums per entry
Interoperability
SampleShard files created with TypeScript can be read by:
- Go:
agentscope/cowrie/ucodec.OpenSampleShard() - Python:
sampleshard.SampleShardReader()
API Reference
SampleShardWriter
class SampleShardWriter {
constructor(path: string, options?: { alignment?: number; compression?: number });
async open(): Promise<void>;
async addSample(sampleId: number | bigint, sample: unknown): Promise<void>;
async addSampleRaw(sampleId: number | bigint, data: Buffer): Promise<void>;
async close(): Promise<void>;
}SampleShardReader
class SampleShardReader {
constructor(path: string);
async open(): Promise<void>;
sampleCount(): number;
getSampleIds(): bigint[];
sampleIdByIndex(index: number): bigint;
hasSample(sampleId: number | bigint): boolean;
async getSample(sampleId: number | bigint): Promise<unknown>;
async getSampleByIndex(index: number): Promise<unknown>;
async getBatch(sampleIds: (number | bigint)[]): Promise<unknown[]>;
async getBatchByRange(start: number, end: number): Promise<unknown[]>;
async close(): Promise<void>;
[Symbol.asyncIterator](): AsyncIterableIterator<[bigint, unknown]>;
}License
MIT