Package Exports
- @dataset.sh/file
 - @dataset.sh/file/dist/index.js
 
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@dataset.sh/file) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
@dataset.sh/file
A TypeScript library for reading and writing DatasetFile ZIP-based archive format. This library provides an efficient way to store and access structured datasets with support for collections, type annotations, and binary files.
Features
- 📦 ZIP-based format - Efficient compression and packaging of datasets
 - 📊 Multiple collections - Organize data into named collections (train, test, validation, etc.)
 - 🏷️ Type annotations - Include type information for each collection
 - 🔤 Typelang support - Define schemas using TypeScript-like syntax for cross-platform compatibility
 - 📎 Binary files - Attach model weights, images, or other binary assets
 
Installation
pnpm add @dataset.sh/fileOptional: Typelang Compiler
For enhanced type validation and cross-platform type generation, you can also install the Typelang compiler:
pnpm add @dataset.sh/typelangQuick Start
Writing a Dataset
import {DatasetFile, DatasetFileWriter} from '@dataset.sh/file';
// Create a new dataset file
const writer = DatasetFile.open('my-dataset.dataset', 'w') as DatasetFileWriter;
// Add metadata
writer.updateMeta({
    author: 'Your Name',
    authorEmail: 'you@example.com',
    description: 'My awesome dataset',
    tags: ['nlp', 'classification'],
    dataset_metadata: {
        version: '1.0.0',
        created: new Date().toISOString()
    }
});
// Add a collection with data and Typelang schema
const trainData = [
    {id: 1, text: 'Hello world', label: 'greeting'},
    {id: 2, text: 'How are you?', label: 'question'}
];
// Define schema using Typelang syntax
const typeSchema = `// use TrainItem
type TrainItem = {
  id: int
  text: string
  label: string
}`;
await writer.addCollection('train', trainData, typeSchema);
// Add binary files (optional)
const modelWeights = Buffer.from('...');
writer.addBinaryFile('model.bin', modelWeights);
await writer.close();Reading a Dataset
import {DatasetFile, DatasetFileReader} from '@dataset.sh/file';
// Open an existing dataset
const reader = DatasetFile.open('my-dataset.dataset', 'r') as DatasetFileReader;
// Access metadata
console.log('Author:', reader.meta.author);
console.log('Collections:', reader.collections());
// Read a collection
const trainCollection = reader.collection('train');
// Get type annotation (raw Typelang schema)
const typeAnnotation = await trainCollection.typeAnnotation();
console.log('Type annotation:', typeAnnotation);
// Generate code from type annotation
const codeUsage = await trainCollection.generateCode();
if (codeUsage) {
    console.log('Type name:', codeUsage.useClass);
    console.log('Compilation result:', codeUsage.result);
}
// Access data
console.log('First 5 items:', trainCollection.top(5));
console.log('Random sample:', trainCollection.randomSample(3));
// Iterate through data
for (const item of trainCollection) {
    console.log(item);
}
// Convert to array
const allData = trainCollection.toList();
// Access binary files
const modelData = reader.openBinaryFile('model.bin');
reader.close();API Reference
DatasetFile
Main entry point for opening dataset files.
DatasetFile.open(filePath: string, mode: 'r' | 'w')
Opens a dataset file for reading or writing.
- filePath: Path to the dataset file
 - mode: 
'r'for reading,'w'for writing - Returns: 
DatasetFileReaderorDatasetFileWriter 
DatasetFileWriter
Used for creating new dataset files.
Methods
updateMeta(meta: Partial<DatasetFileMeta>): Update dataset metadataasync addCollection(name: string, data: any[], type_annotation?: string): Add a data collection with optional Typelang schemaaddBinaryFile(fileName: string, data: Buffer): Add a binary fileasync close(): Close and save the dataset file
DatasetFileReader
Used for reading existing dataset files.
Properties
meta: Dataset metadata
Methods
collections(): Get list of collection namescollection(name: string): Get a collection readercoll(name: string): Shorthand forcollection()binaryFiles(): List binary file namesopenBinaryFile(fileName: string): Read a binary fileclose(): Close the dataset file
CollectionReader
Reader for individual collections within a dataset.
Properties
length: Number of items in the collection
Methods
async typeAnnotation(): Get raw Typelang schema stringasync generateCode(): Generate code usage information from type annotation (returnsCodeUsagewith source, useClass, and compile result)top(n: number): Get first n itemsrandomSample(n: number): Get random sampletoList(): Convert to array[Symbol.iterator](): Iterate through items
File Format
DatasetFile uses a ZIP archive with the following structure:
dataset.dataset/
├── meta.json           # Dataset metadata
├── coll/              # Collections folder
│   ├── train/
│   │   ├── data.jsonl # Data in JSON Lines format
│   │   └── type.tl    # Typelang schema (optional)
│   └── test/
│       ├── data.jsonl
│       └── type.tl
└── bin/               # Binary files folder
    └── model.binTypelang Support
This library supports Typelang, a TypeScript-flavored schema definition language for cross-platform type generation.
Using Typelang Schemas
import {DatasetFile, DatasetFileWriter} from '@dataset.sh/file';
const writer = DatasetFile.open('typed-dataset.dataset', 'w') as DatasetFileWriter;
// Define complex types with Typelang
const userSchema = `// use User
type Address = {
  street: string
  city: string
  country: string
  postalCode?: string
}
type User = {
  id: string
  name: string
  email: string
  age: int
  address: Address
  tags: string[]
  status: "active" | "inactive" | "pending"
}`;
const userData = [{
    id: 'u1',
    name: 'Alice',
    email: 'alice@example.com',
    age: 30,
    address: {
        street: '123 Main St',
        city: 'San Francisco',
        country: 'USA'
    },
    tags: ['developer', 'team-lead'],
    status: 'active'
}];
await writer.addCollection('users', userData, userSchema);
await writer.close();Generic Types
const responseSchema = `// use ApiResponse
type Response<T> = {
  success: bool
  data?: T
  error?: string
  timestamp: string
}
type UserData = {
  userId: string
  username: string
}
type ApiResponse = Response<UserData>`;
await writer.addCollection('responses', responseData, responseSchema);Examples
Working with NLP Datasets
const writer = DatasetFile.open('nlp-dataset.dataset', 'w') as DatasetFileWriter;
writer.updateMeta({
    description: 'Sentiment analysis dataset',
    tags: ['nlp', 'sentiment', 'classification']
});
const data = [
    {text: 'This movie is great!', sentiment: 'positive'},
    {text: 'Terrible experience.', sentiment: 'negative'}
];
const sentimentSchema = `// use SentimentItem
type SentimentItem = {
  text: string
  sentiment: "positive" | "negative" | "neutral"
}`;
await writer.addCollection('train', data, sentimentSchema);
await writer.close();Reading Python-created Datasets
This library is fully compatible with datasets created using the Python dataset-sh library, including those with Typelang type annotations:
const reader = DatasetFile.open('python-dataset.dataset', 'r') as DatasetFileReader;
// Read collections created in Python
const collection = reader.collection('data');
// Check for type annotation and generate code
const typeAnnotation = await collection.typeAnnotation();
if (typeAnnotation) {
    console.log('Type annotation:', typeAnnotation);
    const codeUsage = await collection.generateCode();
    if (codeUsage) {
        console.log('Type name:', codeUsage.useClass);
        console.log('Validation errors:', codeUsage.result.errors);
    }
}
// Iterate through data
for (const item of collection) {
    console.log(item);
}
reader.close();Development
Building
pnpm buildTesting
pnpm test
pnpm test:watch
pnpm test:coverageRunning Examples
pnpm example
pnpm verify-pythonRequirements
- Node.js >= 16.0.0
 - TypeScript >= 5.0.0
 
License
MIT
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
For issues and feature requests, please use the GitHub issue tracker.