Package Exports

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@glogwa/llama-roblox) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

@glogwa/llama-roblox

Complete LLaMA model inference for Roblox using llama.cpp architecture

A production-ready implementation of llama.cpp for Roblox, enabling on-device LLM inference with GGUF model support.

✨ Features

🚀 Full GGUF v3 Support - Load quantized models directly
🎯 10 Quantization Formats - Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K-Q6_K, F16, F32, BF16
🧠 Complete Transformer - Multi-head attention, RoPE, feed-forward networks
💬 Chat Templates - ChatML, Llama 2, Alpaca, Vicuna
🎲 7 Sampling Strategies - Temperature, Top-K, Top-P, Min-P, Mirostat, and more
⚡ Optimized Performance - Cache-blocked matrix multiplication, KV cache
📦 Zero Dependencies - Pure TypeScript implementation

📥 Installation

npm install @glogwa/llama-roblox

🚀 Quick Start

import { quickSetup } from "@glogwa/llama-roblox";

// Load your GGUF model (e.g., Qwen 3 0.6B Q4_K_M)
const modelBuffer = loadModelFromStorage();

// Quick setup with sensible defaults
const llm = quickSetup(modelBuffer, {
    n_ctx: 2048,
    temperature: 0.7,
});

// Generate text
const response = llm.generate("Hello, world!", 100);
print(response);

// Clean up
llm.free();

💬 Chat Example

import { createLLM, ChatTemplateType } from "@glogwa/llama-roblox";

const llm = createLLM();

// Load and configure
llm.loadModel(modelBuffer);
llm.createContext({ n_ctx: 2048 });
llm.setupSampler({ temperature: 0.8 });

// Setup chat
llm.setupConversation(ChatTemplateType.CHATML);
llm.setSystemPrompt("You are a helpful AI assistant.");

// Multi-turn conversation
const response1 = llm.chat("What is TypeScript?", 100);
print(response1);

const response2 = llm.chat("How is it different from JavaScript?", 100);
print(response2);

llm.free();

🎯 Supported Models

Works with any GGUF model, including:

✅ Qwen 3 (0.6B, 1.5B, 3B, 7B)
✅ LLaMA 2/3 (7B, 13B, 70B)
✅ Mistral (7B)
✅ Phi-2/3 (2.7B, 3.8B)
✅ TinyLlama (1.1B)
✅ And many more!

📊 Quantization Support

Format	Bits	Description	Size Reduction
F32	32	Full precision	1x (baseline)
F16	16	Half precision	2x
Q8_0	8	8-bit quantization	4x
Q6_K	6	6-bit K-quants	5.3x
Q5_0/Q5_1	5	5-bit quantization	6.4x
Q4_0/Q4_1	4	4-bit quantization	8x
Q4_K_M	4	4-bit K-quants (medium)	8x
Q3_K	3	3-bit K-quants	10.7x
Q2_K	2	2-bit K-quants	16x

🎲 Sampling Strategies

// Greedy (deterministic)
llm.setupSampler({ temperature: 0.0 });

// Balanced
llm.setupSampler({
    temperature: 0.7,
    top_k: 40,
    top_p: 0.95,
});

// Creative
llm.setupSampler({
    temperature: 1.0,
    top_p: 0.98,
    repeat_penalty: 1.1,
});

// Mirostat (perplexity control)
llm.setupSampler({
    mirostat: 2,
    mirostat_tau: 5.0,
    mirostat_eta: 0.1,
});

Building from Source

To build the project from scratch, use:

npm install
npm run build

Or with Rojo:

rojo build -o "LLM-on-roblox.rbxlx"

For development with live sync:

rojo serve

For more help, check out the Rojo documentation.

Documentation

See the full documentation for detailed usage, API reference, and examples.

License

ISC License

Credits

Based on llama.cpp by Georgi Gerganov