Package Exports
This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (@glogwa/llama-roblox) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.
Readme
@glogwa/llama-roblox
Complete LLaMA model inference for Roblox using llama.cpp architecture
A production-ready implementation of llama.cpp for Roblox, enabling on-device LLM inference with GGUF model support.
✨ Features
- 🚀 Full GGUF v3 Support - Load quantized models directly
- 🎯 10 Quantization Formats - Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1, Q2_K-Q6_K, F16, F32, BF16
- 🧠 Complete Transformer - Multi-head attention, RoPE, feed-forward networks
- 💬 Chat Templates - ChatML, Llama 2, Alpaca, Vicuna
- 🎲 7 Sampling Strategies - Temperature, Top-K, Top-P, Min-P, Mirostat, and more
- ⚡ Optimized Performance - Cache-blocked matrix multiplication, KV cache
- 📦 Zero Dependencies - Pure TypeScript implementation
📥 Installation
npm install @glogwa/llama-roblox🚀 Quick Start
import { quickSetup } from "@glogwa/llama-roblox";
// Load your GGUF model (e.g., Qwen 3 0.6B Q4_K_M)
const modelBuffer = loadModelFromStorage();
// Quick setup with sensible defaults
const llm = quickSetup(modelBuffer, {
n_ctx: 2048,
temperature: 0.7,
});
// Generate text
const response = llm.generate("Hello, world!", 100);
print(response);
// Clean up
llm.free();💬 Chat Example
import { createLLM, ChatTemplateType } from "@glogwa/llama-roblox";
const llm = createLLM();
// Load and configure
llm.loadModel(modelBuffer);
llm.createContext({ n_ctx: 2048 });
llm.setupSampler({ temperature: 0.8 });
// Setup chat
llm.setupConversation(ChatTemplateType.CHATML);
llm.setSystemPrompt("You are a helpful AI assistant.");
// Multi-turn conversation
const response1 = llm.chat("What is TypeScript?", 100);
print(response1);
const response2 = llm.chat("How is it different from JavaScript?", 100);
print(response2);
llm.free();🎯 Supported Models
Works with any GGUF model, including:
- ✅ Qwen 3 (0.6B, 1.5B, 3B, 7B)
- ✅ LLaMA 2/3 (7B, 13B, 70B)
- ✅ Mistral (7B)
- ✅ Phi-2/3 (2.7B, 3.8B)
- ✅ TinyLlama (1.1B)
- ✅ And many more!
📊 Quantization Support
| Format | Bits | Description | Size Reduction |
|---|---|---|---|
| F32 | 32 | Full precision | 1x (baseline) |
| F16 | 16 | Half precision | 2x |
| Q8_0 | 8 | 8-bit quantization | 4x |
| Q6_K | 6 | 6-bit K-quants | 5.3x |
| Q5_0/Q5_1 | 5 | 5-bit quantization | 6.4x |
| Q4_0/Q4_1 | 4 | 4-bit quantization | 8x |
| Q4_K_M | 4 | 4-bit K-quants (medium) | 8x |
| Q3_K | 3 | 3-bit K-quants | 10.7x |
| Q2_K | 2 | 2-bit K-quants | 16x |
🎲 Sampling Strategies
// Greedy (deterministic)
llm.setupSampler({ temperature: 0.0 });
// Balanced
llm.setupSampler({
temperature: 0.7,
top_k: 40,
top_p: 0.95,
});
// Creative
llm.setupSampler({
temperature: 1.0,
top_p: 0.98,
repeat_penalty: 1.1,
});
// Mirostat (perplexity control)
llm.setupSampler({
mirostat: 2,
mirostat_tau: 5.0,
mirostat_eta: 0.1,
});Building from Source
To build the project from scratch, use:
npm install
npm run buildOr with Rojo:
rojo build -o "LLM-on-roblox.rbxlx"For development with live sync:
rojo serveFor more help, check out the Rojo documentation.
Documentation
See the full documentation for detailed usage, API reference, and examples.
License
ISC License
Credits
Based on llama.cpp by Georgi Gerganov