Package Exports

llama-node
llama-node/dist/index.js

This package does not declare an exports field, so the exports above have been automatically detected and optimized by JSPM instead. If any package subpath is missing, it is recommended to post an issue to the original package (llama-node) to support the "exports" field. If that is not possible, create a JSPM override to customize the exports field for this package.

Readme

llama-node

This project is in an early stage, the API for nodejs may change in the future, use it with caution.

GitHub Workflow Status NPM npm type definitions

Introduction

This is a nodejs client library for llama LLM built on top of llama-rs. It uses napi-rs for channel messages between node.js and llama thread.

Currently supported platforms:

darwin-x64
darwin-arm64
linux-x64-gnu
win32-x64-msvc

I do not have hardware for testing 13B or larger models, but I have tested it supported llama 7B model with both ggml llama and ggml alpaca.

We provide prebuild binaries for linux-x64, win32-x64, apple-x64, apple-silicon. For other platforms, before you install the npm package, please install rust environment for self built.

Due to complexity of cross compilation, it is hard for pre-building a binary that fits all platform needs with best performance.

If you face low performance issue, I would strongly suggest you do a manual compilation. Otherwise you have to wait for a better pre-compiled native binding. I am trying to investigate the way to produce a matrix of multi-platform supports.

Manual compilation (from node_modules)

The following steps will allow you to compile the binary with best quality on your platform

Pre-request: install rust
Under node_modules/@llama-node/core folder
```
npm run build
```

Manual compilation (from source)

The following steps will allow you to compile the binary with best quality on your platform

Pre-request: install rust
Under root folder, run
```
npm install && npm run build
```
Under packages/core folder, run
```
npm run build
```
You can use the dist under root folder

Install

npm install llama-node

Usage

The current version supports only one inference session on one LLama instance at the same time

If you wish to have multiple inference sessions concurrently, you need to create multiple LLama instances

Inference

import path from "path";
import { LLamaClient } from "llama-node";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLamaClient(
    {
        path: model,
        numCtxTokens: 128,
    },
    true
);

const template = `how are you`;

const prompt = `Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

${template}

### Response:`;

llama.createTextCompletion(
    {
        prompt,
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true,
    },
    (response) => {
        process.stdout.write(response.token);
    }
);

Chatting

Working on alpaca, this just make a context of alpaca instructions. Make sure your last message is end with user role.

import { LLamaClient } from "llama-node";
import path from "path";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLamaClient(
    {
        path: model,
        numCtxTokens: 128,
    },
    true
);

const content = "how are you?";

llama.createChatCompletion(
    {
        messages: [{ role: "user", content }],
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
    },
    (response) => {
        if (!response.completed) {
            process.stdout.write(response.token);
        }
    }
);

Tokenize

Get tokenization result from LLaMA

import { LLamaClient } from "llama-node";
import path from "path";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLamaClient(
    {
        path: model,
        numCtxTokens: 128,
    },
    true
);

const content = "how are you?";

llama.tokenize(content).then(console.log);

Embedding

Preview version, embedding end token may change in the future. Do not use it in production!

import { LLamaClient } from "llama-node";
import path from "path";

const model = path.resolve(process.cwd(), "./ggml-alpaca-7b-q4.bin");

const llama = new LLamaClient(
    {
        path: model,
        numCtxTokens: 128,
    },
    true
);

const prompt = `how are you`;

llama
    .getEmbedding({
        prompt,
        numPredict: 128,
        temp: 0.2,
        topP: 1,
        topK: 40,
        repeatPenalty: 1,
        repeatLastN: 64,
        seed: 0,
        feedPrompt: true,
    })
    .then(console.log);

Future plan

prompt extensions
more platforms and cross compile (performance related)
tweak embedding API, make end token configurable

llama-node

Package Exports

Readme

llama-node

Introduction

Performance related

Manual compilation (from node_modules)

Manual compilation (from source)

Install

Usage

Inference

Chatting

Tokenize

Embedding

Future plan