OktoScript Grammar

Grammar Overview

OktoScript is a declarative language with a simple, block-based syntax. Each block defines a specific aspect of your AI model training and inference pipeline.

PROJECT "MyModel"
DESCRIPTION "My AI model"

DATASET {
    train: "dataset/train.jsonl"
}

MODEL {
    base: "oktoseek/base-mini"
}

TRAIN {
    epochs: 5
    batch_size: 32
}

EXPORT {
    format: ["okm"]
    path: "export/"
}

          Required blocks: PROJECT, DATASET, MODEL, TRAIN

          Optional blocks: ENV, DESCRIPTION, CONTROL, MONITOR, GUARD, BEHAVIOR, INFERENCE, EXPORT, and more

PROJECT Block

Defines the project name and metadata.

PROJECT "PizzaBot"
DESCRIPTION "AI specialized in pizza restaurant service"
VERSION "1.0"
AUTHOR "OktoSeek"
TAGS ["food", "restaurant", "chatbot"]

DATASET Block

Defines training, validation, and test datasets.

DATASET {
    train: "dataset/train.jsonl"
    validation: "dataset/val.jsonl"
    test: "dataset/test.jsonl"
    format: "jsonl"
    type: "chat"
    language: "en"
    input_field: "input"
    output_field: "target"
}

Supported formats: jsonl, csv, txt, parquet, image+caption, qa, instruction

Supported types: classification, generation, qa, chat, vision, regression

MODEL Block

Defines the base model, architecture, and parameters.

MODEL {
    name: "my-model"
    base: "oktoseek/base-mini"
    architecture: "transformer"
    parameters: 120M
    context_window: 2048
    precision: "fp16"
    device: "cuda"
    
    ADAPTER {
        type: "lora"
        path: "./adapters/my-adapter"
        rank: 16
        alpha: 32
    }
}

TRAIN Block

Defines training hyperparameters and strategy.

TRAIN {
    epochs: 10
    batch_size: 32
    learning_rate: 0.0001
    optimizer: "adamw"
    scheduler: "cosine"
    device: "cuda"
    gradient_accumulation: 2
    early_stopping: true
    checkpoint_steps: 100
    weight_decay: 0.01
    gradient_clip: 1.0
}

Optimizers: adam, adamw, sgd, rmsprop, adafactor, lamb

Schedulers: linear, cosine, cosine_with_restarts, polynomial, constant, step

🧠 CONTROL Block — Decision Engine

Enables logical, conditional, and event-based decisions during training and inference.

CONTROL {
    on_epoch_end {
        IF loss > 2.0 {
            SET LR = 0.00005
            LOG "High loss detected"
            
            WHEN gpu_usage > 90% {
                SET batch_size = 16
            }
            
            IF val_loss > 3.0 {
                STOP_TRAINING
            }
        }
        
        IF accuracy > 0.9 {
            SAVE "best_model"
        }
    }
    
    validate_every: 200
}

Events: on_step_end, on_epoch_end, on_memory_low, on_nan, on_plateau

Directives: IF, WHEN, EVERY, SET, STOP, LOG, SAVE, RETRY, REGENERATE, STOP_TRAINING

📊 MONITOR Block

Tracks training and system metrics.

MONITOR {
    metrics: [
        "loss",
        "val_loss",
        "accuracy",
        "gpu_usage",
        "ram_usage",
        "throughput"
    ]
    
    notify_if {
        loss > 2.0
        gpu_usage > 90%
    }
    
    log_to: "logs/training.log"
}

🛡️ GUARD Block — Safety

Defines safety rules during generation and training.

GUARD {
    prevent {
        hallucination
        toxicity
        bias
        data_leak
    }
    
    detect_using: ["classifier", "regex", "embedding"]
    
    on_violation {
        REPLACE
        with_message: "Sorry, this request is not allowed."
    }
}

🎭 BEHAVIOR Block

Defines model personality and behavior.

BEHAVIOR {
    mode: "chat"
    personality: "friendly"
    verbosity: "medium"
    language: "en"
    avoid: ["violence", "hate", "politics"]
    fallback: "How can I help you?"
    prompt_style: "User: {input}\nAssistant:"
}

💬 INFERENCE Block

Defines inference configuration and parameters.

INFERENCE {
    mode: "chat"
    format: "User: {input}\nAssistant:"
    exit_command: "/exit"
    
    params {
        max_length: 120
        temperature: 0.7
        top_p: 0.9
        beams: 2
        do_sample: true
    }
    
    CONTROL {
        IF confidence < 0.3 { RETRY }
        IF hallucination_score > 0.5 { 
            REPLACE WITH "I'm not certain about that." 
        }
    }
}

EXPORT Block

Defines export formats and paths.

EXPORT {
    format: ["gguf", "onnx", "okm", "safetensors"]
    path: "export/"
    quantization: "int8"
}

Supported formats: gguf, onnx, okm, safetensors, tflite

Complete Example

A complete OktoScript v1.2 example:

# okto_version: "1.2"
PROJECT "PizzaBot"
DESCRIPTION "AI specialized in pizza restaurant service"

ENV {
    accelerator: "gpu"
    min_memory: "8GB"
    precision: "fp16"
}

DATASET {
    train: "dataset/train.jsonl"
    validation: "dataset/val.jsonl"
    format: "jsonl"
    type: "chat"
}

MODEL {
    base: "oktoseek/pizza-small"
    device: "cuda"
}

TRAIN {
    epochs: 10
    batch_size: 32
    learning_rate: 0.0001
    optimizer: "adamw"
    device: "cuda"
}

CONTROL {
    on_epoch_end {
        IF loss > 2.0 {
            SET LR = 0.00005
        }
        IF accuracy > 0.9 {
            SAVE "best_model"
        }
    }
}

MONITOR {
    metrics: ["loss", "accuracy", "gpu_usage"]
    notify_if {
        loss > 2.0
    }
}

BEHAVIOR {
    personality: "friendly"
    language: "en"
}

GUARD {
    prevent {
        toxicity
        bias
    }
}

EXPORT {
    format: ["okm", "onnx"]
    path: "export/"
}

For complete grammar specification, visit:
GitHub - Complete Grammar Documentation