OktoScript Grammar

A decision-driven language for training, evaluating and governing AI models

Table of Contents

Grammar Overview

OktoScript is a declarative language with a simple, block-based syntax. Each block defines a specific aspect of your AI model training and inference pipeline.

PROJECT "MyModel" DESCRIPTION "My AI model" DATASET { train: "dataset/train.jsonl" } MODEL { base: "oktoseek/base-mini" } TRAIN { epochs: 5 batch_size: 32 } EXPORT { format: ["okm"] path: "export/" }
Required blocks: PROJECT, DATASET, MODEL, TRAIN
Optional blocks: ENV, DESCRIPTION, CONTROL, MONITOR, GUARD, BEHAVIOR, INFERENCE, EXPORT, and more

PROJECT Block

Defines the project name and metadata.

PROJECT "PizzaBot" DESCRIPTION "AI specialized in pizza restaurant service" VERSION "1.0" AUTHOR "OktoSeek" TAGS ["food", "restaurant", "chatbot"]

DATASET Block

Defines training, validation, and test datasets.

DATASET { train: "dataset/train.jsonl" validation: "dataset/val.jsonl" test: "dataset/test.jsonl" format: "jsonl" type: "chat" language: "en" input_field: "input" output_field: "target" }

Supported formats: jsonl, csv, txt, parquet, image+caption, qa, instruction

Supported types: classification, generation, qa, chat, vision, regression

MODEL Block

Defines the base model, architecture, and parameters.

MODEL { name: "my-model" base: "oktoseek/base-mini" architecture: "transformer" parameters: 120M context_window: 2048 precision: "fp16" device: "cuda" ADAPTER { type: "lora" path: "./adapters/my-adapter" rank: 16 alpha: 32 } }

TRAIN Block

Defines training hyperparameters and strategy.

TRAIN { epochs: 10 batch_size: 32 learning_rate: 0.0001 optimizer: "adamw" scheduler: "cosine" device: "cuda" gradient_accumulation: 2 early_stopping: true checkpoint_steps: 100 weight_decay: 0.01 gradient_clip: 1.0 }

Optimizers: adam, adamw, sgd, rmsprop, adafactor, lamb

Schedulers: linear, cosine, cosine_with_restarts, polynomial, constant, step

🧠 CONTROL Block — Decision Engine

Enables logical, conditional, and event-based decisions during training and inference.

CONTROL { on_epoch_end { IF loss > 2.0 { SET LR = 0.00005 LOG "High loss detected" WHEN gpu_usage > 90% { SET batch_size = 16 } IF val_loss > 3.0 { STOP_TRAINING } } IF accuracy > 0.9 { SAVE "best_model" } } validate_every: 200 }

Events: on_step_end, on_epoch_end, on_memory_low, on_nan, on_plateau

Directives: IF, WHEN, EVERY, SET, STOP, LOG, SAVE, RETRY, REGENERATE, STOP_TRAINING

📊 MONITOR Block

Tracks training and system metrics.

MONITOR { metrics: [ "loss", "val_loss", "accuracy", "gpu_usage", "ram_usage", "throughput" ] notify_if { loss > 2.0 gpu_usage > 90% } log_to: "logs/training.log" }

🛡️ GUARD Block — Safety

Defines safety rules during generation and training.

GUARD { prevent { hallucination toxicity bias data_leak } detect_using: ["classifier", "regex", "embedding"] on_violation { REPLACE with_message: "Sorry, this request is not allowed." } }

🎭 BEHAVIOR Block

Defines model personality and behavior.

BEHAVIOR { mode: "chat" personality: "friendly" verbosity: "medium" language: "en" avoid: ["violence", "hate", "politics"] fallback: "How can I help you?" prompt_style: "User: {input}\nAssistant:" }

💬 INFERENCE Block

Defines inference configuration and parameters.

INFERENCE { mode: "chat" format: "User: {input}\nAssistant:" exit_command: "/exit" params { max_length: 120 temperature: 0.7 top_p: 0.9 beams: 2 do_sample: true } CONTROL { IF confidence < 0.3 { RETRY } IF hallucination_score > 0.5 { REPLACE WITH "I'm not certain about that." } } }

EXPORT Block

Defines export formats and paths.

EXPORT { format: ["gguf", "onnx", "okm", "safetensors"] path: "export/" quantization: "int8" }

Supported formats: gguf, onnx, okm, safetensors, tflite

Complete Example

A complete OktoScript v1.2 example:

# okto_version: "1.2" PROJECT "PizzaBot" DESCRIPTION "AI specialized in pizza restaurant service" ENV { accelerator: "gpu" min_memory: "8GB" precision: "fp16" } DATASET { train: "dataset/train.jsonl" validation: "dataset/val.jsonl" format: "jsonl" type: "chat" } MODEL { base: "oktoseek/pizza-small" device: "cuda" } TRAIN { epochs: 10 batch_size: 32 learning_rate: 0.0001 optimizer: "adamw" device: "cuda" } CONTROL { on_epoch_end { IF loss > 2.0 { SET LR = 0.00005 } IF accuracy > 0.9 { SAVE "best_model" } } } MONITOR { metrics: ["loss", "accuracy", "gpu_usage"] notify_if { loss > 2.0 } } BEHAVIOR { personality: "friendly" language: "en" } GUARD { prevent { toxicity bias } } EXPORT { format: ["okm", "onnx"] path: "export/" }

For complete grammar specification, visit:
GitHub - Complete Grammar Documentation