Grammar Overview
OktoScript is a declarative language with a simple, block-based syntax. Each block defines a specific aspect of your AI model training and inference pipeline.
PROJECT "MyModel"
DESCRIPTION "My AI model"
DATASET {
train: "dataset/train.jsonl"
}
MODEL {
base: "oktoseek/base-mini"
}
TRAIN {
epochs: 5
batch_size: 32
}
EXPORT {
format: ["okm"]
path: "export/"
}
Required blocks: PROJECT, DATASET, MODEL, TRAIN
Optional blocks: ENV, DESCRIPTION, CONTROL, MONITOR, GUARD, BEHAVIOR, INFERENCE, EXPORT, and more
PROJECT Block
Defines the project name and metadata.
PROJECT "PizzaBot"
DESCRIPTION "AI specialized in pizza restaurant service"
VERSION "1.0"
AUTHOR "OktoSeek"
TAGS ["food", "restaurant", "chatbot"]
DATASET Block
Defines training, validation, and test datasets.
DATASET {
train: "dataset/train.jsonl"
validation: "dataset/val.jsonl"
test: "dataset/test.jsonl"
format: "jsonl"
type: "chat"
language: "en"
input_field: "input"
output_field: "target"
}
Supported formats: jsonl, csv, txt, parquet, image+caption, qa, instruction
Supported types: classification, generation, qa, chat, vision, regression
MODEL Block
Defines the base model, architecture, and parameters.
MODEL {
name: "my-model"
base: "oktoseek/base-mini"
architecture: "transformer"
parameters: 120M
context_window: 2048
precision: "fp16"
device: "cuda"
ADAPTER {
type: "lora"
path: "./adapters/my-adapter"
rank: 16
alpha: 32
}
}
TRAIN Block
Defines training hyperparameters and strategy.
TRAIN {
epochs: 10
batch_size: 32
learning_rate: 0.0001
optimizer: "adamw"
scheduler: "cosine"
device: "cuda"
gradient_accumulation: 2
early_stopping: true
checkpoint_steps: 100
weight_decay: 0.01
gradient_clip: 1.0
}
Optimizers: adam, adamw, sgd, rmsprop, adafactor, lamb
Schedulers: linear, cosine, cosine_with_restarts, polynomial, constant, step
🧠 CONTROL Block — Decision Engine
Enables logical, conditional, and event-based decisions during training and inference.
CONTROL {
on_epoch_end {
IF loss > 2.0 {
SET LR = 0.00005
LOG "High loss detected"
WHEN gpu_usage > 90% {
SET batch_size = 16
}
IF val_loss > 3.0 {
STOP_TRAINING
}
}
IF accuracy > 0.9 {
SAVE "best_model"
}
}
validate_every: 200
}
Events: on_step_end, on_epoch_end, on_memory_low, on_nan, on_plateau
Directives: IF, WHEN, EVERY, SET, STOP, LOG, SAVE, RETRY, REGENERATE, STOP_TRAINING
📊 MONITOR Block
Tracks training and system metrics.
MONITOR {
metrics: [
"loss",
"val_loss",
"accuracy",
"gpu_usage",
"ram_usage",
"throughput"
]
notify_if {
loss > 2.0
gpu_usage > 90%
}
log_to: "logs/training.log"
}
🛡️ GUARD Block — Safety
Defines safety rules during generation and training.
GUARD {
prevent {
hallucination
toxicity
bias
data_leak
}
detect_using: ["classifier", "regex", "embedding"]
on_violation {
REPLACE
with_message: "Sorry, this request is not allowed."
}
}
🎭 BEHAVIOR Block
Defines model personality and behavior.
BEHAVIOR {
mode: "chat"
personality: "friendly"
verbosity: "medium"
language: "en"
avoid: ["violence", "hate", "politics"]
fallback: "How can I help you?"
prompt_style: "User: {input}\nAssistant:"
}
💬 INFERENCE Block
Defines inference configuration and parameters.
INFERENCE {
mode: "chat"
format: "User: {input}\nAssistant:"
exit_command: "/exit"
params {
max_length: 120
temperature: 0.7
top_p: 0.9
beams: 2
do_sample: true
}
CONTROL {
IF confidence < 0.3 { RETRY }
IF hallucination_score > 0.5 {
REPLACE WITH "I'm not certain about that."
}
}
}
EXPORT Block
Defines export formats and paths.
EXPORT {
format: ["gguf", "onnx", "okm", "safetensors"]
path: "export/"
quantization: "int8"
}
Supported formats: gguf, onnx, okm, safetensors, tflite
Complete Example
A complete OktoScript v1.2 example:
# okto_version: "1.2"
PROJECT "PizzaBot"
DESCRIPTION "AI specialized in pizza restaurant service"
ENV {
accelerator: "gpu"
min_memory: "8GB"
precision: "fp16"
}
DATASET {
train: "dataset/train.jsonl"
validation: "dataset/val.jsonl"
format: "jsonl"
type: "chat"
}
MODEL {
base: "oktoseek/pizza-small"
device: "cuda"
}
TRAIN {
epochs: 10
batch_size: 32
learning_rate: 0.0001
optimizer: "adamw"
device: "cuda"
}
CONTROL {
on_epoch_end {
IF loss > 2.0 {
SET LR = 0.00005
}
IF accuracy > 0.9 {
SAVE "best_model"
}
}
}
MONITOR {
metrics: ["loss", "accuracy", "gpu_usage"]
notify_if {
loss > 2.0
}
}
BEHAVIOR {
personality: "friendly"
language: "en"
}
GUARD {
prevent {
toxicity
bias
}
}
EXPORT {
format: ["okm", "onnx"]
path: "export/"
}
For complete grammar specification, visit:
GitHub - Complete Grammar Documentation