Browser diagnostics
Confirms that the OCR engine and the local LLM run correctly in this browser. Everything on this page stays client-side; nothing is uploaded.
Client-side capability check
Runtime acceleration backend
Surfaces what wllama actually picks at runtime, so you can tell a single-threaded WASM fallback or a missing WebGPU adapter apart from a healthy multi-thread + GPU setup without opening devtools. The pre-load row probes the browser; the post-load row reports what wllama settled on after the model is loaded.
Pre-load snapshot (JSON)
(pending)
Console log capture
Mirrors the devtools console (wllama wrapper, llama.cpp native log
from suppressNativeLog: false, runtime diagnostics)
so you can copy it without opening devtools. Capture starts when
the page loads, so reload before running a check if you want a
clean trace.
LLM diagnostic
Loads the selected model, generates a few tokens, validates structured output (JSON-schema tool call), then OCRs a synthetic image to confirm multimodal extraction works end to end. Run each check individually or use Run all. Each run appends a row to the results table.
| Step | Result | Time | Detail |
|---|
Live token stream (drag the bottom-right corner to resize; previous step runs stay visible until Clear results):
LLM benchmarking
Sweeps thread count (1, 3, and the value the user pipeline picks on this device) against compute offload (all GPU, all CPU, GPU with the vision encoder forced to CPU). For each combination, the model is reloaded, a short text generation runs and a synthetic multimodal OCR runs, each stopped after 10 generated tokens (thinking included). TTFT and tok/s are recorded per task. The final table is sorted by multimodal tok/s, descending, since OCR is the production workload. Uses the Model picker and the Model / Completion options textareas above. Cancel interrupts after the current combination's load.
| Combination | Text TTFT | Text tok/s | OCR TTFT | OCR tok/s |
|---|