AiFormParser

Browser diagnostics

Confirms that the OCR engine and the local LLM run correctly in this browser. Everything on this page stays client-side; nothing is uploaded.

Client-side capability check

Checking browser capabilities...

  

Runtime acceleration backend

Surfaces what wllama actually picks at runtime, so you can tell a single-threaded WASM fallback or a missing WebGPU adapter apart from a healthy multi-thread + GPU setup without opening devtools. The pre-load row probes the browser; the post-load row reports what wllama settled on after the model is loaded.

Probing browser capabilities...
Backend (post-load): load a model to populate.
Pre-load snapshot (JSON)
(pending)

Console log capture

Mirrors the devtools console (wllama wrapper, llama.cpp native log from suppressNativeLog: false, runtime diagnostics) so you can copy it without opening devtools. Capture starts when the page loads, so reload before running a check if you want a clean trace.


  

LLM diagnostic

Loads the selected model, generates a few tokens, validates structured output (JSON-schema tool call), then OCRs a synthetic image to confirm multimodal extraction works end to end. Run each check individually or use Run all. Each run appends a row to the results table.

Diagnostic only. Both controls trigger a model reload on the next run. If the Model options YAML below explicitly sets the same key, that wins and the matching picker option is disabled.
Idle. Pick a model and press Run.
Step Result Time Detail

Live token stream (drag the bottom-right corner to resize; previous step runs stay visible until Clear results):

LLM benchmarking

Sweeps thread count (1, 3, and the value the user pipeline picks on this device) against compute offload (all GPU, all CPU, GPU with the vision encoder forced to CPU). For each combination, the model is reloaded, a short text generation runs and a synthetic multimodal OCR runs, each stopped after 10 generated tokens (thinking included). TTFT and tok/s are recorded per task. The final table is sorted by multimodal tok/s, descending, since OCR is the production workload. Uses the Model picker and the Model / Completion options textareas above. Cancel interrupts after the current combination's load.

Idle. Press Run to sweep thread count x compute offload.
Combination Text TTFT Text tok/s OCR TTFT OCR tok/s