Generated 2026-03-21T17:11:28.8457513+00:00
TinyLlama-1.1Bwhich model anchors this benchmarkhow do I compare perplexitywhat does the paper model train onwhat does the traditional model train onhow are you hostedwhat language do you use| Model | Training | Efficacy | Exact-match accuracy | Expected-token recall |
|---|---|---|---|---|
| bitnet-b1.58-sharp | Completed (6 examples, 3 epochs) | 100.0 % | 33.3 % | 93.1 % |
| traditional-local | Completed (6 examples, 24 epochs) | 100.0 % | 33.3 % | 93.1 % |
Perplexity dataset: WikiText2
| Model | Response mean | Response tokens/sec | Training mean | Perplexity | Response allocated | Estimated resident model memory |
|---|---|---|---|---|---|---|
| bitnet-b1.58-sharp | 82.86 ms | 181.03 | 967.7 ms | 72.62 | 20.57 MB | 4.09 MB |
| traditional-local | 104.71 ms | 143.25 | 110.6 ms | 3218.57 | 1.82 MB | 0.11 MB |
| Delta | Value |
|---|---|
| BitNet speedup vs traditional | 1.26x |
| BitNet memory reduction vs traditional | -1030.22% |
| BitNet resident model memory increase vs traditional | 3543.08% |
| BitNet quality improvement vs traditional | 97.74% |
| Model | Value | Chart |
|---|---|---|
| bitnet-b1.58-sharp | 181.03 | |
| traditional-local | 143.25 |
| Model | Value | Chart |
|---|---|---|
| bitnet-b1.58-sharp | 20.57 MB | |
| traditional-local | 1.82 MB |
| Model | Value | Chart |
|---|---|---|
| bitnet-b1.58-sharp | 4.09 MB | |
| traditional-local | 0.11 MB |
| Model | Value | Chart |
|---|---|---|
| bitnet-b1.58-sharp | 72.62 | |
| traditional-local | 3218.57 |
| Model | Prompt | Expected-token recall | Exact match | Response |
|---|---|---|---|---|
| bitnet-b1.58-sharp | which model anchors this benchmark | 90.9 % | No | tinyllama- 1. 1 b anchors the shared benchmark training slice for both local models. |
| bitnet-b1.58-sharp | how do I compare perplexity | 81.8 % | No | use the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training. |
| bitnet-b1.58-sharp | what does the paper model train on | 93.3 % | No | the paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice. |
| bitnet-b1.58-sharp | what does the traditional model train on | 92.3 % | No | the traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice. |
| bitnet-b1.58-sharp | how are you hosted | 100.0 % | Yes | both benchmark models stay in process with microsoft agent framework hosting and local diagnostics. |
| bitnet-b1.58-sharp | what language do you use | 100.0 % | Yes | benchmark prompts and diagnostics stay in clear american english. |
| traditional-local | which model anchors this benchmark | 90.9 % | No | tinyllama- 1. 1 b anchors the shared benchmark training slice for both local models. |
| traditional-local | how do I compare perplexity | 81.8 % | No | use the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training. |
| traditional-local | what does the paper model train on | 93.3 % | No | the paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice. |
| traditional-local | what does the traditional model train on | 92.3 % | No | the traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice. |
| traditional-local | how are you hosted | 100.0 % | Yes | both benchmark models stay in process with microsoft agent framework hosting and local diagnostics. |
| traditional-local | what language do you use | 100.0 % | Yes | benchmark prompts and diagnostics stay in clear american english. |
| Model | Passed | Pending | Failed |
|---|---|---|---|
| bitnet-b1.58-sharp | 10 | 0 | 1 |
| Model | Area | Status | Requirement | Details |
|---|---|---|---|---|
| bitnet-b1.58-sharp | Architecture | Passed | Decoder-only transformer topology matches the paper-aligned BitNet surface. | Layers=4/4, BitLinear projections=29/29, output head=256->68. |
| bitnet-b1.58-sharp | Architecture | Passed | BitLinear projections stay ternary, bias-free, and use signed 8-bit activation quantization. | Bias-free projections=True, activation quantization=±127 (8-bit), ternary weights=1579046/1052835/1579831, empirical entropy=1.561 bits/weight, theoretical ternary limit=1.585. |
| bitnet-b1.58-sharp | Architecture | Passed | RMSNorm layers stay bias-free and use the paper epsilon. | Norm count=9, epsilon=0.00001, learnable scale only=True. |
| bitnet-b1.58-sharp | Architecture | Passed | Attention uses Q/K/V/O BitLinear projections, RoPE on Q/K, and causal masking. | Attention layers=4, head count=8, head dimension=32, scaled-dot-product factor=0.1768. |
| bitnet-b1.58-sharp | Architecture | Passed | Feed-forward blocks use paper-style SwiGLU gate/up/down BitLinear projections. | Feed-forward layers=4, hidden dimension=1024, SwiGLU activation=True. |
| bitnet-b1.58-sharp | Architecture | Passed | Seeded inference is deterministic for repeated prompts. | Prompt='how are you hosted', first response='both benchmark models stay', second response='both benchmark models stay'. |
| bitnet-b1.58-sharp | Memory | Failed | BitNet resident parameter storage exceeds the traditional comparison model; investigate weight or embedding configuration. | BitNet resident parameters=4.09 MB versus traditional-local=115.02 KB (36.43x). The 29 BitLinear projections consume 4.02 MB storing only ternary sbyte weights plus a single float32 gamma scalar per layer (~8 bits/weight before any sparse packing). Token embeddings add 68 KB and RMSNorm scales add 9 KB. |
| bitnet-b1.58-sharp | Runtime | Passed | Paper-model fine-tuning is available from the supported runtime surface. | Validated cloned-model training on 6 default examples for 3 epochs; average loss=4.72. |
| bitnet-b1.58-sharp | Benchmark pipeline | Passed | Perplexity measurements are implemented and reported for named benchmark fixture slices. | WikiText2=72.62 ppl (2 samples), C4=74.03 ppl (2 samples), RedPajama=68.82 ppl (2 samples) |
| bitnet-b1.58-sharp | Benchmark pipeline | Passed | Zero-shot benchmark fixtures are implemented and reported. | ARC-Easy=0/1 (0 %), HellaSwag=0/1 (0 %), WinoGrande=0/1 (0 %), PIQA=0/1 (0 %), StoryCloze=0/1 (0 %) |
| bitnet-b1.58-sharp | Runtime | Passed | Repository checkpoint export/import round-trips through the paper model. | Prompt='how are you hosted', original='both benchmark models stay', reloaded='both benchmark models stay'. |
| Operation | Model | Mean | StdDev | Allocated | Reports |
|---|---|---|---|---|---|
| SpecFlow: Build the agent host for the selected model | bitnet-b1.58-sharp | 84.463 ms | 0.7225 ms | 21057.15 KB | HTML · CSV · Markdown |
| SpecFlow: Build the agent host for the selected model | traditional-local | 2.498 ms | 0.0305 ms | 419.06 KB | HTML · CSV · Markdown |
| SpecFlow: Generate a response for a prompt | bitnet-b1.58-sharp | 82.86 ms | 0.468 ms | 20.57 MB | HTML · CSV · Markdown |
| SpecFlow: Generate a response for a prompt | traditional-local | 104.71 ms | 0.775 ms | 1.82 MB | HTML · CSV · Markdown |
| SpecFlow: Stream a response for a prompt | bitnet-b1.58-sharp | 82.92 ms | 0.404 ms | 20.58 MB | HTML · CSV · Markdown |
| SpecFlow: Stream a response for a prompt | traditional-local | 110.32 ms | 1.155 ms | 1.83 MB | HTML · CSV · Markdown |
| SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark dataset | bitnet-b1.58-sharp | 967.7 ms | 2.70 ms | 37.58 MB | HTML · CSV · Markdown |
| SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark dataset | traditional-local | 110.6 ms | 1.91 ms | 1.72 MB | HTML · CSV · Markdown |
Download the Markdown report · Download the JSON report · Download the stamped JSON report