BitNet benchmark comparison report

Generated 2026-03-21T17:11:28.8457513+00:00

Shared integration inputs

Training set: TinyLlama-1.1B
Query script:
- which model anchors this benchmark
- how do I compare perplexity
- what does the paper model train on
- what does the traditional model train on
- how are you hosted
- what language do you use

Efficacy and accuracy summary

Model	Training	Efficacy	Exact-match accuracy	Expected-token recall
bitnet-b1.58-sharp	Completed (6 examples, 3 epochs)	100.0 %	33.3 %	93.1 %
traditional-local	Completed (6 examples, 24 epochs)	100.0 %	33.3 %	93.1 %

BitNet vs traditional comparison summary

Perplexity dataset: WikiText2

Model	Response mean	Response tokens/sec	Training mean	Perplexity	Response allocated	Estimated resident model memory
bitnet-b1.58-sharp	82.86 ms	181.03	967.7 ms	72.62	20.57 MB	4.09 MB
traditional-local	104.71 ms	143.25	110.6 ms	3218.57	1.82 MB	0.11 MB

Delta	Value
BitNet speedup vs traditional	1.26x
BitNet memory reduction vs traditional	-1030.22%
BitNet resident model memory increase vs traditional	3543.08%
BitNet quality improvement vs traditional	97.74%

Comparison charts

Response tokens/sec

Model	Value	Chart
bitnet-b1.58-sharp	181.03
traditional-local	143.25

Response allocated (MB)

Model	Value	Chart
bitnet-b1.58-sharp	20.57 MB
traditional-local	1.82 MB

Estimated resident model memory (MB)

Model	Value	Chart
bitnet-b1.58-sharp	4.09 MB
traditional-local	0.11 MB

Perplexity

Model	Value	Chart
bitnet-b1.58-sharp	72.62
traditional-local	3218.57

Query script results

Model	Prompt	Expected-token recall	Exact match	Response
bitnet-b1.58-sharp	`which model anchors this benchmark`	90.9 %	No	tinyllama- 1. 1 b anchors the shared benchmark training slice for both local models.
bitnet-b1.58-sharp	`how do I compare perplexity`	81.8 %	No	use the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training.
bitnet-b1.58-sharp	`what does the paper model train on`	93.3 %	No	the paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice.
bitnet-b1.58-sharp	`what does the traditional model train on`	92.3 %	No	the traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice.
bitnet-b1.58-sharp	`how are you hosted`	100.0 %	Yes	both benchmark models stay in process with microsoft agent framework hosting and local diagnostics.
bitnet-b1.58-sharp	`what language do you use`	100.0 %	Yes	benchmark prompts and diagnostics stay in clear american english.
traditional-local	`which model anchors this benchmark`	90.9 %	No	tinyllama- 1. 1 b anchors the shared benchmark training slice for both local models.
traditional-local	`how do I compare perplexity`	81.8 %	No	use the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training.
traditional-local	`what does the paper model train on`	93.3 %	No	the paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice.
traditional-local	`what does the traditional model train on`	92.3 %	No	the traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice.
traditional-local	`how are you hosted`	100.0 %	Yes	both benchmark models stay in process with microsoft agent framework hosting and local diagnostics.
traditional-local	`what language do you use`	100.0 %	Yes	benchmark prompts and diagnostics stay in clear american english.

Paper-alignment audit

Model	Passed	Pending	Failed
bitnet-b1.58-sharp	10	0	1

Model	Area	Status	Requirement	Details
bitnet-b1.58-sharp	Architecture	Passed	Decoder-only transformer topology matches the paper-aligned BitNet surface.	Layers=4/4, BitLinear projections=29/29, output head=256->68.
bitnet-b1.58-sharp	Architecture	Passed	BitLinear projections stay ternary, bias-free, and use signed 8-bit activation quantization.	Bias-free projections=True, activation quantization=±127 (8-bit), ternary weights=1579046/1052835/1579831, empirical entropy=1.561 bits/weight, theoretical ternary limit=1.585.
bitnet-b1.58-sharp	Architecture	Passed	RMSNorm layers stay bias-free and use the paper epsilon.	Norm count=9, epsilon=0.00001, learnable scale only=True.
bitnet-b1.58-sharp	Architecture	Passed	Attention uses Q/K/V/O BitLinear projections, RoPE on Q/K, and causal masking.	Attention layers=4, head count=8, head dimension=32, scaled-dot-product factor=0.1768.
bitnet-b1.58-sharp	Architecture	Passed	Feed-forward blocks use paper-style SwiGLU gate/up/down BitLinear projections.	Feed-forward layers=4, hidden dimension=1024, SwiGLU activation=True.
bitnet-b1.58-sharp	Architecture	Passed	Seeded inference is deterministic for repeated prompts.	Prompt='how are you hosted', first response='both benchmark models stay', second response='both benchmark models stay'.
bitnet-b1.58-sharp	Memory	Failed	BitNet resident parameter storage exceeds the traditional comparison model; investigate weight or embedding configuration.	BitNet resident parameters=4.09 MB versus traditional-local=115.02 KB (36.43x). The 29 BitLinear projections consume 4.02 MB storing only ternary sbyte weights plus a single float32 gamma scalar per layer (~8 bits/weight before any sparse packing). Token embeddings add 68 KB and RMSNorm scales add 9 KB.
bitnet-b1.58-sharp	Runtime	Passed	Paper-model fine-tuning is available from the supported runtime surface.	Validated cloned-model training on 6 default examples for 3 epochs; average loss=4.72.
bitnet-b1.58-sharp	Benchmark pipeline	Passed	Perplexity measurements are implemented and reported for named benchmark fixture slices.	WikiText2=72.62 ppl (2 samples), C4=74.03 ppl (2 samples), RedPajama=68.82 ppl (2 samples)
bitnet-b1.58-sharp	Benchmark pipeline	Passed	Zero-shot benchmark fixtures are implemented and reported.	ARC-Easy=0/1 (0 %), HellaSwag=0/1 (0 %), WinoGrande=0/1 (0 %), PIQA=0/1 (0 %), StoryCloze=0/1 (0 %)
bitnet-b1.58-sharp	Runtime	Passed	Repository checkpoint export/import round-trips through the paper model.	Prompt='how are you hosted', original='both benchmark models stay', reloaded='both benchmark models stay'.

BenchmarkDotNet performance summary

Operation	Model	Mean	StdDev	Allocated	Reports
SpecFlow: Build the agent host for the selected model	bitnet-b1.58-sharp	84.463 ms	0.7225 ms	21057.15 KB	HTML · CSV · Markdown
SpecFlow: Build the agent host for the selected model	traditional-local	2.498 ms	0.0305 ms	419.06 KB	HTML · CSV · Markdown
SpecFlow: Generate a response for a prompt	bitnet-b1.58-sharp	82.86 ms	0.468 ms	20.57 MB	HTML · CSV · Markdown
SpecFlow: Generate a response for a prompt	traditional-local	104.71 ms	0.775 ms	1.82 MB	HTML · CSV · Markdown
SpecFlow: Stream a response for a prompt	bitnet-b1.58-sharp	82.92 ms	0.404 ms	20.58 MB	HTML · CSV · Markdown
SpecFlow: Stream a response for a prompt	traditional-local	110.32 ms	1.155 ms	1.83 MB	HTML · CSV · Markdown
SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark dataset	bitnet-b1.58-sharp	967.7 ms	2.70 ms	37.58 MB	HTML · CSV · Markdown
SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark dataset	traditional-local	110.6 ms	1.91 ms	1.72 MB	HTML · CSV · Markdown

Download the Markdown report · Download the JSON report · Download the stamped JSON report