BitNet benchmark comparison report

Generated 2026-03-21T17:11:28.8457513+00:00

Shared integration inputs

Efficacy and accuracy summary

ModelTrainingEfficacyExact-match accuracyExpected-token recall
bitnet-b1.58-sharpCompleted (6 examples, 3 epochs)100.0 %33.3 %93.1 %
traditional-localCompleted (6 examples, 24 epochs)100.0 %33.3 %93.1 %

BitNet vs traditional comparison summary

Perplexity dataset: WikiText2

ModelResponse meanResponse tokens/secTraining meanPerplexityResponse allocatedEstimated resident model memory
bitnet-b1.58-sharp82.86 ms181.03967.7 ms72.6220.57 MB4.09 MB
traditional-local104.71 ms143.25110.6 ms3218.571.82 MB0.11 MB
DeltaValue
BitNet speedup vs traditional1.26x
BitNet memory reduction vs traditional-1030.22%
BitNet resident model memory increase vs traditional3543.08%
BitNet quality improvement vs traditional97.74%

Comparison charts

Response tokens/sec

ModelValueChart
bitnet-b1.58-sharp181.03
traditional-local143.25

Response allocated (MB)

ModelValueChart
bitnet-b1.58-sharp20.57 MB
traditional-local1.82 MB

Estimated resident model memory (MB)

ModelValueChart
bitnet-b1.58-sharp4.09 MB
traditional-local0.11 MB

Perplexity

ModelValueChart
bitnet-b1.58-sharp72.62
traditional-local3218.57

Query script results

ModelPromptExpected-token recallExact matchResponse
bitnet-b1.58-sharpwhich model anchors this benchmark90.9 %Notinyllama- 1. 1 b anchors the shared benchmark training slice for both local models.
bitnet-b1.58-sharphow do I compare perplexity81.8 %Nouse the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training.
bitnet-b1.58-sharpwhat does the paper model train on93.3 %Nothe paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice.
bitnet-b1.58-sharpwhat does the traditional model train on92.3 %Nothe traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice.
bitnet-b1.58-sharphow are you hosted100.0 %Yesboth benchmark models stay in process with microsoft agent framework hosting and local diagnostics.
bitnet-b1.58-sharpwhat language do you use100.0 %Yesbenchmark prompts and diagnostics stay in clear american english.
traditional-localwhich model anchors this benchmark90.9 %Notinyllama- 1. 1 b anchors the shared benchmark training slice for both local models.
traditional-localhow do I compare perplexity81.8 %Nouse the benchmark report to compare wikitext 2 perplexity after tinyllama- 1. 1 b training.
traditional-localwhat does the paper model train on93.3 %Nothe paper aligned bitnet model fine tunes ternary output weights on the tinyllama- 1. 1 b benchmark slice.
traditional-localwhat does the traditional model train on92.3 %Nothe traditional local model optimizes tensor softmax logits on the same tinyllama- 1. 1 b slice.
traditional-localhow are you hosted100.0 %Yesboth benchmark models stay in process with microsoft agent framework hosting and local diagnostics.
traditional-localwhat language do you use100.0 %Yesbenchmark prompts and diagnostics stay in clear american english.

Paper-alignment audit

ModelPassedPendingFailed
bitnet-b1.58-sharp1001
ModelAreaStatusRequirementDetails
bitnet-b1.58-sharpArchitecturePassedDecoder-only transformer topology matches the paper-aligned BitNet surface.Layers=4/4, BitLinear projections=29/29, output head=256->68.
bitnet-b1.58-sharpArchitecturePassedBitLinear projections stay ternary, bias-free, and use signed 8-bit activation quantization.Bias-free projections=True, activation quantization=±127 (8-bit), ternary weights=1579046/1052835/1579831, empirical entropy=1.561 bits/weight, theoretical ternary limit=1.585.
bitnet-b1.58-sharpArchitecturePassedRMSNorm layers stay bias-free and use the paper epsilon.Norm count=9, epsilon=0.00001, learnable scale only=True.
bitnet-b1.58-sharpArchitecturePassedAttention uses Q/K/V/O BitLinear projections, RoPE on Q/K, and causal masking.Attention layers=4, head count=8, head dimension=32, scaled-dot-product factor=0.1768.
bitnet-b1.58-sharpArchitecturePassedFeed-forward blocks use paper-style SwiGLU gate/up/down BitLinear projections.Feed-forward layers=4, hidden dimension=1024, SwiGLU activation=True.
bitnet-b1.58-sharpArchitecturePassedSeeded inference is deterministic for repeated prompts.Prompt='how are you hosted', first response='both benchmark models stay', second response='both benchmark models stay'.
bitnet-b1.58-sharpMemoryFailedBitNet resident parameter storage exceeds the traditional comparison model; investigate weight or embedding configuration.BitNet resident parameters=4.09 MB versus traditional-local=115.02 KB (36.43x). The 29 BitLinear projections consume 4.02 MB storing only ternary sbyte weights plus a single float32 gamma scalar per layer (~8 bits/weight before any sparse packing). Token embeddings add 68 KB and RMSNorm scales add 9 KB.
bitnet-b1.58-sharpRuntimePassedPaper-model fine-tuning is available from the supported runtime surface.Validated cloned-model training on 6 default examples for 3 epochs; average loss=4.72.
bitnet-b1.58-sharpBenchmark pipelinePassedPerplexity measurements are implemented and reported for named benchmark fixture slices.WikiText2=72.62 ppl (2 samples), C4=74.03 ppl (2 samples), RedPajama=68.82 ppl (2 samples)
bitnet-b1.58-sharpBenchmark pipelinePassedZero-shot benchmark fixtures are implemented and reported.ARC-Easy=0/1 (0 %), HellaSwag=0/1 (0 %), WinoGrande=0/1 (0 %), PIQA=0/1 (0 %), StoryCloze=0/1 (0 %)
bitnet-b1.58-sharpRuntimePassedRepository checkpoint export/import round-trips through the paper model.Prompt='how are you hosted', original='both benchmark models stay', reloaded='both benchmark models stay'.

BenchmarkDotNet performance summary

OperationModelMeanStdDevAllocatedReports
SpecFlow: Build the agent host for the selected modelbitnet-b1.58-sharp84.463 ms0.7225 ms21057.15 KBHTML · CSV · Markdown
SpecFlow: Build the agent host for the selected modeltraditional-local2.498 ms0.0305 ms419.06 KBHTML · CSV · Markdown
SpecFlow: Generate a response for a promptbitnet-b1.58-sharp82.86 ms0.468 ms20.57 MBHTML · CSV · Markdown
SpecFlow: Generate a response for a prompttraditional-local104.71 ms0.775 ms1.82 MBHTML · CSV · Markdown
SpecFlow: Stream a response for a promptbitnet-b1.58-sharp82.92 ms0.404 ms20.58 MBHTML · CSV · Markdown
SpecFlow: Stream a response for a prompttraditional-local110.32 ms1.155 ms1.83 MBHTML · CSV · Markdown
SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark datasetbitnet-b1.58-sharp967.7 ms2.70 ms37.58 MBHTML · CSV · Markdown
SpecFlow: Train the selected model on the TinyLlama-1.1B benchmark datasettraditional-local110.6 ms1.91 ms1.72 MBHTML · CSV · Markdown

Download the Markdown report · Download the JSON report · Download the stamped JSON report