AI & Machine LearningLast updated: February 17, 2026

Speculative Decoding

Speculative Decoding is a speed trick where a small, fast model drafts several tokens ahead, then the large model checks them all at once. It's like having an intern write a rough draft that the expert just approves or corrects. Can make inference 2-3x faster.

Definition

A speed trick where a small, fast model drafts several tokens ahead, then the large model checks them all at once. It's like having an intern write a rough draft that the expert just approves or corrects. Can make inference 2-3x faster.

Related Terms

Inference Model Distillation Tokens

Speculative Decoding

Definition

Related Terms

More AI & Machine Learning Terms

LLM (Large Language Model)

RAG (Retrieval Augmented Generation)

Embeddings

Fine-tuning

Prompt Engineering

Speculative Decoding

Definition

Related Terms

More AI & Machine Learning Terms

LLM (Large Language Model)

RAG (Retrieval Augmented Generation)

Embeddings

Fine-tuning

Prompt Engineering