Autoregressive Decoder

DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

DeepSeek speculative decoding framework DSpark went live June 27 on V4-Flash and V4-Pro, reporting up to 85 percent faster ...

Developer Tech

NVIDIA: DFlash block diffusion accelerates autoregressive LLMs

Deploying DFlash block diffusion on NVIDIA hardware accelerates autoregressive LLMs during latency-sensitive inference.

VentureBeat

OpenAI launches Privacy Filter, an open source, on-device data sanitization model that removes personal information from enterprise datasets

Credit: VentureBeat made with OpenAI ChatGPT-Images-2.0 In a significant shift toward local-first privacy infrastructure, OpenAI has released Privacy Filter, a ...

Seeking Alpha

Read This Before Nvidia GTC 2026: Agentic AI And LPU

NVIDIA Corporation CEO Jensen Huang has been deliberately emphasizing the agentic AI inflection in his recent commentary, which likely sets the tone for upcoming GTC 2026 revelations. A Groq-based LPU ...

winbuzzer.com

AI: Memory Bottleneck Emerges as Main LLM Inference Challenge

While chip makers race to build faster GPUs, Google researchers revealed January 8 that memory and interconnect are the real bottlenecks holding back large language model performance. A new research ...

Semiconductor Engineering

Four Architectural Opportunities for LLM Inference Hardware (Google)

“Large Language Model (LLM) inference is hard. The autoregressive Decode phase of the underlying Transformer model makes LLM inference fundamentally different from training. Exacerbated by recent AI ...

Semiconductor Engineering

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)

A new technical paper titled “Hardware-Centric Analysis of DeepSeek’s Multi-Head Latent Attention” was published by researchers at KU Leuven. “Multi-Head Latent Attention (MLA), introduced in DeepSeek ...

GitHub

TDT vs Transformer Decoder for Autoregressive ASR

Hello, I just read the TDT paper and I was wondering, in what ways is it superior to a transformer decoder and in what ways it isn't? from my understanding, it's less computationally intensive that a ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results