Cache Memory Optimization

Vietnam Investment Review on MSN

Dnotitia's STAR KV cuts KV cache by up to 20x earns ICML 2026 spotlight selection

SEOUL, South Korea, July 2, 2026 /PRNewswire/ -- Dnotitia Inc. (Dnotitia), a company specializing in long-term memory AI and semiconductor-based AI infrastructure technologies, has released the paper ...

The Manila Times

Dnotitia Unveils STAR-KV, Achieving UP to 20x KV Cache Compression, Selected as an ICML 2026 Spotlight Paper

Introduces a low-rank-based approach to KV cache compression, one of the key bottlenecks in long-context AISpeeds up attention computation by up to 6.9x and overall generation throughput by up to 3.1x ...

14h

OpenAI engineers cut ChatGPT guest traffic to a few hundred Nvidia GPUs, with no new hardware deployed.

OpenAI inference cost reduction cut ChatGPT guest traffic from tens of thousands of Nvidia GPUs to just a couple hundred, using software optimization alone. Engineers achieved more than 50% savings ...

Tech Times

OpenAI Halves Inference Costs With Software Alone: GPUs Drop to Hundreds

OpenAI inference cost reduction cut ChatGPT guest traffic from tens of thousands of Nvidia GPUs to just a couple hundred, ...

Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that's been leading OpenRouter — trained entirely on Chinese chips

By registering the LongCat-2.0 repository under the open-source MIT License, Meituan positions the architecture with maximum ...

AI agents need context everywhere they run, even where the cloud can't follow

Couchbase AI Data Plane combines persistent agent memory, vector search and an enterprise MCP server that runs on-device when ...

13hon MSN

The only AI glossary you’ll need this year

The rise of AI has brought an avalanche of new terms and slang. Here is a glossary with definitions of some of the most ...

DIGITIMES

AIC Collaborates with NVIDIA, VAST Data for Next-Gen AI Storage

On the opening day of COMPUTEX 2026, AIC Inc. hosted a high-level strategic panel session at its booth, focusing on ...

Semiconductor Engineering

Blog Review: July 1

Ethernet auto-negotiation; multiphysics to avoid overdesign; PCB design reuse; mobile LLM quantization; modeling BSPDNs.

Communications of the ACM

But Nothing Has Changed on Our Side!

Each team representative takes their turn declaring that “nothing has changed on our side,” somewhat suggesting that the ...

How does an On-device AI work?

Curious about the working of an on-device AI? Here is how an on-device AI works and what you can take from it for yourself.

Tech Times

DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

DeepSeek speculative decoding framework DSpark went live June 27 on V4-Flash and V4-Pro, reporting up to 85 percent faster ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results