Hacker News — vinext + Cloudflare Workers

NHacker Next

new
past
show
ask
show
jobs
submit

▲KVarN: Native vLLM backend for KV-cache quantization by Huawei (github.com)

66 points by theanonymousone 3 hours ago | 7 comments

throwa356262 2 hours ago [-]

Better performance than TQ and better quality than FP16?

Am I reading this right??

qeternity 1 hours ago [-]

It's not better quality: 59.3% vs 59.4% fp16 on AIME 25

thefox96 1 hours ago [-]

Faster than Fp16, not better quality i guess

pbich 1 hours ago [-]

[dead]

v3ss0n 2 hours ago [-]

Why this is not a PR for vLLM ?

esafak 2 hours ago [-]

It's the output of a research paper; the authors are not trying to build up vLLM, and they probably have no incentive to do so. You can submit a PR, though! It's easier now while the divergence is low, so don't wait. Since there are six authors, I bet you could get help with the inevitable review chores if you just take the step of creating the PR.

edit: It might not be clear that it is based on vLLM 0.22, which is the current version: https://github.com/huawei-csl/KVarN/commit/d6290e99098d7426d.... All you have to do is create a diff off it; it's fairly straightforward.

jmalicki 2 hours ago [-]

And with the help of AI, pointing at AI at this paper and saying "making a vLLM PR from this paper" tends to work surprisingly well, even if you need to nudge it a little bit along the way.

thefox96 42 minutes ago [-]

it should be easy to do btw

shockembopper 53 minutes ago [-]

[dead]

Rendered at 18:10:34 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.