☆ Yσɠƚԋσʂ ☆@lemmy.ml to Technology@lemmy.mlEnglish · 29 days agoA SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformersgithub.comexternal-linkmessage-square0linkfedilinkarrow-up15arrow-down11cross-posted to: Aii@programming.devtechnology@lemmygrad.mlhackernews@lemmy.bestiver.selocalllama@sh.itjust.works
arrow-up14arrow-down1external-linkA SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformersgithub.com☆ Yσɠƚԋσʂ ☆@lemmy.ml to Technology@lemmy.mlEnglish · 29 days agomessage-square0linkfedilinkcross-posted to: Aii@programming.devtechnology@lemmygrad.mlhackernews@lemmy.bestiver.selocalllama@sh.itjust.works