cm0002@literature.cafe to AI - Artificial intelligence@programming.devEnglish · 1 month agoA SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformersgithub.comexternal-linkmessage-square0linkfedilinkarrow-up12arrow-down12cross-posted to: technology@lemmy.mltechnology@lemmygrad.mlhackernews@lemmy.bestiver.selocalllama@sh.itjust.works
arrow-up10arrow-down1external-linkA SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformersgithub.comcm0002@literature.cafe to AI - Artificial intelligence@programming.devEnglish · 1 month agomessage-square0linkfedilinkcross-posted to: technology@lemmy.mltechnology@lemmygrad.mlhackernews@lemmy.bestiver.selocalllama@sh.itjust.works