CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers

github.com

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers

github.com

cm0002@literature.cafe to

AI - Artificial intelligence@programming.devEnglish · 1 month ago

GitHub - intel/auto-round: A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

github.com

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers....

You must log in or # to comment.

Chat

AI - Artificial intelligence@programming.dev

Aii@programming.dev

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !Aii@programming.dev

AI related news and articles.

Rules:

No Videos.
No self promotion: Don’t post links to your articles.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

5 users / day
121 users / week
264 users / month
493 users / 6 months
8 local subscribers
280 subscribers
195 Posts
107 Comments
Modlog