Company accidentally spent $500 million on Claude AI in one month after forgetting usage limits

codeinabox@programming.dev · 2 days ago

Company accidentally spent $500 million on Claude AI in one month after forgetting usage limits

mindbleach@sh.itjust.works · 1 day ago

Distillation works better than quantization, to the point Qwen recently out-benchmarked its 397B model with a 27B model, two months apart. Arguably the only reason to train comically large models is that this is a decent strategy for finding very small models.