• mindbleach@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    4
    ·
    4 days ago

    These models run on normal computers, and they are giving them away.

    Does your company not have computers?

    • dan@upvote.au
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 days ago

      Not really. The state of the art models are huge, even the open-weight ones. You really don’t want to quantize below 4-bit, and even that’s a bit of a stretch… Ideally you’d use at least 8-bit to get good results with these models when used for coding.

      GLM-5.1 needs around 400GB VRAM at 4-bit quantization. Apple aren’t making the Mac Studio with 512GB unified RAM any more, so you’d need something like 5 x Nvidia A100 80GB to run a model like this.

      Kimi K2.6 is around the same size.

      • mindbleach@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 days ago

        Distillation works better than quantization, to the point Qwen recently out-benchmarked its 397B model with a 27B model, two months apart. Arguably the only reason to train comically large models is that this is a decent strategy for finding very small models.