• someguy3@lemmy.world
    link
    fedilink
    arrow-up
    9
    arrow-down
    4
    ·
    21 days ago

    Whether cloud or local, it takes CPU/GPU use. That’s what takes power. It’s not magically less because it’s on a personal PC rather than a data center.

    • turdas@suppo.fi
      link
      fedilink
      arrow-up
      6
      arrow-down
      2
      ·
      21 days ago

      Yes it is. Small models like this are on the order of 100x more efficient than the big models backing ChatGPT or Gemini proper.

      • someguy3@lemmy.world
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        edit-2
        21 days ago

        Press X to doubt.

        But in any case allow me to amend my statement:

        Whether cloud or local, it takes CPU/GPU use. It’s not magically less free because it’s on a personal PC rather than a data center.

        That’s still what takes power. This is AI use that’s not needed. And multiply by hundreds of millions of devices, it’s a shit ton of energy.

        • turdas@suppo.fi
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          edit-2
          21 days ago

          And multiply by hundreds of millions of devices, it’s a shit ton of energy.

          No it’s not. You clearly have zero perspective on energy consumption.

          The power draw on a phone with an NPU (where Gemini Nano is mostly used) is comparable to watching a video on your phone, maybe a couple of watts. On devices without NPUs (e.g. PCs) it will be more, but not dramatically so. The power use of this is absolutely zilch in the grand scheme of things.

          To be extremely generous, let’s say the average power draw is 50 watts, and that the model generates on average 10 tok/s, and that the average user has it generate 500 tokens per day (about 400 words). That’s 50 seconds of 50 watts for every user, and let’s say this is done by a billion users. This is a very generous estimate: in reality the average power draw is lower, the average tokens generated is likely lower (the intended use is generating short snippets like, say, email titles based on the email’s content), and this definitely won’t be used by a billion people.

          WolframAlpha tells us that this takes 694 MWh of power, and helpfully mentions that this is 74% the fuel energy of an Airbus A330-300, and indeed this energy use is roughly in the ballpark of one transatlantic flight. There’s about 500 transatlantic flights every day. Two offshore wind turbines will generate this much power on a windy day.

          In all likelihood an order of magnitude more energy is spent every day watching short form videos. I’m not going to do the napkin math on that though.

          edit: in reality, local models like this will likely reduce net power consumption as fewer API calls are made to cloud LLMs, which are both less power efficient and have overhead from the whole internet thing.

    • Zetta@mander.xyz
      link
      fedilink
      arrow-up
      2
      ·
      21 days ago

      Like the other guy said, it is magically more efficient because it’s magically significantly smaller. This model is likely a few billion parameters and frontier models are in the 1 - 3+ trillion parameter range.

      Yeah, people’s mobile phones that run this model might die slightly faster, but playing a mobile game or doing any type of hardware intense process will kill your battery faster. It’s no different.