d

  • mercano@lemmy.world
    link
    fedilink
    arrow-up
    14
    arrow-down
    1
    ·
    4 days ago

    AI doesn’t see a word as a sequence of letters, they just see it as a pointer to an entry in Words table.

      • Lambda@lemmy.ca
        link
        fedilink
        arrow-up
        12
        ·
        4 days ago

        Yeah, if words were actually encoded as 1-hot vectors this would be pretty trivial, but the rest of LLM training would be somewhere between infeasible and impossible. The actual embedding vectors obscure spelling even more.

        Side note: last time I checked, current embedding vectors were approximately 40 dimensional… Has that gone up significantly in the last couple of years?

    • chicken@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      3
      ·
      4 days ago

      Shouldn’t it help that it separated them out with underlines? How does this text break down in terms of tokens?