No, you don’t trust the output. You shouldn’t trust the output of search either. This is just search with summarization.
That’s why there are linked sources so that you can verify yourself. The person’s contention was that you can’t trust citations because they can be hallucinated. That’s not how these systems work, the citations are not handled by LLMs at all except as references, the actual source list is entirely a regular search program.
The LLM’s summarization and sources are like the Google Results page, they’re not information that you should trust by themselves they are simply a link to take you to information that’s responsive to your search. The LLM provides a high level summary so you can make a more informed decision about which sources to look at.
Anyone treating LLMs like they’re reliable is asking for trouble, just like anyone who believes everything they read on Facebook or cite Wikipedia directly.
Search didn’t used to give “output”. It used to give links to a wide variety of sources such as detailed and exact official documentation. There was nothing to “trust”.
Now it’s all slop bullshit that needs to be double checked, a process that frankly takes just as long as finding the information youself using the old system, and even that still can’t be trusted in case it missed something.
Search didn’t used to give “output”. It used to give links to a wide variety of sources such as detailed and exact official documentation. There was nothing to “trust”.
If you search on Google, the results are an output. There’s nothing AI about the term output.
You get the same output here and, as you can see, the sources are just as easily accessible as a Google search and are handled by non-LLM systems so they cannot be hallucinations.
The topic here is about hallucinating sources, my entire position is that this doesn’t happen unless you’re intentionally using LLMs for things that they are not good at. You can see that systems like this do not use the LLM to handle source retrieval or citation.
Now it’s all slop bullshit that needs to be double checked, a process that frankly takes just as long as finding the information youself using the old system, and even that still can’t be trusted in case it missed something.
This is true of Google too, if you’re operating on the premise that you can trust Google’s search results then you should know about Search Engine Optimization (https://en.wikipedia.org/wiki/Search_engine_optimization), an entire industry that exists specifically to manipulate Google’s search results. If you trust Google more than AI systems built on search then you’re just committing the same error.
Yes, you shouldn’t trust things you read on the Internet until you’ve confirmed them from primary sources. This is true of Google searches or AI summarized results of Google searches.
I’m not saying that you should cite LLM output as facts, I’m saying that the argument that ‘AIs hallucinate sources’ isn’t true of these systems which are designed to not allow LLMs to be in the workflow that retrieves and cites data.
It’s like complaining that live ducks make poor pool toys… if you’re using them for that, the problem isn’t the ducks it’s the person who has no idea what they’re doing.
so I fail to see why I should be using an LLM at all then. If I am going to the webpages anyway, why shouldn’t I just use startpage/searx/yacy/whatever?
Yeah, if you already know where you’re going then sure, add it to Dashy or make a bookmark in your browser.
But, if you’re going to search for something anyway. Then why would you use regular search and skim the tiny amount of random text that gets returned with Google’s results? In the same amount of time, you could dump the entire contents of the pages into an LLM’s context window and have it tailor the response to your question based on the text.
You still have to actually click on some links to get to the real information, but a summary generated from the contents of the results is more likely to be relevant than the text presented in Google’s results page. In both cases you still have a list of links, generated by a search engine and not AI, which are responsive to your query.
see, the problem is that I am not going to be reading that text because I know it is unreliable and ai text makes my eyes glaze over, so I will be clicking on all those links until I find something that is reliable. On a search engine I can just click through every link or refine my search with something like site:reddit.com site:wikipedia.org or format:pdf or something similar. With a chatbot, I need to write out the entire question, look at the four or so links it provided and then reprompt it if it doesn’t contain what I’m looking for. I also get a limited amount of searches per day because I am not paying for a chatbot subscription. This is completely pointless to me.
I’m not sure what standards you’re saying unreliable.
You can see in the example that I provided it correctly answered the question and also correctly cited the place where the answer came from in the exact same amount of time as it would take to type the query into Google.
Yes, LLMs by themselves can hallucinate and do so at a high rate so that they’re unreliable sources of information. That is 100% true. It will never be fixed, because LLMs are trained to be an autocorrect and produce syntactically correct language. You should never depend on raw LLM generated text from an empty context, like from a chatbot.
The study of this in academia (example: https://arxiv.org/html/2312.10997v5) has found that LLMs hallucination rate can be dropped to almost nothing (less than a human) if given text containing the information that it is being asked about. So, if you paste a document into the chat and ask it a question about the document the hallucination rate drops significantly.
This finding created a technique called Retrieval Augmented Generation where you use some non-AI means of finding data, like a search engine, and then put the documents into the context window along with the question. This makes it so that you can create systems that use LLMs for the tasks that they’re accurate and fast at (like summarizing text that is in the context window) and non-AI tools to do things that require accuracy (like searching databases for facts and tracking citation).
You can see in the images I posted that it both answered the question and also correctly cited the source which was the entire point of contention.
you are linking to an arxiv preprint. I do not know these researchers. there is nothing that indicates to me that this source is any more credible than a blog post.
has found that LLM hallucination rate can be dropped to almost nothing
where? It doesn’t seem to be in this preprint, which is mostly a history of RAG and mentions hallucinations only as a problem affecting certain types of RAG more than other types. It makes some relative claims about accuracy that suggest including irrelevant data might make models more accurate. It doesn’t mention anything about “hallucination rate being dropped to almost nothing”.
(less than a human)
you know what has a 0% hallucination rate about the contents of a text? the text
You can see in the images I posted that it both answered the question and also correctly cited the source which was the entire point of contention.
this is anecdotal evidence, and also not the only point of contention. Another point was, for example, that ai text is horrible to read. I don’t think RAG(or any other tacked-on tool they’ve been trying for the past few years) fixes that.
you know what has a 0% hallucination rate about the contents of a text? the text
What text are you reading that has a 0% error rate? Google search results? Reddit posts? You seem to be comfortable with the idea that arxiv preprints can have an error rate that isn’t 0% so ‘the text’ isn’t guaranteed to have no errors.
Even assuming perfect text, your error rate in summarization isn’t 0% either. Do you not misread passages or misremember facts and have to search again or find that you need to edit a rough draft before you finish it? We deal with errors all of the time and they’re manageable as long as they’re low. The question isn’t ‘can we make a process that has a 0% error rate, that’s an impossible standard’ the question is if we can make a system that has an error rate that’s close to or lower than a person’s.
The reason why is because these systems scale in a way that you do not. Even if you have savant level reading, recall and summarization such that you would make Kim Peek envious, how many books worth of material can you read and summarize in 10 seconds? 1? 5?
Could you read and summarize 75 novels (10 million tokens) with a 0% error rate? I’d imagine not and you certainly couldn’t do it in 30 seconds. In fact, this would be an impossible task for you no matter how high of an error rate that we allowed. You simply cannot ingest data fast enough to even make a guess at what a summary would look like. Or, to be more accurate to the actual use case, could you read 75 novels and provide a page reference to all of the passages written in iambic pentameter? I can read the passages myself, I just need for you to find them and tell me the page. You’d probably take longer than 10 seconds and you would almost assuredly miss some.
Meanwhile an LLM could produce a summary, with citations generated and tracked by non-AI systems, with an error rate comparable to a human (assuming the human was given a few months to work on the problem) in seconds.
what text are you reading that has a 0% error rate?
as I said, the text has a 0% error rate about the contents of the text, which is what the LLM is summarising, and to which it adds it’s own error rate. Then you read that and add your error rate.
the question is can we make a system that has an error rate that is close to or lower than a person’s
can we???
could you read and summarize 75 novels with a 0% error rate?
why… would I want that? I read novels because I like reading novels? I also think that on summaries LLMs are especially bad, since there is no distinction between “important” and “unimportant” in the architecture. The point of a summary is to only get the important points, so it clashes.
provide a page reference to all of the passages written in iambic pentameter?
no LLM can do this. LLMs are notoriously bad at doing any analysis of this kind of style element because of their architecture. why would you pick this example
Meanwhile an LLM could produce a summary, with citations generated and tracked by non-AI systems, with an error rate comparable to a human (assuming the human was given a few months to work on the problem) in seconds.
I still have not seen any evidence for this, and it still does not adress the point that the summary would be pretty much unreadable
as I said, the text has a 0% error rate about the contents of the text, which is what the LLM is summarising, and to which it adds it’s own error rate. Then you read that and add your error rate.
Error rates that you simultaneously haven’t defined and also have declared as too high to be usable.
These tools clearly work, much like a search engine clearly works. They have errors (find me clean search results) but we use them.
You could make the same argument about search. If you issued a query to Google and compared the results generated by the machine learning systems and then had a human read the entire Internet specifically trying to answer your query you would probably find that in the end (after a few decades) the human results would probably be more responsive to your query and the Google results, once you get to page 3 or 4 start to become random nonsense.
By any measure the Google results are worse than what a human would choose. This is why you have to ‘learn’ to search and to issue queries in a specific way, because otherwise you get errors/bad results.
The problem with the accurate human results is that if you had all of the people on the planet working full-time 365 days a year could not service a single minute worth of the queries that the Google machine learning algorithms serve up 24/7.
Could you read 3 books and find the answer that you want? Or craft some regular expression search to find it? Sure, but you can’t do it faster than it takes to run a RAG search and inference 10 million tokens worth of text.
The whole point of search is that looking through every document every time that you want to find something is a waste of effort, using summarization allows you to more accurately survey larger volumes of data and search in what you’re looking for. You never trust the output of the model, just like you don’t cite Google’s search results page or Wikipedia, because they are there to point you to information, not provide it. A RAG system gives you the citations for the data so once the summarization indicates that it has found what you’re looking for then you can read for yourself.
the question is can we make a system that has an error rate that is close to or lower than a person’s
A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts.
Our clinical error metrics were derived from 18 experimental configurations involving LLMs for clinical note generation, consisting of 12,999 clinician-annotated sentences. We observed a 1.47% hallucination rate and a 3.45% omission rate. By refining prompts and workflows, we successfully reduced major errors below previously reported human note-taking rates, highlighting the framework’s potential for safer clinical documentation.
why… would I want that? I read novels because I like reading novels? I also think that on summaries LLMs are especially bad, since there is no distinction between “important” and “unimportant” in the architecture. The point of a summary is to only get the important points, so it clashes.
Novel is given as a human unit of text, because you may not know what 10 million tokens means in terms of actual length. I’m clearly not talking about fictional novels read for entertainment.
Meanwhile an LLM could produce a summary, with citations generated and tracked by non-AI systems, with an error rate comparable to a human (assuming the human was given a few months to work on the problem) in seconds.
I still have not seen any evidence for this, and it still does not adress the point that the summary would be pretty much unreadable
This is an example of a commercial tool which returns both the non-LLM generation of citations and the accurate summation of the contents of the article as it relates to the question.
Where do we bagin?
It’s a lot of words to say that gpt can sommarise the text for you. Not only that, you’be required to trust that summary, otherwise there wouldn’t be AI use in general.
Summary? That is a wrong words.
A summary is a reasoned synospsis made with intent. AI just generates a whole new text using the original as a prompt. It’s not a summary of anything in particular, it’s a new document.
You can, instead, learn to search properly, using trusted sources and using keyword search per trusted source. Take note of the links and the site abstracts.
Check the authors of the articles you read, make sure that they’re real people.
Ethics in research are not replaceable by ai. Sooner or later you’ll get there.
You’re arguing against the use of AI to do actual research. I agree with you that using AI to do research is wrong. I’m not sure where you got any other idea.
My entire point, the statement that I was responding to, was a claim that LLMs hallucinate sources. That’s only true of naive uses of LLMs, if you just ask a model to recite a fact it will hallucinate a lot of the time. This is why they are used in RAG systems and, in these systems, the citations are tracked through regular software because every AI researcher knows that LLMs hallucinate. That hasn’t been new information for 5+ years now.
Systems that do RAG search summarizations, as in my example, both increase the accuracy of the response (by inserting the source documents in to the context window) and avoid relying on LLMs to handle citations.
It’s one thing to hate the damage that billionaires are doing to the world in order to chase some pipedream about AI being the holy grail of technology. I’m with you there, fuck AI.
It’s a whole other thing to pretend that machine learning is worthless or incapable of being a good tool in the right situations. You’ve been relying on machine learning tools for a long time, you say ‘learn to search properly’. The search results that you receive are entirely built on ancestors of the PageRank machine learning algorithm which is responsible for creating Google.
The only reason that AI is even on your radar (assuming you’re not in academia) is because a bunch of rich assholes are exploiting people’s amazement at this new technology to sell impossible dreams to people in order to cash in on the ignorance of others. Those people are scammers with MBAs, but their scam doesn’t change the usefulness of the underlying technology of Transformer neural networks or Machine Learning in general.
Fighting against ‘AI’ is pointless if your target is LLMs and not billionaires.
True, nuance is dead on social media. Especially in high propaganda places where people treat bad faith arguments like a virtue.
It is weird how the position is both that AI is simultaneously incapable of producing any work of any quality and also an existential threat to all human labor on the planet.
It really sounds like they have two arguments that they’re smashing together and treating like one.
First, AI system do produce poor quality output a lot of the time. Much like any other technology, the first few years are not exactly an example of what is possible.
For example, the first jet aircraft could only operate for a few hours or their engines would literally melt. People are sitting here looking at these prototype jet aircraft and claiming that there will never be commercially viable jet travel. (and yet, in this same metaphor, somehow jets will also take over all forms of travel imminently).
LLMs and Image generators are not AI, they’re simply the easiest and cheaptest to train, which is why you have all of these capitalist vultures jumping on these products as if they’re the future.
That’s really the core of the second part of the argument which is essentially: “Capitalists have too much money and have decided to gamble that money on the AI industry, resulting in unsustainable spending and growth that harms real people and communities”.
By itself, this is a good argument also. People are starting to understand the sides, we’re on the bottom and the people on the top who have the power often make horrible decisions in order to chase profit and the result is that regular people are being hurt by those decisions.
The red herring is that they’re blaming these problems on AI instead of the billionaire humans who are actually choosing to put in these data centers and fire workers, etc. A language model or diffusion model isn’t choosing to fly in natural gas generators to power datacenters and pollute communities. Elon Musk chose that.
Getting angry at AI is a useless distraction. There are human beings that are making these decisions and the ones that bear responsibility for the damages, not a few Terabytes of spicy linear algebra.
No, you don’t trust the output. You shouldn’t trust the output of search either. This is just search with summarization.
That’s why there are linked sources so that you can verify yourself. The person’s contention was that you can’t trust citations because they can be hallucinated. That’s not how these systems work, the citations are not handled by LLMs at all except as references, the actual source list is entirely a regular search program.
The LLM’s summarization and sources are like the Google Results page, they’re not information that you should trust by themselves they are simply a link to take you to information that’s responsive to your search. The LLM provides a high level summary so you can make a more informed decision about which sources to look at.
Anyone treating LLMs like they’re reliable is asking for trouble, just like anyone who believes everything they read on Facebook or cite Wikipedia directly.
Search didn’t used to give “output”. It used to give links to a wide variety of sources such as detailed and exact official documentation. There was nothing to “trust”.
Now it’s all slop bullshit that needs to be double checked, a process that frankly takes just as long as finding the information youself using the old system, and even that still can’t be trusted in case it missed something.
If you search on Google, the results are an output. There’s nothing AI about the term output.
You get the same output here and, as you can see, the sources are just as easily accessible as a Google search and are handled by non-LLM systems so they cannot be hallucinations.
The topic here is about hallucinating sources, my entire position is that this doesn’t happen unless you’re intentionally using LLMs for things that they are not good at. You can see that systems like this do not use the LLM to handle source retrieval or citation.
This is true of Google too, if you’re operating on the premise that you can trust Google’s search results then you should know about Search Engine Optimization (https://en.wikipedia.org/wiki/Search_engine_optimization), an entire industry that exists specifically to manipulate Google’s search results. If you trust Google more than AI systems built on search then you’re just committing the same error.
Yes, you shouldn’t trust things you read on the Internet until you’ve confirmed them from primary sources. This is true of Google searches or AI summarized results of Google searches.
I’m not saying that you should cite LLM output as facts, I’m saying that the argument that ‘AIs hallucinate sources’ isn’t true of these systems which are designed to not allow LLMs to be in the workflow that retrieves and cites data.
It’s like complaining that live ducks make poor pool toys… if you’re using them for that, the problem isn’t the ducks it’s the person who has no idea what they’re doing.
so I fail to see why I should be using an LLM at all then. If I am going to the webpages anyway, why shouldn’t I just use startpage/searx/yacy/whatever?
Yeah, if you already know where you’re going then sure, add it to Dashy or make a bookmark in your browser.
But, if you’re going to search for something anyway. Then why would you use regular search and skim the tiny amount of random text that gets returned with Google’s results? In the same amount of time, you could dump the entire contents of the pages into an LLM’s context window and have it tailor the response to your question based on the text.
You still have to actually click on some links to get to the real information, but a summary generated from the contents of the results is more likely to be relevant than the text presented in Google’s results page. In both cases you still have a list of links, generated by a search engine and not AI, which are responsive to your query.
see, the problem is that I am not going to be reading that text because I know it is unreliable and ai text makes my eyes glaze over, so I will be clicking on all those links until I find something that is reliable. On a search engine I can just click through every link or refine my search with something like site:reddit.com site:wikipedia.org or format:pdf or something similar. With a chatbot, I need to write out the entire question, look at the four or so links it provided and then reprompt it if it doesn’t contain what I’m looking for. I also get a limited amount of searches per day because I am not paying for a chatbot subscription. This is completely pointless to me.
I’m not sure what standards you’re saying unreliable.
You can see in the example that I provided it correctly answered the question and also correctly cited the place where the answer came from in the exact same amount of time as it would take to type the query into Google.
Yes, LLMs by themselves can hallucinate and do so at a high rate so that they’re unreliable sources of information. That is 100% true. It will never be fixed, because LLMs are trained to be an autocorrect and produce syntactically correct language. You should never depend on raw LLM generated text from an empty context, like from a chatbot.
The study of this in academia (example: https://arxiv.org/html/2312.10997v5) has found that LLMs hallucination rate can be dropped to almost nothing (less than a human) if given text containing the information that it is being asked about. So, if you paste a document into the chat and ask it a question about the document the hallucination rate drops significantly.
This finding created a technique called Retrieval Augmented Generation where you use some non-AI means of finding data, like a search engine, and then put the documents into the context window along with the question. This makes it so that you can create systems that use LLMs for the tasks that they’re accurate and fast at (like summarizing text that is in the context window) and non-AI tools to do things that require accuracy (like searching databases for facts and tracking citation).
You can see in the images I posted that it both answered the question and also correctly cited the source which was the entire point of contention.
you are linking to an arxiv preprint. I do not know these researchers. there is nothing that indicates to me that this source is any more credible than a blog post.
where? It doesn’t seem to be in this preprint, which is mostly a history of RAG and mentions hallucinations only as a problem affecting certain types of RAG more than other types. It makes some relative claims about accuracy that suggest including irrelevant data might make models more accurate. It doesn’t mention anything about “hallucination rate being dropped to almost nothing”.
you know what has a 0% hallucination rate about the contents of a text? the text
this is anecdotal evidence, and also not the only point of contention. Another point was, for example, that ai text is horrible to read. I don’t think RAG(or any other tacked-on tool they’ve been trying for the past few years) fixes that.
What text are you reading that has a 0% error rate? Google search results? Reddit posts? You seem to be comfortable with the idea that arxiv preprints can have an error rate that isn’t 0% so ‘the text’ isn’t guaranteed to have no errors.
Even assuming perfect text, your error rate in summarization isn’t 0% either. Do you not misread passages or misremember facts and have to search again or find that you need to edit a rough draft before you finish it? We deal with errors all of the time and they’re manageable as long as they’re low. The question isn’t ‘can we make a process that has a 0% error rate, that’s an impossible standard’ the question is if we can make a system that has an error rate that’s close to or lower than a person’s.
The reason why is because these systems scale in a way that you do not. Even if you have savant level reading, recall and summarization such that you would make Kim Peek envious, how many books worth of material can you read and summarize in 10 seconds? 1? 5?
Could you read and summarize 75 novels (10 million tokens) with a 0% error rate? I’d imagine not and you certainly couldn’t do it in 30 seconds. In fact, this would be an impossible task for you no matter how high of an error rate that we allowed. You simply cannot ingest data fast enough to even make a guess at what a summary would look like. Or, to be more accurate to the actual use case, could you read 75 novels and provide a page reference to all of the passages written in iambic pentameter? I can read the passages myself, I just need for you to find them and tell me the page. You’d probably take longer than 10 seconds and you would almost assuredly miss some.
Meanwhile an LLM could produce a summary, with citations generated and tracked by non-AI systems, with an error rate comparable to a human (assuming the human was given a few months to work on the problem) in seconds.
as I said, the text has a 0% error rate about the contents of the text, which is what the LLM is summarising, and to which it adds it’s own error rate. Then you read that and add your error rate.
can we???
why… would I want that? I read novels because I like reading novels? I also think that on summaries LLMs are especially bad, since there is no distinction between “important” and “unimportant” in the architecture. The point of a summary is to only get the important points, so it clashes.
no LLM can do this. LLMs are notoriously bad at doing any analysis of this kind of style element because of their architecture. why would you pick this example
I still have not seen any evidence for this, and it still does not adress the point that the summary would be pretty much unreadable
Error rates that you simultaneously haven’t defined and also have declared as too high to be usable.
These tools clearly work, much like a search engine clearly works. They have errors (find me clean search results) but we use them.
You could make the same argument about search. If you issued a query to Google and compared the results generated by the machine learning systems and then had a human read the entire Internet specifically trying to answer your query you would probably find that in the end (after a few decades) the human results would probably be more responsive to your query and the Google results, once you get to page 3 or 4 start to become random nonsense.
By any measure the Google results are worse than what a human would choose. This is why you have to ‘learn’ to search and to issue queries in a specific way, because otherwise you get errors/bad results.
The problem with the accurate human results is that if you had all of the people on the planet working full-time 365 days a year could not service a single minute worth of the queries that the Google machine learning algorithms serve up 24/7.
Could you read 3 books and find the answer that you want? Or craft some regular expression search to find it? Sure, but you can’t do it faster than it takes to run a RAG search and inference 10 million tokens worth of text.
The whole point of search is that looking through every document every time that you want to find something is a waste of effort, using summarization allows you to more accurately survey larger volumes of data and search in what you’re looking for. You never trust the output of the model, just like you don’t cite Google’s search results page or Wikipedia, because they are there to point you to information, not provide it. A RAG system gives you the citations for the data so once the summarization indicates that it has found what you’re looking for then you can read for yourself.
Yes.
Here is a peer reviewed article published in Nature Medicine - https://pmc.ncbi.nlm.nih.gov/articles/PMC11479659/
The relevant section from the abstract:
Another published peer reviewed article posted in npj digital medicine - https://www.nature.com/articles/s41746-025-01670-7
Novel is given as a human unit of text, because you may not know what 10 million tokens means in terms of actual length. I’m clearly not talking about fictional novels read for entertainment.
https://lemmy.world/post/43275879/22220800
This is an example of a commercial tool which returns both the non-LLM generation of citations and the accurate summation of the contents of the article as it relates to the question.
Where do we bagin? It’s a lot of words to say that gpt can sommarise the text for you. Not only that, you’be required to trust that summary, otherwise there wouldn’t be AI use in general.
Summary? That is a wrong words. A summary is a reasoned synospsis made with intent. AI just generates a whole new text using the original as a prompt. It’s not a summary of anything in particular, it’s a new document.
You can, instead, learn to search properly, using trusted sources and using keyword search per trusted source. Take note of the links and the site abstracts.
Check the authors of the articles you read, make sure that they’re real people.
Ethics in research are not replaceable by ai. Sooner or later you’ll get there.
You’re arguing against the use of AI to do actual research. I agree with you that using AI to do research is wrong. I’m not sure where you got any other idea.
My entire point, the statement that I was responding to, was a claim that LLMs hallucinate sources. That’s only true of naive uses of LLMs, if you just ask a model to recite a fact it will hallucinate a lot of the time. This is why they are used in RAG systems and, in these systems, the citations are tracked through regular software because every AI researcher knows that LLMs hallucinate. That hasn’t been new information for 5+ years now.
Systems that do RAG search summarizations, as in my example, both increase the accuracy of the response (by inserting the source documents in to the context window) and avoid relying on LLMs to handle citations.
It’s one thing to hate the damage that billionaires are doing to the world in order to chase some pipedream about AI being the holy grail of technology. I’m with you there, fuck AI.
It’s a whole other thing to pretend that machine learning is worthless or incapable of being a good tool in the right situations. You’ve been relying on machine learning tools for a long time, you say ‘learn to search properly’. The search results that you receive are entirely built on ancestors of the PageRank machine learning algorithm which is responsible for creating Google.
The only reason that AI is even on your radar (assuming you’re not in academia) is because a bunch of rich assholes are exploiting people’s amazement at this new technology to sell impossible dreams to people in order to cash in on the ignorance of others. Those people are scammers with MBAs, but their scam doesn’t change the usefulness of the underlying technology of Transformer neural networks or Machine Learning in general.
Fighting against ‘AI’ is pointless if your target is LLMs and not billionaires.
You are just speaking to a brick wall. It’s taking all the jobs AND garbage. Can’t be a tool in between that has pros and cons.
True, nuance is dead on social media. Especially in high propaganda places where people treat bad faith arguments like a virtue.
It is weird how the position is both that AI is simultaneously incapable of producing any work of any quality and also an existential threat to all human labor on the planet.
It really sounds like they have two arguments that they’re smashing together and treating like one.
First, AI system do produce poor quality output a lot of the time. Much like any other technology, the first few years are not exactly an example of what is possible.
For example, the first jet aircraft could only operate for a few hours or their engines would literally melt. People are sitting here looking at these prototype jet aircraft and claiming that there will never be commercially viable jet travel. (and yet, in this same metaphor, somehow jets will also take over all forms of travel imminently).
LLMs and Image generators are not AI, they’re simply the easiest and cheaptest to train, which is why you have all of these capitalist vultures jumping on these products as if they’re the future.
That’s really the core of the second part of the argument which is essentially: “Capitalists have too much money and have decided to gamble that money on the AI industry, resulting in unsustainable spending and growth that harms real people and communities”.
By itself, this is a good argument also. People are starting to understand the sides, we’re on the bottom and the people on the top who have the power often make horrible decisions in order to chase profit and the result is that regular people are being hurt by those decisions.
The red herring is that they’re blaming these problems on AI instead of the billionaire humans who are actually choosing to put in these data centers and fire workers, etc. A language model or diffusion model isn’t choosing to fly in natural gas generators to power datacenters and pollute communities. Elon Musk chose that.
Getting angry at AI is a useless distraction. There are human beings that are making these decisions and the ones that bear responsibility for the damages, not a few Terabytes of spicy linear algebra.
Well written
Thank you