The First LLM That Never Hallucinates

Pinecone introduced a hallucination-free LLM... Or did they?

Tereza Tizkova
5 min readApr 16, 2024

Hello there! My blog is moving to Subtack, so please support this article there and follow me if you want to see new content. :) 👇

On May 2023, a New York attorney cited several nonexistent cases, complete with nonexistent quotes and holdings that seem to have been generated by ChatGPT.

The cautionary tale of the “ChatGPT lawyer” went viral, and AI hallucinations in the courtroom were a wake-up call for the legal profession, and people in other professions also started to be aware of the danger of AI chatbots.

Why does AI hallucinate?

The underlying cause of ChatGPT or your other favorite chatbot sometimes sharing false information is the underlying large language model (LLM) that powers them. (For example, ChatGPT assistant is powered by LLM called GPT and made by OpenAI).

LLMs were trained on a lot of information and data, from which they tried to generate the best possible answer. They are made to emulate how humans generate text (there is a lot of math behind this), so occasionally they make things up — which we call hallucinations.

You can look at this as LLMs failing to admit they don’t know an answer or don’t have enough data. Similar to us humans.

How did Pinecone solve this problem?

So… What if LLMs stopped believing they “know” things, but instead relied exclusively on external knowledge? The hypothesis Pinecone (a startup known for building vector databases) shared was that a model will never hallucinate if it always admits what it does not know.

Pinecone trained a new LLM called “Luna” for which they used a technique they call “information-free training”.

They defined the assumed knowledge factor (AKF) = the confidence level set by the model when it forms factual content. This is the equation that gives the best-performing adjustment schedule for AKF.

Source

Where’s the catch?

Where’s the catch?

You might think, are there any drawbacks to this approach?

The catch is, that Luna might be an April Fools. (At least I believe so). The post was announced on April 1st and screenshots of Luna’s performance indicate a joke:

If you are present in the tech/AI bubble like me, you are used to breathtaking demos of products, so it seems unlikely Pinecone would release this and take it seriously.

Source

More parts of their post indicate a great sense of humor:
“When reviewing results, we found Luna tends to answer pretty much all questions with some version of “I don’t know.” Therefore, the results are relatively poor on coding (0%) and task completion (0%), as well as usefulness (0%).”

Or the classics “The model is closed-source, but we plan to open-source it in the near future”.

There is even another vector database startup, Vectara, that made the same April Fool’s joke.

Source

However, the Luna announcement is still online two weeks later, and I have seen even senior technical people online resharing the article and taking it seriously. Not everyone in the internet community appreciated the launch.

I actually learned about Luna when I noticed that a person that I respect a lot also believed it and shared it as a fact.

So how to actually reduce LLM hallucinations?

Even though Luna is most likely fake news, there are approaches to reduce hallucinations, and the Pinecone team themselves work on one of them called RAG.

1. RAG

Retrieval-augmented generation (RAG) is one such solution. RAG is a technique that enhances the output of large language models (LLMs), like GPT, by retrieving relevant information from external sources. Coming back to the legal example, Lexis+ and Paxton.AI are examples of legal tech solutions incorporating RAG technology.

However, even though RAG reduces hallucinations, it doesn’t completely eliminate them. (So even Lexis might still generate a non-sense text).

2. Code interpreters

Another way to reduce the hallucinations is equipping the LLM or LLM-powered app with a code interpreter.

Aditya Advani demonstrated in his article how using a code interpreter style LLM engine can reduce the hallucination rate of GPT-4 from <10% to <1%.

Code interpreters can handle uploads and downloads, write code to look up data from source files, and arrive at conclusions instead of reasoning freestyle like simpler agents normally do.

Here is an example of Anthropic’s Claude model equipped with code interpreter.

Source

3. Prompt engineering

Changing the way you ask the LLM for an output is definitely not a reliable solution to prevent hallucinations, but might be useful in individual cases. In one example, asking “Is King Renoit real?” caused a hallucination, but asking “Is King Renoit mentioned in Song of Roland, yes, or no?” answers correctly. (King Renoit is a made-up person).

You can try more tricks like this, e.g. something as simple as telling the LLM not to lie or tell it what kind of output you want.

Thank you for reading! :)

There are probably more techniques to reduce hallucinations, so if you want to discuss them, feel free to do so here or on X (Twitter).

And I apologize for you having to see that one mathematical formula.

--

--