AI can get tripped up by basic math

Fun fact; Meta on WhatsApp is not available here in the UK.
Interesting. Wonder why it hasn't rolled out there yet.

The voice recognition is incredible. I can speak really fast and mumble a few words and it still understands. I can't wait for this level of voice recognition to be integrated at the o/s level for true hands free voice control.
 
llm also contain word association, so during the training process the words "smaller than" could be associated with "less than" which gets weighted by the neural network and could then be further interpreted as "less than" means "minus" in the context of the math problem. If you retry the prompt in gemini and substitute "smaller than the others" with just the word "tiny" (or if you run the same prompt multiple times in new instances to reroll new weights in the neural network), the answer would probably be correct.
 
I read somewhere that certain prompts that requires reasoning can get AI to make calculation errors.

I tested this on Copilot, Claude, Chatgpt and Gemini. All except Gemini managed to ignore the trivial info about some being smaller as irrelevant and calculated correctly.





Gemini:




Just for fun I changed Steve to Dave an used apples instead of kiwis... Same erroneous result.

The basic "AI" is an LLM, pretty much your cellphone predictive text that went to private school.

The way around it is to extract the math from a piece of text using the LLM and ship that off to a different piece of code that handles math.
 
ChatGPT 4o mini

You said:​

How many r's in raspberry

ChatGPT said:​

There are two "r's" in the word "raspberry."

You said:​

Show them by adding a - before and after each

ChatGPT said:​

Sure! Here’s the word "raspberry" with the "r's" highlighted:
ra-s-p-be-rr-y

You said:​

no the dashes should be immediately before and after each r

ChatGPT said:​

Got it! Here it is:
ra-s-p-be-r-r-y

You said:​

why -p- ?

ChatGPT said:​

That was a mistake! Here’s the correct format:
ra-s-b-e-rr-y
Thank you for your patience!

You said:​

what happened to the p?

ChatGPT said:​

I apologize for that oversight! Here’s the correct representation:
ra-s-p-be-rr-y
Thank you for catching that!

You said:​

the p has dashes around it, why?

ChatGPT said:​

I see the confusion! Here’s the correct representation with only the "r's" highlighted:
ra-s-p-be-r-r-y
Thanks for your understanding!
 
The basic "AI" is an LLM, pretty much your cellphone predictive text that went to private school.

The way around it is to extract the math from a piece of text using the LLM and ship that off to a different piece of code that handles math.
I always explain LLMs (AI) as autocomplete on steroids.

At least it has not become aggressive yet...
 
Not quite at what one would call Artificial General Intelligence (AGI), yet.

There are however some freaking amazing specialized models out there already. For Math, models like Google DeepMind AlphaProof/AlphaGeometry 2, Wolfram Alpha AI, etc. are way better suited.

The public AI chatbots are generalist, with the intent to engage on a personable level with people. The quality of the answers you get is also heavily dependent on the token compute resources you have access too. In many cases you can run models locally on your GPU. And depending on a variety of factors, mostly how well you've set things up and done your homework, get very useful results.

In two or so years, it'll be all about agents in any case. And that's probably how things will proceed. Agents will network with other agents to locate the perfect resources to answer your questions, or even complete seriously complex tasks. With very little oversight needed.

You know... These guys...

1729016353583.png
 
LLMs are technically not really intelligent. They do their best to predict the next word in a sentence or the next pixel in an image. Sometimes they are right, sometimes they are wrong. But they do not process logic.

This is what I keep telling people, they are simply LLM's based on decades old neural network 'technology', fed huge amounts of data thanks to the internet. I have never seen the 'intelligence' in them, artificial or otherwise...
 
This, this is why AI will one day wipe out the human race.

Well, if you all keep pestering it with the strawberry question then that moment might come sooner than expected.
 
if you prep it first with "show the count of each alphabet letter in the word strawberry" it counts correctly.
Asking the "how many r's in strawberry" question after gets the correct response.

The moment you reset the chat/start a new chat and starts with the second question, it goes back to 3


Prompt engineering: The skill to get the answer you expect from a LLM (ai-ish)...
 
590152530c7d5f3618abbfecbc4a15a2.png
 
Top
Sign up to the MyBroadband newsletter
X