Apple’s cautious but clever AI use

Would you like ChatGPT with that? (Apple)

The big announcement at Apple’s WWDC this week may have seemed underwhelming at first glance, but is a very smart move the closer you look: Yes, we’re finally getting a calculator on the iPad a decade after the device was first launched.

The company is also cleverly avoiding most of the issues with hallucinations, privacy and over-promising and under-delivering on the hype that plague most AI products (see Adobe, Microsoft and Google’s recent AI debacles).

Instead, Apple is using compressed, on-device, open-source derived models, which cleverly hot swap adapters when required that have been fine-tuned to specialize in one particular task, whether it is summarization, proofreading or auto-replies.

The idea is to increase the chance that most AI tasks can be completed successfully and privately on the device itself — and, of course, to finally provide a compelling reason to upgrade your phone or iPad.

More difficult queries are sent using anonymized, encrypted data to a medium-sized model on Apple’s servers (which doesn’t store the data) and the most complex tasks involving writing or synthetic reasoning are sent on to ChatGPT after you’ve given it permission. OpenAI can’t store your data either.

Based on the presentations, the privacy and functionality aspects seem very well thought out, although we won’t find out for sure until September. Apple didn’t build the first smartphone, but it came up with one of the best versions of it with the iPhone. It will be interesting to see if its cautiously optimistic approach to AI enjoys similar success.

Apple Intelligence
Apple can do some things well but isn’t promising the earth. (Apple)

Google AI is still stuck on glue

Like a snake eating its own tail, it seems that all of those articles about how stupid Google’s AI Overview answers were have just been making the answers worse. You may remember Google telling search users to eat rocks, that cockroaches live in cocks, and, most famously, that glue is a good way to stick cheese to pizza.

This week, Google’s AI is still telling users to add two tablespoons of glue to pizza, citing news reports from Business Insider and The Verge — about its own incorrect answers — as the source. Verge journalist Elizabeth Lopatto wrote:

“Just phenomenal stuff here, folks. Every time someone like me reports on Google’s AI getting something wrong, we’re training the AI to be wronger.”

Google glue
Screenshot of Google’s glue recommendations. (The Verge)

Two OpenAI researchers predict AGI in 3 years

Leopold Aschenbrenner, the OpenAI researcher fired for leaking information about how unprepared the company is for artificial general intelligence, has dropped a 165-page treatise on the subject. He predicts that AI models could reach the capabilities of human AI researchers and engineers (which is AGI) by 2027, which would then inevitably lead to superintelligence as the AGI develops the tech itself. The prediction is based on the linear progress we’ve seen in AI in recent years, although critics claim the tech might hit a ceiling at some point.

Another research engineer at OpenAI, James Bekter, wrote something similar: “We’ve basically solved building world models, have 2-3 years on system 2 thinking, and 1-2 years on embodiment. The latter two can be done concurrently.” He estimates three to five years “for something that looks an awful lot like a generally intelligent, embodied agent.”

French scientist’s $1 million bet that LLMs like ChatGPT won’t lead to AGI

AI experts, including Meta chief AI scientist Yann LeCun and ASI founder Ben Goertzel, are skeptical that LLMs can provide any sort of path to AGI. French AI researcher Francois Chollet argued on Dwarkesh Patel’s podcast this week that OpenAI has actually set back progress toward AGI by “5 to 10 years” because it stopped publishing frontier research and because its focus on LLMs has sucked all the oxygen out of the room.

Channeling LeCun’s highway metaphor, Chollet believes LLMs are “an off-ramp on the path to AGI” and he’s just launched the $1 million ARC Prize for any AI system that can pass his four-year-old Abstraction and Reasoning Corpus (ARC) test to see if it can actually adapt to novel ideas and situations rather than simply remix content from the web. Chollet believes that most existing benchmarks simply test memorization, which LLMs excel at, and not the ability to creatively grapple with new ideas and situations. It’s an interesting philosophical debate: after all, as Patel pressed him, don’t humans mostly just memorize stuff and generalize or extrapolate from it?

The $1 million prize echoes skeptic James Randi’s $1 million paranormal challenge for anyone who could demonstrate paranormal activities. It was never been claimed, and its main purpose was to highlight the fact that paranormal claims are nonsense.Chollet’s aim, however, appears to be to try and focus on more holistic benchmarks for intelligence than memorization. Every task on the test is solvable by humans, but not by AI just yet.

LLMs unable to reason about novel things

New research (right) supports the idea that LLMs are surprisingly stupid whenever they encounter questions that humans haven’t extensively written about on the web.

LLM research
Alice In Wonderland LLM research (Arvix)

The paper concludes that despite passing bar exams and other party tricks, current LLMs lack basic reasoning skills, and existing benchmarks fail to detect these deficiencies properly.

The LLMs were asked about this problem: “Alice has N brothers, and she also has M sisters. How many sisters does Alice’s brother have?”

It concluded that: “While easily solvable by humans using common sense reasoning (the correct answer is M+1), most tested LLMs, including GPT-3.5/4, Claude, Gemini, LLaMA, Mistral, and others, show a severe collapse in performance, often providing nonsensical answers and reasoning.”

The LLMs were also very confident in their wrong answers and provided detailed explanations justifying them.

LeCun highlighted the study on X, saying: “Yet another opportunity to point out that reasoning abilities and common sense should not be confused with an ability to store and approximately retrieve many facts.”

LLMs are wrong a lot about the election

A study from data analytics startup GroundTruthAI claims that Google Gemini 1.0 Pro and ChatGPT’s various flavors (from 3.5 to 4o) give incorrect information about voting and the 2024 U.S. election more than a quarter of the time.

The researchers asked 216 election questions multiple times and determined that Gemini 1.0 Pro answered correctly just 57% of the time, while the best OpenAI model (GPT-4o) answered correctly 81% of the time.

The models consistently got Biden and Trump’s ages wrong and couldn’t say how many days were left until the election. Two models incorrectly said voters can register on polling day in Pennsylvania.

Google claims the researchers must have used the API rather than the public interface and Wired reports Google and Microsoft’s chatbots are now refusing to answer election questions.

Read also


The Invisible Man of the Visible World: How Blockchain Could Offer New Hope to Stateless Rohingya

Art Week

Defying Obsolescence: How Blockchain Tech Could Redefine Artistic Expression

AI training data will run out soon

A peer-reviewed study from Epoch AI estimates that tech companies will exhaust the supply of publicly available text-based AI training data sometime between 2026 and 2032. 

“There is a serious bottleneck here,” study co-author Tamay Besiroglu said. “If you start hitting those constraints about how much data you have, then you can’t really scale up your models efficiently anymore.”

However, AIs could be trained on video, audio and synthetic data, and companies appear set to stripmine private data, too. Professors Angela Huyue Zhang and S. Alex Yang warn in The Sunday Times that GPT 4o’s “free” model appears to be a way for OpenAI to hoover up crowdsourced massive amounts of multimodal data.

ChatGPT is good at hacking zero-day vulnerabilities

A few months ago, a group of researchers demonstrated that describing security vulnerabilities to a GPT-4 agent enabled it to hack a series of test websites. But while it was good at attacking known vulnerabilities, it performed poorly on unknown or ‘zero-day’ vulnerabilities. 

The same researchers have since employed a GPT-4 planning agent leading a team of subagents to try and uncover the unknown, or zero-day, vulnerabilities on test websites. In the new research, the AI agents were able to exploit 53% of the zero-day security flaw opportunities on test websites.

Luma AI’s new Sora competitor, Dream Machine

Luma AI dropped its new Dream Machine text and image-to-video generator, and the usual AI influencers have put out long threads of hugely impressive high-resolution examples.

Unlike Sora, the public can try it out themselves. Over on Reddit, users report it’s taking an hour or two to produce anything (due to overloaded servers) and that their results don’t match the hype.

500K new AI millionaires

If you’re not a millionaire yet from the AI boom, are you even trying? According to consulting company Capgemini, the total number of millionaires in the U.S. jumped by 500,000 people to 7.4 million. Fortune attributes this to the AI stock boom

The publication notes that investor optimism over AI saw the S&P 500 surge by 24% last year, Tesla doubled, Meta jumped 194%, and Nvidia grew 239%. The index and the tech-heavy Nasdaq hit record highs this year. The boom looks set to continue, with Goldman Sachs predicting AI investment globally could top $200 billion by 2025.

Read also


Bitcoin: A Peer To Peer Online Poker Payment System by Satoshi Nakamoto


Extinct or Extant: Can Blockchain Preserve the Heritage of Endangered Populations?

Adobe and Microsoft back down on AI features

Following a backlash from users over Adobe’s terms of service, which gave the company broad permissions to access and take ownership of user content and potentially train AI, the company has changed course. It’s now saying it will “never” train generative AI on creators’ content “nor were we considering any of these practices.”

The controversy started after content creators were told they needed to agree to Adobe’s terms or face a fee equal to 50% of their remaining annual subscription cost.

Microsoft has also backed down on its Recall feature for its line of AI-branded Copilot+PCs. The tool creates a screenshot record of absolutely everything users do to enable the AI to help out. But cybersecurity experts say the trove of information is a massive honeypot for hackers. The tool will now be turned off by default, require biometrics to access, and data will be encrypted when users aren’t logged in.

Andrew Fenton

Andrew Fenton

Based in Melbourne, Andrew Fenton is a journalist and editor covering cryptocurrency and blockchain. He has worked as a national entertainment writer for News Corp Australia, on SA Weekend as a film journalist, and at The Melbourne Weekly.

Source link