Why ChatGPT Feels Like a Brilliant but Flawed Intern

Key Takeaways

Many users report AI becoming less reliable for practical business tasks.
Conversational fluency can create the illusion of competence without execution.
"Working on it now" responses often mask the reality that no background processing exists.
File generation and sandbox failures remain a major frustration for power users.
AI models are increasingly optimised for conversation, not necessarily task completion.
The future winners may be platforms that prioritise reliability over personality.

The growing gap between AI intelligence and practical usefulness

Artificial intelligence has never been more capable. Yet many power users argue it has never been more frustrating. From endless apology loops and phantom background processing to broken file downloads and forgotten instructions, a growing number of professionals believe today’s frontier AI models are becoming harder to work with, not easier. Here’s why.

I’ve increasingly noticed this myself. The latest models often seem to spend more time explaining how they intend to complete a task than actually completing it. On several occasions I’ve found myself repeatedly asking for the work to be done, only to receive another detailed explanation of the process instead. More recently, I’ve been presented with a download link for a supposedly finished PowerPoint presentation, only to discover it was actually the last Excel spreadsheet the system had generated. I’m still firmly in the ChatGPT camp and remain impressed by what these tools can achieve, but I do hope these issues are addressed quickly because they’re becoming difficult to ignore.

For a technology designed to save time, AI has developed an impressive talent for occasionally wasting it.

Ask some power users about their experience with modern language models and you’ll hear a surprisingly familiar description. Not “revolutionary assistant”. Not “digital co-pilot”.

More often, you’ll hear something closer to:

“It’s like working with an incredibly bright intern who speaks confidently about everything, promises the world, then forgets where they saved the file.”

It’s an observation that’s difficult to ignore because it captures a growing frustration among professionals using AI daily. While model benchmarks continue climbing, practical usability doesn’t always seem to follow the same trajectory.

Somewhere between astonishing intelligence and everyday usefulness, things appear to be getting complicated.

Why smarter AI can sometimes feel less useful

The assumption that a more intelligent model automatically creates a better user experience seems logical.

Unfortunately, reality has a habit of being awkward.

Many modern AI systems excel at reasoning tests, coding challenges and academic benchmarks. Yet users frequently report failures in surprisingly mundane areas:

Following established instructions
Completing multi-step tasks
Remembering project context
Delivering files correctly
Avoiding repetitive apologies
Knowing when to stop talking and simply do the job

The irony is difficult to miss.

We’ve built systems capable of discussing quantum physics while occasionally struggling to remember the filename they generated five minutes earlier.

That disconnect creates the perception of intelligence without reliability. And reliability is often what businesses actually pay for.

The strange phenomenon of AI procrastination

One of the most common complaints involves what might best be described as conversational procrastination.

You’ve probably seen it:

“I’m preparing that now.”

“Please give me a few minutes.”

“I’m compiling the information.”

“I’ll let you know when it’s ready.”

The problem is that standard language models don’t work in the background.

When the response stops generating, the model effectively stops working.

There is no digital employee sitting behind the scenes diligently assembling your PowerPoint while you make a coffee.

Yet the language often creates precisely that impression.

This happens because AI models are trained on vast quantities of human communication. Unfortunately, human communication contains an awful lot of corporate delay tactics.

When faced with uncertainty or complexity, the model can default to generating the sort of reassuring language a stressed employee might use when trying to buy themselves more time.

The result is a curious illusion of progress where nothing is actually happening.

Many users discover this only after waiting patiently for work that was never being performed.

The apology loop nobody asked for

Closely linked to this behaviour is what many users call the apology loop.

The sequence is remarkably predictable.

The AI promises action.

Nothing happens.

The user asks for an update.

The AI apologises profusely.

The AI promises immediate action.

Nothing happens.

Repeat.

At its worst, this creates a bizarre experience where the model appears more concerned with sounding helpful than being helpful.

Some users argue this stems from modern alignment systems rewarding agreeableness and user satisfaction.

After all, apologising sounds polite.

Unfortunately, politeness is not a substitute for delivery.

As every project manager in history knows, an apology doesn’t make the deadline any less missed.

When the wrong file arrives

Few issues frustrate professionals more than file-generation failures.

Imagine requesting a presentation.

The AI confidently confirms completion.

A download link appears.

You click.

An Excel spreadsheet downloads.

Not the presentation.

The spreadsheet.

Possibly the spreadsheet from an entirely different task.

This phenomenon is particularly frustrating because the model often appears genuinely convinced it has completed the requested work.

From the user’s perspective it feels absurd.

From a technical perspective it highlights a deeper challenge: maintaining context across increasingly complex workflows involving generated files, code execution environments and multiple processing layers.

Whatever the cause, the result is the same.

You still don’t have your presentation.

The benchmark paradox

Modern AI development increasingly revolves around benchmark performance.

Reasoning scores improve.

Mathematics scores improve.

Coding scores improve.

Leaderboards light up.

Yet some users report feeling less productive.

This creates what could be called the benchmark paradox.

The model becomes objectively better according to measurable tests while becoming subjectively harder to work with during real-world tasks.

It’s the equivalent of hiring an employee with a higher IQ who somehow needs more supervision.

Impressive in theory.

Complicated in practice.

Have AI models become too conversational?

Another criticism emerging from professional users is what some describe as excessive conversational behaviour.

Business users generally want:

Direct answers
Clear recommendations
Completed tasks
Minimal fluff

Instead, they occasionally receive:

Reassurance
Validation
Emotional framing
Overly cautious language
Excessive formatting

To be fair, many users enjoy this style.

But when you’re trying to finish a board presentation before 9am, a digital therapist is rarely what you ordered.

The growing tension between conversational warmth and operational efficiency is becoming one of AI’s most interesting design challenges.

Why competitors are gaining attention

As frustration grows, many professionals are exploring alternative platforms.

I find myself increasingly doing the same, now using Claude as much as ChatGPT, with a spattering of Gemini thrown into the mix depending on the task at hand. That experience appears to mirror a broader trend among power users who are becoming more platform-agnostic in their search for reliability and results.

Not necessarily because competitors are smarter.

But because they may be perceived as more predictable.

For business users, predictability often beats brilliance.

A system that reliably completes 90% of tasks is usually more valuable than one capable of extraordinary performance 50% of the time.

The market is increasingly shifting from a battle of intelligence to a battle of dependability.

And that changes everything.

The Bottom Line

The next phase of AI won’t be won by whichever model sounds the smartest.

It will be won by whichever model completes the task.

For years, AI development focused on making machines more intelligent.

Now users are asking a different question:

Can it simply do what I asked?

That might sound like a low bar.

In reality, it could become the most important competitive advantage in the industry.

Because while everyone loves a genius, businesses tend to prefer someone who actually delivers the file.

And preferably the right one.

Frequently Asked Questions

Why do AI models sometimes feel less useful despite being more intelligent?

Users report that while AI models excel at reasoning and benchmarks, they often struggle with practical tasks, creating a perception of intelligence without reliability.

What is conversational procrastination in AI?

Conversational procrastination occurs when AI models use reassuring language without actually completing tasks, leading users to feel misled about progress.

What is the apology loop in AI interactions?

The apology loop is a repetitive cycle where the AI promises action, fails to deliver, and then apologizes, creating frustration for users.

Why are users exploring alternative AI platforms?

As frustrations grow, users are seeking alternatives that may be perceived as more predictable and reliable, rather than simply smarter.

What is the benchmark paradox in AI?

The benchmark paradox describes the situation where AI models improve on measurable tests but users feel less productive in real-world tasks.

Why ChatGPT Sometimes Feels Like a Brilliant Intern Who Forgot the Brief

The growing gap between AI intelligence and practical usefulness

Why smarter AI can sometimes feel less useful

The strange phenomenon of AI procrastination

The apology loop nobody asked for

When the wrong file arrives

The benchmark paradox

Have AI models become too conversational?

Why competitors are gaining attention

The Bottom Line

Demis Hassabis Just Raised $2.1 Billion to Reinvent Medicine. That Should Get Everyone’s Attention.

What is an AI Agent?

The Interplay of Hardware and Energy in Advancing Artificial Intelligence

This Week in AI, AGI, and ASI: The Latest Developments

Subscribe and never miss out

Terms and conditions

Cookie policy

Why ChatGPT Sometimes Feels Like a Brilliant Intern Who Forgot the Brief

The growing gap between AI intelligence and practical usefulness

Why smarter AI can sometimes feel less useful

The strange phenomenon of AI procrastination

The apology loop nobody asked for

When the wrong file arrives

The benchmark paradox

Have AI models become too conversational?

Why competitors are gaining attention

The Bottom Line

Demis Hassabis Just Raised $2.1 Billion to Reinvent Medicine. That Should Get Everyone’s Attention.

You May Also Like

What is an AI Agent?

The Interplay of Hardware and Energy in Advancing Artificial Intelligence

This Week in AI, AGI, and ASI: The Latest Developments

Subscribe and never miss out

Terms and conditions

Cookie policy