The Times Australia
The Times World News

.

If AI image generators are so smart, why do they struggle to write and count?

  • Written by Seyedali Mirjalili, Professor, Director of Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia
If AI image generators are so smart, why do they struggle to write and count?

Generative AI tools such as Midjourney, Stable Diffusion and DALL-E 2 have astounded us with their ability to produce remarkable images in a matter of seconds[1].

Despite their achievements, however, there remains a puzzling disparity between what AI image generators can produce and what we can. For instance, these tools often won’t deliver satisfactory results for seemingly simple tasks such as counting objects and producing accurate text.

If generative AI has reached such unprecedented heights in creative expression, why does it struggle with tasks even a primary school student could complete?

Exploring the underlying reasons helps sheds light on the complex numerical nature of AI, and the nuance of its capabilities.

AI’s limitations with writing

Humans can easily recognise text symbols (such as letters, numbers and characters) written in various different fonts and handwriting. We can also produce text in different contexts, and understand how context can change meaning.

Current AI image generators lack this inherent understanding. They have no true comprehension of what any text symbols mean. These generators are built on artificial neural networks trained on[2] massive amounts of image data, from which they “learn” associations and make predictions.

Combinations of shapes in the training images are associated with various entities. For example, two inward-facing lines that meet might represent the tip of a pencil, or the roof of a house.

But when it comes to text and quantities, the associations must be incredibly accurate, since even minor imperfections are noticeable. Our brains can overlook slight deviations in a pencil’s tip, or a roof – but not as much when it comes to how a word is written, or the number of fingers on a hand.

Read more: Both humans and AI hallucinate — but not in the same way[3]

As far as text-to-image models are concerned, text symbols are just combinations of lines and shapes. Since text comes in so many different styles – and since letters and numbers are used in seemingly endless arrangements – the model often won’t learn how to effectively reproduce text.

AI-generated image produced in response to the prompt ‘KFC logo’. Imagine AI[4]

The main reason for this is insufficient training data. AI image generators require much more training data[5] to accurately represent text and quantities than they do for other tasks.

The tragedy of AI hands

Issues also arise when dealing with smaller objects that require intricate details, such as hands[6].

Two AI-generated images produced in response to the prompt ‘young girl holding up ten fingers, realistic’. Shutterstock AI

In training images, hands are often small, holding objects, or partially obscured by other elements. It becomes challenging for AI to associate the term “hand” with the exact representation of a human hand with five fingers.

Consequently, AI-generated hands often look misshapen[7], have additional or fewer fingers, or have hands partially covered by objects such as sleeves or purses.

We see a similar issue when it comes to quantities. AI models lack a clear understanding of quantities, such as the abstract concept of “four”.

As such, an image generator may respond to a prompt for “four apples” by drawing on learning from myriad images featuring many quantities of apples – and return an output with the incorrect amount.

In other words, the huge diversity of associations within the training data impacts the accuracy of quantities in outputs.

Three AI-generated images produced in response to the prompt ‘5 soda cans on a table’. Shutterstock AI

Will AI ever be able to write and count?

It’s important to remember text-to-image and text-to-video conversion is a relatively new concept in AI. Current generative platforms are “low-resolution” versions of what we can expect in the future.

With advancements being made[8] in training processes and AI technology, future AI image generators will likely be much more capable of producing accurate visualisations.

It’s also worth noting most publicly accessible AI platforms don’t offer the highest level of capability. Generating accurate text and quantities demands highly optimised and tailored networks, so paid subscriptions to more advanced platforms will likely deliver better results.

References

  1. ^ a matter of seconds (www.zdnet.com)
  2. ^ trained on (www.assemblyai.com)
  3. ^ Both humans and AI hallucinate — but not in the same way (theconversation.com)
  4. ^ Imagine AI (www.imagine.art)
  5. ^ more training data (decrypt.co)
  6. ^ such as hands (www.buzzfeednews.com)
  7. ^ often look misshapen (twitter.com)
  8. ^ advancements being made (theconversation.com)

Read more https://theconversation.com/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count-208485

Times Magazine

Building a Strong Online Presence with Katoomba Web Design

Katoomba web design is more than just creating a website that looks good—it’s about building an online presence that reflects your brand, engages your audience, and drives results. For local businesses in the Blue Mountains, a well-designed website a...

September Sunset Polo

International Polo Tour To Bridge Historic Sport, Life-Changing Philanthropy, and Breath-Taking Beauty On Saturday, September 6th, history will be made as the International Polo Tour (IPT), a sports leader headquartered here in South Florida...

5 Ways Microsoft Fabric Simplifies Your Data Analytics Workflow

In today's data-driven world, businesses are constantly seeking ways to streamline their data analytics processes. The sheer volume and complexity of data can be overwhelming, often leading to bottlenecks and inefficiencies. Enter the innovative da...

7 Questions to Ask Before You Sign IT Support Companies in Sydney

Choosing an IT partner can feel like buying an insurance policy you hope you never need. The right choice keeps your team productive, your data safe, and your budget predictable. The wrong choice shows up as slow tickets, surprise bills, and risky sh...

Choosing the Right Legal Aid Lawyer in Sutherland Shire: Key Considerations

Legal aid services play an essential role in ensuring access to justice for all. For people in the Sutherland Shire who may not have the financial means to pay for private legal assistance, legal aid ensures that everyone has access to representa...

Watercolor vs. Oil vs. Digital: Which Medium Fits Your Pet's Personality?

When it comes to immortalizing your pet’s unique personality in art, choosing the right medium is essential. Each artistic medium, whether watercolor, oil, or digital, has distinct qualities that can bring out the spirit of your furry friend in dif...

The Times Features

How much money do you need to be happy? Here’s what the research says

Over the next decade, Elon Musk could become the world’s first trillionaire[1]. The Tesla board recently proposed a US$1 trillion (A$1.5 trillion) compensation plan, if Musk ca...

NSW has a new fashion sector strategy – but a sustainable industry needs a federally legislated response

The New South Wales government recently announced the launch of the NSW Fashion Sector Strategy, 2025–28[1]. The strategy, developed in partnership with the Australian Fashion ...

From Garden to Gift: Why Roses Make the Perfect Present

Think back to the last time you gave or received flowers. Chances are, roses were part of the bunch, or maybe they were the whole bunch.   Roses tend to leave an impression. Even ...

Do I have insomnia? 5 reasons why you might not

Even a single night of sleep trouble can feel distressing and lonely. You toss and turn, stare at the ceiling, and wonder how you’ll cope tomorrow. No wonder many people star...

Wedding Photography Trends You Need to Know (Before You Regret Your Album)

Your wedding album should be a timeless keepsake, not something you cringe at years later. Trends may come and go, but choosing the right wedding photography approach ensures your ...

Can you say no to your doctor using an AI scribe?

Doctors’ offices were once private. But increasingly, artificial intelligence (AI) scribes (also known as digital scribes) are listening in. These tools can record and trans...