The Times Australia
The Times World News

.
Times Media

.

If AI image generators are so smart, why do they struggle to write and count?

  • Written by Seyedali Mirjalili, Professor, Director of Centre for Artificial Intelligence Research and Optimisation, Torrens University Australia
If AI image generators are so smart, why do they struggle to write and count?

Generative AI tools such as Midjourney, Stable Diffusion and DALL-E 2 have astounded us with their ability to produce remarkable images in a matter of seconds[1].

Despite their achievements, however, there remains a puzzling disparity between what AI image generators can produce and what we can. For instance, these tools often won’t deliver satisfactory results for seemingly simple tasks such as counting objects and producing accurate text.

If generative AI has reached such unprecedented heights in creative expression, why does it struggle with tasks even a primary school student could complete?

Exploring the underlying reasons helps sheds light on the complex numerical nature of AI, and the nuance of its capabilities.

AI’s limitations with writing

Humans can easily recognise text symbols (such as letters, numbers and characters) written in various different fonts and handwriting. We can also produce text in different contexts, and understand how context can change meaning.

Current AI image generators lack this inherent understanding. They have no true comprehension of what any text symbols mean. These generators are built on artificial neural networks trained on[2] massive amounts of image data, from which they “learn” associations and make predictions.

Combinations of shapes in the training images are associated with various entities. For example, two inward-facing lines that meet might represent the tip of a pencil, or the roof of a house.

But when it comes to text and quantities, the associations must be incredibly accurate, since even minor imperfections are noticeable. Our brains can overlook slight deviations in a pencil’s tip, or a roof – but not as much when it comes to how a word is written, or the number of fingers on a hand.

Read more: Both humans and AI hallucinate — but not in the same way[3]

As far as text-to-image models are concerned, text symbols are just combinations of lines and shapes. Since text comes in so many different styles – and since letters and numbers are used in seemingly endless arrangements – the model often won’t learn how to effectively reproduce text.

AI-generated image produced in response to the prompt ‘KFC logo’. Imagine AI[4]

The main reason for this is insufficient training data. AI image generators require much more training data[5] to accurately represent text and quantities than they do for other tasks.

The tragedy of AI hands

Issues also arise when dealing with smaller objects that require intricate details, such as hands[6].

Two AI-generated images produced in response to the prompt ‘young girl holding up ten fingers, realistic’. Shutterstock AI

In training images, hands are often small, holding objects, or partially obscured by other elements. It becomes challenging for AI to associate the term “hand” with the exact representation of a human hand with five fingers.

Consequently, AI-generated hands often look misshapen[7], have additional or fewer fingers, or have hands partially covered by objects such as sleeves or purses.

We see a similar issue when it comes to quantities. AI models lack a clear understanding of quantities, such as the abstract concept of “four”.

As such, an image generator may respond to a prompt for “four apples” by drawing on learning from myriad images featuring many quantities of apples – and return an output with the incorrect amount.

In other words, the huge diversity of associations within the training data impacts the accuracy of quantities in outputs.

Three AI-generated images produced in response to the prompt ‘5 soda cans on a table’. Shutterstock AI

Will AI ever be able to write and count?

It’s important to remember text-to-image and text-to-video conversion is a relatively new concept in AI. Current generative platforms are “low-resolution” versions of what we can expect in the future.

With advancements being made[8] in training processes and AI technology, future AI image generators will likely be much more capable of producing accurate visualisations.

It’s also worth noting most publicly accessible AI platforms don’t offer the highest level of capability. Generating accurate text and quantities demands highly optimised and tailored networks, so paid subscriptions to more advanced platforms will likely deliver better results.

References

  1. ^ a matter of seconds (www.zdnet.com)
  2. ^ trained on (www.assemblyai.com)
  3. ^ Both humans and AI hallucinate — but not in the same way (theconversation.com)
  4. ^ Imagine AI (www.imagine.art)
  5. ^ more training data (decrypt.co)
  6. ^ such as hands (www.buzzfeednews.com)
  7. ^ often look misshapen (twitter.com)
  8. ^ advancements being made (theconversation.com)

Read more https://theconversation.com/if-ai-image-generators-are-so-smart-why-do-they-struggle-to-write-and-count-208485

The Times Features

Will the Wage Price Index growth ease financial pressure for households?

The Wage Price Index’s quarterly increase of 0.8% has been met with mixed reactions. While Australian wages continue to increase, it was the smallest increase in two and a half...

Back-to-School Worries? 70% of Parents Fear Their Kids Aren’t Ready for Day On

Australian parents find themselves confronting a key decision: should they hold back their child on the age border for another year before starting school? Recent research from...

Democratising Property Investment: How MezFi is Opening Doors for Everyday Retail Investors

The launch of MezFi today [Friday 15th November] marks a watershed moment in Australian investment history – not just because we're introducing something entirely new, but becaus...

Game of Influence: How Cricket is Losing Its Global Credibility

be losing its credibility on the global stage. As other sports continue to capture global audiences and inspire unity, cricket finds itself increasingly embroiled in political ...

Amazon Australia and DoorDash announce two-year DashPass offer only for Prime members

New and existing Prime members in Australia can enjoy a two-year membership to DashPass for free, and gain access to AU$0 delivery fees on eligible DoorDash orders New offer co...

6 things to do if your child’s weight is beyond the ideal range – and 1 thing to avoid

One of the more significant challenges we face as parents is making sure our kids are growing at a healthy rate. To manage this, we take them for regular check-ups with our GP...

Times Magazine

Eliud Kipchoge signs with Shokz as global ambassador

Shokz, the consumer electronics brand, known for its open-ear headphones and technology, have today announced the current, two-time Olympic marathon champion, Eliud Kipchoge, as a global ambassador. As part of the partnership, Kipchoge and Shokz wi...

How to Analyze and Repair Complex Non-Volatile Memory Failures: Advanced Techniques for Handling NAND Flash Degradation

Non-volatile memory is the unsung hero of our digital world, quietly storing crucial data even when power is lost. But what happens when this silent guardian begins to fail? For laptop users, understanding and addressing complex NAND flash degradat...

Control From Anywhere: Remote Garage Access Made Easy

While carrying groceries, children and an overflowing recycling bin on your way out of the house the possibility of fumbling for your garage door opener is not uncommon, it is true! The classic clicker, to your great disenchantment, can be difficul...

Protecting businesses through the power of light

As Australia continues to grapple with an ongoing jobs crisis making sure all members are safe from disease and sick leave doesn’t overwhelm workflows. According to a study conducted by Frost and Sullivan Sick leave is already costing the national...

High-Quality Fabrics for Workwear

For anyone looking for high-quality fabrics for workwear, it is important to consider the functionality and comfort of the fabric when making a selection. It is essential to choose fabrics that are both durable and comfortable in order to ensure ...

From Surviving to Thriving on How a Transformational Retreat Can Change Your Life

Transformational retreats have become a popular way for professionals to take a break from their daily grind and focus on self-improvement. A transformational retreat is an immersive experience that helps individuals to reconnect with themselves, g...