The Times Australia
The Times World News

.

why GPT-4 is notable, but not groundbreaking

  • Written by Marcel Scharth, Lecturer in Business Analytics, University of Sydney
why GPT-4 is notable, but not groundbreaking

OpenAI, the artificial intelligence (AI) research company behind ChatGPT and the DALL-E 2 art generator, has unveiled the highly anticipated GPT-4 model. Excitingly, the company also made it immediately available[1] to the public through a paid service.

GPT-4 is a large language model (LLM), a neural network trained on massive amounts of data to understand and generate text. It’s the successor to GPT-3.5, the model behind ChatGPT.

The GPT-4 model introduces a range of enhancements over its predecessors. These include more creativity, more advanced reasoning, stronger performance across multiple languages, the ability to accept visual input, and the capacity to handle significantly more text.

More powerful than the wildly popular ChatGPT, GPT-4 is bound to inspire an in-depth exploration of its capabilities and further accelerate the adoption of generative AI.

Improved capabilities

Among many results[2] highlighted by OpenAI, what immediately stands out is GPT-4’s performance on a range of standardised tests. For example, GPT-4 scores among the top 10% in a simulated US bar exam, whereas GPT-3.5 scores in the bottom 10%.

This table from the OpenAI technical report shows the performance of the model on a range of simulated standardised tests. GPT-4 often performs in the top 20% range. OpenAI

GPT-4 also outperforms GPT-3.5 on a range of writing, reasoning and coding tasks. The following examples[3] illustrate how GPT-4 displays more reliable commonsense reasoning than GPT-3.5.

An AI model that sees the world Another significant development is that GPT-4 is multimodal, unlike previous GPT models. This means it accepts both text and image inputs. Samples provided by OpenAI reveal GPT-4 is capable of interpreting images, explaining visual humour and providing reasoning based on visual inputs. Such skills are beyond the scope of previous models. GPT-4 can explain the meaning behind funny memes. OpenAI This ability to “see” could provide GPT-4 a more comprehensive picture of how the world works – just as humans acquire enhanced knowledge through observation. This is thought to be an important ingredient for developing sophisticated AI that could bridge the gap between current models and human-level intelligence. In fact, GPT-4 isn’t the first language model with these capabilities. A few weeks ago, Microsoft released Kosmos-1[4], a language model that accepts visual inputs the same way GPT-4 does. Google also recently expanded its PaLM[5] language model to be able to take in image data and sensor data collected from robots. Multimodality is a growing trend in AI research. Longer texts GPT-4 can take in and generate up to 25,000 words of text, which is much more than ChatGPT’s limit of about 3,000 words. It can handle more complex and detailed prompts, and generate more extensive pieces of writing. This allows for richer storytelling, more in-depth analysis, summaries of long pieces of text and deeper conversational interactions. In the example below, I gave the new ChatGPT (which uses GPT-4) the entire Wikipedia article about artificial intelligence and asked it a specific question, which it answered accurately. GPT-4 answers a question relating to a Wikipedia article on artificial intelligence. Author provided Limitations Even though the GPT-4 technical report[6] controversially provides no details about how the model was developed, all signs indicate it’s essentially a scaled-up version of GPT-3.5 with safety improvements. In other words, it’s not a new paradigm in AI research. OpenAI has itself said GPT-4 is subject to the same limitations[7] as previous language models, such as being prone to reasoning errors and biases, and making up false information. That said, OpenAI’s results on GPT-4 suggest it’s at least more reliable than previous GPT models. OpenAI used human feedback to fine-tune GPT-4 to produce more helpful and less problematic outputs. GPT-4 is much better at declining inappropriate requests and avoiding harmful content when compared to the initial ChatGPT release. Its arrival will continue a crucial debate among critics[8]. That being whether alternative approaches are required to fundamentally solve issues of truthfulness and reliability, or whether[9] throwing more data and resources at language models will eventually do the job. One could argue GPT-4 represents only an incremental improvement over its predecessors in many practical scenarios. Results showed human judges preferred GPT-4 outputs over the most advanced variant of GPT-3.5 only about 61% of the time. GPT-4 also shows no improvement over GPT-3.5 in some tests, including English language and art history exams. Bing AI Soon after GPT-4’s launch, Microsoft revealed[10] its highly controversial Bing chatbot was running on GPT-4 all along. The announcement confirmed speculation[11] by commentators who noticed it was more powerful[12] than ChatGPT. This means Bing provides an alternative way[13] to leverage GPT-4, since it’s a search engine rather than just a chatbot. Read more: Gaslighting, love bombing and narcissism: why is Microsoft's Bing AI so unhinged?[14] However, as anyone looped in on AI news knows, Bing started to go a bit crazy. But I don’t think the new ChatGPT will follow since it seems to have been heavily fine-tuned using human feedback. In its technical report, OpenAI shows how GPT-4 can indeed go completely off the rails without this human feedback training. Commercial applications One notable aspect of GPT-4’s release has been that, in addition to Bing, it’s already being used by companies and organisations such as Duolingo[15], Khan Academy[16], Morgan Stanley[17], Stripe[18] and the Icelandic government[19] to build new services and tools. Its commercial deployment will further heat up competition between major AI labs, and fuel investors’ appetite[20] for generative technologies. References^ immediately available (help.openai.com)^ results (openai.com)^ examples (cs.nyu.edu)^ Kosmos-1 (dailynous.com)^ PaLM (ai.googleblog.com)^ technical report (cdn.openai.com)^ same limitations (www.theguardian.com)^ crucial debate among critics (garymarcus.substack.com)^ whether (lastweekin.ai)^ revealed (blogs.bing.com)^ confirmed speculation (www.nytimes.com)^ more powerful (oneusefulthing.substack.com)^ alternative way (oneusefulthing.substack.com)^ Gaslighting, love bombing and narcissism: why is Microsoft's Bing AI so unhinged? (theconversation.com)^ Duolingo (blog.duolingo.com)^ Khan Academy (blog.khanacademy.org)^ Morgan Stanley (openai.com)^ Stripe (openai.com)^ Icelandic government (openai.com)^ investors’ appetite (www.economist.com)

Read more https://theconversation.com/evolution-not-revolution-why-gpt-4-is-notable-but-not-groundbreaking-201858

Times Magazine

Building an AI-First Culture in Your Company

AI isn't just something to think about anymore - it's becoming part of how we live and work, whether we like it or not. At the office, it definitely helps us move faster. But here's the thing: just using tools like ChatGPT or plugging AI into your wo...

Data Management Isn't Just About Tech—Here’s Why It’s a Human Problem Too

Photo by Kevin Kuby Manuel O. Diaz Jr.We live in a world drowning in data. Every click, swipe, medical scan, and financial transaction generates information, so much that managing it all has become one of the biggest challenges of our digital age. Bu...

Headless CMS in Digital Twins and 3D Product Experiences

Image by freepik As the metaverse becomes more advanced and accessible, it's clear that multiple sectors will use digital twins and 3D product experiences to visualize, connect, and streamline efforts better. A digital twin is a virtual replica of ...

The Decline of Hyper-Casual: How Mid-Core Mobile Games Took Over in 2025

In recent years, the mobile gaming landscape has undergone a significant transformation, with mid-core mobile games emerging as the dominant force in app stores by 2025. This shift is underpinned by changing user habits and evolving monetization tr...

Understanding ITIL 4 and PRINCE2 Project Management Synergy

Key Highlights ITIL 4 focuses on IT service management, emphasising continual improvement and value creation through modern digital transformation approaches. PRINCE2 project management supports systematic planning and execution of projects wit...

What AI Adoption Means for the Future of Workplace Risk Management

Image by freepik As industrial operations become more complex and fast-paced, the risks faced by workers and employers alike continue to grow. Traditional safety models—reliant on manual oversight, reactive investigations, and standardised checklist...

The Times Features

Is our mental health determined by where we live – or is it the other way round? New research sheds more light

Ever felt like where you live is having an impact on your mental health? Turns out, you’re not imagining things. Our new analysis[1] of eight years of data from the New Zeal...

Going Off the Beaten Path? Here's How to Power Up Without the Grid

There’s something incredibly freeing about heading off the beaten path. No traffic, no crowded campsites, no glowing screens in every direction — just you, the landscape, and the...

West HQ is bringing in a season of culinary celebration this July

Western Sydney’s leading entertainment and lifestyle precinct is bringing the fire this July and not just in the kitchen. From $29 lobster feasts and award-winning Asian banque...

What Endo Took and What It Gave Me

From pain to purpose: how one woman turned endometriosis into a movement After years of misdiagnosis, hormone chaos, and major surgery, Jo Barry was done being dismissed. What beg...

Why Parents Must Break the Silence on Money and Start Teaching Financial Skills at Home

Australia’s financial literacy rates are in decline, and our kids are paying the price. Certified Money Coach and Financial Educator Sandra McGuire, who has over 20 years’ exp...

Australia’s Grill’d Transforms Operations with Qlik

Boosting Burgers and Business Clean, connected data powers real-time insights, smarter staffing, and standout customer experiences Sydney, Australia, 14 July 2025 – Qlik®, a g...