All posts

Thought Leadership

July 22nd, 2025

The state of AI image generation: Learning from over a billion images

After generating over a billion images and hitting 5 million per day on peak days, Gamma's AI team has learned a lot about which AI models actually deliver—and where they still fall short.

Some top-level findings:

Overall image quality has reached professional standards.
Across the board, we see significant advancements in image-prompt alignment, photorealism, and text rendering quality, setting a new standard that makes previous generations look amateur by comparison.
Text rendering has improved immensely.
Models that used to produce garbled letters can often create pixel-perfect signage, posters, and branded materials.
Unsolicited text generation has become a persistent issue. Models that excel at rendering text when requested now sometimes add unwanted text elements even when prompts explicitly avoid mentioning any text content.

Over the past year at Gamma, our users have generated more than one billion images through our platform, with peak days hitting 5 million generations. That's not just a milestone—it's a massive dataset that reveals exactly how well AI image models perform in the real world.

Gamma is uniquely positioned to compare the models, because we give users the freedom to choose between models (or we set the right default for them) based on their specific image-generation needs.

And here's what that data shows: the AI image generation space is accelerating at a rapid pace. Models that struggled with basic text rendering just months ago are now creating professional-grade marketing materials. But with rapid progress comes predictable challenges, and some old problems persist despite the advances.

Model-by-model comparison

Based on analyzing millions of user interactions and feedback, here's how the leading models — Imagen 4 Ultra, Flux Kontext Max, GPT Image, Recraft v3, and Ideogram 3.0 — stack up across our most common categories of user feedback.

(Note: We used the most premium versions of each model available within the Gamma product. We generated 3 images per model for each prompt and chose the best one.)

Prompt adherence

For the first comparison, we measured how the models interpret a complex prompt. In this case, I asked for the following:

A diverse team of four young professionals brainstorms around a colorful project board in a converted loft space. The team leader, wearing rolled sleeves and bright yellow sneakers, points to innovative sketches while a colleague adds sticky notes to their shared vision. Sunlight streams through industrial windows, illuminating their workspace filled with prototype models, creative inspiration boards, and a few quirky desk plants. Coffee cups and half-eaten snacks hint at their energetic session. The atmosphere balances professionalism with creative energy as they collaborate on their breakthrough concept. Photorealistic, warm lighting, natural expressions.

A comparison of five different image models' ability to follow a complex prompt.

In this case, we had a clear winner.

Imagen 4 Ultra nails nearly every detail in the prompt — the loft setting, the sunlight and warm lighting, the prototype models, and the "professionalism with creative agency" atmosphere.

Additionally, it conveyed the best photorealism compared to other models, which output poor facial integrity.

Text rendering

Our next test took on a longstanding challenge for image models: the accurate rendering of complex text.

We fed the models the following prompt:

A bright blue infographic with white text titled "5 STEPS FOR CALCULATING COAST FIRE" in large bold letters at the top. Layout shows 5 numbered steps in a grid with step 5 spanning the bottom. Each step has a white circle with blue number (1-5) and accompanying icon. Step 1: money bag icon, text "ADD UP YOUR RETIREMENT ACCOUNTS : 401K, 403B, IRA, ETC". Step 2: calculator and chart icon, text "USE A COMPOUND INTEREST CALCULATOR : 5-7% FOR HOWEVER MANY YEARS TO RETIREMENT". Step 3: calculator with dollar sign icon, text "USE THE 4% RULE TO SEE HOW MUCH YOU CAN LIVE ON ANNUALLY : EX. $1,500,000 X 4% = $60,000 PER YEAR". Step 4: beach umbrella and chair icon, text "CALCULATE YOUR COMFORTABLE ANNUAL LIVING EXPENSES : MULTIPLY BY 80%-100%". Step 5: stack of money icon, text "IF YOUR PROJECTED RETIREMENT INCOME EXCEEDS YOUR CURRENT ANNUAL LIVING EXPENSES, YOU'VE ACHIEVED COAST FIRE!".

A comparison of five image models' ability to render complex text.

Here, we judge a tie between Imagen, GPT, and Flux. (The other two models were far behind these front-runners.) All three models successfully interpreted the required format:

Title placement at the top in large, bold text.
Five distinct steps, clearly numbered (1 to 5) using white numbers in blue circles.
Grid-style layout, with Step 5 spanning across the bottom or visually anchored as a concluding step.

These models not only parsed the prompt accurately, but also adhered to the visual hierarchy and organizational clarity demanded by the infographic category.

Icons were used in a consistent way that corresponded to each step, and each infographic struck a visual balance between decorative elements and informative content.

That said, none of these outputs were typo-free, and none were able to include the entirety of the text we provided. (Math formatting was also a challenge for GPT and Flux.) But in all three cases, the majority of the text is legible and aligns closely with the intended message.

Photorealism

Next, we prompted the models to assess their ability to deliver photorealism, which demands extremely fine visual fidelity.

We used the following prompt:

Ultra-realistic close-up photograph of a 35-year-old woman with heterochromia, left eye green, right eye hazel, sitting in a sunlit coffee shop, shot with 85mm lens at f/1.4 aperture. Natural skin texture shows subtle freckles across her nose bridge, individual eyelashes catching golden hour light streaming through rain-streaked window behind her. Exact focus on her eyes where you can see the reflection of the photographer and camera equipment in her pupils.

A comparison of five image models' ability to render photorealistic images.

Again, Imagen 4 struck us as the clear standout, despite a slightly odd composition and unnecessary reflection at the top right.

In terms of visual fidelity—heterochromia, glass reflections, freckles, depth of field, and even camera lens effects—Imagen was the only model that delivered all of these elements convincingly.

As an example, Imagen was able accurately depict the contrast between green and hazel eyes, with the left eye distinctly greener. Flux, GPT, and Recraft all defaulted to uniformly colored eyes.

In other comparisons, we saw Imagen excel at group images, where it was able to maintain facial clarity across a group of people.

"Do not" instructions

We also compared the models' ability to follow "Do Not" instructions, which have historically been a challenge.

We used the following prompt:

Children's book illustration of a vibrant circus with red and white striped big top tent, colorful spinning carousel, juggling balls frozen mid-air, rainbow-striped tightrope stretching between tall poles, and scattered popcorn boxes and cotton candy stands. Bright primary colors, soft rounded shapes, and whimsical details like floating balloons, spinning pinwheels, and a tiny toy train circling the tent. DO NOT include any animals, people, or human figures in the scene.

A comparison of five image models' ability to follow "do not" instructions in a prompt.

In this case, Recraft was the winner.

It adhered most faithfully to our "do not" instructions. For example, it strictly avoided both anthropomorphic and human figures, while GPT included a person juggling and Ideogram showed acrobats in mid-air.

It even excelled at a subtler catch—Imagen, Flux, and GPT all depict merry-go-round horses, which violate the “no animals” clause. Recraft skips the carousel entirely, erring on the side of caution while still capturing fun.

Diagrams and flowcharts

Finally, we tested the models' ability to render complex diagrams and flowcharts. We used the following prompt:

Clean flowchart diagram showing SaaS product launch process with rectangular boxes connected by arrows, starting from "Market Research" through "MVP Development," "Beta Testing," "Marketing Campaign," "Product Launch," and ending at "Post-Launch Analytics." Use teal and lavender color scheme with clear labels, decision diamonds for key approval gates, and timeline indicators showing 3-6 month phases beneath each major milestone.

A comparison of five image models' ability to render diagrams and flowcharts.

In this case, GPT took the gold.

While each model brought a unique visual interpretation, GPT was the only one to deliver a cohesive, readable, and visually communicative flowchart that honored all key instructions from the prompt.

GPT's flow is clear and uninterrupted, with consistent arrow usage that avoids ambiguity or misdirection (unlike Flux, which includes repeated or looping nodes).

It also balances icons with text cleanly, and the illustrations support the message appropriately.

And despite a couple typos, GPT's text is fully legible and semantically sound.

Bottom line

The AI image generation models aren't just improving—they're specializing. Each model is finding its strength, which means the right choice depends entirely on what you're trying to accomplish.

In our usage, Image 4 Ultra excels at photorealism, GPT shines with structured diagrams, and Recraft followed "do not" preferences best. This specialization is the natural evolution of a maturing technology.

Six months ago, we (and our users) were managing the limitations of AI image generation. Now we're navigating the strengths. That's a much better problem to have.

The models are evolving rapidly, and at Gamma, we see the changes in real time. We'll revisit these findings in the coming weeks, with inevitable changes to the leaderboard.

The state of AI image generation: Learning from over a billion images

Model-by-model comparison

Prompt adherence

Text rendering

Photorealism

"Do not" instructions

Diagrams and flowcharts

Bottom line

How good ideas get into the universe

Product

Company

Social

Legal

Get the app