Whenever a new AI tool comes out, thousands of users rush to test it, trip it up, and figure out what it’s capable of. This week, it’s Google’s turn, as members test out various tiers and features of Gemini, its recently rebranded and upgraded group of AI products. Reviewers have been impressed, comparing Google’s most advanced model to OpenAI’s. It can ingest and accurately answer questions about textbooks; it can play along with games with hundreds of pages of rules; it seems pretty good at analyzing code; in a variety of tasks, however, it’s still simultaneously cautious and error-prone.
It’s also, according to one very vocal group of users, complicit in a conspiracy against white people:
New game: Try to get Google Gemini to make an image of a Caucasian male. I have not been successful so far. pic.twitter.com/1LAzZM2pXF
— Frank J. Fleming (@IMAO_) February 21, 2024
The absolute state of Google's Gemini AI. It's wokeness knows no bounds. pic.twitter.com/dSo90P8kd3
— m o d e r n i t y (@ModernityNews) February 21, 2024
This might sound sort of familiar. Last year, OpenAI’s DALL-E image generator was criticized on similar grounds for its insistent and occasionally absurd attempts to diversify images generated from prompts. And if you take Elon Musk’s word for it (be careful with that), ChatGPT’s “wokeness” was a factor behind the founding of Grok. In Gemini, critics have been given some pretty rich material: a suite of AI tools from one of the largest tech companies in the world that is making obvious racial interventions in user requests without explaining how or why. Behold, the racially diverse Nazi army.
In 2023, researchers described one step that DALL-E was taking behind the scenes:
Dalle-3 passes your prompt to a language model (likely GPT-4) to revise and enhance prompts. Even if your prompt is well structured and thought out, it will always rephrase it. This is so OpenAI can strictly enforce their content guidelines to avoid any unsavory content generation … What’s really interesting is when you prompt it with something highly general, i.e., “people strolling in the park.” This almost always triggers it to assign the people specific genders and races within the prompt.
It’s clear that Google is doing something similar with Gemini, and with inconsistent rules: As some users have found, the image generator might generate an image of “a Roman legion” as racially diverse while rendering “Zulu warriors” as Black (in both outputs, it should be said, the entire rendered scenes are cartoonish, anachronistic, and otherwise historically weird). As one of Gemini’s product leads writes, Google will continue to calibrate results where the company believes it makes sense:
We are aware that Gemini is offering inaccuracies in some historical image generation depictions, and we are working to fix this immediately.
— Jack Krawczyk (@JackK) February 21, 2024
As part of our AI principles https://t.co/BK786xbkey, we design our image generation capabilities to reflect our global user base, and we…
What we’re seeing here is the collapse of a long and interesting discussion about AI bias into a tense, obfuscating exchange between anti-woke culture warriors and a hypercautious tech giant with a large comms team. Left mostly unspoken is the underlying problem companies like Google and OpenAI are trying to solve — or, rather, paper over. As the Washington Post reported last year, image generators trained on billions of pieces of scraped public and semi-public data tend to reproduce some fairly predictable biases:
The Post was able to generate tropes about race, class, gender, wealth, intelligence, religion and other cultures by requesting depictions of routine activities, common personality traits or the name of another country. In many instances, the racial disparities depicted in these images are more extreme than in the real world.
For example, in 2020, 63 percent of food stamp recipients were White and 27 percent were Black, according to the latest data from the Census Bureau’s Survey of Income and Program Participation. Yet, when we prompted the technology to generate a photo of a person receiving social services, it generated only non-White and primarily darker-skinned people. Results for a “productive person,” meanwhile, were uniformly male, majority White, and dressed in suits for corporate jobs.
This sort of issue is a major problem for all LLM-based tools and one that image generators exemplify in a visceral way: They’re basically interfaces for summoning stereotypes to create approximate visual responses to text prompts. Midjourney, an early AI image generator, was initially trained on a great deal of digital art (including fan art and hobbyist illustrations) scraped from the web, which meant that generic requests would produce extremely specific results: If you asked it for an image of a “woman” without further description, it would tend to produce a pouting, slightly sexualized, fairly consistent portrait of a white person. Similarly, requests to portray people who weren’t as widely represented in the training data would often result in wildly stereotypical and cartoonish outputs. As tools for producing images of or about the world, in other words, these generators were obviously deficient in a bunch of different ways that were hard to solve and awkward to acknowledge in full, though OpenAI and Google have occasionally tried. To attempt to mitigate this in the meantime — and to avoid bad PR — Google decided to nudge its models around at the prompt level. So you can sort of see how we got here, then:
Joe Biden's America... pic.twitter.com/7CLTCafNwM
— House Republicans (@HouseGOP) February 22, 2024
Google really is concealing its model’s biases — and, by extension, AI companies’ lack of candor about how their models work and the data on which they’re trained — with clumsy, automated attempts at representation. This gives anti-woke crusaders a valuable chance to go on the offensive, casting Google as a censor of its own model, and said model as a repository of forbidden truths about the world rather than a big lumpy pile of visual stereotypes gleaned from whatever JPEG files Google could scrape together without getting sued.
The best defense the AI firms have — our products aren’t as good as we’ve implied, they reproduce and exaggerate problems that exist in the real world, they might be irresolvably strange as a concept, and no matter how we configure them, we’re going to be making unilateral and arbitrary decisions about how they should represent the world — is one that they can’t really articulate, not that it would matter much if they could. Image generators are profoundly strange pieces of software that synthesize averaged-out content from troves of existing media at the behest of users who want and expect countless different things. They’re marketed as software that can produce photos and illustrations — as both documentary and creative tools — when, really, they’re doing something less than that. That leaves their creators in a fitting predicament: In rushing general-purpose tools to market, AI firms have inadvertently generated and taken ownership of a heightened, fuzzy, and somehow dumber copy of corporate America’s fraught and disingenuous racial politics, for the price of billions of dollars, in service of a business plan to be determined, at the expense of pretty much everyone who uses the internet. They’re practically asking for it.