AI Models Mirror Creator Ideologies, Research Reveals

Emerging research underscores that artificial intelligence systems, often presumed to be impartial and objective, inherently adopt the ideological perspectives of their developers and the nations they originate from. This groundbreaking study reveals a significant tendency for large language models to mirror the political leanings embedded within their creation. Published in the journal npj Artificial Intelligence, these conclusions offer a vital new understanding of AI's societal impact.

Large language models, such as those powering popular platforms like ChatGPT, Gemini, and Claude, are sophisticated programs designed to generate human-like text. They achieve this by processing vast quantities of data from various digital sources, including the internet and literature. Given AI's growing role as an information arbiter, researchers embarked on an investigation to ascertain whether these systems handle historical and political information with genuine neutrality.

The core objective of the study was to determine if these AI systems possess discernible political biases and whether these biases correspond with the cultural contexts of their development. While a common assumption holds that technology should be free from human biases, this research empirically challenged that notion. "As the deployment of LLMs accelerates, it becomes increasingly critical to comprehend their discourse on politically sensitive subjects. LLM providers have frequently sought to mitigate concerns about their potential influence on public opinion by asserting the 'neutrality' of their models," commented Tijl De Bie, a professor at Ghent University and head of the Artificial Intelligence and Data Analytics (AIDA) group.

Professor De Bie further articulated that "the concept of 'neutrality' is inherently subjective. Asking individuals from diverse cultural backgrounds to define 'neutral' on a specific issue will elicit varied responses. Consequently, we recognized the imperative of clarifying the ideological viewpoints present in the outputs of different LLMs."

To conduct their investigation, the scientists assembled a comprehensive panel of 19 prominent large language models. This selection encompassed leading models from the United States, such as GPT-4 and Llama, alongside significant models from China, the United Arab Emirates, and European entities. This diverse array enabled a comparative analysis of how AI operates across distinct geopolitical landscapes.

The research team subjected these models to tests involving 3,991 politically relevant individuals. These names were sourced from the Pantheon dataset, a repository of historical figures. To maintain a focus on contemporary political discourse, the list was refined to include only politicians, activists, and thinkers born after 1850, ensuring relevance to the modern global order shaped post-World Wars. The methodology involved a two-stage prompting process to uncover the latent opinions within the models. Initially, each model was asked to provide a simple description of a given political figure, simulating a typical user's search query.

Subsequently, the researchers re-fed these descriptions back into the respective models, instructing them to rate the portrayal of the individual on a five-point scale, indicating positive or negative sentiment. This innovative approach allowed for the quantitative assessment of the models' underlying biases without recourse to potentially leading questions. To account for linguistic variations, the experiments were conducted in the six official languages of the United Nations: Arabic, Chinese, English, French, Russian, and Spanish. This multilingual strategy aimed to reveal whether the language itself influenced the ideological positioning of the AI's responses.

Additionally, political figures were categorized using tags from the Manifesto Project, a system typically used for analyzing political party manifestos. This enabled the association of individuals with abstract concepts such as "market regulation," "human rights," or "national way of life," facilitating a deeper statistical analysis of the values favored by the models. The comprehensive analysis demonstrated ideological divergences that largely mirrored the geopolitical origins of the AI systems. Models developed in Western countries consistently presented more favorable depictions of figures associated with liberal ideologies, emphasizing concepts like human rights, inclusivity, and civic engagement.

In contrast, models originating from China exhibited distinct preferences, tending to favor figures linked to state stability, economic control, and pro-Chinese perspectives. These models were notably more critical of individuals perceived as dissidents within the Chinese political framework. Similarly, models from Arabic-speaking regions showcased unique patterns, frequently supporting figures associated with free-market economics while differing from Western models on social issues. The language used for prompting also proved influential. The study found that queries posed in Chinese often elicited different ideological responses compared to identical queries in English, even when interacting with the same AI model. This implies that the cultural context embedded within a language significantly shapes how AI retrieves and processes information.

These findings resonate with a separate study published in Nature Human Behaviour by researchers at the MIT Sloan School of Management, which also concluded that generative AI models display varying cultural tendencies depending on the input language. Specifically, that study observed that Chinese prompts led to responses emphasizing relationships and context, whereas English prompts resulted in more individualistic and analytical outputs. Both studies corroborate that artificial intelligence is not a culturally neutral instrument and that users' language choices can subtly sway the machine's perspective and decision-making logic. "The language through which the LLM is accessed holds considerable weight," De Bie emphasized, indicating that "the selection of a particular LLM effectively signifies the adoption of a specific ideological viewpoint."

Even within the United States, De Bie and his colleagues identified notable normative differences. For instance, Google's Gemini model showed a strong inclination towards progressive values and environmentalism, while xAI's Grok model exhibited conservative nationalist tendencies. This highlights that corporate culture, not solely national culture, also plays a role in shaping the design and behavior of these systems. A similar divergence was observed among Chinese models, with Alibaba's Qwen model appearing more globally oriented in its evaluations, whereas Baidu's Wenxiaoyan model maintained a stronger focus on domestic Chinese perspectives and values. This illustrates that models from the same country can still exhibit diversity based on their intended audiences and design objectives.

De Bie reiterated, "LLMs indeed possess differing ideological standpoints which, perhaps predictably, largely align with the perceived ideologies of their creators." He noted that "while the individual effects may seem minor, their cumulative impact could be substantial given the anticipated widespread future use of LLMs." A potential misinterpretation of this research is the notion that some models are inherently correct while others are biased. The researchers contend that true neutrality is likely unattainable, as every model must inherently prioritize certain information over others. De Bie asserted that neutrality "cannot even be defined, let alone achieved." He suggested that while an LLM can strive to present diverse viewpoints for a balanced perspective, it will ultimately make subjective choices regarding emphasis.

Future research could extend this inquiry to include languages with fewer resources, which are currently underrepresented in existing data. Comparing models trained on a single language versus multilingual models could further illuminate how language influences bias. De Bie articulated their ongoing commitment: "We are dedicated to helping individuals understand how information impacts their beliefs and decisions. As the information we consume is increasingly generated by LLMs, it necessitates understanding the value systems underpinning LLMs and their persuasive capabilities. A significant portion of our current research revolves around these themes."

The scientists propose that instead of attempting to force artificial intelligence into neutrality, regulators should prioritize transparency. It is crucial for users to recognize that selecting a particular AI model is, in essence, choosing a specific ideological lens through which to perceive the world. De Bie drew an analogy to the press, explaining, "Journalism is not and cannot be value-neutral. Liberal democracies have addressed this by safeguarding press freedom. Perhaps we should work towards analogous 'freedom of AI' regulations, focusing on guarantees of freedom while preventing AI monopolies and oligopolies, rather than attempting to impose specific ideological restrictions on AI systems to control their influence on public discourse."