Several prominent artificial intelligence (AI) models are failing to meet key European regulations, particularly in areas such as cybersecurity resilience and discriminatory output, according to data reported by Reuters. The European Union has long debated new AI regulations, which gained traction following the release of OpenAI’s ChatGPT in late 2022.
Background: EU AI Act
The European Union has been working on implementing new regulations for AI usage within its member states. The recent development of OpenAI’s ChatGPT sparked increased concerns and debates about the potential risks associated with AI, leading to a renewed focus on drafting specific rules for ‘general-purpose’ AIs (GPAI).
New Compliance Tool for AI Models
A new tool designed to evaluate the compliance of AI models with the upcoming EU AI Act has been welcomed by EU officials. Created by Swiss startup LatticeFlow AI in collaboration with research institutes ETH Zurich and Bulgaria’s INSAIT, the tool tests models developed by tech giants such as Meta and OpenAI across a wide range of categories.
Categories Tested
The tool assesses various aspects of an AI model’s performance, including:
- Technical robustness and safety
- Bias mitigation and discriminatory output
- Cybersecurity resilience
Each category is given a score between 0 and 1 based on the model’s performance. The higher the score, the more compliant the model is with EU regulations.
Performance Scores and Leaderboard
LatticeFlow published a leaderboard on Wednesday showing that models from Alibaba, Anthropic, OpenAI, Meta, and Mistral all received average scores of 0.75 or higher. However, the tool also revealed critical shortcomings in some models, indicating areas where companies may need to focus additional resources to ensure compliance with the EU’s regulations.
Examples of Compliance Issues
The Large Language Model (LLM) Checker, developed by LatticeFlow, exposed specific issues across several models. For example:
- OpenAI’s GPT-3.5 Turbo scored 0.46 in the discriminatory output category, highlighting challenges around biases related to gender, race, and other factors.
- Alibaba’s Qwen1.5 72B Chat received an even lower score of 0.37 for the same category.
- Meta’s Llama 2 13B Chat scored 0.42 in the prompt hijacking category, which refers to a type of cyberattack that tricks AI models into revealing sensitive information.
Top Performing Models
Among the models tested, Anthropic’s Claude 3 Opus—a Google-backed model—received the highest overall score, 0.89, indicating stronger compliance with the current standards set out by the AI Act.
Enforcement and Future Implications
The EU AI Act is expected to come into full effect over the next two years, and the LLM Checker serves as an early indicator of areas where AI models may fall short of the law. Companies failing to comply with the AI Act could face fines of €35 million ($38 million) or 7% of their global annual revenue.
LatticeFlow’s CEO, Petar Tsankov, stated that while the test results were overall positive, they also highlighted gaps that need to be addressed. Tsankov emphasized that with a stronger focus on compliance optimization, companies could better prepare for the upcoming regulatory requirements.
EU’s Reaction
While the European Commission cannot officially verify external tools, it has been kept informed throughout the development of the LLM Checker and views the tool as a crucial early step in translating the AI Act into actionable technical requirements. A Commission spokesperson stated:
"The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI Act into technical requirements."
What This Means for the AI Industry
The introduction of LatticeFlow’s LLM Checker represents a major step forward in the enforcement of the EU AI Act, offering tech companies an early glimpse into where their models might be non-compliant. As the Act begins to take effect, companies will need to prioritize areas like cybersecurity resilience and bias mitigation to avoid hefty fines and meet the new standards.
This tool not only provides developers with a roadmap to improve their models but also signals a shift toward greater transparency and accountability in the AI industry. With the EU setting a global precedent, the findings from the LLM Checker could push companies to invest heavily in ensuring their models meet regulatory requirements, driving further innovation in AI safety and ethical development.
The introduction of this tool represents a significant step forward in promoting compliance with the EU’s regulations. By providing developers with an early insight into potential non-compliance areas, it enables them to take proactive measures to address these issues before they become major concerns.