What Our AI Football Prediction Challenge Teaches Us About Humans, Data and Uncertainty

Football prediction is difficult. That is exactly why it makes such an interesting test case for artificial intelligence, data analysis and AI engineering.

At KnowNow Information, we are using AI football data analysis to run an AI World Cup Prediction Challenge. It is a light-hearted but useful way to explore how different AI models approach the same problem, how humans reason about uncertainty, and how the quality of underlying data affects the answer.

The challenge compares predictions from human participants, large language models, a betting market view, a purpose-built KnowNow model, and a deliberately unpredictable model we call the Chaos Engine.

The aim is not to prove definitively that one model is better than another. The sample size is too small for that, and this is not a formal statistical forecasting exercise. We are not running thousands of Monte Carlo simulations, building a full expected-goals model, or claiming that a single set of score predictions can establish which AI system is “best”.

Instead, the challenge is designed as a practical learning exercise.

It helps us explore how different systems behave when they are given different levels of context, structure and data. It also gives us a visible, entertaining way to compare how human judgement, general-purpose AI models, retrieval-augmented systems, betting markets and randomised statistical methods respond to uncertainty.

The challenge is also a way to explore one of the most important lessons in AI engineering: the model matters, but so does the data you give it.

Why use football as an AI data challenge?

The AI World Cup Prediction Challenge gives us a way to ask some useful questions:

Do different AI models reach similar conclusions when given the same match context?
How much does structured background data influence a prediction?
Do models with deeper contextual information behave differently from models given only a simple prompt?
How often do AI predictions cluster around the betting market?
Can a deliberately simple randomised model occasionally compete with more sophisticated approaches?
What happens when the outcome is genuinely uncertain?

Football is a good test environment because it combines data, context, randomness and human interpretation. A team can dominate a match and still draw. A weaker team can score first and change the entire shape of the game. Injuries, weather, team selection, tactics, pressure and luck all matter.

That makes it a useful reminder that prediction is not the same as certainty.

A learning exercise, not a definitive benchmark

It is important to be clear about what this challenge is and what it is not.

It is not a scientific benchmark of AI model performance. It does not contain enough matches to make statistically robust claims. It does not control every variable. It does not use a full probabilistic forecasting framework. It does not attempt to estimate the complete distribution of possible outcomes.

That is deliberate.

The purpose is to observe behaviour, not to declare a winner in any definitive technical sense.

We are interested in the variation between models. We want to see whether some models are more conservative, whether some overreact to team strength, whether some favour favourites too heavily, and whether additional context changes the way a model reasons about a match.

The challenge is also a useful way to explain artificial intelligence in a more accessible way. Rather than talking abstractly about prompts, retrieval, context windows, foundation models and uncertainty, we can show how those ideas appear in a familiar setting: football score predictions.

The role of data in AI football analysis

One of the most interesting parts of the challenge is that not every participant receives the same level of data.

Some AI models receive detailed team dossiers and match-specific context files. Others are asked for a simple score prediction. The KnowNow model uses a Llama 3 8B foundation model with Retrieval-Augmented Generation. The Chaos Engine uses FIFA rankings, expected goals assumptions and randomised score sampling. The betting market uses correct-score prices.

That difference is intentional.

In real-world AI projects, the quality and structure of the underlying data often matters as much as the model itself. A highly capable model with poor context may perform worse than a smaller model with better structured information. A simple model may sometimes perform surprisingly well if the task is uncertain and the outcome space is narrow.

For the AI World Cup, the data layer has two main parts: team dossiers, which describe the underlying strength and profile of each team, and match context files, which describe the specific conditions around each fixture.

This is one of the core lessons we want to explore through the challenge.

Behind the football, there is a practical AI engineering question: how much difference does structured football data make to the way different models reason?

The team dossiers behind the predictions

A key part of the challenge is the use of team dossiers.

For each team, I have been building a structured dossier that can be supplied to the AI models before they make their predictions. These dossiers are designed to give the models more than just a team name and a fixture list. They include background information on the squad, likely tactical approach, key players, recent form, experience, player quality, squad value, strengths, weaknesses and overall tournament outlook.

The aim is to give the models structured context in a form they can reason from.

For example, the Haiti dossier looks at Haiti’s qualification story, tactical profile, squad value, likely playing style, key attacking players, limitations, and expected tournament performance. It also includes structured player-level information, so that the models can assess not only the team as a whole but also the likely quality and experience of the starting XI.

See the Haiti Dossier here on github

Haiti is a useful example because it shows both the value and the limits of the data. I have built up my own football data and notes over the last 20 years or so, and that experience informs the way I think about teams, players and tournament prediction. However, I do not have the same depth of personal historical data on every nation. Haiti is not a team where I have decades of detailed private notes, so the dossier relies much more heavily on structured public information, recent tournament data and the framework I have created for comparing teams consistently.

That is important because it reflects a real-world AI problem.

AI systems do not work in a vacuum. Their outputs are shaped by the quality, relevance and completeness of the information available to them. Where the underlying data is stronger, the model has more to work with. Where the underlying data is thinner, the model may still produce a plausible answer, but that answer should be treated with more caution.

This is one of the reasons the challenge is interesting. It is not only comparing the models themselves. It is also testing how different models respond to different levels of underlying data.

Match context: the second layer of information

Alongside the team dossiers, I also create match context files for individual fixtures.

The team dossiers provide the background view of each nation. The match context files provide the more immediate, fixture-specific layer of information. They are typically created the night before each game, by which point we usually have a clearer idea of the likely weather conditions, the refereeing team, team availability and the practical circumstances around the match.

These files are designed to capture the factors that are specific to a particular fixture rather than the general strength of a team. For example, the Mexico v South Africa match context file included information about the venue, kick-off time, local conditions, referee, travel logistics, tactical setup, tournament situation and data confidence.

That matters because football matches are not played in a vacuum.

A team’s overall quality is important, but so is the environment in which the match is played. The Mexico v South Africa file, for example, highlighted that the game was being played at Estadio Azteca in Mexico City, at around 2,240 metres above sea level. It also noted that Mexico were the host nation, that South Africa were based in Pachuca at a similar altitude, and that both teams would be preparing in high-altitude conditions.

Those details may or may not change the final prediction, but they give the models more relevant information to consider.

The match context files typically include:

Match metadata, including date, kick-off time, venue and referee
Venue information, including altitude, pitch type and whether the host nation is involved
Weather conditions based on the latest available forecast
Referee information and officiating team details, where available
Team logistics, including base camp location, travel distance and time-zone adjustment
Team situation, including injuries, suspensions and availability where known
Tactical information
Tournament context
Data confidence and outstanding issues

The Mexico v South Africa example also included an embedding summary. This is a concise narrative version of the match context, written so that an AI system can retrieve and reason from the key information more easily.

This is an important part of the experiment. Some models are being asked to make predictions with access to both team dossiers and match-specific context. Others are being tested with less information. That helps us see whether richer context changes the predictions, whether models make use of the additional information, and whether better structured data leads to more consistent reasoning.

The participants and their methodologies

David – Human predictor. Predictions can be made or amended at any point up until kick-off.

Emmanuel – Human predictor. All group-stage predictions were submitted before the tournament began but may be amended up until kick-off.

Grok – AI model from X. Grok receives team dossiers and match-specific context through a chat interface before making a prediction.

DeepSeek – AI model that is asked only to predict the final scoreline of the match.

Claude – AI model from Anthropic. Claude uses team dossiers loaded before the tournament and match-specific context supplied before each fixture.

Mistral – AI model from Mistral. Mistral receives team dossiers and match-specific context through a chat interface before making a prediction.

Betting Market – The betting market prediction uses the lowest-priced correct-score prediction from Bet365. If multiple scorelines share the same lowest price, the corresponding Betfair Exchange market is used to determine the selection.

ChatGPT Instant – Uses ChatGPT in Instant mode. It receives team dossiers and match-specific context through a chat interface before making a prediction.

ChatGPT Pro – Uses ChatGPT in Pro mode. It receives team dossiers and match-specific context through a chat interface before making a prediction.

Chaos Engine – The Chaos Engine uses FIFA rankings as a proxy for relative team strength, converts the ranking gap into expected goals for each team, then samples plausible scorelines from Poisson distributions. A controlled randomisation layer is added to avoid deterministic outputs while keeping results within realistic football score ranges.

The Chaos Engine is deliberately not trying to be the most intelligent predictor. It exists to test whether a simple, randomised statistical approach can sometimes compete with humans, betting markets and more sophisticated AI models.

KnowNow Model – The KnowNow Model is built on a Llama 3 8B foundation model and enhanced using Retrieval-Augmented Generation. Team dossiers are embedded within OpenWebUI’s knowledge base, while match-specific context is added before each fixture.

This gives us a way to test how a smaller foundation model performs when it has access to structured background knowledge.

What we are learning so far

Even in the early stages of the challenge, some interesting patterns are emerging.

Some models cluster around similar scorelines, especially when there is a clear favourite. Others are more willing to predict draws or upsets. The betting market tends to be more conservative. The Chaos Engine occasionally produces surprising outcomes, but because it is constrained to realistic football score ranges, its predictions are not absurd.

That is part of the point.

In many real-world uses of AI, the value is not simply in producing a single answer. It is in understanding how the answer was reached, what data shaped it, what uncertainty remains, and how different systems behave when faced with the same decision.

The AI World Cup gives us a simple and engaging way to explore those questions.

Why this matters beyond football

Although this challenge is built around football predictions, the underlying lessons apply much more widely.

Businesses using AI need to understand that different models can produce different outputs, even when asked the same question. They also need to understand that better data, clearer context and more structured retrieval can significantly affect performance.

A model’s answer is shaped by the information available to it, the way the prompt is written, the design of the surrounding system, and the level of uncertainty in the task itself.

That is why we see this challenge as more than a bit of football fun. It is also a practical demonstration of how AI systems behave in the real world.

For us, it connects directly to the work we do in data strategy, AI engineering and structured knowledge systems. The same principles apply whether the task is predicting a football score, automating a business process, supporting a decision, or helping an organisation make better use of its data.

Follow the challenge

We will continue sharing the results, league table and daily predictions as the tournament progresses.

The aim is to keep learning, keep testing, and keep showing how different AI systems respond when asked to make decisions under uncertainty.

Football may be unpredictable, but that makes it a very useful classroom for understanding AI.

Want to explore what better data and AI engineering could do for your organisation?

At KnowNow Information, we help organisations turn complex data, knowledge and operational challenges into practical AI-enabled solutions.

The AI World Cup is a light-hearted example, but the lesson is serious: better data, clearer context and well-designed AI systems can make a real difference to how decisions are supported.

Whether you are exploring AI automation, improving the way your organisation manages data, building a knowledge base, or developing a new digital service, we would be delighted to discuss how we can help.

Use the Get in Touch button below to start the conversation.

Author: KnowNow Information

Helping organisations navigate AI, data management, and ethical technology, delivering systems that prioritise trust, transparency, and human-centred design.

Get in touch