World Majority
Do We Have the Ability to Predict?

The prospects for diplomatic achievements and failures are more difficult to assess than pressure patterns and escalation dynamics. This asymmetry is poorly consistent with the realist notion that friendship in international relations is a direct continuation of enmity, and vice versa. There appears to be a qualitative difference between these two forms of interaction between states.

On October 6, 1973, Egyptian aircraft launched a surprise attack on Israeli troop positions in Sinai. Thus began the Yom Kippur War, in which Egypt, with the support of its allies, posed a serious threat to Israel for the first time in the long history of the conflict, defeating its forces along the Suez Canal. This success was largely due to the element of surprise. Although the Israeli forces, having regrouped, were eventually able to achieve a military victory, their aura of invincibility was undermined.

After the conflict ended, Israel conducted a detailed analysis of how the enemy had managed to pull off such a painful “strategic surprise”. Its consequences led to the fall of the government and a profound crisis of confidence in the country’s leadership. The investigation revealed that Israeli intelligence had numerous indications of an impending attack, but these signals were disregarded, the specialists reporting them were suppressed, and reassuring reports were sent to higher-ups.

This failure was committed by the very same agencies that had established themselves for decades as some of the most highly qualified in the world and had promptly warned of a similar threat six years earlier. In 1967, Israel, using the information it received, launched a pre-emptive strike that led to victory in the Six-Day War. In 1973, its failure to correctly assess the situation brought it to the brink of defeat.

In international relations, neither practitioners nor theorists like to make predictions. Both are inclined to cite the diversity of circumstances influencing events, the indeterminism of incentives, and the influence of the free will of political leaders. The fact remains that foreign policy, like any other social activity, is impossible without assumptions about the future. Likewise, the value of scientific constructs cannot be measured without the ability to draw predictive consequences.

The thesis of fundamental unpredictability has been repeatedly reiterated in relation to various phenomena. For example, in the 19th century, it was regularly applied to weather forecasts. Although it’s hard to resist ridiculing meteorologists after getting drenched in an unexpected downpour, the progress in weather forecasting in recent decades alone has been astonishing – by the turn of the 2020s, five-day forecasts had achieved the same accuracy that was impossible to achieve for one day in the 1980s. The history of weather forecasting shows that even in systems with high sensitivity to initial conditions (the “butterfly effect”), forecast quality can be improved.

This is largely the result of conscious efforts to improve forecasting methods and, not least, consistent error correction. Until recently, these objectives were virtually absent from international analysis. No attempt was even made to diagnose the scale of the problem. Notions that international relations are unpredictable were based on anecdotal examples, such as Israel’s unpreparedness for the Yom Kippur War.

To rectify this situation, in the autumn of 2022, a group of MGIMO researchers initiated a project to study the accuracy of expert forecasting. The focus was not on assessing long-term trends or general directions of development in the international system, but on the specific events that generate those “strategic surprises” that pose significant risks to states. This choice was partly due to the greater verifiability of such forecasts compared to more vague assumptions about the future.

The study posed two key questions: “What qualities distinguish more successful forecasters from less successful ones?” and “What types of phenomena are better predicted than others?” The primary data source for the analysis was the results of monthly surveys of members of the domestic expert community. In addition, in-depth interviews are conducted with a number of regular respondents to understand the rationale behind their judgments.

World Majority
Difficulties and Prospects of Expert Forecasting in International Relations
On September 11, the Valdai Club’s Moscow site hosted a discussion titled “Expert Forecasting of International Relations.” The moderator, Oleg Barabanov, described foreign policy forecasting as a critical applied task for experts and for communication between the expert community and foreign policy agencies.
Club events

A wide range of specialists are invited to participate in the surveys, including experts from leading universities, academic institutions, think tanks, and professionals. A total of 299 respondents, covering various regions and subject areas, have already answered at least one of the questions. They vary in age, education, and work experience, increasing the representativeness of the study.

All respondents have participated without any financial incentives and confidentiality was assured (this means that survey organisers do not disclose the identities of respondents or their individual survey results without their consent.)

A typical example of a question asked in the questionnaire is: “What is the probability that Recep Tayyip Erdogan will be president of Turkey on September 1, 2023?” Such specific wording allows for the subsequent verification of responses to determine whether they correspond to reality. This is also achieved by introducing precise timelines for potential events. Respondents are not required to make binary predictions (“will come true/will not come true”), but are expected to estimate the probability as a percentage.

When formulating questions, the research team primarily focuses on topics significant to Russia’s foreign policy. At the same time, they strive to ensure diversity in terms of geography, time horizon, and type of event. Each survey wave (except the very first) includes no more than nine questions due to the limited time respondents are willing to devote to completing the questionnaire. Participants are not required to answer all questions in a particular wave or participate in all waves.

Individual questions are repeated to compare expert assessments across different contexts. By September 2025, 318 questions, 256 of which were unique, were asked in 35 waves. Nearly 24,000 responses were received. Respondents were asked to answer not only questions in their areas of expertise but also questions in which they are not professionally involved. Therefore, based on their self-identification, participants are classified as experts in the former case and as non-experts in the latter.

The primary metric for assessing forecast accuracy in the study is the Brier score. This indicator was chosen for its simplicity, the maximum volume of data collected, and its applicability to other similar projects. It is calculated as the mean square deviation of the expected probability of an event from its real probability

Формула.jpg

where Pi is the expected probability of an event, taking values ​​from 0 to 1, and 0i is the real probability of an event, taking the value 0 if the event did not occur and 1 if the event occurred).

The higher the forecast accuracy, the smaller the deviation of the expected probability from the actual probability, and, accordingly, the closer the Brier score is to zero. Conversely, the worse the estimates, the closer the indicator is to one. However, this metric, like any other, can be an artefact of the nature of the questions asked, rather than the quality of the forecasts – the more complex they are, the lower the accuracy. In this regard, it is of greatest interest to consider the indicators in a comparative perspective. The Brier Index is convenient because it can be calculated for individual respondents, groups of respondents, and other sets of responses based on various criteria. As of September 2025, the results of 203 questions – whether the predicted events occurred or not – are known. This set provides the basis for a number of preliminary, cautious conclusions within the study, which simultaneously serve as hypotheses for further testing.

First and foremost, the accuracy of expert assessments in various subject areas is similar to that of non-experts.

In 112 cases, the former provided, on average, more accurate answers than the latter, while in 91 cases, they were less accurate. The Brier score of experts (around 0.206) is slightly higher than that of non-experts (around 0.212). The difference in values ​​for this indicator is not statistically significant (it does not pass standard tests), and therefore It may be due to chance.

When broken down by time horizon (see Figure 1), it turns out that over short horizons (up to a year), there is virtually no difference in forecast quality. This difference emerges when forecasting over longer periods, but there is significantly less data available, making the results less reliable. Over extremely short horizons (within one quarter), non-experts are even more accurate on average than experts, but in this case, again, the lack of observations may be a factor.

01_Brier_eng.jpg

These results lead to the counterintuitive conclusion that domain expertise does not provide a significant advantage in forecasting. It should be noted that in some cases, experts provide very accurate estimates in their area of ​​expertise. At the same time, some non-experts also provide high-quality forecasts on the same issues, while other experts can make significant mistakes. In other words, forecast accuracy is highly dependent on other personal characteristics of respondents that have yet to be determined.

For example, in November 2024, the question was asked: “What is the likelihood that Syrian President Bashar al-Assad will lose power before October 1, 2025?” Even before this survey was completed, an offensive by anti-government forces began on November 27, resulting in the long-time Syrian leader fleeing the country. His government fell on December 8, meaning the anticipated event occurred well before the stated deadline. Forty-seven respondents answered this question, giving an average estimate of the probability of a change of power in the country of 18.1%. Only one respondent predicted that the chances of the Syrian president being removed were higher than those of his retention. He estimated the probability at 75%. This most insightful respondent turned out to be a regional expert (though not specifically on Syria). Nevertheless, the average estimate of Middle East experts, at 20.4%, differed little from the 15.7% of non-experts. Moreover, second and third place (with a 50% prediction) were shared by non-experts.

Paradoxically, in more than half the cases (109 out of 203), the spread of expert estimates was greater than that of non-experts. This contradicts the intuitive assumption that specialists in a given field, united by a body of shared knowledge, will have a similar set of expectations regarding future developments. The paradox that arises is likely due to the fact that non-experts draw their ideas from information included in general news reports, while experts, with more extensive data, assess the significance of various facts differently.

This means that relying on the opinion of an individual specialist increases the chance of receiving not only a more accurate answer than from an average generalist, but also a very inaccurate one. Meanwhile, in the case of non-experts (even if they belong to a broad cohort of international relations experts), the assessment results will be more average, and therefore, the risk of making a significant error is unexpectedly lower.

If subject-matter knowledge does not provide a significant advantage in forecasting, then what other respondent characteristics are significant? One of them, based on the current results, is age.

Moreover, in this case, the effect is nonlinear. The highest accuracy is demonstrated by the middle-age group – using the Rosstat classification – from 35 to 44 years old (Brier score = 0.208). Both younger and older respondents had worse performance, although not dramatically so (BS=0.22 for those aged 25 to 34, 0.219 for those aged 45 to 59, and 0.222 for those over 60).

An important common characteristic shared by most respondents was their tendency to overestimate political inertia and underestimate the potential for change. Consequently, they are much more accurate in predicting non-events–instances where the situation remains unchanged from previous periods (BS=0.176). However, predicting the occurrence of events is significantly worse (BS=0.39). It can be concluded that international relations experts often lack the audacity to make bold predictions.

When classifying event types, it was found that survey participants found questions about confrontation significantly easier (BS=0.17), while cooperation remained less predictable (BS=0.24). It should be noted that this is not a case of simple alarmism. Questions about confrontation include both cases in which the expected escalation occurred and those in which it did not. Similarly, the cooperation category includes examples of both events and non-events.

It turns out that the prospects for diplomatic achievements and failures are more difficult to assess than pressure patterns and escalation dynamics. This asymmetry is poorly consistent with the realist notion that friendship in international relations is a direct continuation of enmity, and vice versa. There appears to be a qualitative difference between these two forms of interaction between states.

Finally, somewhat better results were obtained for questions concerning non-Western regions (BS=0.189), while the West remained less predictable (BS=0.219). In this case, the study’s results again contradict the common notion that “the East is a delicate matter”. For Russian specialists, it is still less mysterious than the upheavals in Western Europe and North America.

Thus, the ongoing study, although far from complete, has already yielded a number of non-trivial results. While significant individual differences are observed between individual forecasters and questions within the identified categories, some consistent trends are evident. These trends can be used to formulate preliminary forecasting predictions. Further work involves not only re-examining and refining the identified patterns but also seeking more substantiated explanations for their causes.

It would be naive to expect that we will completely eliminate uncertainty from discussions of international relations. Nevertheless, finding answers to these questions will help improve the accuracy of expert assessments, enable the selection and training of more successful forecasters, and reduce the risk of “strategic surprises” for Russian foreign policy. It will bring us several steps closer to moving assumptions about the future from the realm of prophecy to the realm of valid prediction.


Views expressed are of individual Members and Contributors, rather than the Club's, unless explicitly stated otherwise.