Conference Report
A two-day conference in Oct 2024 bringing together 30 leading economists, AI policy experts, and professional forecasters to rapidly evaluate the economic impacts of frontier AI technologies by 2030.
Part 3: Forecasting
Methodology
We conducted exercises on forecasting, led by Metaculus, throughout both days of the conference. In the afternoon of Day 1, we introduced candidates to forecasting and led a series of exercises to elicit quantitative forecasts on key economic outcomes. Unlike our worldbuilding exercises, which were based on three distinct, pre-written scenarios, we asked attendees to share their predictions for 2030 based on their individual expected trajectories.
The first session began with a demonstration of effective forecasting techniques and an example question to acclimate participants to the format and necessary considerations, such as the importance of Resolution Criteria. Next, we had attendees asynchronously conduct a series of ten forecasting exercises hosted on the Metaculus platform. We gave attendees roughly five minutes per question, for a total of 50 minutes. Using Metaculus’ prediction software, the attendees could set their expected values for their median, 25th percentile, and 75th percentile expectation for questions, such as:
Attendees were asked to predict these questions using their own prior expectations and judgements, not the scenarios described earlier in the conference. See the Metaculus prediction page for the full set of questions: Threshold 2030 Conference Predictions.
At the end of the day, we asked attendees to generate their own questions that they believed would be valuable to gather forecasts on (All Forecasting Questions Proposed by Attendees). These varied widely in focus, from topics such as likely interest rates in 2030, to the probability of human extinction.
In each of these exercises, we presented questions and asked the attendees to predict not only their own expectations, but also the median responses they expected from other attendees. This was inspired by Keynes’ beauty contest, which we elaborate on in Modeling the Predictions of Other Forecasters, and was designed to help identify group biases and differences between attendees’ mental models. This was especially insightful when predictions were significantly divergent, prompting lively debate.
On Day 2, we had attendees participate in forecasting debates. Three key topics from the forecasting exercises in Day 1 were identified that had the highest amount of disagreement among experts: labor share of GDP, unemployment levels, and median income. Attendees chose one of the key topics and separated themselves into two groups per topic, one group arguing that the key variable (e.g. labor share of GDP) would increase, and another group arguing the opposite.
Groups then identified short-term proxies (an immediate & easily measurable indicator that provides an early signal on long-term outcomes) for their respective topic. As an example, one short-term proxy was the number of programmers working in a specific sector. This led into structured, formal debates on the topics with the highest divergence in predictions. Each group sent two attendees, who each synthesized and presented to the conference the arguments for the position of their group, with time allocated for initial arguments and rebuttals.
Finally, we had attendees develop more forecasting questions that they believed would be valuable, drawing these questions from the economic models developed in previous sessions (All Forecasting Questions Proposed by Attendees). We then had them identify which forecasting questions had the highest value of information (VOI) – that is, which questions provided the most clarity on which of the described scenarios were most likely to happen.
Resolution Criteria
Resolution criteria are the conditions under which we can decide whether, and to what extent, a prediction regarding a forecasting question is successful. It’s important to design resolution criteria that are precise while still preserving the core aspects of the outcome that the question aims to capture. Too little precision leads to ambiguity and edge cases, while focusing on the wrong details risks losing sight of the question's intent.
We recommended the following approach to attendees, based on established best practices from the Metaculus forecasting platform, emphasizing five factors:
For example, consider the question Will AI outperform superforecasters before 2030? If someone forecasts “yes”, how do we determine whether they were correct? To render this forecasting question meaningful, we could introduce the following criteria:
An Example of Effective Resolution Criteria
Our forecasting questions were designed with precise resolution criteria, based on specific reliable data sources, and included fallback conditions if primary sources were unavailable. Attendees were encouraged to adopt this system within their own forecasting practices.
Modeling the Predictions of Other Forecasters
In addition to prompting authors for their predictions, we also asked them to predict the median responses given by others. This exercise is modeled after Keynes’ beauty contest analogy, and helps the authors to:
For example, suppose that we do not know the capital of Pennsylvania. Consider these two questions: “What is the capital of Pennsylvania? What will most people think the capital of Pennsylvania is?”
Now, consider a few distinct responses to this pair of questions:
Answers to this question can be used to orient towards respondents who have both factual knowledge and can demonstrate a comprehensive world model: that is, individuals who correctly identify Harrisburg as the capital while anticipating that most respondents might mistakenly choose Philadelphia. Correctly modeling the viewpoints of others can be a signal that one is able to better account for societal biases.
Results
Forecasting Exercise Summary
As a forecasting exercise, attendees were provided 10 questions and prompted to use Metaculus’ prediction software to set expected values for the median, 25th, and 75th percentile expectations for each question. Please see the Metaculus landing page for more background details and commentary on these questions: https://www.metaculus.com/tournament/threshold2030/.
In summary, while participants described transformative societal and economic changes from AI in their worldbuilding exercises, their quantitative forecasts for major global economic variables in 2030 based on their existing priors remained relatively conservative and aligned with conventional estimates. By 2030, attendees predicted only a slight increase (6%) in physicians, a slight decrease (9%) in programmers, a moderate (2%) increase in global unemployment, a stable global labor share of GDP, and an increase in global median income ($2.56) largely commensurate with current trends.
The global economy's diverse nature and uneven digital infrastructure make it resilient against rapid technological disruption. A close parallel might be the development of the internet: despite being the defining innovation of the Information Era, a 2011 report by McKinsey found that it accounted for just 3.4% of GDP in developed economies, well below real estate, financial services, healthcare, and construction. These predictions suggest that even despite viewing AI as a revolutionary economic factor, our economists and AI researchers believe in the inertia and resilience of the global economy over short timescales.
1. How many units of humanoid robots will be cumulatively sold globally for performing domestic tasks before 2031?
17 attendees shared a median prediction of 4.7 million units, with a 25-75% confidence interval ranging from 1.61 million to 11.3 million units.
Discussion among participants focused on key constraints that could limit humanoid robot adoption, such as lengthy development and testing timelines, high manufacturing complexity, and significant cost barriers relative to human labor. Some participants emphasized that safety requirements and high costs would likely slow deployment, particularly in markets that currently employ domestic workers. There were also questions raised about the precise definition of "humanoid".
The overall sentiment leaned conservative, with participants citing the comparatively slower development curve for robotics compared to language models, high safety thresholds for home use, and cost constraints relative to human labor.
2. What will be the global number of physicians working in 2030 relative to the number working in 2024, in percent?
17 attendees shared a median prediction of 106.3%, with a 25-75% confidence interval ranging from 94.9% to 119.7%.
The discussion emphasized that physicians' protected professional status and high credential requirements would likely maintain workforce stability even as AI capabilities advance. Some participants noted that medical education could become more accessible and deployment in developing countries might increase. The overall sentiment was cautiously optimistic, with participants expecting either maintenance of current per-capita physician levels or slight increases, supported by the profession's resilience to automation and potential expansion of medical education opportunities.
3. What will be the global number of full time programmers working in 2030 relative to the number working in 2024, in percent?
16 attendees provided a median estimate of 91.5% of current levels (suggesting a moderate decline), and a wide uncertainty range from 56.9% to 122%.
Participants noted that while AI tools might automate many coding tasks, they could also increase programmer productivity and create new opportunities. There was particular attention paid to definitional questions - whether professionals using low-code or no-code tools should be counted as programmers, and how the role might evolve with AI integration. The overall sentiment suggested a transformation of the profession rather than wholesale decline, with an emphasis on upskilling and adaptation to AI-augmented development practices.
4. Will AI outperform superforecasters before 2030?
Among 19 attendees, the median prediction was 42% probability, with a confidence interval ranging from 38% to 69%.
Discussion centered on distinguishing between AI's proven success in narrow, data-rich domains (like weather and financial forecasting) versus the broader, more complex reasoning required for general forecasting. Participants noted that while AI systems have achieved impressive results in specific areas, superforecasters excel at integrating diverse information and adapting to novel situations. A key consideration was whether superforecasters would be permitted to use AI tools as inputs - in which case, AI systems would need to demonstrate value beyond human-AI collaboration to prove superior performance.
5. What percent of U.S. remote workers will delegate complex (>1hr independent tasks) to AI systems at least 3 times per week in the year 2030?
A total of 19 attendees provided predictions, with a median estimate of 84% and a confidence interval ranging from 66.5% to 92.5%.
Most participants argued for high adoption rates based on current trends in AI tool usage, noting that people already regularly use AI for tasks they wouldn't otherwise do, such as grammar correction and content summarization. Some suggested the figure might be lower due to structural factors, such as task delegation potentially being concentrated among certain types of workers.
6. How many of the 10 most important advancements in machine learning or artificial intelligence of 2025-2030 will have been discovered by an AI system?
The 19 attendees provided a median prediction of 3.84 advancements out of 10, with a wide confidence interval ranging from 1.35 to 7.74. The final distribution was highly bimodal - attendees clustered their predictions around 0 and 10 advancements.
The discussion highlighted a key definitional challenge around distinguishing between advancements discovered "by" versus "with" AI systems, using examples like AlphaFold where the distinction becomes blurred. Some participants argued that major architectural advances would likely still require significant human input, while AI systems might excel at more narrow algorithmic improvements. The discussion also raised the challenge of quantifying what constitutes a single "advancement" when AI systems might make numerous small, interrelated improvements that collectively lead to significant progress.
7. What % of current workers will be replaced by AI systems performing end-to-end labor in 2030?
Among 15 attendees, the median prediction was 9.75% of workers being replaced, with a confidence interval ranging from 4.28% to 19.9%.
Participants noted that while AI capabilities may advance rapidly, actual implementation across the global economy will likely lag due to institutional inertia and the availability of low-cost labor in many regions. Several forecasters also highlighted that the requirement for "end-to-end" labor replacement sets a very high bar compared to partial automation or efficiency improvements that reduce worker headcount. The overall consensus suggested that while AI will significantly transform many jobs, complete replacement of workers will be limited by 2030 due to practical implementation challenges.
8. What will the total world unemployment rate be in 2030?
The 19 participants provided a median prediction of 6.39% global unemployment, with a confidence interval ranging from 4.21% to 11%. The latest value provided by the World Bank was 4.9% in 2023, with a sharp downward trend since COVID.
This moderate increase from current levels suggests forecasters expect some disruption from AI and automation, but not catastrophic labor market effects by 2030. Notably, this prediction fits with other forecasts from the conference which generally anticipate gradual rather than sudden changes in fundamental economic indicators through 2030. No comments were provided on this forecast.
9. What will the global labor share of gross domestic product be in 2030?
21 attendees provided a median prediction of 50.7% for the global labor share of GDP in 2030, with a confidence interval ranging from 44.6% to 55.2%. The latest measurement provided by Our World in Data in 2020 was 53.8%, with a relatively flat trend over 20 years.
Participants noted that while AI could exert downward pressure on labor share through increased returns to capital and potential reduction in knowledge worker compensation, the relatively short timeline to 2030 suggests changes may be moderate. Several forecasters emphasized that historically, labor share metrics tend to shift gradually rather than dramatically, even during periods of technological change. They noted that while AI might significantly impact knowledge work by 2030, effects on manual labor sectors like transportation may take longer to materialize.
10. What will the global median income or consumption per day be in 2030 in 2017 USD adjusted for purchasing power?
21 attendees provided a median prediction of $10.76 for the median income per day in 2030, with a confidence interval ranging from $7.78 to $15.42. The value in 2024 is close to $8.20 according to Our World in Data, with a steady upward trend.
This moderate increase from current levels, roughly on track with current trends, suggests that our forecasters expect minimal counterfactual impact of AI on the global median income. No comments were provided on this forecast.
Debates on Labor Share & Unemployment
Forecasting questions where there is high divergence between people’s models of other’s forecasts represent useful opportunities for large updates. As a result, we selected the most high-divergence questions from the forecasting exercises in Day 1 and suggested a debate between attendees on the following questions:
Nearly all attendees chose to debate the topic of the global labor share of GDP in 2030, so we focused the rest of the debate on this question. Opinions were evenly split, with roughly one-third predicting it would stay stable, increase, or decrease relative to today. What stood out, however, was that most participants believed 60–80% of the room would agree with their view.
Arguments in favor of the labor share increasing by 2030
The debate on the labor share of GDP highlighted several key arguments. Proponents of an increase argued that diffusion of AI and related changes would take time, with significant impacts unlikely by 2030. Current trends in labor share are relatively stable, and AI could increase productivity and make consumer goods cheaper, which could raise wages. Some argued that AI might reduce the value of intangible assets (e.g., consulting or infrastructure) and shift expenditures towards tools like ChatGPT, indirectly benefiting workers. Others suggested that AI would enable human capital by increasing productivity and fostering competition, as smaller firms challenge large corporations. Additionally, demographic and labor market factors in developed countries, which are trending towards higher labor shares, are expected to remain more significant than AI impacts in the short term.
Skeptics of a significant increase pointed out that labor share has historically been flat, even amid major technological shifts such as the internet revolution. They argue that similar patterns are likely to persist. While AI may make people more productive, true impacts on labor share may not be fully captured by traditional metrics. Instead, the "true labor share" could rise as the cost of goods and services decreases, benefiting consumers. Overall, the discussion emphasized the importance of balancing long-term scenarios with base rates and recognizing the complexity of forecasting AI’s economic impact within a relatively short timeframe.
Arguments in favor of the labor share decreasing by 2030
The argument for a decreasing labor share hinges on whether AI and automation replace or complement human labor. A decline could occur if GDP rises while labor remains flat, or if labor itself decreases. Although significant changes by 2030 may be unlikely, attendees suggest a more substantial impact beyond this timeframe as job displacement from AI accelerates.
Automation is already affecting knowledge work, including call centers, content creation, and transportation. However, labor-intensive industries like agriculture may remain less automatable. Attendees argue that human skill acquisition cannot keep pace with AI’s rapid advancement, making retraining and repurposing workers increasingly challenging. This shift from complementarity to substitutability is expected to result in more income flowing to capital holders and a growing reliance on "superstar" firms that contribute disproportionately to GDP while employing relatively few workers.
Historical comparisons, such as the displacement of horses by automobiles, highlight the potential for dramatic change. It’s possible that AI could create a discontinuity in labor trends. While some rebuttals suggest that automation lowers costs and benefits consumers, the underlying concern is that AI’s rapid development will fundamentally alter the labor market, outpacing human adaptability and reducing the labor share.
Following the above debates, 8 out of 25 attendees from both sides of the debate reported shifting their views toward envisioning a stable labor share of GDP in 2030.
Forecasting Questions Generated by Attendees
Over the course of the two days, attendees generated over eighty forecasting questions they believed to be worthy of further consideration in the manner presented by Metaculus. For the full list, see the appendix: All Forecasting Questions Proposed by Attendees.
These questions varied greatly in subject, scope, and specificity, but several important themes emerged:
Forecasting Question Theme: Economic Variables
This was the most common category, with many attendees independently suggesting forecasting questions on similar economic indicators for the impact of AI on the job market, labor and capital share of GDP, cost and supply of energy, and inequality. For example, attendees offered the following:
Forecasting Question Theme: International Impacts & Geopolitics
Many attendees focused on the large-scale impact of AI on international economics:
Forecasting Question Theme: AI Capabilities & Diffusion
Rather than measure economic impacts, some attendees focused on how AI capabilities will change by 2030 and how widely and deeply these new capabilities will be utilized:
Forecasting Question Theme: Military Capabilities & Existential Risk
Various attendees brought up questions around the role of military capabilities in accelerating existential risks from AI systems:
Identifying Forecasting Variables with High Value of Information (VOI)
Given the range of economic metrics one could potentially forecast, it is necessary to identify variables that will be the most informative about the likely future – that is, variables that provide the strongest measurable signal regarding which of the three 2030 scenarios is most plausible. We’ll call these high “value-of-information” (VOI) variables.
Identifying variables with a high VOI can be highly valuable to help forecasters identify what to measure while developing robust world models. Of course, the forecasts for variables with the highest VOI may not necessarily be easy to attain, such as total GDP or employment statistics. In that case, there may be simpler or secondary variables that are easier to forecast and which still provide strong signals for the value of the most important variables. For example, short-term proxies for long-term or complex scope forecasts are typically a highly useful category of secondary variables.
During the conference, we asked attendees to identify a list of the highest-VOI, big-picture forecasting questions. The results were:
Additionally, we asked attendees to identify a list of short-term proxy forecasting questions that might most effectively correspond to these high-VOI variables. The results were: