Programs

About

Publications

Get Updates

Programs

Scenario Research

Governance Research

AI Awareness

Fellowships

About

About Us

Our Team

How We Work

Theory of Change

Blog

Donate

Get Updates

Programs

Scenario Research

Governance Research

AI Awareness

Fellowships

About

About Us

Our Team

How We Work

Theory of Change

Blog

Donate

Get Updates

Conference Report

Threshold 2030

Modeling AI Economic Futures

A two-day conference in Oct 2024 bringing together 30 leading economists, AI policy experts, and professional forecasters to rapidly evaluate the economic impacts of frontier AI technologies by 2030.

Report Home

Download Full Report

Outline

Comprehensive Summary

Context & Overview of the Conference

Proposed Scenarios

Part 1: Worldbuilding

Part 2: Economic Causal Models

Part 3: Forecasting

Methodology

Resolution Criteria

Modeling the Predictions of Other Forecasters

Results

Forecasting Exercise Summary

Debates on Labor Share & Unemployment

Forecasting Questions Generated by Attendees

Promising Future Research Ideas

Conclusions

Appendix

All Worldbuilding Writeups: Part 1

All Worldbuilding Writeups: Part 2

All Economic Causal Model Notes

All Forecasting Questions Proposed by Attendees

Part 3: Forecasting

Methodology

We conducted exercises on forecasting, led by Metaculus, throughout both days of the conference. In the afternoon of Day 1, we introduced candidates to forecasting and led a series of exercises to elicit quantitative forecasts on key economic outcomes. Unlike our worldbuilding exercises, which were based on three distinct, pre-written scenarios, we asked attendees to share their predictions for 2030 based on their individual expected trajectories.

The first session began with a demonstration of effective forecasting techniques and an example question to acclimate participants to the format and necessary considerations, such as the importance of Resolution Criteria. Next, we had attendees asynchronously conduct a series of ten forecasting exercises hosted on the Metaculus platform. We gave attendees roughly five minutes per question, for a total of 50 minutes. Using Metaculus’ prediction software, the attendees could set their expected values for their median, 25th percentile, and 75th percentile expectation for questions, such as:

Will AI outperform superforecasters before 2030?

What will be the global number of full time programmers working in 2030 relative to the number working in 2024, in percent?

How many of the 10 most important advancements in machine learning or artificial intelligence of 2025-2030 will have been discovered by an AI system?

Attendees were asked to predict these questions using their own prior expectations and judgements, not the scenarios described earlier in the conference. See the Metaculus prediction page for the full set of questions: Threshold 2030 Conference Predictions.

At the end of the day, we asked attendees to generate their own questions that they believed would be valuable to gather forecasts on (All Forecasting Questions Proposed by Attendees). These varied widely in focus, from topics such as likely interest rates in 2030, to the probability of human extinction.

In each of these exercises, we presented questions and asked the attendees to predict not only their own expectations, but also the median responses they expected from other attendees. This was inspired by Keynes’ beauty contest, which we elaborate on in Modeling the Predictions of Other Forecasters, and was designed to help identify group biases and differences between attendees’ mental models. This was especially insightful when predictions were significantly divergent, prompting lively debate.

On Day 2, we had attendees participate in forecasting debates. Three key topics from the forecasting exercises in Day 1 were identified that had the highest amount of disagreement among experts: labor share of GDP, unemployment levels, and median income. Attendees chose one of the key topics and separated themselves into two groups per topic, one group arguing that the key variable (e.g. labor share of GDP) would increase, and another group arguing the opposite.

Groups then identified short-term proxies (an immediate & easily measurable indicator that provides an early signal on long-term outcomes) for their respective topic. As an example, one short-term proxy was the number of programmers working in a specific sector. This led into structured, formal debates on the topics with the highest divergence in predictions. Each group sent two attendees, who each synthesized and presented to the conference the arguments for the position of their group, with time allocated for initial arguments and rebuttals.

Finally, we had attendees develop more forecasting questions that they believed would be valuable, drawing these questions from the economic models developed in previous sessions (All Forecasting Questions Proposed by Attendees). We then had them identify which forecasting questions had the highest value of information (VOI) – that is, which questions provided the most clarity on which of the described scenarios were most likely to happen.

Resolution Criteria

Resolution criteria are the conditions under which we can decide whether, and to what extent, a prediction regarding a forecasting question is successful. It’s important to design resolution criteria that are precise while still preserving the core aspects of the outcome that the question aims to capture. Too little precision leads to ambiguity and edge cases, while focusing on the wrong details risks losing sight of the question's intent.

We recommended the following approach to attendees, based on established best practices from the Metaculus forecasting platform, emphasizing five factors:

Specificity. Avoid ambiguity by defining terms and setting thresholds.

Use reliable sources. Prefer authoritative, regularly published data that will be accessible at resolution time. Include alternative sources if needed.

Address edge cases. Anticipate scenarios where resolution conditions might be unclear, and specify how to handle them.

Plan fallback criteria. Account for scenarios where the main data source becomes unavailable or other complications arise.

Prepare for unknowns. Consider how resolution criteria should handle unexpected situations.

For example, consider the question Will AI outperform superforecasters before 2030? If someone forecasts “yes”, how do we determine whether they were correct? To render this forecasting question meaningful, we could introduce the following criteria:

An Example of Effective Resolution Criteria

AIs will be considered to outperform superforecasters if, before January 1, 2030, a peer-reviewed study has been conducted that shows that one or more AI systems have demonstrated better scores on the same set of forecasting questions than an aggregate of at least ten superforecasters or pro forecasters.

The scoring system must be a strictly proper scoring system.

Scores will be considered “better” if they demonstrate a statistical significance (p value) of 5% or less that the AI system(s) are better than the top forecasters, rejecting the null hypothesis of equal performance.

Statistical significance must adequately account for the multiple comparisons problem.

A “superforecaster” or “pro forecaster” must be a top forecaster selected based on excellent historical performance satisfying the criteria of a reputable forecasting platform. For example, Metaculus Pro Forecasters or GJ Superforecasters would currently qualify.

If no peer-reviewed studies have been conducted Metaculus may also consider non-peer-reviewed studies or industry reports that in its judgement satisfy the required level of rigor described above.

Our forecasting questions were designed with precise resolution criteria, based on specific reliable data sources, and included fallback conditions if primary sources were unavailable. Attendees were encouraged to adopt this system within their own forecasting practices.

Modeling the Predictions of Other Forecasters

In addition to prompting authors for their predictions, we also asked them to predict the median responses given by others. This exercise is modeled after Keynes’ beauty contest analogy, and helps the authors to:

Identify group biases

Surface differences in participants' mental models, including identifying strengths of some models

Create opportunities for productive debate when predictions diverge

Identify forecasters who can model the biases among their peers and develop a world model beyond the biases

For example, suppose that we do not know the capital of Pennsylvania. Consider these two questions: “What is the capital of Pennsylvania? What will most people think the capital of Pennsylvania is?”

Now, consider a few distinct responses to this pair of questions:

The capital is Philadelphia & most people believe that the capital is Philadelphia.
This answer is incorrect, and signals that the respondent does not have an accurate mental model of this topic. It is a common misconception to believe that the capital is Philadelphia. This is a low-signal answer.

The capital is Harrisburg & most people believe that the capital is Harrisburg.
This answer, while correct about the true capital, fails to identify an accurate relationship between the model of the respondent and the global model. This is a medium-signal answer.

The capital is Harrisburg & most people believe that the capital is Philadelphia.
This answer is correct, and is indicating that the respondents are aware of the common fallacy that Pennsylvania is the state capital and are modeling the population’s biases correctly, while maintaining a model of the true information. This is a high-signal answer.

The capital is Philadelphia & most people believe that the capital is Harrisburg.
This answer is incorrect, and furthermore, is highly unlikely to appear. This requires the respondent to not only believe in the common misconception, but also to have been exposed to the true information and believe that the true information must be a misconception. Other than cases of intentional misinformation or conspiracy theories, such false flags are quite uncommon.

Answers to this question can be used to orient towards respondents who have both factual knowledge and can demonstrate a comprehensive world model: that is, individuals who correctly identify Harrisburg as the capital while anticipating that most respondents might mistakenly choose Philadelphia. Correctly modeling the viewpoints of others can be a signal that one is able to better account for societal biases.

Results

Forecasting Exercise Summary

As a forecasting exercise, attendees were provided 10 questions and prompted to use Metaculus’ prediction software to set expected values for the median, 25th, and 75th percentile expectations for each question. Please see the Metaculus landing page for more background details and commentary on these questions: https://www.metaculus.com/tournament/threshold2030/.

In summary, while participants described transformative societal and economic changes from AI in their worldbuilding exercises, their quantitative forecasts for major global economic variables in 2030 based on their existing priors remained relatively conservative and aligned with conventional estimates. By 2030, attendees predicted only a slight increase (6%) in physicians, a slight decrease (9%) in programmers, a moderate (2%) increase in global unemployment, a stable global labor share of GDP, and an increase in global median income ($2.56) largely commensurate with current trends.

The global economy's diverse nature and uneven digital infrastructure make it resilient against rapid technological disruption. A close parallel might be the development of the internet: despite being the defining innovation of the Information Era, a 2011 report by McKinsey found that it accounted for just 3.4% of GDP in developed economies, well below real estate, financial services, healthcare, and construction. These predictions suggest that even despite viewing AI as a revolutionary economic factor, our economists and AI researchers believe in the inertia and resilience of the global economy over short timescales.

1. How many units of humanoid robots will be cumulatively sold globally for performing domestic tasks before 2031?

17 attendees shared a median prediction of 4.7 million units, with a 25-75% confidence interval ranging from 1.61 million to 11.3 million units.

Discussion among participants focused on key constraints that could limit humanoid robot adoption, such as lengthy development and testing timelines, high manufacturing complexity, and significant cost barriers relative to human labor. Some participants emphasized that safety requirements and high costs would likely slow deployment, particularly in markets that currently employ domestic workers. There were also questions raised about the precise definition of "humanoid".

The overall sentiment leaned conservative, with participants citing the comparatively slower development curve for robotics compared to language models, high safety thresholds for home use, and cost constraints relative to human labor.

2. What will be the global number of physicians working in 2030 relative to the number working in 2024, in percent?

17 attendees shared a median prediction of 106.3%, with a 25-75% confidence interval ranging from 94.9% to 119.7%.

The discussion emphasized that physicians' protected professional status and high credential requirements would likely maintain workforce stability even as AI capabilities advance. Some participants noted that medical education could become more accessible and deployment in developing countries might increase. The overall sentiment was cautiously optimistic, with participants expecting either maintenance of current per-capita physician levels or slight increases, supported by the profession's resilience to automation and potential expansion of medical education opportunities.

3. What will be the global number of full time programmers working in 2030 relative to the number working in 2024, in percent?

16 attendees provided a median estimate of 91.5% of current levels (suggesting a moderate decline), and a wide uncertainty range from 56.9% to 122%.

Participants noted that while AI tools might automate many coding tasks, they could also increase programmer productivity and create new opportunities. There was particular attention paid to definitional questions - whether professionals using low-code or no-code tools should be counted as programmers, and how the role might evolve with AI integration. The overall sentiment suggested a transformation of the profession rather than wholesale decline, with an emphasis on upskilling and adaptation to AI-augmented development practices.

4. Will AI outperform superforecasters before 2030?

Among 19 attendees, the median prediction was 42% probability, with a confidence interval ranging from 38% to 69%.

Discussion centered on distinguishing between AI's proven success in narrow, data-rich domains (like weather and financial forecasting) versus the broader, more complex reasoning required for general forecasting. Participants noted that while AI systems have achieved impressive results in specific areas, superforecasters excel at integrating diverse information and adapting to novel situations. A key consideration was whether superforecasters would be permitted to use AI tools as inputs - in which case, AI systems would need to demonstrate value beyond human-AI collaboration to prove superior performance.

5. What percent of U.S. remote workers will delegate complex (>1hr independent tasks) to AI systems at least 3 times per week in the year 2030?

A total of 19 attendees provided predictions, with a median estimate of 84% and a confidence interval ranging from 66.5% to 92.5%.

Most participants argued for high adoption rates based on current trends in AI tool usage, noting that people already regularly use AI for tasks they wouldn't otherwise do, such as grammar correction and content summarization. Some suggested the figure might be lower due to structural factors, such as task delegation potentially being concentrated among certain types of workers.

6. How many of the 10 most important advancements in machine learning or artificial intelligence of 2025-2030 will have been discovered by an AI system?

The 19 attendees provided a median prediction of 3.84 advancements out of 10, with a wide confidence interval ranging from 1.35 to 7.74. The final distribution was highly bimodal - attendees clustered their predictions around 0 and 10 advancements.

The discussion highlighted a key definitional challenge around distinguishing between advancements discovered "by" versus "with" AI systems, using examples like AlphaFold where the distinction becomes blurred. Some participants argued that major architectural advances would likely still require significant human input, while AI systems might excel at more narrow algorithmic improvements. The discussion also raised the challenge of quantifying what constitutes a single "advancement" when AI systems might make numerous small, interrelated improvements that collectively lead to significant progress.

7. What % of current workers will be replaced by AI systems performing end-to-end labor in 2030?

Among 15 attendees, the median prediction was 9.75% of workers being replaced, with a confidence interval ranging from 4.28% to 19.9%.

Participants noted that while AI capabilities may advance rapidly, actual implementation across the global economy will likely lag due to institutional inertia and the availability of low-cost labor in many regions. Several forecasters also highlighted that the requirement for "end-to-end" labor replacement sets a very high bar compared to partial automation or efficiency improvements that reduce worker headcount. The overall consensus suggested that while AI will significantly transform many jobs, complete replacement of workers will be limited by 2030 due to practical implementation challenges.

8. What will the total world unemployment rate be in 2030?

The 19 participants provided a median prediction of 6.39% global unemployment, with a confidence interval ranging from 4.21% to 11%. The latest value provided by the World Bank was 4.9% in 2023, with a sharp downward trend since COVID.

This moderate increase from current levels suggests forecasters expect some disruption from AI and automation, but not catastrophic labor market effects by 2030. Notably, this prediction fits with other forecasts from the conference which generally anticipate gradual rather than sudden changes in fundamental economic indicators through 2030. No comments were provided on this forecast.

9. What will the global labor share of gross domestic product be in 2030?

21 attendees provided a median prediction of 50.7% for the global labor share of GDP in 2030, with a confidence interval ranging from 44.6% to 55.2%. The latest measurement provided by Our World in Data in 2020 was 53.8%, with a relatively flat trend over 20 years.

Participants noted that while AI could exert downward pressure on labor share through increased returns to capital and potential reduction in knowledge worker compensation, the relatively short timeline to 2030 suggests changes may be moderate. Several forecasters emphasized that historically, labor share metrics tend to shift gradually rather than dramatically, even during periods of technological change. They noted that while AI might significantly impact knowledge work by 2030, effects on manual labor sectors like transportation may take longer to materialize.

10. What will the global median income or consumption per day be in 2030 in 2017 USD adjusted for purchasing power?

21 attendees provided a median prediction of $10.76 for the median income per day in 2030, with a confidence interval ranging from $7.78 to $15.42. The value in 2024 is close to $8.20 according to Our World in Data, with a steady upward trend.

This moderate increase from current levels, roughly on track with current trends, suggests that our forecasters expect minimal counterfactual impact of AI on the global median income. No comments were provided on this forecast.

Forecasting questions where there is high divergence between people’s models of other’s forecasts represent useful opportunities for large updates. As a result, we selected the most high-divergence questions from the forecasting exercises in Day 1 and suggested a debate between attendees on the following questions:

What will the global labor share of gross domestic product be in 2030?

What will the total world unemployment rate be in 2030?

What will the global median income or consumption per day be in 2030 in 2017 USD, adjusted for purchasing power?

Nearly all attendees chose to debate the topic of the global labor share of GDP in 2030, so we focused the rest of the debate on this question. Opinions were evenly split, with roughly one-third predicting it would stay stable, increase, or decrease relative to today. What stood out, however, was that most participants believed 60–80% of the room would agree with their view.

Arguments in favor of the labor share increasing by 2030

The debate on the labor share of GDP highlighted several key arguments. Proponents of an increase argued that diffusion of AI and related changes would take time, with significant impacts unlikely by 2030. Current trends in labor share are relatively stable, and AI could increase productivity and make consumer goods cheaper, which could raise wages. Some argued that AI might reduce the value of intangible assets (e.g., consulting or infrastructure) and shift expenditures towards tools like ChatGPT, indirectly benefiting workers. Others suggested that AI would enable human capital by increasing productivity and fostering competition, as smaller firms challenge large corporations. Additionally, demographic and labor market factors in developed countries, which are trending towards higher labor shares, are expected to remain more significant than AI impacts in the short term.

Skeptics of a significant increase pointed out that labor share has historically been flat, even amid major technological shifts such as the internet revolution. They argue that similar patterns are likely to persist. While AI may make people more productive, true impacts on labor share may not be fully captured by traditional metrics. Instead, the "true labor share" could rise as the cost of goods and services decreases, benefiting consumers. Overall, the discussion emphasized the importance of balancing long-term scenarios with base rates and recognizing the complexity of forecasting AI’s economic impact within a relatively short timeframe.

Arguments in favor of the labor share decreasing by 2030

The argument for a decreasing labor share hinges on whether AI and automation replace or complement human labor. A decline could occur if GDP rises while labor remains flat, or if labor itself decreases. Although significant changes by 2030 may be unlikely, attendees suggest a more substantial impact beyond this timeframe as job displacement from AI accelerates.

Automation is already affecting knowledge work, including call centers, content creation, and transportation. However, labor-intensive industries like agriculture may remain less automatable. Attendees argue that human skill acquisition cannot keep pace with AI’s rapid advancement, making retraining and repurposing workers increasingly challenging. This shift from complementarity to substitutability is expected to result in more income flowing to capital holders and a growing reliance on "superstar" firms that contribute disproportionately to GDP while employing relatively few workers.

Historical comparisons, such as the displacement of horses by automobiles, highlight the potential for dramatic change. It’s possible that AI could create a discontinuity in labor trends. While some rebuttals suggest that automation lowers costs and benefits consumers, the underlying concern is that AI’s rapid development will fundamentally alter the labor market, outpacing human adaptability and reducing the labor share.

Following the above debates, 8 out of 25 attendees from both sides of the debate reported shifting their views toward envisioning a stable labor share of GDP in 2030.

Forecasting Questions Generated by Attendees

Over the course of the two days, attendees generated over eighty forecasting questions they believed to be worthy of further consideration in the manner presented by Metaculus. For the full list, see the appendix: All Forecasting Questions Proposed by Attendees.

These questions varied greatly in subject, scope, and specificity, but several important themes emerged:

Forecasting Question Theme: Economic Variables

This was the most common category, with many attendees independently suggesting forecasting questions on similar economic indicators for the impact of AI on the job market, labor and capital share of GDP, cost and supply of energy, and inequality. For example, attendees offered the following:

What will interest rates and economic growth be, conditional on hitting some milestone of powerful AI (1C)

% of present day Fortune 500 still included in 2030 (or comparative speed of churn, since there is always some change over time) (1D)

What share of human tasks being replaced with AI systems will produce a cost improvement greater than 20% on the labor cost? On the total product / service cost? (G4)

Unionization rates of software engineers? (5C)

What fraction of total US electricity generation is used to power data centers in December 2030? (G1-B)

What will the 30-year real interest rate (as measured by TIPS) be on Dec 31, 2030? (G1-D)

To what extent do changes in the Internet Skills Scale (ISS) scores across demographic groups predict disparities in economic mobility and access to AI-enabled services over a 3-year period? (G5)

Forecasting Question Theme: International Impacts & Geopolitics

Many attendees focused on the large-scale impact of AI on international economics:

A country uses AI in its infrastructure provided, and with its IP located in, a company domiciled elsewhere by 2030. (i.e. AI as SaaS, without the country having control over this AI system) (2C)

Number of countries deploying tariffs or protectionist on advanced technologies (5A)

Global catchup growth: Will currently-poor countries grow faster (in % terms) than currently-richer countries? (G4)

Forecasting Question Theme: AI Capabilities & Diffusion

Rather than measure economic impacts, some attendees focused on how AI capabilities will change by 2030 and how widely and deeply these new capabilities will be utilized:

Probability that China pools >250.000 AI accelerators from at least two leading commercial labs by 2030 (2A).

What is the longest chain of reasoning a LLM can perform with 90% reliability? (G1-A)

What will be the best consistently measured performance of an AI model on the Wechsler Adult Intelligence Scale? (G1-C)

An official high-stakes exam (e.g. SAT, Gaocao, important Indian exam) uses AI to assess student performance (G4)

What will be the % change in [infant mortality, malnutrition, maternal mortality, low rates of vaccination, and low life expectancy] between now and 2030? (G4)

Will quantum computing be used for more than 10% of training of foundation models? (5B)

Forecasting Question Theme: Military Capabilities & Existential Risk

Various attendees brought up questions around the role of military capabilities in accelerating existential risks from AI systems:

What will be the ratio of humanoid combat robots to human soldiers in active military conflicts globally by December 31, 2030? (2B)

How many military drones can a single operator run in a military theatre of operations in 2030? (5B)

Probability of human extinction / permanent disempowerment due to misaligned TAI / superintelligence (1A)

Identifying Forecasting Variables with High Value of Information (VOI)

Given the range of economic metrics one could potentially forecast, it is necessary to identify variables that will be the most informative about the likely future – that is, variables that provide the strongest measurable signal regarding which of the three 2030 scenarios is most plausible. We’ll call these high “value-of-information” (VOI) variables.

Identifying variables with a high VOI can be highly valuable to help forecasters identify what to measure while developing robust world models. Of course, the forecasts for variables with the highest VOI may not necessarily be easy to attain, such as total GDP or employment statistics. In that case, there may be simpler or secondary variables that are easier to forecast and which still provide strong signals for the value of the most important variables. For example, short-term proxies for long-term or complex scope forecasts are typically a highly useful category of secondary variables.

During the conference, we asked attendees to identify a list of the highest-VOI, big-picture forecasting questions. The results were:

What percent of workers will be eventually replaced by AI systems via end-to-end labor automation?

What will be the total world unemployment rate?

What will the global labor share of GDP be?

What will the global median income per day be?

Will AI be able to outperform pro forecasters?

Additionally, we asked attendees to identify a list of short-term proxy forecasting questions that might most effectively correspond to these high-VOI variables. The results were:

Total number of physicians working in 2030 in proxy to labor automation