POLARIS: A New Method For Election Prediction

Reform UK is on track to win over 70 seats, given local election results. Meanwhile, Labour stand to lose 155 seats across the country, leaving them without a majority.

Report by Callum Hunter, Senior Data Scientist at JLP

28th December 2024

Overview

POLARIS (Political Analysis through Regional and Local Insights System) is a new model developed to leverage the wealth of data provided by council by-election results to predict election outcomes mid-cycle. This method uses demographic indicators of council wards that these elections take place in, such as proportion of degree educated individuals and the proportion of those unemployed, to predict vote share changes for each of the major parties. These models can then be used to project what those changes would look like at constituency level by using the same demographic indicators. Regression towards the mean, as well as the large limit nature of constituencies when compared to wards, ensures that the large swings seen at ward level are tempered. Furthermore, we apply a “swing tightening” method that further tempers any of the extreme swings that the models may predict. Finally, we apply small UNS shifts to the national projected vote share in order to match these with our latest national polling. This ensures that the predictions are more than just a national projections of local by-election results. More detail on the method can be found below.

This method currently projects a very different House of Commons, with the Labour Party losing its majority and no party in a position to form a majority government. The graphic below shows the current forecasted results.

The electoral map of the country would also drastically change. Coastal seats across the North East would be lost by Labour to Reform UK, including Easington and Hartlepool. Further losses would be metered out by Reform in the Thames Estuary and large parts of the North West. Meanwhile, the Conservatives would regain ground in their Southern Heartlands and parts of Scotland. This dynamic would make governing almost impossible for any of the parties, sending the country into an unsure future.


Introduction and Methodology

Since the 2024 General Election there have been 157 different Local Council By-Elections in wards across the country with 60 of these elections being wins for Labour and 44 being wins for the Conservative Party. These elections represent an exceptional wealth of data that contain real voter trends representing over 280,000 cast ballots. Whilst these voters are much more likely to be the highly politically engaged, as well as more likely to be registering protest votes against the current government, the sheer number of people makes exploiting the data worthwhile. Carefully dealing with these issues can make this data invaluable in understanding the ongoing public mood.

In order to leverage this data, we have built POLARIS (Political Analysis through Regional and Local Insights System) which aims to extrapolate this Council By-Election data to the national stage so we can track seat movements at constituency level. The methodology is based on the hierarchical nature of geographic division in the UK — a similar method to the usual SRP/MRPs used during general election campaigns.

To do this we gather a database of demographics for each ward in the country – in England and Wales we use bivariate indicators for more granularity, such as education by age – and then we regress change in vote share since the last local election for each party in each ward against the demographic predictors. This provides a model that can impute vote share change for each party, given a set of demographic indicators about a geographic region. The innate regression towards the mean and large limit of constituency demographics versus ward demographics helps to temper extreme swings which can take place at the ward level. We also included a geometric time series weighting so that by-election that occurred earlier are weighted less than recent elections. This means that elections that occurred around one month ago affect the model one quarter less than those that occurred today.

To include some political data, we use Britain Elects’ ward-level vote share estimations from the 2024 General Election — including this data provides a similar political link between each ward and the general election. This is important and it allows the model to learn a crucial political link between the ward level and the constituency level. Currently the model relies on Random Forest Regression, a machine learning model. This is a technique we have had immense success with this year, including in our SRP, undecided modelling and turnout modelling. This kind of model mimics the sort of hierarchy that this kind of data naturally forms, with constituencies naturally being composed of electoral wards.

Once these models are trained, we predict vote share changes at the constituency level using a similar database but for constituency level data. This way we poststratify ward level results onto constituencies – this is the key conceptual point. To be clear, mathematically and philosophically, this is exactly the same technique as SRP/MRP – it is all hierarchical data. Typically, the underlying hierarchical structure is individuals layered beneath constituencies; in POLARIS we replace the individuals with wards, but the hierarchical systematics are the same. Below we show some example correlations between a handful of example constituencies and 10 randomly selected wards from across the country. It is clear the Newark and Newton Aycliffe & Spennymoor correlate with differing wards with the former being mos correlated with Oakham South and the latter being most correlated with Lydney North. It is these correlations the model uses to infer vote share changes.

After these estimates are calculated we enforce the conservation of vote share – that is, the sums of changes in vote share for all parties must be zero – as well as a “swing tightening” method. This swing tightening draws extreme values of swing back towards the mean in order to temper some of the extreme swings that can occur at ward level. This method changes the overall constituency seat counts very little but is used to prevent extreme swings at constituency level, giving us more accurate vote shares. Finally, we compare the national projected vote share to the latest modelled omnibus numbers and use a UNS adjustment to make sure they match. This then gives the final estimates and seat shares.


Back Testing to 2019 and 2024

To establish the validity of this method we backtested the model on the last two general elections, in England and Wales, with mixed success. These backtests use by-election data from the year the election took place. The choice of just England and Wales is due to the ease of generating the required tables from the census website — building the same tables for the Scottish seats requires much more time and the number of seats in England and Wales gives a large enough test sample to assess the performance of the model. 

Starting with the 2019 general election, we found the model fared very well calling 92% of constituencies for the correct party in England and Wales. The graphic below shows that, whilst the model underestimated the Conservatives and overestimated the Liberal Democrats, the seat counts fall within a reasonable range that any MRP would do well to produce. We also tested vote shares in order to assess how reliable the predicted values are. Most parties saw an RMSE of around 4 - 5 points with the Conservatives having the largest RMSE of ±4.8% and Plaid Cymru having the smallest RMSE of just ±0.9%. The 95% confidence intervals show that the majority of vote share predictions at the constituency level fell within around 10 points of the actual result — this is around the same performance as the best MRPs in 2024 and 2019. 

Such good news was not the case for the 2024 general election back test. The model drastically overestimated the Conservative position and drastically underestimated Labour’s position — Labour were predicted to win a slim majority. However, the main reason for this is due to the insurgent nature of Reform UK. The party had very little success in local election in the run up to the general election, despite having major support in national polls. This could be due to high political attention nature of local election voters in the run up to the 2024 general election. These voters are unlikely to be the low propensity voters that make up much of Reform UK’s base. After the general election, Reform UK appear to be doing much better in local elections, gaining 5 seats in council by-elections and winning 7% of the vote share across all wards.

The presence of this data is enough for the model to start to learn about the demographic composition of pro-Reform UK wards and then extrapolate that data. It is this fact that gives us confidence in our predictions. Reform UK’s local position now is just enough for the model to learn the underlying patterns — this was not the case pre-election. The backtesting shows that once the model can learn these patterns it can become very accurate, matching the performance metrics of the best SRP/MRPs.


Results

Before presenting the results, it is worth emphasising an assumption about Local Council Elections: typically, these elections favour smaller parties and so we would expect the smaller parties to drastically outperform their current polling expectations. As we will see, this fact does not distort the predictions anywhere near as much as may naively be expected and the projected national vote shares are exceptionally close to our modelled omnibus numbers. 

The UNS shifts that one has to apply to each of the parties is shown in below — the SNP, Plaid Cymru, and Conservative model predictions are within 1 point of our November omnibus figures, whilst Labour sits within 2.5 points. This result is somewhat remarkable given the assumed bias of Local Elections and taking into consideration the considerable assumptions that underpin this kind of poststratification procedure. This method, whilst not perfectly recreating the omnibus numbers, does produce estimates which are within the mutual error margin of the model and the omnibus.

With this simple aggregated test passed it is now time to turn attention to the projected swing profiles of each party. These swing profiles are paramount to assessing any model’s ability to capture complex underlying dynamics. Previous investigations by the author have shown that upwards of 80% of party results from the past 60 years have demonstrated some form of proportional swing profile. In fact, those investigations showed that some parties can experience more extreme patterns of swing such as quadratic and cubic swing profiles.

The graphic below shows that such patterns are also captured by POLARIS – these trends are found using Akaike Information Criterion (AIC) searches which finds the most parsimonious polynomial fit to the data. Whilst the Labour profile is best described by a cubic, the ideal form is likely an exponential that tends to around -8 points for all 2024 vote shares above 30%. Meanwhile, the projected Reform UK swing is described by a quadratic curve, but with the wrong sign to allow for efficient vote shares.

Previous work on swing relationships from 1966 onwards shows that Labour’s most efficient election wins produce a negative quadratic swing curve that increases their vote share most in marginal seats. Reform UK’s curve will increase their vote share most in seats they are already performing well in. This may be due to Reform being close in relatively few marginal seats. As the party improves this may no longer be the case. Capturing these non-linear swing dynamics is not a guarantee of the model’s efficacy but it does point to the fact that it is able to capture the complex relationships other methods have failed to capture in the past.

The actual seat counts make for grim reading for the Labour Party with them potentially losing 155 seats with 64 passing over to Reform UK and 81 passing to the Conservatives – see below. According to this analysis, it is likely that Labour would still be the largest party and be forced into a minority coalition with the Liberal Democrats and seek support of other smaller parties to pass legislation. This would have echoes of the 1974 general elections, bouncing the country back into another general election a few months later.

This would drastically change the electoral map of Britain – top of page – with Reform UK picking up seats across the country, including a number around Rochdale as well as the Thames Estuary. Labour would be beaten back by the Conservatives in Southern seats that Labour won from them in 2024 and by Reform UK in the North. Labour would still hold on North of the Border, with Scotland being the lynchpin in the electoral system. If Scotland were to fall back to the SNP, then coalition government would become even more complex or downright impossible.


Conclusions

This method presents a novel way to mine a rich data set to perform some form of current seat projection. There are of course pitfalls. Council By-Elections are often seen as a referendum on the governing party, making these events systematically biased against the government. These elections are also very atomized – there is very little consideration of the wider political environment which can affect voter’s behaviour. That being said, this method likely predicts the worst-case reasonable scenario for Labour right now. It is likely that Labour would outperform these numbers, given the typical aversion of the public for coalition governments. POLARIS presents an opportunity to track changes in seat count without performing large scale and expensive surveys. The mathematical underpinnings are sound, and the method is not novel – its application here is.