Julian Bajkowski, Leigh warns Brits to keep big data on a short policy leash, The Mandarin, 1 October 2024

Government and corporate infatuation with solutions generated by big data and artificial intelligence have been given a very public reality check by federal minister Andrew Leigh in a touchstone speech to Oxford University bookended by an address to the UK Evaluation Task Force at Number 9 Downing Street.
In his most direct call yet for policymakers and scientific researchers not to be run down by the stampede towards all things seemingly artificial and generative, the competition, charities, treasury and employment assistant minister argues rigour and scientific discipline need to be maintained to create positive, provable and repeatable policies and outcomes.
Leigh, for those not yet acquainted with his commitment to generating scientific-quality evidence to inform and evaluate policy, is a committed ‘randomista’ or supporter of randomised trials to generate objective results and eliminate biases, especially around health, social policy and government funding.
The doctor of economics also appears to be one of a gradually growing number of heretical sceptics of the prevailing market view that AI can and shortly will relieve government of many of its historical challenges by boiling down giant databases to spit out the answers needed.
Part of the tussle between the promulgators of AI — essentially major software and cloud infrastructure suppliers — and the pro-science policy community is that many applications of AI often seek to replace or augment an existing evaluation process or approaches in favour of feeding and milking a large data model.
“In place of randomised trials, some put their faith in ‘big data’. Between large-scale surveys and extensive administrative datasets, the world is awash in data as never before. Each day, hundreds of exabytes of data are produced,” Leigh said.
“Big data has improved the accuracy of weather forecasts, permitted researchers to study social interactions across racial and ethnic lines, enabled the analysis of income mobility at a fine geographic scale and much more. Yet a clue to the value of randomised trials comes from the behaviour of the biggest big data company of them all, Google.”
Leigh said that since its foundation in 1998, Google had conducted “thousands of randomised trials to refine its products” and the platform still regularly conducts randomised trials, “often dubbed A/B testing” to determine search results display preferences “as measured by click-through rates”.
Moreover, Google did not rely on just boiling data to arrive at a result, which it then put into action.
“The company uses randomised trials to determine which features should be added to Google Maps, Google Docs and Gmail, balancing functionality against complexity. It runs randomised trials of its ad auctions, the way privacy settings are displayed, and recommendation algorithms,” Leigh said.
“Among its employees, Google has conducted randomised trials to determine the optimal length of meetings, the impact of remote work, employee wellness programs and productivity tools.
“Why would Google conduct randomised trials rather than use big data? Because it is keen to uncover causal effects. To see this, suppose that the company instead decided to determine the impact of product tweaks by looking at patterns in the data.”
Leigh then broke down that all-time favourite of long-suffering productivity software users, the dreaded new feature or function that doesn’t really improve anything or gets in the way of things.
Remember Windows Vista? Or that over-friendly irritation known as Clippy?
Sensibly, with the prime minister having recently signed up the government to a so-called trial of Microsoft after a photo opp with Satya Nadella, Leigh stuck to the G-Suite and all things Mountain View, noting, “for example, it could offer a new function in Google Sheets, and compare the productivity of users who took it up with the productivity of users who did not take it up.
“Such an analysis might also hold constant other observed factors about the two groups of users, such as how often they use the product,” Leigh said.
“The problem with such an analysis is that what isn’t observed can have a major impact on productivity. If users who like new functions are increasing their productivity at a more rapid rate, then this will bias the estimate upwards.”
Conversely, if users who like new functions are procrastinating, it will bias the estimate downwards. Google doesn’t know the true answer, so it opts for a randomised trial. In conducting its randomised trials, big data is a massive asset for Google. But big data doesn’t preclude the need to do randomised trials.”
Leigh’s latest speeches are important for a number of reasons, a couple of them being that a relatively fresh Labour government in the UK will likely be looking to similar jurisdictions to gauge, as Leigh and friends would put it, “what works” at a policy and program level.
While Britain’s challenges differ from Australia’s in many ways, a decent chunk of Australia’s digital transformation playbook was in part taken from the UK’s Cabinet Office, which also relied on A/B testing for user experiences to validate what processes worked under what circumstances and in what groups.
The Albanese government has also eschewed the usual political trait of rejecting ideas and learnings from those of a different cloth taking on board many of the lessons that made Service NSW a success.
But it’s also an important note signal that AI is not a policy and political cure-all for what Evgeny Morozov describes as the affliction of “solutionism”. And there are indications of what research house Gartner calls ‘the peak of inflated expectations’.
Last week, sceptical views AI from Goldman Sachs’ head of stock research Jim Covello made their way into The New York Times and caused a few big tech heart murmurs when he observed billboards on Highway 101 from San Jose to San Francisco that said: “They’re all AI” but that “Not that long ago, they were all crypto”.
“Overbuilding things the world doesn’t have use for or is not ready for typically ends badly,” Covello is quoted in the NYT as telling a markets research publication called Top of Mind.
An example of Goldman Sachs’ own internal deployment of AI to update analyst spreadsheets with company financials results saved 20 minutes per company, the NYT said, but cost six times as much.
When AI does get a proper start in the federal government, it won’t be such a bad thing if the Australian Centre for Evaluation and Leigh’s merry band of randomistas are all over it like a rash.