Lara Groves is a senior researcher on AI accountability at the Ada Lovelace Institute. We discussed her recent
research on the New York City algorithmic bias audit regime, co-authored with Jacob Metcalf (Data & Society), Alayna Kennedy (independent researcher), Briana Vecchione (Data & Society), and Andrew Strait (Ada Lovelace Institute). The research received a Best Paper Award at the ACM FAccT Conference 2024.
What is the New York City algorithmic bias audit regime?
Lara: The regime is part of Local Law 144 (LL 144), the first legislation globally to impose algorithmic fairness requirements on commercial companies. The law mandates that all employers in New York City using automated employment decision-making tools (AEDTs) for hiring undergo annual independent bias audits and publish the results. The lawmakers behind LL 144 were driven by social justice goals, hoping to curb unjust AI practices in hiring. Algorithm audits are designed to enable an independent, impartial assessment of a particular legal or ethical risk, and to create accountability relationships between developers or deployers of AI systems and people affected by them.
In practice, LL 144 functions as more of a transparency regime, informing potential candidates about employers' use of AEDTs and leaving them to make a judgement about whether to proceed with an application. A more implicit, secondary objective is that the publicly available audit reports would encourage employers to adopt responsible AI practices and methods.
Our research focused on the auditors, who were contracted by employers to conduct bias audits and prepare audit report, and their experiences. We conducted interviews to explore the range of audit services available, the methodologies used, client instructions, and the challenges auditors faced – shining light on processes that might otherwise not be in the public domain.
What are the main findings regarding the effectiveness of the audit regime?
Lara: LL 144 is drafted broadly, giving both auditors and employers considerable discretion in implementation. There’s a presumption that algorithmic bias audits function similarly to financial services audits. In financial services, the auditor is an independent entity that audits a system used within a company and provides assurance against a predetermined industry standard. However, algorithmic bias auditing is still nascent and lacks professional standards for methods, suitable benchmarks, or auditor accreditation. Moreover, the regime allows employers to shop around and choose their preferred auditor, which is a feature of second-party auditing rather than the third-party auditing used in financial services. Consequently, a diversity of auditor-client relationships, services, and audit methods have emerged in this regime.
Interviewees disagreed about who qualifies as a legitimate auditor and what constitutes a legitimate auditing practice. We found that some auditors maintain a distant relationship with their clients, choosing to deliver just the simple audit test. Others argued that a more robust audit requires a closer relationship to gain sustained access to company data. In addition, some auditors offer the audit plus consulting services to improve client practices, like advising on how to mitigate bias in future. One organisation decided to provide a quality-checking service that assesses the robustness of completed audit reports. LL 144 allows this range of client relationships as long as auditors meet the independence criteria – which under the law only requires the auditor to not have a financial stake in the company they are auditing.
LL 144 narrowly defines AEDTs as tools used to ‘substantially assist’ a hiring decision. This creates a loophole, and many employers believed they fell outside of the scope of the regime because they were using AI during screening processes only, without directly leading to a decision, or that they were only using AEDTs as decision support. Some of these employers underwent an audit but did not publish the audit report if they thought they could argue their way out of scope. This very much undermines LL 144’s objective of promoting greater transparency. A complementary study by my co-authors, Jacob Metcalf and Brianna Vecchione at Data & Society, found that only eighteen audit reports have been published to date – out of potentially thousands.
The regime also narrowly conceptualises bias, focusing on gender, race, and their intersection, which misses potential biases related to other protected characteristics like age, disability, and socioeconomic status. LL 144 did not sufficiently create a participatory role for impacted persons, such as employment candidates and trade unions.
Ultimately, LL 144 is very procedural and lacks any mechanism to prevent companies from continuing to use biased systems. An employer will 'pass' simply by publishing the audit report, even if the report shows that the AEDT is biased. This allows AI hiring tools to continue being used and sold despite causing unfair outcomes for certain groups, undermining the regime's overarching objective. This contrasts sharply with a regime like the US Food and Drug Administration's pre and post market approval and monitoring mechanisms, which prevents unsafe or risky systems from being sold.
Encouragingly, many interviewees said clients were eager to contract bias audits to get ahead of the curve, particularly as further legislation requiring responsible AI practices is coming down the track.
What are the most important takeaways for policymakers designing audit regimes?
Lara: Policymakers should closely consider the audit regime's objective and ensure there are incentives in place to help obligated parties meet them. LL 144 has left too much discretion over implementation to auditors and clients, which has undermined its overall effectiveness and risks lowering audit standards more widely. Jacob Metcalf, one of my co-authors, is leading additional research on key terms in LL144, like 'independence' and 'substantially assist', to investigate how these policy choices have impacted the regime's operationalization.
There is also the risk that companies may misrepresent the outcomes of audits, suggesting that their tool was found to be free from bias when it was not. This indicates the need for some control mechanism when delivering audit reports, particularly to prevent companies from misrepresenting an audit outcome to align with their interests.
Beyond this, we echo calls from the audit community for auditor accreditation, protections for external researchers from legal repercussions (especially concerning larger platforms), and support with good data provenance and data cleaning. Additionally, enforcement and sufficient resourcing of regulators are essential. In New York, we found that the responsible regulator lacks the resources to properly enforce and impose sanctions for non-compliance with the bias audit regime.