Claim replicability may help prevent harms caused by ML

The ability of a model to duplicate its performance is how machine learning (ML) scientists traditionally justify the validity of their research claims. But according to one graduate student in the College of Information Sciences and Technology (IST), this model performance replicability may not be enough to align ML with the goal of machine learning for social good. 

Tianqi Kou, a student in IST’s informatics doctoral program, proposed that adding — and prioritizing —claim replicability may improve the current accountability mechanism. His paper, “From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility Gap,” was recently accepted to the Association for Computing Machinery Conference on Fairness, Accountability and Transparency (ACM FAccT), which was held June 3-6 in Rio de Janeiro. 

In recent years, in light of the responsibility gap — challenges inherent in discerning clear lines of responsibility for ML harms among diverse parties involved — the artificial intelligence (AI) ethics community has taken an interest in understanding ML scientists’ responsibility, according to Kou. However, he said, “motivating engagement in social considerations using impact statements and checklists” falls short in practice. 

According to Kou, model performance replicability focuses on numerical evaluation but neglects “the importance of scrutinizing the validity of what is being said about those numbers.” Claim replicability brings attention to the qualitative claims of ML papers. 

“The rhetorical power of implicit or explicit research claims tend to be overlooked,” Kou said. “Numbers cannot legitimize the traveling of a method beyond laboratory by itself, rather their circulation is enabled by claims about benefits or functionalities the model can bring”. 

Kou offered a prominent example of Northpointe’s recidivism prediction algorithm (COMPAS). COMPAS’s validity can be granted when the model’s accuracy is replicated. However, the claim of COMPAS enhancing equality and reducing human bias failed to be replicated in practice — racial discrimination and human prejudice are both present. What is overlooked by model performance replicability will be called into question by claim replicability. 

“Because upholding replicability is central to the scientific self,” Kou said, “claim replicability — which necessitates sound reasoning about social contexts — generates a stronger obligation in engaging in social reflections.” 

According to Kou, initial steps toward actualizing claim replicability includes presenting complete evidential structure for implicit and explicit claims, increasing evidential diversity, avoiding open-ended expressions and making humble claims when writing papers. 

As a Science and Technology Studies (STS) scholar, Kou believes implementing claim replicability requires understanding of the formulation, expression and traveling of ML research claims. According to Kou, ML research claims generate different consequences depending on their audiences — reviewers, engineers, policymakers, auditors, curious laypersons, impacted communities, critical computing scholars and scientists from other disciplines. Such understanding is crucial in regulating pernicious knowledge claims that precede concrete harms from technological innovations. 

Advancing claim replicability as a new communication and evaluation norm must not fall only on the ML community. Conceptual understanding and tool-building require joint efforts from communities such as AI ethics, human-computer interaction and STS, according to Kou, who feels “lucky to be in an information school where interdisciplinary expertise and conversations are vibrant.”