Curating a Releasable Large-Scale Dataset for Common Logical Fallacies in Online Conversations

Sponsoring Agency
Meta Platforms, Inc.


Many datasets have been devoted to enabling AI systems to identify misinformation, fake news, hate speeches, and trolling posts in online conversations. While these works are important, the detection and comprehension of the underlying logical fallacies that fundamentally undermine these texts' logical validity remain unexplored. One of the bottlenecks is the lack of benchmark datasets, as annotating logical fallacies is challenging. We propose constructing LOGIFALLA, the first public benchmark dataset for common logical fallacies in online conversations. The core idea is to hire a group of crowd workers to simulate an online discussion thread about a given news article, where we secretly ask a subset of these workers to respond using specified types of logical fallacies (e.g., straw man, slippery slope, etc.) LOGIFALLA will include 6,000 discussion threads with a total of 36,000 posts. Each post will be annotated with its logical fallacy label (Yes/No) and fallacy type (if Yes); the news articles will also be included. LOGIFALLA can move the AI community from spotting the problematic languages towards understanding the underlying problematic reasonings. We believe this will help create online discussion spaces that are more transparent, healthy, and safe.