Karthik Prasanna N supervised by Dr. Ashwin Jayanti received his Master of Science – Dual Degree in Computing & Human Sciences (CHD). Here’s a summary of his research work on Do Large Language Models Reason by following Rules?
Training and evaluating large language models (LLMs) on deductive reasoning tasks has attracted much attention in recent times. There have been various attempts at training and evaluating LLMs to perform deductive reasoning tasks. Some among these have shown interesting results which suggest that LLMs behave like humans by displaying content effect, that is, they reason better on reasoning tasks containing rules that align with our everyday beliefs and reason poorly when the rules are belief violating. Others, using chain-of-thought prompting, a technique to make LLMs generate intermediate steps before making them arrive at the final conclusion, suggest that LLMs may emulate human-like reasoning thought processes by generating intermediate reasoning steps. On the other hand, there are studies which conclude that language models do not yet demonstrate reliable deductive reasoning, since their performance is shown to decrease upon introduction of perturbations, such as synonym substitution, and attribute their reasoning to artefacts from the training data. In this study, we look at these three distinct attempts at evaluating LLMs on deductive competence tasks, each employing different criteria such as content-based reasoning (Dasgupta et al., 2023 [1]), chain-of-thought prompting (Wei et. al., 2023 [2]), and introduction of perturbations (Yuan et al., 2023 [3]). In order to make sense of these claims concerning genuine reasoning, we introduce a framework developed by Diane Proudfoot using externalist criteria for machine cognition (Proudfoot 2004 [4]), which is based on Wittgenstein’s argument that deductive reasoning involves rule-following which is normative in nature. Based on the notion that deductive competence involves the following of normative rules, the framework proposes the use of the Wittgensteinian distinction between rule-following and quasi rule-following as a method to distinguish genuine deductive competence from quasi-competence. The criteria distinguishing between rule-following and quasi rule-following shall be adapted to analyse whether these LLMs can be said to reason genuinely or are merely imitating reasoning-like behaviour. We propose this use of Proudfoot’s criteria for rule-following as a framework to distinguish genuine deductive competence from quasi deductive competence. In doing so, we also draw attention to the limitations and implications of Proudfoot’s claims regarding machine cognition through the introduction of a thought experiment. This thought experiment enables us to think through Proudfoot’s argument, according to which it is due to pragmatic considerations–and not in principle–that LLMs are unlikely to possess genuine reasoning.
July 2025

