Rishabh Maheshwary received his Master of Science in Computer Science and Engineering (CSE). His research work was supervised by Prof. Vikram Pudi. Here’s a summary of his research work on Evaluating the Robustness of Deep learning Models for NLP:
The significance of deep neural networks (DNNs) has been well established through its success in a variety of tasks. However, recent studies have shown that DNNs are vulnerable to adversarial examples— inputs crafted by adding small perturbations to the original input. Such perturbations are almost imperceptible to humans but deceive DNNs thus raising major concerns about their utility in real world applications. Although existing adversarial attack methods in NLP have achieved a high success rate in attacking DNNs, they either require detailed information about the target model, training data or need a large amount of queries to generate attacks. Therefore, such attack methods are not realistic as they do not reflect the types of attacks that can be encountered in the real world and are less effective as attacks relying on model information and excessive queries can be easily defended against. In this thesis, we address the above mentioned drawbacks by proposing two realistic attack settings– hard label black box setting and limited query setting. Next, we propose two novel attack methods that craft plausible and semantically similar adversarial examples in the above settings. The first attack method uses a population based optimization procedure to craft adversarial examples in the hard label black box setting. The second method is a query efficient method that leverages word attention scores and locality sensitive hashing to find important words for substitution in the limited query setting. We benchmark our results across the same search space used in prior attacks so as to ensure fair and consistent comparison. To improve the quality of generated adversarial examples, we propose an alternative method that uses masked language models to find candidate words for substitution by considering the information of both the original word and its surrounding context. We demonstrate the efficacy of each of our proposed approaches by attacking NLP models for text classification and natural language inference tasks. In addition to that we use adversarial examples to evaluate the robustness and generalization of recent math word problem solvers. Our results showcase that DNNs for the above tasks are not robust as they can be deceived by our proposed attack methods in a highly restricted setting. We conduct human evaluation studies to verify the validity and quality of generated adversarial examples.
Contributions
In this thesis, we address the above mentioned drawbacks by introducing two different realistic attack settings to evaluate NLP model — hard label black box setting and limited query setting. Next we propose two novel attack methods that generate plausible and semantically similar adversarial examples in the above attack settings. Apart from this, we evaluate the robustness and generalization of math word problem solvers using adversarial examples. We also conducted human evaluation to ensure that the generated adversarial examples are valid, semantically similar and grammatically correct.
- Hard Label Black Box Setting — First, we propose a novel attack setting called the hard label black box setting. This attack setting does not require information about the model architecture, parameters, confidence scores or training data. It only assumes access to the top predicted label. Next, we propose a mechanism that uses population-based optimization procedure to successfully generate adversarial examples in the hard label black box setting, where all the prior methods fail. In comparison to previous attack strategies, our attack achieves a higher success rate (more than 90%) and lower perturbation rate that too in a highly restricted setting. The hard label black box setting has been explored recently in vision but to the best of our knowledge we are first to explore it in the NLP domain.
- Limited Query Setting — First, we propose a novel ranking mechanism that jointly leverages word attention scores and locality sensitive hashing (LSH) to rank the input words. It takes significantly less number of queries and works best in a limited query setting when compared to prior black box methods. Next, we benchmark our attack method on the same search space used in the respective baselines. Further, we evaluate its effectiveness by comparing it with four baselines across three different search spaces. On an average, our method is 50% faster as it takes 75% fewer queries than the prior attacks while compromising the attack success rate by less than 2%.
- Context aware adversarial attack — We propose an attack that generates candidate words using the influence of both the original word (to be replaced) as well as its surrounding context. It uses BERT (a Masked Language Model (MLM) trained for masked word and next sentence prediction task) to generate candidates for each word to be replaced in the input.
- Evaluating Math Word Problem Solvers — Although adversarial examples are commonly used for various NLP tasks such as question answering, machine translation, dialogue systems, text classification, natural language inference, etc there has been no work that uses adversarial examples to evaluate Math Word Problem solvers. A Math Word Problem (MWP) consists of a natural language text which describes a world state involving some known and unknown quantities. The task is to parse the text and generate equations that can help find the value of unknown quantities. Standard accuracy metrics have shown that Math Word Problem (MWP) solvers have achieved high performance on benchmark datasets. However, the extent to which existing MWP solvers truly understand language and its relation with numbers is still unclear. In chapter 6, we bridge this gap and evaluate the robustness of state-of-the-art MWP solvers against adversarial examples. Generating adversarial attacks for MWP is a challenging task as apart from preserving textual semantics, numerical value also needs to be preserved. The text should make mathematical sense, and the sequence of events must be maintained such that humans generate the same equations from the problem text. We propose two methods: Question Reordering and Sentence Paraphrasing to generate adversarial examples on three MWP solvers across two benchmark datasets. On average, the generated adversarial examples are able to reduce the accuracy of MWP solvers by over 40%. Further, we experiment with different types of input embeddings and perform adversarial training using our proposed methods.
Publications
- R. Maheshwary, S. Maheshwary, and V. Pudi, “Generating natural language attacks in a hard label black box setting,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 13525–13533, 2021
- R. Maheshwary, S. Maheshwary, and V. Pudi, “A strong baseline for query efficient attacks in a black box setting,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, (Online and Punta Cana, Dominican Republic), pp. 8396–8409, Association for Computational Linguistics, Nov. 2021
- R. Maheshwary, S. Maheshwary, and V. Pudi, “A context aware approach for generating natural language attacks,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 15839–15840, 2021
- V. Kumar, R. Maheshwary, and V. Pudi, “Adversarial examples for evaluating math word problem solvers,” in Findings of the Association for Computational Linguistics: EMNLP 2021, (Punta Cana, Dominican Republic), pp. 2705–2712, Association for Computational Linguistics, Nov. 2021
April 2023