The rapid growth in natural language processing (NLP) over the last couple yearshas generated student interest and excitement in learning more about the field. In this paper, we present two types of students that NLP courses might want to train. First, an “NLP engineer” who is able to flexibly design, build and apply new technologies in NLP for a wide range of tasks. Second, an “NLP scholar” who is able to pose, refine and answer questions in NLP and how it relates to the society, while also learning to effectively communicate these answers to a broader audience. While these two types of skills are not mutually exclusive — NLP engineers should be able to think critically, and NLP scholars should be able to build systems — we think that courses can differ in the balance of these skills. As educators at Small Liberal Arts Colleges, the strengths of our students and our institution favors an approach that is better suited to train NLP scholars. In this paper we articulate what kinds of skills an NLP scholar should have, and then adopt a backwards design to propose course components that can aid the acquisition of these skills.
2023
Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics
Language models (LMs) have been argued to overlap substantially with human beings in grammaticality judgment tasks. But when humans systematically make errors in language processing, should we expect LMs to behave like cognitive models of language and mimic human behavior? We answer this question by investigating LMs’ more subtle judgments associated with “language illusions” – sentences that are vague in meaning, implausible, or ungrammatical but receive unexpectedly high acceptability judgments by humans. We looked at three illusions: the comparative illusion (e.g. “More people have been to Russia than I have”), the depth-charge illusion (e.g. “No head injury is too trivial to be ignored”), and the negative polarity item (NPI) illusion (e.g. “The hunter who no villager believed to be trustworthy will ever shoot a bear”). We found that probabilities represented by LMs were more likely to align with human judgments of being “tricked” by the NPI illusion which examines a structural dependency, compared to the comparative and the depth-charge illusions which require sophisticated semantic understanding. No single LM or metric yielded results that are entirely consistent with human behavior. Ultimately, we show that LMs are limited both in their construal as cognitive models of human language processing and in their capacity to recognize nuanced but critical information in complicated language materials.
2022
On the Limitations of Data: Mismatches between Neural Models of Language and Humans
The majority of work at the intersection of computational linguistics and natural language processing aims to show, process by process, that human linguistic behavior (and knowledge) is reducible to a simple learning objective (e.g., predicting the next word) applied to unstructured linguistic data (e.g., written data). This dissertation uses three test cases to show concrete instances where current reductionist approaches fall short of human linguistic knowledge. In the first case study, implicit causality, competition among multiple linguistic processes is shown to obscure human-like behavior in models. This challenges existing methodologies that rely on the investigation of individual linguistic processes in isolation and points to a mismatch between human linguistic systems and those built solely on the basis of linguistic data. In the second case study, ambiguous relative clause attachment, models of Spanish and English are compared to show that, while models appear to mimic humans in English, they fail to do so in Spanish. The failure of computational models of Spanish follows from a mismatch between data produced by speakers and speakers’ interpretation preferences, and it is argued that this reflects fundamental limitations of text data. In the third case study, Principle B and incremental processing, it is demonstrated that, while humans use hard constraints to restrict their online processing of pronouns, computational models do not. The inability of models to process language incrementally like humans indicates a mismatch between linguistic data and the human parser. This dissertation argues that data are not sufficient to instruct models about fundamental aspects of human language. Ultimately, in using techniques from psycholinguistics and careful cross-linguistic comparison, it is argued that neural models can reveal specific areas of linguistic knowledge where data is not enough, suggesting in turn what the human mind itself must contribute.
Incremental Processing of Principle B: Mismatches Between Neural Models and Humans
Forrest Davis
In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), Dec 2022
Despite neural language models qualitatively capturing many human linguistic behaviors, recent work has demonstrated that they underestimate the true processing costs of ungrammatical structures. We extend these more fine-grained comparisons between humans and models by investigating the interaction between Principle B and coreference processing. While humans use Principle B to block certain structural positions from affecting their incremental processing, we find that GPT-based language models are influenced by ungrammatical positions. We conclude by relating the mismatch between neural models and humans to properties of training data and suggest that certain aspects of human processing behavior do not directly follow from linguistic data.
2021
Uncovering Constraint-Based Behavior in Neural Models via Targeted Fine-Tuning
In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Aug 2021
A growing body of literature has focused on detailing the linguistic knowledge embedded in large, pretrained language models. Existing work has shown that non-linguistic biases in models can drive model behavior away from linguistic generalizations. We hypothesized that competing linguistic processes within a language, rather than just non-linguistic model biases, could obscure underlying linguistic knowledge. We tested this claim by exploring a single phenomenon in four languages: English, Chinese, Spanish, and Italian. While human behavior has been found to be similar across languages, we find cross-linguistic variation in model behavior. We show that competing processes in a language act as constraints on model behavior and demonstrate that targeted fine-tuning can re-weight the learned constraints, uncovering otherwise dormant linguistic knowledge in models. Our results suggest that models need to learn both the linguistic constraints in a language and their relative ranking, with mismatches in either producing non-human-like behavior.
Finding Event Structure in Time: What Recurrent Neural Networks can tell us about Event Structure in Mind
Under a theory of event representations that defines events as dynamic changes in objects across both time and space, as in the proposal of Intersecting Object Histories (Altmann & Ekves, 2019), the encoding of changes in state is a fundamental first step in building richer representations of events. In other words, there is an inherent dynamic that is captured by our knowledge of events. In the present study, we evaluated the degree to which this dynamic was inferable from just the linguistic signal, without access to visual, sensory, and embodied experience, using recurrent neural networks (RNNs). Recent literature exploring RNNs has largely focused on syntactic and semantic knowledge. We extend this domain of investigation to representations of events within RNNs. In three studies, we find preliminary evidence that RNNs capture, in their internal representations, the extent to which objects change states; for example, that chopping an onion changes the onion by more than just peeling the onion. Moreover, the temporal relationship between state changes is encoded to some extent. We found RNNs are sensitive to how chopping an onion and then weighing it, or first weighing it, entails the onion that is being weighed being in a different state depending on the adverb. Our final study explored what factors influence the propagation of these rudimentary event representations forward into subsequent sentences. We conclude that while there is much still to be learned about the abilities of RNNs (especially in respect of the extent to which they encode objects as specific tokens), we still do not know what are the equivalent representational dynamics in humans. That is, we take the perspective that the exploration of computational models points us to important questions about the nature of the human mind.
2020
Discourse structure interacts with reference but not syntax in neural language models
Language models (LMs) trained on large quantities of text have been claimed to acquire abstract linguistic representations. Our work tests the robustness of these abstractions by focusing on the ability of LMs to learn interactions between different linguistic representations. In particular, we utilized stimuli from psycholinguistic studies showing that humans can condition reference (i.e. coreference resolution) and syntactic processing on the same discourse structure (implicit causality). We compared both transformer and long short-term memory LMs to find that, contrary to humans, implicit causality only influences LM behavior for reference, not syntax, despite model representations that encode the necessary discourse information. Our results further suggest that LM behavior can contradict not only learned representations of discourse but also syntactic agreement, pointing to shortcomings of standard language modeling.
A standard approach to evaluating language models analyzes how models assign probabilities to valid versus invalid syntactic constructions (i.e. is a grammatical sentence more probable than an ungrammatical sentence). Our work uses ambiguous relative clause attachment to extend such evaluations to cases of multiple simultaneous valid interpretations, where stark grammaticality differences are absent. We compare model performance in English and Spanish to show that non-linguistic biases in RNN LMs advantageously overlap with syntactic structure in English but not Spanish. Thus, English models may appear to acquire human-like syntactic preferences, while models trained on Spanish fail to acquire comparable human-like preferences. We conclude by relating these results to broader concerns about the relationship between comprehension (i.e. typical language model use cases) and production (which generates the training data for language models), suggesting that necessary linguistic biases are not present in the training signal at all.
Interaction with Context During Recurrent Neural Network Sentence Processing
Syntactic ambiguities in isolated sentences can lead to increased difficulty in incremental sentence processing, a phenomenon known as a garden-path effect. This difficulty, however, can be alleviated for humans when they are presented with supporting discourse contexts. We tested whether recurrent neural network (RNN) language models (LMs) could learn linguistic representations that are similarly influenced by discourse context. RNN LMs have been claimed to learn a variety of syntactic constructions. However, recent work has suggested that pragmatically conditioned syntactic phenomena are not acquired by RNNs. In comparing model behavior to human behavior, we show that our models can, in fact, learn pragmatic constraints that alleviate garden-path effects given the correct training and testing conditions. This suggests that some aspects of linguistically relevant pragmatic knowledge can be learned from distributional information alone.
2017
Linguistically Rich Vector Representations of Supertags for TAG Parsing
Dan Friedman, Jungo Kasai, R. Thomas McCoy, and 3 more authors
In Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms, Sep 2017