The purpose of this lab is to give you hands on experience using the
NLPScholar toolkit to answer a question about the
linguistic knowledge of a pre-trained transformer language model. By
completing this lab, you will demonstrate that you can:
This lab assumes that you have already cloned the
NLPScholar repository and have installed the
nlp environment by following the instructions in
Install.md.
This lab has one part:
sample_results.tsvBefore starting each lab, get the latest version of the
NLPScholar repo by first navigating to the folder on
terminal and then executing:
git pull
Additionally, a package is missing that you need for today. With the nlp environment activated run:
pip install seaborn
Consider the following motivating examples:
Technically, the pronoun in both 1 and 2 is ambiguous. However,
speakers report strong preferences for who she should refer
to in these sentences. Take a minute to check your judgments.
The core insight is that speakers prefer she to refer to
the subject Sally in 1 and the object Mary in
2. The sentences are otherwise the same, so it must be the verbs
frightened and feared which modulate
preferences. That is, these sentences form a minimal
pair where the main verb (frightened or
feared) is varied.
In fact, many (possibly all) languages have verbs like this (Harshorne et al.,
2013). These verbs are called implicit causality verbs.
There are two types: subject implicit causality verbs like
frightened and object implicit causality verbs
like feared. Our research question today is Do
transformer-based language models learn implicit causality? We
will narrow this to a sub-question: Does distilgpt2 learn the
implicit causality bias of verbs? Your tasks in this lab is to
answer this question.
In this first part, think through with your group how can answer this question using the toolkit. Here’s some things to keep in mind to help get you started:
distilbert/distilgpt2 on HuggingFace. Try out some
sentences using interact to see if you can use probability
as a depedent measure.| Subject IC | Object IC |
|---|---|
| frightened | feared |
| bored | believed |
| frustrated | encouraged |
| betrayed | cherished |
| amazed | blamed |
| confused | divorced |
| amused | revered |
| worried | trusted |
| haunted | liked |
| upset | valued |
In this part, you should use the interact mode to test
some initial ideas for how to evaluate the model’s knowledge. To help
scaffold you here, consider this google sheet.
It includes the format you should use to organize your experiment on the
sheet labeled data. There are columns included to help you think through
what information you should included. See the MinimalPairAnalysis
document for more details on these column names.
Using interact mode you should fill in the results table
with your initial explorations. You should develop by the end of this
sentences and an initial result by aggregating over your results table
(in the results sheet)