In the Lab 3 we worked with the Standard American English dialect. Some of the sentences that were ungrammatical in this dialect can be grammatical in other dialects. For example, the following sentences considered grammatical by (at least some) speakers of Indian English in specific situations.
The following sentences, however, are considered ungrammatical.
Note, these sentences do not make up the comprehensive set of grammatical and ungrammatical sentences, but are just illustrative examples.
Your goal in this homework is to evaluate if the pretrained LM
distilbert/distilgpt2
treats sentences from Indian English
as being grammatical
HW1.py
HW1.py
with the main function implemented which tests
your grammar and creates training and validation datahw1_grammar.txt
with the rules for Indian English
implemented along with the rules from Lab3 implementing the adjectives
and adverbs.Make sure that your grammar from Lab3 works as intended before trying to modify it
Based on the examples above, what is the difference between Standard American English and Indian English dialects? Modify your grammar so it can accept sentences from Indian English. Write test cases to test your grammar.
In this part your goal is to evaluate if the pretrained LM
distilbert/distilgpt2
treats sentences from Indian English
as being grammatical. One approach to do this is to embed the sentence
in a fronted sentential complement.
For example, if you wanted to verify that a sentence like
the panda gave/sent/lent the sandwich
, you could compare
the following minimal pairs.
the fact that the panda gave/sent/lent the sandwich annoyed my friend
the fact that the panda gave/sent/lent the sandwich to the pajamas annoyed my friend
You could swap out annoyed with verbs like perplexed
and
surprised
.
In the google doc template, answer the following questions:
ROI
)distilbert/distilgpt2
considered sentences from
Indian English to be grammatical, what patterns would you expect to see
in the microdiff
column or the accuracy
column
of your results file? Why?In this part you should use the NLP Scholar pipeline to more
systematically evaluate whether distilbert/distilgpt2
treats Indian English sentences as being grammatical.
Here are some things you should figure out before you run the pipeline:
sentid
,
pairid
, contextid
and
condition
?lemma
s can you use?In the google doc template answer the following questions:
Generate 10000 sentences from your grammar with a maximum depth of 6.
Finetune distilbert/distilgpt2
model on this data. Use 90%
of the sentences for your training, and 10% for validation.
Evaluate your finetuned model on the same sentences from Indian Englsh.
In the google doc template answer the following questions:
What are the limitations of the experiment you ran? What are some changes you would make to the experimental setup if you wanted to more robustly study the following questions: