The due date for this codelet is Monday, Jan 27 at 11:59PM.
The following reintroduces you to using python via some concrete exercises. At the conclusion of this codelet, you should have demonstrated (re)familiarity with key aspects of Python.
Your task is to:
codelet0.zip
from the course website and open
it. You will find these instructions, some relevant data files, and
codelet0.py
which has scaffolding for you.codelet0.py
When assessing your code, satisfactory achievement is demonstrated, in part, by:
Doctests are a useful testing module provided with python. The full documentation can be found here. What’s relevant is that it allows the specification of test cases in docstrings. Consider the below example,
import doctest
def add(a: int, b: int) -> int:
""" Returns the sum
Args:
a (int): First number
b (int): Second number
Returns:
int: Sum of a and b
>>> add(3, 4)
7
>>> add(0, 1)
1
"""
return a + b
if __name__ == '__main__':
=True) doctest.testmod(verbose
Here, we add to the add
function, two simple test cases.
Notice that the desired solution is under the >>>
with no extra indentation. If our function outputted the wrong thing,
running doctest.testmod(verbose=True)
would tell us what
case failed.
For the exercises not mentioned below, you should find sufficient detail in their docstrings.
The final exercise is meant to introduce you to python classes, which you may have only seen in Java. To show you how to format a basic class, I’ve provided an example of a class called Word below:
class Word:
"""
A class word manipulating words.
Attributes:
vowels (set): (class attribute) A set of all vowels
text (str): Text of the word
"""
#A variable shared by all instances of the class
= set(['a', 'e', 'i', 'o', 'u'])
vowels
def __init__(self, text):
#self identifies the instance of class and can be used
#to reference the variables of a specific instance of
#a class. So for an instance W of Word, W.text will
#return the text of that W. Within the class methods
#self.text can be used to pick out the text of the instance.
self.text = text
def whereAreTheVowels(self):
"""
A function that returns the position of the vowels in the word.
Returns:
list: A list of the positions of the vowels in the word.
"""
= []
where for position, letter in enumerate(self.text):
#Checks if the letter is contained in the set
#vowels, if so, we append the position to where
if letter in Word.vowels:
where.append(position)return where
There are some special methods specific to classes. The
__init__
method is one such method, and it serves as the
class constructor. Morever, the arguments, if there are any beyond self
(we return to self in a moment), in this function are required for
instantiating the class. In the case of Word, that means we need to pass
in a variable filling the slot of text.
= 'Fig'
t = Word(t) w
Above, we create an instance of Word that has, as its text attribute “Fig”. Finally, in creating a class we can create methods which can be called by class instances. In the example of w, we can call the function whereAreTheVowels by
w.whereAreTheVowels()
Notice that we don’t pass any arguments in, despite self being in the function declaration.
For this exercise, follow the description provided and create a class
called Linguist with two attributes, name and advisor, and a function
addUniversity
, which takes one argument and adds it to the
class instance (via self.).
As a final refresher on python, conllu2sents.py
lays out
a task of reading in a file and changing the data in some way. To assist
you in understanding the goals of the function, example input/output
pairs are given. As per the typing hints in the function instantiation,
conllu2sent
, expects the name of a file and returns a list.
The conllu in conllu2sent
refers to a file format used for
dependency parsing, among other things (see here). It
consists of tab separated columns, with each column corresponding to a
type of linguistic annotation. For example, the second column holds the
form of a word (as it appears in the sentence), and the third columns
holds the lemma or stem of the word. For example, above you should see
the pair “Is” and “be”. Blank lines mark sentence boundaries, and lines
beginning with # are comments (similarly, # are comments in python).
Your task is to read in a conllu file and output a list containing each
sentence in the file. To do this, you’ll want to loop over the lines in
the file, paying particular attention to the sentence boundaries and to
specific columns in the file. There is one final thing to keep in mind.
Namely, the sentence should maintain normal spacing. That is,
punctuation should directly follow the relevant word. One of the hints
in the file should help.
My recommendation would be to approach this in two parts. The first,
try to get the sentences from the file fr_easy.conllu
. Once
you can do that, move onto fr_hard.conllu
which has
punctuation.