Codelet 2: Linear Regression using PyTorch

The due date for this codelet is Friday, Sep 19 at 11:59PM.

Introduction

The aim of this codelet is to build your PyTorch skills via the concrete implementation of parts of linear regression. You should draw on Chapter 3 of the textbook and the PyTorch documentation.

Important: Only use PyTorch and in-built Python. The use of any other libraries, including the books d2l library results in an automatic unsatisfactory grade for this assignment. A core goal of this class is to build your competencies as a machine learning engineer. I want to minimize abstractions from other libraries so that you build these skills.

Outline

Your assignment

Your task is to:

  1. Download codelet2.zip from the course website and open it. You will find these instructions and codelet2.py which has scaffolding for you.
  2. Complete each of the three broad tasks below in codelet2.py and include a file called codelet2.pdf with your answers to the written questions.

Grading

When assessing your work, satisfactory achievement is demonstrated, in part, by:

Linear Regression

You will complete an implementation of linear regression trained using gradient descent. You should build on codelet2.py. You have two tasks:

  1. Complete the model class
  2. Implement mini-batch gradient descent

Linear Regression Model

The basic structure of your model is provided in the class LineaerRegression. Your task is to complete this function with any relevant methods. Do not add methods that serve no substantive purpose in your code or are not used (that is draw from the book with care). For example, you should not add a plotting method. Note, the loss function is correctly implemented and should not be modified. In your code, you must use PyTorch’s Linear layer to handle your model’s weights (and compute your model’s output). Please note the introduction of this codelet. You may only use PyTorch in completing your implementation.

In addition to completing your model, you must write code that returns the error in your model’s estimation of w and b. See below for the formatting (it draws heavily on the example provided in the book).

To evidence completion of this task, and to check your own progress, please train your model using the train method. The train method should work without any tweaks. Do not modify or otherwise change that function. You should add to the pdf accompanying your code a screenshot of the output of your code. Mine, for example, looks as follows:

0 39.86579513549805
10 12.93497085571289
20 2.6193177700042725
error in estimating w: tensor([-0.5257,  0.3273,  0.6399])
error in estimating b: tensor([0.4122])

Training

Your second task is to adapt the train function to (1) implement mini-batch gradient descent in the function minibatch_train, and (2) implement stochastic gradient descent in the function sgd_train.

In order to help think through this process, you must copy the code from batch_train to your pdf document and annotate the function. Note what each line does. You should make sure to note which line calculates the gradient and which line updates the weights of your model.

In implementing mini-batch gradient descent, here are your decision specifications:

In implementing stochastic gradient descent, here are your decision specifications:

To evidence completion of this task, and to check your own progress, please train your model using your two train functions and print the error of each of your parameters, as before. You should add to the pdf accompanying your code a screenshot of the output of your code. Mine, for example, looks as follows:

Epoch 0 Total Loss 1479.2000402931476
Epoch 1 Total Loss 0.05384931225699052
Stochastic gradient descent error
error in estimating w: tensor([-6.6924e-04, -3.6597e-05, -5.9795e-04, -9.9182e-05])
error in estimating b: tensor([0.0003])
----------------------------------------------------------------------------------
0 10.669316709041595
1 5.0789901435375215
2 1.4392380893230439
3 0.3736846551299095
4 0.15271076932549477
5 0.05610000006854534
6 0.018682969850488007
7 0.008071314846165478
8 0.003019134764326736
9 0.0008185547601897269
10 0.00028801323351217433
11 0.00015633701550541446
12 8.788212180661503e-05
13 6.127286615082994e-05
14 5.611547276203055e-05
15 5.720405024476349e-05
16 5.878164884052239e-05
17 5.920631556364242e-05
18 5.879780437680893e-05
19 5.825129701406695e-05
Mini-batch gradient descent error
error in estimating w: tensor([ 0.0003,  0.0014, -0.0008, -0.0025])
error in estimating b: tensor([0.0002])

Reflection

Finally, you should reflect on the performance of the training methods. You should concretely answer the following questions in the pdf accompanying your code:

  1. How many times does your model see each sample in batch_train, minibatch_train, and sgd_train?
  2. Change the learning rate for sgd_train to 5. What happens to your loss and error? Why might this be the case?