I am trying to understand whether it makes sense to apply Low-rank approximations to learnable parameters in a class. The goal is to reduce parameter counts.
I have the following custom module :
class CustomPara(nn.Module):
def __init__(self, num_blocks, in_planes, out_planes, kernel_size):
super(CustomPara, self).__init__()
self.coefficient_shape = (num_blocks,1,1,1,1)
blocks = [torch.Tensor(out_planes, in_planes, kernel_size, kernel_size) for _ in range(num_blocks)]
for i in range(num_blocks): init.kaiming_normal_(blocks[i])
self.blocks = nn.Parameter(torch.stack(blocks)) # this is what we will freeze later
def forward(self, coefficients):
final_blocks = (self.blocks*coefficients).sum(0)
return final_blocks
Is it possible to reduce the number of learnable parameters here using Low-rank adaptation on the blocks parameter?
Indeed, the idea is introduced in the paper LoRA: Low-Rank Adaptation of Large Language Models.
The key idea is that any matrix W of shape n * m and rank r can be written as multiplication of two matrices A, B such that A has shape n * r and B has shape r * m.
by introducing A, B as trainable matrices instead of W itself you can decrease the number of trainable parameters with a little bit of decrease in model performance.