can't we vectorize code with nested loops to update matrix values

50 Views Asked by At

I wrote a piece of code but I am not sure if we can get rid of the loops and vectorize it to make it faster. Can you please give suggestions? I am just updating the co-occurence matrix .

 M = np.zeros((num_words,num_words))
    word2Ind = {words[i]:i  for i in range(len(words))}

    for document in corpus:
        for i,word in enumerate(document):
            for j in range(i - window_size ,i + window_size + 1):
                if i != j and j >= 0 and j <= len(document) - 1:
                    M[word2Ind[document[i]],word2Ind[document[j]]] += 1
1

There are 1 best solutions below

0
chrslg On BEST ANSWER

You could at least, since the only thing you use word2ind for is in pieces word2int[document[?]] start with computing index for your document once for all, and then work from those index

M = np.zeros((num_words,num_words))
word2Ind = {words[i]:i  for i in range(len(words))}

for document in corpus:
    IX=[word2Ind[d] for d in document]
    for i,word in enumerate(document):
        for j in range(i - window_size ,i + window_size + 1):
            if i != j and j >= 0 and j <= len(document) - 1:
                M[IX[i], IX[j]] += 1

It becomes then easier to slighly vecorize

M = np.zeros((num_words,num_words))
word2Ind = {words[i]:i  for i in range(len(words))}

for document in corpus:
    IX=np.array([word2Ind[d] for d in document], dtype=np.uint32)
    for j in range(1 , window_size + 1):
        if j==0: continue
        M[IX[:-j], IX[j:]] += 1
        M[IX[j:], IX[:-j]] += 1