Does CountVectorizer().fit_transform() preserve order of input?

480 Views Asked by rookinn At 03 May 2022 at 11:50

I'm wondering if, when I use CountVectorizer().fit_transform(), the output preserves the order of the input.

My input is a list of documents. I know that the output matches the input in terms of the length, but I'm not sure if they are ordered the same way.

I understand that I might not be explaining it very well, so here's an example.

Say if I have:

input = ["<text_1>", "<text_2>", "<text_3>"]
a = CountVectorizer().fit_transform(input)

Will the indexes correspond as though order is preserved?

For example, in:

  (0, 33)   1
...
  (0, 42)   8
...
  (385, 58) 1
  (385, 51) 6

Is (0, 33) 1 eqivalent to input[0], or (385, 58) 1 to input[365] ?

Original Q&A

There are 1 best solutions below

Arne On 03 May 2022 at 12:08 BEST ANSWER

Yes, the row order is preserved. This must be true for all scikit-learn transformation methods, because a common workflow is to split your data into a feature matrix X and a target vector y, where each row of the matrix corresponds to one element of the vector. When you transform X, you must still be able to train the model on the transformed X paired with y, so the order must be preserved.

Does CountVectorizer().fit_transform() preserve order of input?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in FEATURE-EXTRACTION

Related Questions in COUNTVECTORIZER

Trending Questions

Popular # Hahtags

Popular Questions