Understanding train_test_split's documentation

43 Views Asked by At

My question is regarding Python's sklearn.model_selection.train_test_split method (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html), to be more precise: its return type.

How do I know what exactly is returned reading this documentation? (it just says "splitting: list, length=2 * len(arrays), List containing train-test split of inputs.")

Without looking at the examples below I would not have known that it makes sense to call it like X_train, X_test, y_train, y_test = train_test_split(...). How would I have known that?

Furthermore I was quite surprised that after inserting a data frame the result is not a data frame any longer and the column names are gone.

Do you also have general advice on how to read Python documentations?

1

There are 1 best solutions below

0
On

I don't think the function is limited to the examples. It allows an arbitrary number of indexables and returns "2 x number of indexables" as can be seen:

Returns: splittinglist, length=2 * len(arrays)

So, your output can also be an arbitrary number of elements e.g.

a,b,c,d,e,f = train_test_split(X,y,z)

also runs successfully. Regarding your other point, scikit-learn uses Numpy for speeding up computations as pandas is quite slow. However, I agree that documentation would be better mentioning the return type