How can I use the mask command to include more than one parameter?

144 Views Asked by At

I'm currently doing a machine learning project (a very basic one), and using baseball data from 1871-2015. I want to use a specific set of years to test my prediction on. I'm using the dfply package and then the mask command to take out a certain year, but I need more than just one year taken out. How can I go about this?

Thank you in advance.

I've tried to use "or" and "|" as well as adding () and [].

import pandas as pd

import numpy as np 

import sklearn

from sklearn import linear_model

from sklearn.utils import shuffle

import matplotlib.pyplot as pyplot

import pickle 

from matplotlib import style

from dfply import *

import statsmodels.api as sm

import statsmodels.formula.api as smf

data = pd.read_csv("team.csv")

data_test = (data >>  mask(X.year == 1997))

I want the X.year to be from 1997-2015.

1

There are 1 best solutions below

0
AnsFourtyTwo On BEST ANSWER

Assuming you have a column year in your pandas.DataFrame, this should work:

data_test = data[data.year == 1997]