Subsetting a data frame using the sum of each row vector R

96 Views Asked by At

Hi I have some data I am reading in from a csv, which is set out in binary form:

   1 2 3 4...N
1  0 1 0 1...1
2  1 1 0 1...1
3  0 0 0 0...0
4  1 0 1 1...1
.  1 1 1 0...1
.  1 0 0 0...1
N  0 0 1 1...0

screenshot of str(data)

I want to take a subset of this data where the sum of the row vectors is greater than a number say 10, or x. The first column is a placeholder column for customer ID, so this needs to be excluded. Do you have any suggestions about how I could go about doing this?

I've been trying various things like df=subset() but I've not been able to get the syntax correct.

Thanks in advance.

1

There are 1 best solutions below

14
On BEST ANSWER

We can do this with rowSums

df1[rowSums(df1) > 10, , drop = FALSE]
#  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#7  0  0  0  1  0  0  1  1  0   1   1   1   1   1   0   0   0   1   1   1
#9  1  1  1  1  0  0  1  0  0   0   0   1   1   0   0   1   1   1   0   1

Update

In the OP's dataset, the first column 'X' is not binary and have bigger numbers. So, when we include that variable, the rowSums would be greater than 10. It is the index ID and not to be used in the calculation. So, by removing it in the rowSums, it would subset well

df1[rowSums(df1[-1])> 10,]

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:1, 10* 20, replace = TRUE), ncol = 20))