Subsetting a data frame using the sum of each row vector R

104 Views Asked by Andrew Buchanan At 28 June 2025 at 18:00

Hi I have some data I am reading in from a csv, which is set out in binary form:

   1 2 3 4...N
1  0 1 0 1...1
2  1 1 0 1...1
3  0 0 0 0...0
4  1 0 1 1...1
.  1 1 1 0...1
.  1 0 0 0...1
N  0 0 1 1...0

screenshot of str(data)

I want to take a subset of this data where the sum of the row vectors is greater than a number say 10, or x. The first column is a placeholder column for customer ID, so this needs to be excluded. Do you have any suggestions about how I could go about doing this?

I've been trying various things like df=subset() but I've not been able to get the syntax correct.

Thanks in advance.

Original Q&A

There are 1 best solutions below

akrun On 03 April 2018 at 14:39 BEST ANSWER

We can do this with rowSums

df1[rowSums(df1) > 10, , drop = FALSE]
#  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#7  0  0  0  1  0  0  1  1  0   1   1   1   1   1   0   0   0   1   1   1
#9  1  1  1  1  0  0  1  0  0   0   0   1   1   0   0   1   1   1   0   1

Update

In the OP's dataset, the first column 'X' is not binary and have bigger numbers. So, when we include that variable, the rowSums would be greater than 10. It is the index ID and not to be used in the calculation. So, by removing it in the rowSums, it would subset well

df1[rowSums(df1[-1])> 10,]

data

set.seed(24)
df1 <- as.data.frame(matrix(sample(0:1, 10* 20, replace = TRUE), ncol = 20))

Subsetting a data frame using the sum of each row vector R

There are 1 best solutions below

Update

data

Related Questions in R

Related Questions in DATAFRAME

Related Questions in SUBSET

Related Questions in RECOMMENDERLAB

Trending Questions

Popular # Hahtags

Popular Questions