Is there a way in R or Excel to check that values alternate?

84 Views Asked by At

I have a data frame with 60000 rows. In one of the column, the values alternate between "landing" and "take off", as such.

Movements Flight number Destination
take-off 011 Paris
landing 011 Paris
take-off 053 Ibiza
landing 053 Ibiza
take-off 067 Mans
take-off 123 Geneva
landing 123 Geneva

But there are some mistakes, like here in row 5. In this instance, I have the take-off row, but not the landing one.

Is there any way in R or Excel to highlight or select those rows? To be specific: flight numbers are not unique and appear multiple times throughout it.

Thanks!

I tried this:

data() <- read.csv("Flight data.csv")

# Tried to find the positions where values don't alternate correctly
incorrect_positions <- which(data$event[-1] == data$event[-length(data$event)]) + 1

if (length(incorrect_positions) == 0) {
  print("The values in the 'event' column alternate correctly.\n")
} else {
  print("The values in the 'event' column do not alternate correctly at positions:", incorrect_positions, "\n")
}

It did not give me an error, but apparently doesn't correspond to what I want - it kept printing that all the values alternate correctly, when I know for a fact that they don't.

2

There are 2 best solutions below

0
On BEST ANSWER

Using as.factor then as.integer to strip of the levels. differences should be -1 or 1; so taking abs will help to identify which are bad ones.

> f <- \(x) c(1L, abs(diff(as.integer(as.factor(x)))))
> f(dat$Movements)
[1] 1 1 1 1 1 0 1
> which(f(dat$Movements) != 1)
[1] 6

Data:

> dput(dat)
structure(list(Movements = c("take-off", "landing", "take-off", 
"landing", "take-off", "take-off", "landing"), Flight.number = c(11L, 
11L, 53L, 53L, 67L, 123L, 123L), Destination = c("Paris", "Paris", 
"Ibiza", "Ibiza", "Mans", "Geneva", "Geneva")), class = "data.frame", row.names = c(NA, 
-7L))
0
On

There are several ways to check whether the Movements column alternates. You could check whether all odd entries match entry 1, and all even entries match entry 2. I think you also want the flight number and destination to be constant across those pairs. This code does all the tests:

odds <- seq(1, nrow(dat), by = 2)
if (max(odds) == nrow(dat)) {
  warning("You have an odd number of entries!  Last entry not checked.")
  odds <- odds[-length(odds)]
}
evens <- odds + 1

bad <- dat$Movements[odds] != dat$Movements[1]
if (any(bad))
  warning("These odd entries are wrong:", paste(odds[bad], collapse=","))

bad <- dat$Movements[evens] != dat$Movements[2]
if (any(bad))
  warning("These even entries are wrong:", paste(evens[bad], collapse=",")

bad <- dat$FlightNumber[odds] != dat$FlightNumber[evens]
if (any(bad))
  warning("These flight numbers don't match:", paste("(", odds[bad], ",", evens[bad], ")", collapse = ",")

bad <- dat$Destination[odds] != dat$Destination[evens]
if (any(bad))
  warning("These destinations don't match:", paste("(", odds[bad], ",", evens[bad], ")", collapse = ",")