How to add leading zero to select rows of a column of integers in R

85 Views Asked by At

I've hit a problem which I'm assuming needs an easy solution, but I've been going in circles trying to figure it out with no luck.

I have two dfs, each with a "Facility.ID" column. In df1, Facility.IDs range from 5 integer digits to 6 integer digits. In df2, all IDs are 6 integer digits, where 0's were placed in front of the previously 5 integer digits to make them all 6 digits (For example, one Facility.ID in df1 is 10001, but in df2 the same site's is 010001).

I need to merge these two dfs by their common Facility.ID column, but df1 lacks 0s for the first 536 rows of Facility.IDs. These are expressed as integers in my df, not characters.

For context:

  • The same Facility is referenced as 10001 in df1, but 010001 in df2. I essentially just need to add a 0 in front of the first 536 Facility.ID's of df1 to merge the two dfs correctly.

Tried using sprintf, but ran into an error code that made it seem like I should convert the column to be characters instead of integers. Though it didn't work.

final2019 <- transform(final2019, Facility.ID = as.character(Facility.ID)) %>% sprintf("%06d", final2019$Facility.ID)

Error in sprintf(., "%06d", final2019$Facility.ID) : 'fmt' is not a character vector

2

There are 2 best solutions below

0
r2evans On

Up front: two problems here:

  • the known problem is that %>% is passing the return value from transform as the first argument to sprintf; in this case, it's passing a data.frame to a function that expects a length-1 string; and
  • even once we fix that, the sprintf format won't work.

Follow through to the end ... I walk you through it a little, then wrap up with a likely-correct and more efficient solution.


transform passes a data.frame to the next function in the pipe, so try

final2019 <- transform(final2019, Facility.ID = as.character(Facility.ID)) %>% 
  transform(Facility.ID = sprintf("%06d", Facility.ID))

In your case, the way %>% works is that it prepends the current frame as the first argument of the next function. In your case, that means the flow looked somewhat like this:

tmp1 <- transform(final2019, Facility.ID = as.character(Facility.ID))
sprintf(fmt = tmp1, "%06d", final2019$Facility.ID)

(I added the fmt= there to identify that the first argument will be passed the frame.)

where fmt is supposed to be

     fmt: a character vector of format strings, each of up to 8192
          bytes.

Clearly this is not what you wanted/intended.

Even if you explicitly name fmt="%06d", the frame will then be the first unnamed argument, which is going to erro with unsupported type, since sprintf mostly works on "simple" things, not lists/frames, etc.


Heads up, though, you're still in trouble: %06d expects an integer, not a string, so you're still going to err ... while this is unrelated to the %>%-mixup, it's still something to resolve:

final2019 <- transform(final2019, 
  Facility.ID = sprintf("%06s", as.integer(as.character(Facility.ID)))

You need both as.integer(as.character(.)), though, since on a factor, as.integer alone will return the underlying index integers, not the strings to which they refer. And that is a really confusing problem to troubleshoot:

vec <- factor(c("1","3","22"))
as.integer(vec)
# [1] 1 3 2
as.integer(as.character(vec))
# [1]  1  3 22
0
Chris Ruehlemann On

You can transform the vector with regex:

df1 %>%
  mutate(Facility.ID = 
  # if the number of `d`igits is < 6, ...
  ifelse(str_count(as.character(Facility.ID),"\\d") < 6,
                              # ... add leading 0: 
                              str_c("0", Facility.ID),
                              # else leave as is:
                              Facility.ID))
  Facility.ID
1      011111
2      022222
3      333333
4      044444

Data:

df1 <- data.frame(
  Facility.ID = c(11111,22222,333333,44444)
)