I have a column with texts, named 'OBSERVA.' In the midst of this text, there may be a sequence of 8 digits corresponding to a code that I would like to extract for filling another column. For example, one of the tuples in the OBSERVA column has the following record: 'DO 29932940-2 OCCUPATION: RETIRED INFLUENZA UNDER ANALYSIS GAL.' In this case, I need to extract the numbers 29932940. I used the 'str_extract' function from the 'stringr' package, but I did not get a satisfactory result (the sequence of 8 numbers is not identified, I just have NA's).
library(stringr)
dados_sivep_tratados$Teste <- ifelse(
dados_sivep_tratados$NU_DO == 0 & !is.na(dados_sivep_tratados$OBSERVA),
str_extract(dados_sivep_tratados$OBSERVA, "\\b\\d{8}\\b"),
NA
)
Example with different lengths of the number before
-