Extracting row, text and numbers from character with R

442 Views Asked by At

Need to extract data from the text ( this is just a sample)

text <- c("    9 A                  1427107                -", 
              "    99 (B)                3997915                -", 
              "    999 (SOCIO)            7161315                -", 
              "    9999 @M                 4035115                -", 
              "    99999 01 Z               2136481035115         8,621" 
              )

so far I tried but could not create pattern for all columns

as.numeric(gsub("([0-9]+).*$", "\\1",text))

I want my data frame out put looks like

row_names   Text        ID              Amount
9           A           1427107         - 
99          (B)         3997915         - 
999         (SOCIO)     7161315         -
9999        @M          4035115         - 
99999       01 Z        2136481035115   8,621

Row_names are all the numbers, "Text" contains numbers and text ID column contains numbers from 7 to 13 digits, Amount is either a "-" or numbers with thousands (,)

2

There are 2 best solutions below

4
On BEST ANSWER

We can use read.table to read the data into a data.frame

df1 <- read.table(text =  text, header = FALSE, fill = TRUE)

Or using extract

library(tibble)
library(tidyr)
tibble(col1 = trimws(text)) %>% 
    extract(col1, into = c('rn', 'Text', 'ID', 'Amount'),
        '^(\\d+)\\s+(.*)\\s+(\\d+)\\s+([-0-9,]+)', convert = TRUE)
0
On

In base R, we can use strcapture and provide the pattern and type of data to extract.

strcapture('\\s+(\\d+)\\s(.*?)\\s+(\\d+)\\s(.*)', text, 
           proto=list(row_names=integer(), Text=character(), 
                      ID = numeric(), Amount = character()))

#  row_names    Text            ID           Amount
#1         9       A       1427107                -
#2        99     (B)       3997915                -
#3       999 (SOCIO)       7161315                -
#4      9999      @M       4035115                -
#5     99999    01 Z 2136481035115            8,621