I am trying to import data from a large text file that does have a common delineator between the values. There is anywhere from 1 up to about 5 or so spaces between each of the values, but there are also single spaces within some of the values, which causes the values to be separated when using sep="".
I tried to copy and paste some of the data, but it is so large I am not sure how it will come across on this platform, so I have also included some pics of the text file.
Beginning of the data set (many more rows and columns): 
Middle of the data set:

No CollarID UTC Date UTC Time LMT Date LMT Time Origin SCTS Date SCTS Time ECEF X ECEF Y ECEF Z Latitude Longitude Height DOP FixType 3D Error Sats Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Sat C/N Mort. Status Activity Main Beacon Temp Easting Northing AnimalID GroupID
m/d/yyyy h:mm:ss AMPM m/d/yyyy h:mm:ss AMPM m/d/yyyy h:mm:ss AMPM [m] [m] [m] [°] [°] [m] [m] used No db No db No db No db No db No db No db No db No db No db No db No db [V] [V] [°C] sskkkmmm.mmm ±kkkkmmm.mmm
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 13642 2/24/2020 5:00:38 PM 2/24/2020 5:00:38 PM Collar 9/1/2022 4:42:09 PM 3789035 911471 5032242 52.4308439 13.5257923 83.00 1.8 val. GPS-3D 3.0 7 2 41 12 44 21 47 25 44 26 41 29 44 31 41 0 0 0 0 0 0 0 0 0 0 normal 0 3.14 3.46 6 33399772.133 5809981.769 N/A N/A
2 13642 2/24/2020 6:00:17 PM 2/24/2020 6:00:17 PM Collar 9/1/2022 4:42:09 PM 3789026 911466 5032229 52.4308433 13.5257518 66.65 4.4 val. GPS-3D 7.0 5 21 47 25 44 26 41 29 41 31 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.20 3.46 5 33399769.376 5809981.762 N/A N/A
3 13642 2/24/2020 7:00:39 PM 2/24/2020 7:00:39 PM Collar 9/1/2022 4:42:09 PM 3789033 911470 5032242 52.4308594 13.5257849 81.67 1.4 val. GPS-3D 3.2 8 5 41 16 41 20 47 21 41 26 44 27 38 29 44 31 29 0 0 0 0 0 0 0 0 normal 0 3.18 3.42 4 33399771.664 5809983.505 N/A N/A
4 13642 2/24/2020 8:00:17 PM 2/24/2020 8:00:17 PM Collar 9/1/2022 4:42:09 PM 3789038 911470 5032239 52.4308083 13.5257677 82.26 2.6 val. GPS-3D 7.0 6 16 44 20 50 21 41 26 44 27 44 29 32 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.18 3.44 4 33399770.379 5809977.850 N/A N/A
5 13642 2/24/2020 9:00:09 PM 2/24/2020 9:00:09 PM Collar 9/1/2022 4:42:09 PM 3789032 911468 5032235 52.4308313 13.5257597 75.24 5.0 val. GPS-3D 7.4 5 16 44 20 44 21 41 26 41 27 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.22 3.46 4 33399769.890 5809980.415 N/A N/A
6 13642 2/24/2020 10:00:38 PM 2/24/2020 10:00:38 PM Collar 9/1/2022 4:42:09 PM 3789041 911468 5032243 52.4308128 13.5257288 86.92 1.2 val. GPS-3D 3.4 9 1 35 8 44 10 47 11 44 16 44 20 41 21 35 27 44 30 38 0 0 0 0 0 0 normal 0 3.20 3.40 5 33399767.744 5809978.401 N/A N/A
7 13642 2/24/2020 11:00:17 PM 2/24/2020 11:00:17 PM Collar 9/1/2022 4:42:09 PM 3789032 911467 5032229 52.4308001 13.5257454 70.34 2.2 val. GPS-3D 4.4 6 1 38 8 41 10 44 11 44 20 35 27 47 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.20 3.42 6 33399768.848 5809976.964 N/A N/A
8 13642 2/25/2020 12:00:08 AM 2/25/2020 12:00:08 AM Collar 9/1/2022 4:42:09 PM 3789035 911467 5032237 52.4308231 13.5257351 78.46 2.8 val. GPS-3D 5.2 5 1 44 8 44 10 44 11 38 27 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.22 3.48 7 33399768.199 5809979.543 N/A N/A
9 13642 2/25/2020 1:00:39 AM 2/25/2020 1:00:39 AM Collar 9/1/2022 4:42:09 PM 3789033 911470 5032236 52.4308265 13.5257849 76.91 1.6 val. GPS-3D 2.4 8 1 44 3 44 8 41 11 47 14 44 17 38 22 44 28 41 0 0 0 0 0 0 0 0 normal 0 3.22 3.44 8 33399771.589 5809979.849 N/A N/A
10 13642 2/25/2020 2:00:16 AM 2/25/2020 2:00:16 AM Collar 9/1/2022 4:42:09 PM 3789034 911470 5032239 52.4308360 13.5257814 79.88 1.8 val. GPS-3D 3.8 7 1 44 3 44 11 47 14 41 17 41 19 38 22 44 0 0 0 0 0 0 0 0 0 0 normal 0 3.24 3.48 8 33399771.377 5809980.912 N/A N/A
11 13642 2/25/2020 3:00:09 AM 2/25/2020 3:00:09 AM Collar 9/1/2022 4:42:09 PM 3789030 911470 5032236 52.4308473 13.5257952 75.14 2.2 val. GPS-3D 6.6 6 1 44 3 44 14 32 17 44 19 41 22 44 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.24 3.46 8 33399772.337 5809982.145 N/A N/A
12 13642 2/25/2020 4:00:38 AM 2/25/2020 4:00:38 AM Collar 9/1/2022 4:42:09 PM 3789036 911469 5032236 52.4308074 13.5257603 78.55 1.6 val. GPS-3D 3.2 8 1 38 2 38 6 44 9 44 19 44 22 47 23 44 31 38 0 0 0 0 0 0 0 0 normal 0 3.22 3.44 7 33399769.873 5809977.757 N/A N/A
13 13642 2/25/2020 4:54:41 AM 2/25/2020 4:54:41 AM Collar 9/1/2022 4:42:09 PM 3789043 911469 5032240 52.4307808 13.5257362 85.87 2.4 val. GPS-3D 4.6 7 2 41 3 47 4 38 6 44 9 44 22 35 23 41 0 0 0 0 0 0 0 0 0 0 Mortality no radius 0 3.22 3.46 7 33399768.176 5809974.837 N/A N/A
14 13642 2/25/2020 5:21:45 AM 2/25/2020 5:21:45 AM Collar 9/1/2022 4:42:09 PM 3789044 911474 5032243 52.4307820 13.5258042 89.55 2.8 val. GPS-3D 5.6 6 2 44 3 44 4 41 6 44 9 44 23 41 0 0 0 0 0 0 0 0 0 0 0 0 Mortality no radius 0 3.22 3.48 7 33399772.804 5809974.874 N/A N/A
44 13642 2/26/2020 9:00:16 AM 2/26/2020 9:00:16 AM Collar 9/1/2022 4:42:09 PM 3789037 911470 5032239 52.4308152 13.5257711 81.66 2.2 val. GPS-3D 4.6 7 5 47 7 44 13 44 15 41 27 29 28 47 30 41 0 0 0 0 0 0 0 0 0 0 Mortality no radius 0 3.22 3.44 4 33399770.628 5809978.615 N/A N/A
45 13642 2/26/2020 10:00:09 AM 2/26/2020 10:00:09 AM Collar 9/1/2022 4:42:09 PM 3789031 911468 5032232 52.4308218 13.5257632 72.27 3.4 val. GPS-3D 5.4 5 5 38 13 44 15 44 28 44 30 44 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mortality no radius 0 3.22 3.44 3 33399770.102 5809979.352 N/A N/A
46 13642 3/10/2020 9:01:37 PM 3/10/2020 9:01:37 PM Collar 9/1/2022 4:42:09 PM N/A N/A N/A N/A N/A N/A 48.6 No Fix N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.22 3.50 19 31166021.445 0.000 N/A N/A
47 13642 3/11/2020 1:01:37 AM 3/11/2020 1:01:37 AM Collar 9/1/2022 4:42:09 PM N/A N/A N/A N/A N/A N/A 48.6 No Fix N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.22 3.52 19 31166021.445 0.000 N/A N/A
48 13642 3/11/2020 3:01:37 AM 3/11/2020 3:01:37 AM Collar 9/1/2022 4:42:09 PM N/A N/A N/A N/A N/A N/A 48.6 No Fix N/A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 normal 0 3.22 3.52 19 31166021.445 0.000 N/A N/A
I tried the following code, which I think would have worked if there weren't some places where the values are only separated by a single space (see pic of middle of the data set)
headers <- read.table(text = gsub("\\s{2,}", "\t", readLines(file_name)), sep="\t", header = FALSE, nrows = 1, as.is = TRUE)
Vec_data <- read.table(text = gsub("\\s{2,}", "\t", readLines(file_name)), sep="\t", skip = 3, header = FALSE)
colnames(Vec_data)= headers
That's a "fixed width format" (FWF) file, so we need
read.fwfand need to determine each column's widths.I took one of the lines and counted to the end of each value. For instance:
(etc).
There are two ways to do this:
We can
readLines()the whole file and then subset[-(2:3)]the second/third rows. If the file is huge, this may be less efficient (given R's global string hash), but if not then this is likely the easiest way to go.Unfortunately, it does not work with this file :-(
(Note, sometimes this means that one or more of the columns are off by 1 or so ... "+1" in one and "-1" in another may fix it. I just didn't work on it here.)
Sometimes with FWF files, the column names don't perfectly align with the data columns. In this case, we need to skip some rows and then either find another set of
widthsto use, or just set them manually.For names, oddly enough we can read them in like this:
Unfortunately, note that many of them are repeating (
"Sat"and"C/N"), so you'd need to find a way to unique-ify them anyway.After that, I suggest some basic cleanup, including removal of leading spaces (most fields), converting
"N/A"to R'sNA, and then after all of that convert them into classes that R knows about (since any column that sees"N/A"will be a string column even if it really is numeric).(See here for the
quux[] <-thing.)From here, now you'll have the challenges of converting dates and times to
POSIXt