Good afternoon Community,
I have .dat file (a sample included) that originally, I had made a previous post thinking that I just needed help to shift rows that were misaligned. However, in further examining of the data, I see the problem is a bit more complex (to me) and I am having a difficult time figuring out how to approach this data and problem.
I am trying to clean this file and preferably get it in a dataframe so that I can use specific information from this file to perform a comparative analysis with other csv files that I have managed to clean up already.
**TL;DR ** I have a .dat file that contains multiple tables of various header that I am trying to clean and preferably put in a DataFrame. The complexity (to me) is that this file contains two primary headers with different lengths that are not only at the beginning but throughout the file. There is also some misalignment for specific columns and it's important that the 'structure' is maintained as each table in the file correlates to a route.
The details..
Firstly, the file itself consists of many table joins which also contains headers throughout the data.
These headers follow two primary types and patterns:
The 1st header contains Time | Sim_time, etc which are separated by ' | ' and the data that immediately follows this have the same corresponding ' | '. This is represented in just two rows in the data.
The 2nd header is identified by the index (idx) and is space separated. It's data also follows respectively.
Throughout the file you will notice that the end of one set of data and the beginning of the next can be determined when the Time (date/time e.g. year-month-day) is found. This represents the 1st header with Idx correlating to the 2nd header and the corresponding data proceeding immediately after.
Furthermore, when looking at the data and putting it in a delimited format, I observed that the fix_nam column under the 2nd header is misaligned showing time when there is no fix_nam which is identified as letters. everything here should be shifted right leaving a blank under fix_nam if there is no string character. Leading me to the original problem I had posted Here.
I am a bit stuck as all my scripts seem to produce various outputs but not one in a proper DataFrame to encapsulate what I am asking. I did include one of my latest scripts so if anyone has any thoughts on how to approach this, I would be very appreciative.
If there is more context or information that I can provide to help please let me know.
Thank you.
import pandas as pd
# Initialize lists to store headers and data
headers_list = []
data_list = []
second_header_list = []
# Open the file and read line by line
with open('datashort.dat', 'r') as file:
for line in file:
# Strip whitespace characters from the line
line = line.strip()
if line:
# Split the line into fields using '|' delimiter
fields = line.split('|')
if len(fields) > 1:
# Check if the headers are from the first set or the second set
if len(headers_list) == 0:
headers_list.append(fields)
else:
second_header_list.append(fields)
else:
data_list.append(fields[0].split())
headers_df = pd.DataFrame(headers_list)
second_header_df = pd.DataFrame(second_header_list)
data_df = pd.DataFrame(data_list)
headers_combined = headers_df.apply(lambda x: '|'.join(x.dropna()), axis=1)
result_df = pd.concat([headers_combined, data_df], ignore_index=True, axis=1)
result_df.columns = result_df.iloc[0]
result_df = result_df[1:]
output_path = 'output.csv'
result_df.to_csv(output_path, index=False)
Here is some sample data (apologies if I added this incorrectly)
Time|Sim_Time|Record_Code|Record_Type|Channel_Id|Channel_Mode|Rec_Id|App_Id|Process_Id|Host_Name|Acid|Cfpid|Tail_No|NAS_Cid|Dept|Arr|FixNbr|Profiles|Parse_Time|System_Mode|Release_Name$$$
2023-07-31T23:30:00.442Z|2023-07-31T23:30:00.442Z|32|28003|A|Active|420|420|6049|fdpsa101|XXX1234|1891|C857TJ|538|LALA|AHAH|75|
Idx fix_loc fix_nam ETA Spd LL UL clr_0 clr_1 Post Center fixOvr fixInd fixInd2 FixTypes
00 183100N0682100W LALA 22:16:22 M078 F320 F320 F320 F320 NY DOMES CERDA x00 x00 x00 RADAR_MON_IN/OUT_XFER/
01 183100N0682100W LALA 22:16:22 M078 F320 F320 F320 F320 CERDA x0100 x04 x00 DISPLAY/OUT_XFER/AIRPORT/DCT/
02 184531N0683420W 22:18:47 M078 F320 F320 F320 F320 CERDA x00 x04 x00 RADAR_MON_OUT/OUT_XFER/DCT/
03 190919N0685619W 22:22:45 M078 F320 F320 F320 F320 CERDA x00 x04 x00 RADAR_MON_IN/OUT_XFER/DCT/
04 193800N0692300W 22:27:35 M078 F320 F320 F320 F320 CERDA x00 x04 x00 DISPLAY/AS_FILED/OUT_XFER/DCT/
05 194748N0692411W 22:28:49 M078 F320 F320 F320 F320 CERDA x00 x04 x00 RADAR_IN/OUT_XFER/DCT/
06 194748N0692411W 22:28:49 M078 F320 F320 F320 F320 CERDA x00 x04 x00 RCADEV_IN/OUT_XFER/DCT/
07 230754N0694847W 22:54:09 M078 F320 F320 F320 F320 CERDA x00 x4001 x00 KEEP/OUT_XFER/REPORTED/ADS_REP/
08 241119N0695649W 23:01:52 M078 F320 F320 F320 F320 CERDA x00 x4001 x00 KEEP/OUT_XFER/REPORTED/ADS_REP/
09 242117N0695806W 23:03:05 M078 F320 F320 F320 F320 CERDA x00 x0200 x00 MOBPST/OUT_XFER/LVCHG/
10 242637N0695847W 23:03:44 M078 F320 F320 F320 F320 CERDA x00 xffffc001 x00 KEEP/OUT_XFER/REPORTED/ADS_REP/RADAR_REP/
11 244025N0700034W 23:05:25 M078 F320 F320 F320 F320 CERDA CERDA x00 x04 x00 XCENTER_IN/OUT_XFER/DCT/
12 244025N0700034W 23:05:25 M078 F320 F320 F320 F320 CERDA CERDA x00 x04 x00 DISPLAY/CENTER_IN/OUT_XFER/DCT/
13 245058N0700155W 23:06:42 M078 F320 F320 F320 F320 90 CERDA x00 x04 x00 POSTING_IN/OUT_XFER/DCT/
14 245213N0700205W 23:06:52 M078 F320 F320 F320 F320 CERDA x00 x4001 x00 KEEP/REPORTED/ADS_REP/
15 250000N0700306W 23:07:49 M078 F320 F320 F320 F320 CERDA CERDA x00 x04 x00 CENTER_OUT/DCT/
16 250000N0700306W 23:07:49 M078 F320 F320 F320 F320 90 x00 x04 x00 SECTOR_IN/IN_FIR/DCT/
17 250000N0700306W LAMER 23:07:49 M078 F320 F320 F320 F320 90 x00 x4000 x00 DISPLAY/AS_FILED/IN_FIR/ADS_REP/
18 250000N0700306W 23:07:49 M078 F320 F320 F320 F320 90 x00 x04 x00 RCADEV_OUT/IN_FIR/DCT/
19 250000N0700306W 23:07:49 M078 F320 F320 F320 F320 90 x00 x04 x00 RADAR_OUT/IN_FIR/DCT/
20 250829N0700503W 23:08:53 M078 F320 F320 F320 F320 90 x00 x4001 x00 KEEP/IN_FIR/REPORTED/ADS_REP/
21 251131N0700544W 23:09:19 M078 F320 F320 F320 F320 90 x00 xffff8001 x00 KEEP/IN_FIR/REPORTED/RADAR_REP/
22 251436N0700627W 23:09:42 M078 F320 F320 F320 F320 90 x00 xffff8001 x00 KEEP/IN_FIR/REPORTED/RADAR_REP/
23 251743N0700710W 23:10:05 M078 F320 F320 F320 F320 90 x400000 x00 x00 IN_FIR/MOBPST/DEVIAT_START/
24 251920N0700733W 23:10:17 M078 F320 F320 F320 F320 90 x800000 xffff8001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/RADAR_REP/
25 252232N0700817W 23:10:40 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
26 252515N0700855W 23:11:00 M078 F320 F320 F320 F320 CERDA 90 x800000 x04 x00 XCENTER_OUT/IN_FIR/DEVIAT/DCT/
27 252949N0700958W 23:11:34 M078 F320 F320 F320 F320 90 x800000 x04 x00 RADAR_MON_OUT/IN_FIR/DEVIAT/DCT/
28 253857N0701205W 23:12:45 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
29 254222N0701253W 23:13:11 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
30 254339N0701311W 23:13:21 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
31 262034N0702148W 23:18:03 M078 F320 F320 F320 F320 90 x800000 x00 x00 IN_FIR/MOBPST/DEVIAT/
32 265903N0703054W 23:22:57 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
33 270119N0703127W 23:23:15 M078 F320 F320 F320 F320 90 x800000 x00 x00 IN_FIR/MOBPST/DEVIAT/
34 275117N0704325W 23:29:50 M078 F320 F320 F320 F320 90 x800000 x4001 x00 KEEP/IN_FIR/DEVIAT/REPORTED/ADS_REP/
35 285101N0705759W 23:37:29 M078 F320 F320 F320 F320 90 x800000 x6000 x00 KEEP/IN_FIR/DEVIAT/PILOT_EST_REP/ADS_REP/
36 285400N0705844W BOREX 23:37:51 M078 F320 F320 F320 F320 90 x800000 x4000 x00 DISPLAY/AS_FILED/IN_FIR/DEVIAT/ADS_REP/
37 302440N0712125W 23:49:14 M078 F320 F320 F320 F320 87 90 x800000 x04 x00 POSTING_IN/IN_FIR/DEVIAT/DCT/
38 302440N0712125W 23:49:14 M078 F320 F320 F320 F320 90 x800000 x04 x00 SECTOR_OUT/IN_FIR/DEVIAT/DCT/
39 302440N0712125W 23:49:14 M078 F320 F320 F320 F320 87 x800000 x04 x00 SECTOR_IN/IN_FIR/DEVIAT/DCT/
40 302440N0712125W 23:49:14 M078 F320 F320 F320 F320 90 87 x800000 x04 x00 POSTING_OUT/IN_FIR/DEVIAT/DCT/
41 320244N0714648W ALOBI 00:01:34 M078 F320 F320 F320 F320 87 x800000 x00 x05 AS_FILED/IN_FIR/DEVIAT/NON_COMPULSORY/MIDAIRWAY_POINT/
42 330141N0720230W LSIER 00:09:05 M078 F320 F320 F320 F320 87 x800000 x00 x01 KEEP/IN_FIR/DEVIAT/NON_COMPULSORY/
43 335855N0721806W ONGOT 00:16:27 M078 F320 F320 F320 F320 87 x800000 x00 x00 DISPLAY/AS_FILED/IN_FIR/DEVIAT/
44 340000N0721813W 00:16:35 M078 F320 F320 F320 F320 87 x800000 x04 x00 RADAR_MON_IN/IN_FIR/DEVIAT/DCT/
45 341219N0721932W 00:18:07 M078 F320 F320 F320 F320 82 87 x800000 x04 x00 XCENTER_IN/IN_FIR/DEVIAT/DCT/
46 341915N0722017W 00:19:00 M078 F320 F320 F320 F320 87 x01000000 x00 x00 MOBILE/IN_FIR/DEVIAT_END/
47 344317N0722254W 00:22:00 M078 F320 F320 F320 F320 87 x00 x04 x00 SECTOR_OUT/IN_FIR/DCT/
48 344317N0722254W 00:22:00 M078 F320 F320 F320 F320 82 82 x00 x04 x00 CENTER_IN/DCT/
49 344317N0722254W 00:22:00 M078 F320 F320 F320 F320 82 x00 x04 x00 RADAR_IN/DCT/
50 344317N0722254W 00:22:00 M078 F320 F320 F320 F320 82 x00 x04 x00 RCADEV_IN/DCT/
51 344317N0722254W 00:22:00 M078 F320 F320 F320 F320 87 82 x00 x04 x00 POSTING_OUT/DCT/
52 344328N0722255W SAUCR 00:22:02 M078 F320 F320 F320 F320 82 x00 x00 x00 DISPLAY/AS_FILED/
53 345804N0722032W STERN 00:23:50 M078 F320 F320 F320 F320 82 x00 x00 x00 DISPLAY/AS_FILED/
54 360224N0721936W CHUBY 00:31:41 M078 F320 F320 F320 F320 82 x00 x00 x00 DISPLAY/AS_FILED/
55 364207N0723231W HOBOH 00:36:45 M078 F320 F320 F320 F320 82 x00 x00 x00 DISPLAY/AS_FILED/
56 364303N0723232W 00:36:51 M078 F320 F320 F320 F320 86 82 x00 x04 x00 XCENTER_IN/DCT/
57 364303N0723232W 00:36:51 M078 F320 F320 F320 F320 82 82 x00 x04 x00 CENTER_OUT/DCT/
58 364303N0723232W 00:36:51 M078 F320 F320 F320 F320 86 86 x00 x04 x00 CENTER_IN/DCT/
59 364303N0723232W 00:36:51 M078 F320 F320 F320 F320 82 86 x00 x04 x00 XCENTER_OUT/DCT/
60 371340N0723312W SILLY 00:40:31 M078 F320 F320 F320 F320 86 x00 x00 x04 DISPLAY/AS_FILED/MIDAIRWAY_POINT/
61 374940N0721716W STINK 00:44:53 M078 F320 F320 F320 F320 86 x00 x00 x04 DISPLAY/AS_FILED/MIDAIRWAY_POINT/
62 381653N0715713W YAALE 00:48:25 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
63 383655N0715700W YETTI 00:50:46 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
64 385917N0715643W MOUGH 00:53:23 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
65 394805N0724825W 01:01:45 M078 F320 F320 F320 F320 86 86 x00 x04 x00 DISPLAY/CENTER_OUT/DCT/
66 394805N0724825W 01:01:45 M078 F320 F320 F320 F320 86 86 x00 x04 x00 XCENTER_OUT/DCT/
67 394924N0724950W OWENZ 01:01:59 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
68 394841N0731541W PREPI 01:05:00 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
69 395537N0733724W LEECY 01:07:37 M078 F320 F320 F320 F320 86 x00 x00 x00 DISPLAY/AS_FILED/
70 400102N0735140W CAMRN 01:09:23 M078 F320 F320 F320 F320 86 x00 x04 x00 DISPLAY/AS_FILED/DCT/
71 400331N0735121W 01:09:41 M078 F320 F320 F320 F320 86 x00 x04 x00 RADAR_MON_OUT/DCT/
72 403823N0734644W AHAH 01:13:57 M078 F320 F320 F320 F320 86 x0100 x04 x00 DISPLAY/AIRPORT/DCT/
73 403823N0734644W AHAH 01:13:57 M078 F320 F320 F320 F320 86 x00 x00 x00 RCADEV_OUT/
74 403823N0734644W AHAH 01:13:57 M078 F320 F320 F320 F320 NY DOMES 86 x00 x00 x00 RADAR_OUT/|2023-07-31T23:36:18.000Z|L|ocean21.t2802000.na01851b$$$
2023-07-31T23:30:00.690Z|2023-07-31T23:30:00.690Z|32|28003|A|Active|420|420|6049|fdpsa101|FET2102|1979|VLRRI|647|LALA|EGGG|94|
Idx fix_loc fix_nam ETA Spd LL UL clr_0 clr_1 Post Center fixOvr fixInd fixInd2 FixTypes
00 191320N0682332W CHUMA 21:40:00 M084 F340 F340 F340 F340 SANJUAN SANJUAN x00 x00 x00 XCENTER_IN/
01 191322N0682330W 21:40:00 M084 F340 F340 F340 F340 SANJUAN SANJUAN x00 x04 x00 CENTER_IN/DCT/
02 191322N0682330W 21:40:00 M084 F340 F340 F340 F340 SANJUAN x00 x04 x00 RADAR_IN/DCT/
03 191320N0682332W CHUMA 21:40:00 M084 F340 F340 F340 F340 NY DOMES SANJUAN x00 x00 x00 RADAR_MON_IN/
04 191320N0682332W CHUMA 21:40:00 M084 F340 F340 F340 F340 SANJUAN x00 x00 x00 RCADEV_IN/
05 191320N0682332W CHUMA 21:40:00 M084 F340 F340 F340 F340 SANJUAN x00 x00 x00 DISPLAY/AS_FILED/
06 200011N0672752W DOZGO 21:48:20 M084 F340 F340 F340 F340 SANJUAN x00 x04 x00 DISPLAY/AS_FILED/DCT/
07 203222N0671039W JANMA 21:52:36 M084 F340 F340 F340 F340 SANJUAN x00 x00 x00 DISPLAY/AS_FILED/
08 211556N0670734W LENNT 21:57:45 M084 F340 F340 F340 F340 SANJUAN x00 x00 x01 KEEP/NON_COMPULSORY/
09 211924N0670816W 21:58:10 M084 F340 F340 F340 F340 SANJUAN x00 x0200 x00 MOBPST/LVCHG/
10 213716N0671152W 22:00:19 M084 F340 F340 F340 F340 90 SANJUAN x00 x04 x00 POSTING_IN/DCT/
11 213716N0671152W 22:00:19 M084 F340 F340 F340 F340 SANJUAN SANJUAN x00 x04 x00 CENTER_OUT/DCT/
12 213716N0671152W 22:00:19 M084 F340 F340 F340 F340 90 x00 x04 x00 SECTOR_IN/IN_FIR/DCT/
13 213717N0671152W KINCH 22:00:19 M084 F340 F340 F340 F340 90 x00 x4001 x00 DISPLAY/AS_FILED/IN_FIR/REPORTED/ADS_REP/