I'm working on processing Lidar data with Python. The test data has about 150 000 data points but the actually data will contain hundreds of millions. Initially, it was exported as .dwg file, however, since I couldn't find a way to process it I decided to convert it to *.dxf
and work from there. Then I'm trying to extract the point coordinates and layer and save it as a *.cvs
file for further processing. Here is the code:
import pandas as pd
PointCloud = pd.DataFrame(columns=['X', 'Y', 'Z','Layer'])
filename="template"
# Using readlines()
with open(filename+".dxf", "r") as f2:
input = list(f2.readlines())
###Strip the data only to datapoints to speed up (look up .dxf documentation)
i=input.index('ENTITIES\n') #find the begining of the entities section
length = input.index('OBJECTS\n') #find the begining of the entities section
while i<length:
line=input[i]
if i%1000==0: print ("Completed: "+str(round(i/length*100,2))+"%")
if line.startswith("AcDbPoi"):
x=float(input[i+2].strip())
y=float(input[i+4].strip())
z=float(input[i+6].strip())
layer=input[i-2].strip() # Strips the newline character
point = {'X':x,'Y':y,'Z':z,'Layer':layer}
PointCloud.loc[PointCloud.shape[0]]=[x,y,z,layer]
i+=14
else:
i+=1
PointCloud.to_csv(filename+'.csv', sep='\t', encoding='utf-8')
While it works, going line by line is not the most efficient way, hence I'm trying to find ways to optimize it. Here is the *.dxf
point structure that I'm interested in extracting:
AcDbEntity
8
SU-SU-Point cloud-Z
100
AcDbPoint
10
4.0973
20
2.1156
30
-0.6154000000000001
0
POINT
5
3130F
330
2F8CD
100
AcDbEntity
Where: 10, 20, and 30 are the XYZ coordinates and 8 is the layer. Any ideas on how to improve it would be greatly appreciated.
The slowest part is file IO and I don't think this can be sped up much.
But it could be more memory efficient by really reading the (very large) DXF file line by line. The code could also be more robust by just parsing the absolut minimum data from the POINT entities, this way the function can parse newer DXF versions and also DXF R12 and older.
FYI: This is the simplest valid DXF R12 file containing only POINT entities: