I've reached a point when I do not have a proper idea of how to resolve my problem. I am working with the text file (*.inp - abaqus job file) and I want to extract some basic information from it. By far I identified two major problems:
- Such files are quite big, i.e. 500 000 lines.
- Their structure is not always csv-like
Ad.1. Because of huge amount of data, I wanted to include the pandas library to speed up the operations (which will be repeated in an optimization loop)
Ad.2. Exemplary *.inp file with its "strange" structure (please note that "node" and "element" are actual names used in the code, and each element is built up from several nodes, like a cube=element, each of the cube vertices=node]:
*NODE
1, 0.0, 0.0, 3.0
2, -17.0, 5.5, 2.3
3, 51.0, 0.0, 639.8
5, 0.0, 5.5 , 31.0
...
145000, 31.3, 21.5, 99.8
*ELEMENT, ELSET=Name1, TYPE=Type1
1527450, 265156, 273237, 265019, 265021, 275728, 273221, 265599,
265146, 273583, 265020
1527449, 269279, 272869, 269277, 269479, 273130, 272862, 269278,
269489, 275729, 269627
1527448, 272250, 272858, 275350, 273327, 272851, 275730, 275731,
273346, 275732, 275733
...
1126546, 265180, 275352, 273263, 273237, 275736, 275737, 275738,
275739, 275740, 273246
*ELEMENT, ELSET=Name2, Type2
...
*SURFACE, NAME=Surf1
12345, S5
34567, S3
...
*STEP
*STATIC
1.0,,,1.0
*BOUNDARY
bc_1,1,3,0.0
bc_2,6,6,0.0
...
...
Values listed under "*NODE" keyword have following sequenece: node_id, coord_x, coord_y, coord_z
This is the biggest set of data in the model, that's why I wanted to use pandas for it (read it like a csv). For this part I do not find major issues.
Values listed under *ELEMENT" keyword are a bit more complicated:
line n: elementn_id, node1_id, node2_id, node3_id, node4_id, node5_id, node6_id, node7_id
line n+1: node8_id, node9_id, node10_id
In this case, pandas import this part of code as two separate lines (obviously) with N/A items in last 7 columns of n+1 rows. I use pd.read_csv for it. Please be aware, that nodes ids from 1 to 10 form together an element (with id specified as 1st thing in the nth row).
And now I state the problem :):
- How to properly import the data which lay between *ELEMENT, ELSET=name1 and *ELEMENT, ELSET=name2, when my aim is to have matrix in which each element uses 1 row only with total of 11 columns (1st - element_id, 2-11 - nodex_id).
- By far I divided this *.inp file into separate files to be able to work on them... Now I want to do it all in one script, i.e. create matrix A = [(node_id, coord_x, coord_y, coord_z),...] and matrix B = [(element_id, node1_id, node2_id, ... , node10_id),...] at once. How to do so, if simple pd.read_csv doesn't perform OK in this case? There are plenty of strictly string rows which should either not be imported or be excluded to speed up the script.
My idea was to import the *.inp file into python as 'open' function, then add some kind of tags/triggers to match which lines of the code should be used further (and maybe processed using pandas), but in this case I do not use pandas as import option...
I believe my problem is quite dull for most of you, but I am not strictly a developer :) I do not anticipate to get direct, ready solution, but to get your advise on where to look for potential answer or tools.
Thank you all in advance and I wish you a nice day, prz
Interesting challenge!
So long as the files you need to process roughly follow that structure, something like this might work for you. See below for the output.
io.StringIO()
to make this self-sufficient, but it could just as well be anopen("data.inp")
file stream.yield
magic may seem a little arcane, sorry about that. :)The output is