I need to improve performance a desktop application (.net) which was designed to read the database and create xml files based on XBRL (eXtensible Bussiness Reporting Language). It is using UBMatrix for creating XBRL Taxonomies.
The application works fine if the size of particular data is small. But the application will take more than 30 min to generate files if the data is big. The client data is always huge/big. So the application requires more time to generate files.
My task is to optimise the application in order to reduce the time taken to create the xml files. When I checked the application I found that the application is running on this way.
Starts
- Create connection to db
- gets the first set of data ( this table (table1) is too Large ). And the query will returns around 15-30 K rows to dataTable
- for loop 0 to datatable.Rows.count
- checks some condition
- get data from db. (this table (table2) is also too large than (table1).
- send data to form xbrl and writes to xml ( this is done by thrid party application called UBMatrix). It is not possible to edit the code which creates xbrl-xml file.
Similarly there are 3 to 4 set of data will process
In my observation, we can avoid db calls in for loop. Get all the data before the loop. When I checked the queries, there were subqueries,not exists(select * from table) etc can be replaced with joins, not exists (select 1 from table)
But still the application need to process in loop. I am also thinking of using threading so that the I can create threads based on the size of data and process it simultaneosly.
Eg
- if there are 100 rows.there will be 100 entries to xml file (XBRL)
- So i will make 50,50 and run in two threads which will generate two xml file. at the end I will combine two into one xml file.
So processing of 0th question and 50th question can be start at same time. Currently in for loop, 0th will process and 99th will be process at the end only. I am not sure about the idea. Can any suggest /share your ideas . any help will be appreciated. Thanks in advance
Not really an answer, just a really large comment:
I would remove multi-threading from your plans unless the UBMatrix API states it is thread-safe, thinking of all the disc I/O when generating the XBRL.
Have your profiled your app for memory usage? I am thinking of the 15-30K rows of data being loaded, then possible transferred into an an object model prior to processing and writing to file. If you start to reach the 2GB limit (32 bit), then your process will be doing a lot of paging, which is sooo slooow.
Would this alternative be a possibility? Pre-generate the data to file, possibly in xml format. Then, hoping the UBMatrix has an api which accepts a file path and streams data, you could just pass off the path to your file data. (This is more in case it is a memory issue, but could still speed things up if the data queries are long running.)