I am creating a website that creates randomised test data, part of this is a random name generator that I am wanting to scale to create circa a million names. (written in .net 4.5 - C#)
My initial solution was to create the names on the web server thread, obviously a bad idea and very slow. That has slowly evolved into a solution that has an offline batch processor that populates an Azure table with precompiled names, that are then downloaded by the webserver (or a worker role that does the final data compile)
However, this also seems quite slow, even running with parallel processes it is taking minutes to download the data.
So I am looking for the best architecture to speed this up.
I have considered having a worker role that does this processing, and keeps the results in memory and waits for the web server to request them. However I'm not sure this is the best approach, or if it will even solve the problem! (mostly because I don't know how to transfer the data out)
So I'm hoping for a little architectural advice, on the best way to bring this data in. I'm not sure if it is simply the case that it is going to take a couple of minutes to process that many records.
(additionally added)
The code is running on an Azure web instance, pulling data out of an Azure Storage Table in the same region.
I have profiled the app, and most of the time is spent downloading data from the table.
The data that the random name generator seeds from is a few hundred thousand records in another Azure Table.
I'm now wondering if maybe I'm asking the wrong question! Maybe the simpler question is, given a source of a few hundred thousand first name / surnames what would the best way to compile a million of them to pull into a web query
P.S. I am not a c# guy by any stretch (more a sysadmin) my c# generally follows the script kiddie approach of find something online that is vaguely close and assimilate - I just can't find anything that is vaguely close in this case. (which makes me think I'm missing something obvious)