I'm trying to refactoring my project and now I'm trying to research for best ways to increase the application's performance.
Question 1. SpinLock vs Interlocked
To creating a counter, which way has better performance.
Interlocked.increament(ref counter)
Or
SpinLock _spinlock = new SpinLock()
bool lockTaken = false;
try
{
_spinlock.Enter(ref lockTaken);
counter = counter + 1;
}
finally
{
if (lockTaken) _spinlock.Exit(false);
}
And if we need to increment another counter, like counter2, should we declare another SpinLock object? or its enough to use another boolean object?
Question 2. Handling nested tasks or better replacement
In this current version of my application, I used tasks, adding each new task to an array and then used Task.WaitAll()
After a lot of research I just figured out that using Parallel.ForEach has better performance, But how can I control the number of current threads? I know I can specify a MaxDegreeOfParallelism in a ParallelOptions parameter, but the problem is here, every time crawl(url) method runs, It just create another limited number of threads, I mean if I set MaxDegree to 10, every time crawl(url) runs, another +10 will created, am I right?, so how can I prevent this? should I use semaphore and threads instead of Parallel? Or there is a better way?
public void Start() {
Parallel.Invoke(() => { crawl(url) } );
}
crawl(string url) {
var response = getresponse(url);
Parallel.foreach(response.links, ParallelOption, link => {
crawl(link);
});
}
Question 3. Notify when all Jobs (and nested jobs) finished.
And my last question is how can I understand when all my jobs has finished?
There a is a lot of misconceptions here, I'll point out just a few.
They both do, depending on your exact situation
This is also very suspect, and actually just wrong. Once again it depends on what you want to do.
Once again this is wrong, this is your own implementation detail, and depends on how you do it. also TPL
MaxDegreeOfParallelismis only a suggestion, it will only do what it thinks heuristically is best for you.The answer is a resounding yes.
OK, let's have a look at what you are doing. You say you are making a crawler. A crawler, accesses the internet, each time you access the internet or a network resource or the file system you are (said simplistically) waiting around for an IO completion port callbacks. This is what's knows as an IO workload.
With IO Bound tasks we don't want to tie up the thread pool with threads waiting for IO completion ports. It's inefficient, you are using up valuable resources waiting for callback on threads that are effectively paused.
So for IO bound work, we don't want to spin up new tasks, and we don't want to use Parallel ForEach to wait around using up threads waiting for events to happen. The most appropriate modern pattern for IO bound tasks is the
asyncandawaitpattern.For CPU bound work (if you want to use as much CPU as you can) smash the thread pool, use TPL Parallel or as many tasks that is effective.
The
asyncandawaitpattern works well with completion ports, because instead of waiting around idly for a callback it will give the threads back and allow them to be reused....
However what I suggest is using another approach, where you can take advantage of
asyncandawaitand also control degrees of parallelisation. This enables you to be good to your thread pool, not using up resources waiting for callbacks, and allowing IO to be IO. I give youTPL DataFlowActionBlockandTransformManyBlocksThis subject is a little above a simple working example, but I can assure you its an appropriate path for what you are doing. What I suggest is you have a look at the following links.
In Summary, there are many ways to do what you want to do, and there are many technologies. But the main thing is you have some very skewed ideas about parallel programming. You need to hit the books, hit the blogs, and start getting some really solid design principles from the ground up, and stop trying to figure this all out for your self by nit picking small bits of information.