If I have an unsynchronized java collection in a multithreaded environment, and I don't want to force readers of the collection to synchronize[1], is a solution where I synchronize the writers and use the atomicity of reference assignment feasible? Something like:
private Collection global = new HashSet(); // start threading after this
void allUpdatesGoThroughHere(Object exampleOperand) {
// My hypothesis is that this prevents operations in the block being re-ordered
synchronized(global) {
Collection copy = new HashSet(global);
copy.remove(exampleOperand);
// Given my hypothesis, we should have a fully constructed object here. So a
// reader will either get the old or the new Collection, but never an
// inconsistent one.
global = copy;
}
}
// Do multithreaded reads here. All reads are done through a reference copy like:
// Collection copy = global;
// for (Object elm: copy) {...
// so the global reference being updated half way through should have no impact
Rolling your own solution seems to often fail in these type of situations, so I'd be interested in knowing other patterns, collections or libraries I could use to prevent object creation and blocking for my data consumers.
[1] The reasons being a large proportion of time spent in reads compared to writes, combined with the risk of introducing deadlocks.
Edit: A lot of good information in several of the answers and comments, some important points:
- A bug was present in the code I posted. Synchronizing on global (a badly named variable) can fail to protect the syncronized block after a swap.
- You could fix this by synchronizing on the class (moving the synchronized keyword to the method), but there may be other bugs. A safer and more maintainable solution is to use something from java.util.concurrent.
- There is no "eventual consistency guarantee" in the code I posted, one way to make sure that readers do get to see the updates by writers is to use the volatile keyword.
- On reflection the general problem that motivated this question was trying to implement lock free reads with locked writes in java, however my (solved) problem was with a collection, which may be unnecessarily confusing for future readers. So in case it is not obvious the code I posted works by allowing one writer at a time to perform edits to "some object" that is being read unprotected by multiple reader threads. Commits of the edit are done through an atomic operation so readers can only get the pre-edit or post-edit "object". When/if the reader thread gets the update, it cannot occur in the middle of a read as the read is occurring on the old copy of the "object". A simple solution that had probably been discovered and proved to be broken in some way prior to the availability of better concurrency support in java.
Rather than trying to roll out your own solution, why not use a ConcurrentHashMap as your set and just set all the values to some standard value? (A constant like
Boolean.TRUE
would work well.)I think this implementation works well with the many-readers-few-writers scenario. There's even a constructor that lets you set the expected "concurrency level".
Update: Veer has suggested using the Collections.newSetFromMap utility method to turn the ConcurrentHashMap into a Set. Since the method takes a
Map<E,Boolean>
my guess is that it does the same thing with setting all the values toBoolean.TRUE
behind-the-scenes.Update: Addressing the poster's example
Your minimalist solution would work just fine with a bit of tweaking. My worry is that, although it's minimal now, it might get more complicated in the future. It's hard to remember all of the conditions you assume when making something thread-safe—especially if you're coming back to the code weeks/months/years later to make a seemingly insignificant tweak. If the ConcurrentHashMap does everything you need with sufficient performance then why not use that instead? All the nasty concurrency details are encapsulated away and even 6-months-from-now you will have a hard time messing it up!
You do need at least one tweak before your current solution will work. As has already been pointed out, you should probably add the
volatile
modifier toglobal
's declaration. I don't know if you have a C/C++ background, but I was very surprised when I learned that the semantics ofvolatile
in Java are actually much more complicated than in C. If you're planning on doing a lot of concurrent programming in Java then it'd be a good idea to familiarize yourself with the basics of the Java memory model. If you don't make the reference toglobal
avolatile
reference then it's possible that no thread will ever see any changes to the value ofglobal
until they try to update it, at which point entering thesynchronized
block will flush the local cache and get the updated reference value.However, even with the addition of
volatile
there's still a huge problem. Here's a problem scenario with two threads:global={}
. ThreadsA
andB
both have this value in their thread-local cached memory.A
obtains obtains thesynchronized
lock onglobal
and starts the update by making a copy ofglobal
and adding the new key to the set.A
is still inside thesynchronized
block, ThreadB
reads its local value ofglobal
onto the stack and tries to enter thesynchronized
block. Since ThreadA
is currently inside the monitor ThreadB
blocks.A
completes the update by setting the reference and exiting the monitor, resulting inglobal={1}
.B
is now able to enter the monitor and makes a copy of theglobal={1}
set.A
decides to make another update, reads in its localglobal
reference and tries to enter thesynchronized
block. Since Thread B currently holds the lock on{}
there is no lock on{1}
and ThreadA
successfully enters the monitor!A
also makes a copy of{1}
for purposes of updating.Now Threads
A
andB
are both inside thesynchronized
block and they have identical copies of theglobal={1}
set. This means that one of their updates will be lost! This situation is caused by the fact that you're synchronizing on an object stored in a reference that you're updating inside yoursynchronized
block. You should always be very careful which objects you use to synchronize. You can fix this problem by adding a new variable to act as the lock:This bug was insidious enough that none of the other answers have addressed it yet. It's these kinds of crazy concurrency details that cause me to recommend using something from the already-debugged java.util.concurrent library rather than trying to put something together yourself. I think the above solution would work—but how easy would it be to screw it up again? This would be so much easier:
Since the reference is
final
you don't need to worry about threads using stale references, and since theConcurrentHashMap
handles all the nasty memory model issues internally you don't have to worry about all the nasty details of monitors and memory barriers!