For an owncloud (or nextcloud) project we need to add a great amount of storage, I've been checking all options such as: CEPH, Openstack Swift/Cinder, GlusterFS, SDFS and Tahoe-lafs.
With this service we expect many of the same files to be added by users, that is why deduplication is quite important for us. So far the only solutions for deduplication of clustered storage data would be SDFS and Tahoe-lafs. However our concerns are these two are Java and Python and will hurt CPU to much. (*Yes deduplication will likely mean more RAM and CPU as well)
Perhaps one of you have a better solution? *deduplication filesystem (e.g. ZSF) will not work as data is stored on multiple machines (HA Cluster).
This is not a complete solution which is what I think you are looking for, but rather an open source deduplication library for Node.js with a native binding written in C++ and a reference implementation written in Javascript:
https://github.com/ronomon/deduplication
It should be fast enough if you can implement the indexing yourself using an LSM-Tree backed KV store.