I'm looking to build a distributed torrent search engine.
I'm aware of Distributed Hash Tables for addressing nodes in peer to peer networks. I don't fully understand how each node acquires a globally unique ID though.
What algorithms and data-structures I need to build the distributed database though, I'm not sure. It would need to have a high degree of redundancy obviously, and be as efficiently searchable as possible.
What I really need is a pointer in the direction of some resources and preferably some code examples.
I'd say that's not really relevant to the title of your question and implementation-specific anyway. But generally it's either done at random or based on a hash of their public IP + some random sub-part modulo some adjustments for subnets. Have a look at bittorrent's secure node ID generation algorithm for example.
This is a non-trivial topic that I don't think can be answered within a few paragraphjs. DHTs at their base do not allow enumeration of stored values or any complex operations coordinated by multiple nodes, a direct key-value lookup is all they do. To implement keyword search on top of that you'll have to do quite some algorithmic and language processing gymnastics and add extensions to the basic DHT protocol to accommodate these requirements.
Here's an incomplete list of several problems to solve:
I'm not sure if a DHT is even the right approach here. I vaguely recall other metrics based on language / the keywords themselves where nodes move themselves in the keyspace to gravitate towards words in use and thus provide the necessary network capacity.
I recommend hitting google scholar looking for keyword search related modifications or alternative overlays to DHTs.