I'm trying to build a simple recommender system model using neural structured learning, using the similarity between users and businesses to predict the likelihood of a single user to rate a previously unrated item based on their similarity (and weights) to other users and items. I have my train and test data and have gone through and created similarity matrices for the users and for the items which are essentially square matrices for user-user or item-item with their respective ratios from 0 to 1 (where 0 or very low means low or no matching items rated, or users who rated, and high or close to 1 means many matching items rated or users who rated).
Here's an example of the user-user similarity matrix user-user similarity matrix
In the tutorial that tensorflow provides for Neural Structured Learning, they are able to feed graph inputs into their framework to add interaction terms such as similarity. (The base example is here: https://medium.com/tensorflow/introducing-neural-structured-learning-in-tensorflow-5a802efd7afd)
I'm very much a beginner and while I think I understand the general logic of how it works, I don't know how to actually 'feed' the similarity matrices into the training data. I have made them into graphs using igraph, but I don't think I understand what information is in the example graphs to know how I should format or reshape my data to act as the input. I'm also just really confused because when looking at the example, I don't even see where they've used their graph.tsv in the model. I'm assuming it's because the model code is just a skeleton, but where do I put that?
Any direction would be much appreciated!!
Thanks for the question, and I apologize for the delayed reply.
In order to "feed" your similarity graph to NSL, you need to use our pack_nbrs tool. The API for the tool is described here:
https://www.tensorflow.org/neural_structured_learning/api_docs/python/nsl/tools/pack_nbrs
And here's the relevant section of one of one of our on-line tutorials showing how it is used:
https://www.tensorflow.org/neural_structured_learning/tutorials/graph_keras_lstm_imdb#augment_training_data_with_graph_neighbors
The pack_nbrs tool/function reads in three files:
Each TAB-separated line of the TSV file should specify a single edge in the similarity graph, and has this form:
source_id<TAB>target_id[<TAB>edge_weight]
The edge_weight is optional (it defaults to 1.0 if not supplied), but since you seem to have degrees of similarity, you will probably want to specify it. The source_id and target_id can be arbitrary strings, but they must agree with the node IDs you specify in files (1) and (2) (so the edges "match up" with the node features). Using your example similarity matrix, we might have:
Note that because your similarity matrix is symmetric, you don't need to specify the edges in both directions. Instead, you can specify add_undirected_edges=True when invoking pack_nbrs. Also note that you probably want to apply some amount of thresholding so you don't end up creating an edge between every pair of nodes in the graph. In my example above, I included only those pairs with a similarity > 0.5, but you may want to apply a larger threshold based on your data. You want to be sure that edges in your similarity graph denote true similarity between instances, or else the benefits of graph-based regularization will be reduced.
I hope that answers your question. If not, or if you have any follow-on questions, please let us know. I recommend that you work through our on-line tutorials, which should provide a strong foundation for working with the NSL libraries and tools.
Best,
Allan