Flink Gelly Path/Trail Usecase

107 Views Asked by At

our team is new to Gelly api. We are looking to implement a simple use case that will list all paths originating from an initial vertice - for e.g.

input edge csv file is 1,2\n2,3\n3,4\n1,5\n5,6

the required output will be (the full path that starts from 1) 1,2,3,4\n1,5,6

Can someone please help.

1

There are 1 best solutions below

0
On

You can use one of Gelly's iteration abstractions, e.g. vertex-centric iterations. Starting from the source vertex, you can iteratively extend the paths, one hop per superstep. Upon receiving a path, a vertex appends its ID to the path and propagates it to its outgoing neighbors. If a vertex has no outgoing neighbors, then it prints / stores the path and does not propagate it further. To avoid loops a vertex could also check if its ID exists in the path, before propagating. The compute function could look like this:

public static final class ComputePaths extends ComputeFunction<Integer, Boolean, NullValue, ArrayList<Integer>> {

    @Override
    public void compute(Vertex<Integer, Boolean> vertex, MessageIterator<ArrayList<Integer>> paths) {
        if (getSuperstepNumber() == 1) {
            // the source propagates its ID
            if (vertex.getId().equals(1)) {
                ArrayList<Integer> msg = new ArrayList<>();
                msg.add(1);
                sendMessageToAllNeighbors(msg);
            }
        }
        else {
            // go through received messages
            for (ArrayList<Integer> p : paths) {
                if (!p.contains(vertex.getId())) {
                    // if no cycle => append ID and forward to neighbors
                    p.add(vertex.getId());
                    if (!vertex.getValue()) {
                        sendMessageToAllNeighbors(p);
                    }
                    else {
                        // no out-neighbors: print p
                        System.out.println(p);
                    }
                }
                else {
                    // found a cycle => print the path and don't propagate further
                    System.out.println(p);
                }
            }
        }
    }
}

In this code I have assumed that you have pre-processed vertices to mark the ones that have no out-neighbors with a "true" value. You could e.g. use graph.outDegrees() to find those.

Have in mind that enumerating all paths in a large and dense graph is expensive to compute. The intermediate paths state can explode quite quickly. You could look into using a more compact way for representing paths than an using ArrayList of ints, but beware of the cost if you have a dense graph with large diameter. If you don't need the paths themselves but you're only interested in reachability or shortest paths, then there exist more efficient algorithms.