I have to ensure that a graph in our application is a DAG with a unique source and a unique sink.
Specifically I have to ensure that for a given start node and end node (both of which are known at the outset) that every node in the graph lies on a path from the start node to the end node.
I already have an implementation of Tarjan's algorithm which I use to identify cycles, and a a topological sorting algorithm that I can run once Tarjan's algorithm reports the graph is a DAG.
What is the most efficient way to make sure the graph meets this criterion?
First, do a DFS on the graph, starting from the source node, which, as you say, is known in advance. If you encounter a back edge[1], then you have a cycle and you can exit with an error. During this traversal you can identify if there are nodes, not reachable from the source[2] and take the appropriate action.
Once you have determined the graph is a DAG, you can ensure that every node lies on a path from the source to the sink by another DFS, starting from the source, as follows:
The procedure
have_path
sets a booleanflag
in each node, for which there exists some path from that node to the sink. In the same time the procedure traverses only nodes reachable from the source and it traverses all nodes reachable from the source. If the procedure returns true, then all the nodes, reachable from the source lie on a path to the sink. Unreachable nodes were already handled in the first phase.[1] an edge, linking a node with a greater DFS number to an node with a lesser DFS number, that is not already completely processed, i.e. is still on the DFS stack
[2] e.g. they would not have an assigned DFS number