Way to visualize Beam pipeline run with DirectRunner

1.1k Views Asked by At

In GCP we can see the pipeline execution graph. Is the same possible when running locally via DirectRunner?

2

There are 2 best solutions below

0
On BEST ANSWER

You can use pipeline_graph and the InteractiveRunner to get a graphviz representation of your pipeline locally.

An example for the word count pipeline used in the Beam documentation:

import apache_beam as beam
from apache_beam.runners.interactive.display import pipeline_graph
from apache_beam.runners.interactive.interactive_runner import InteractiveRunner
import re

pipeline = beam.Pipeline(InteractiveRunner())
lines = pipeline | beam.Create([f"some_file_{i}.txt" for i in range(10)])

# Count the occurrences of each word.
counts = (
    lines
    | 'Split' >> (
        beam.FlatMap(
            lambda x: re.findall(r'[A-Za-z\']+', x)).with_output_types(str))
    | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
    | 'GroupAndSum' >> beam.CombinePerKey(sum))

# Format the counts into a PCollection of strings.
def format_result(word_count):
    (word, count) = word_count
    return f'{word}: {count}'

output = counts | 'Format' >> beam.Map(format_result)

# Write the output using a "Write" transform that has side effects.
# pylint: disable=expression-not-assigned
output | beam.io.WriteToText("some_file.txt")

print(pipeline_graph.PipelineGraph(pipeline).get_dot())

This prints

digraph G {
node [color=blue, fontcolor=blue, shape=box];
"Create";
lines [shape=circle];
"Split";
pcoll4978 [label="", shape=circle];
"PairWithOne";
pcoll8859 [label="", shape=circle];
"GroupAndSum";
counts [shape=circle];
"Format";
output [shape=circle];
"WriteToText";
pcoll6409 [label="", shape=circle];
"Create" -> lines;
lines -> "Split";
"Split" -> pcoll4978;
pcoll4978 -> "PairWithOne";
"PairWithOne" -> pcoll8859;
pcoll8859 -> "GroupAndSum";
"GroupAndSum" -> counts;
counts -> "Format";
"Format" -> output;
output -> "WriteToText";
"WriteToText" -> pcoll6409;
}

Putting this into https://edotor.net results in:

beam pipeline

You can work with GraphViz in Python to produce a prettier output if needed (graphviz for example).

0
On

You can also use Python's RenderRunner, e.g.

python -m apache_beam.examples.wordcount --output out.txt \
    --runner=apache_beam.runners.render.RenderRunner \
    --render_output=pipeline.svg

This also has an interactive mode, triggered by passing --port=N (where 0 can be used to pick an unused port) which vends the graph as a local web service. This allows one to expand/collapse composites for easier exploration. Any --render_output arguments that are passed will get re-rendered as you edit the graph. (It uses graphviz under the hood, so can render any of those supported formats.)

Rendered Graph

For rendering non-Python pipelines, one can start this up as a local portable "runner."

python -m apache_beam.runners.render

and then "submit" this job from your other SDK over the provided jobs API endpoint via a portable runner to view it.