I want to be able to take in a maven dependency tree in as an input and parse through it to determine the groupId, artifactId, and version of each dependency with its child(ren) if any, and the child(ren)'s groupId, artifactId, and version (and any additional child(ren) and so on). I'm not sure if it makes the most sense to parse through the mvn dependency tree and store the info as a nested dictionary before preparing the data for neo4j.
I'm also unsure of the best way to parse through the entire mvn dependency tree. The code below is the most progress I've made at attempting to parse, remove unnecessary info in the front and label something a child or parent.
tree=
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] | \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] | +- brs:libutil:jar:2.51:compile
[INFO] | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] | | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
[INFO] | | | \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.
fileObj = open("tree", "r")
for line in fileObj.readlines():
for word in line.split():
if "[INFO]" in line.split():
line = line.replace(line.split().__getitem__(0), "")
print(line)
if "|" in line.split():
line = line.replace(line.split().__getitem__(0), "child")
print(line)
if "+-" in line.split() and "|" not in line.split():
line = line.replace(line.split().__getitem__(0), "")
line = line.replace(line.split().__getitem__(0), "parent")
print(line, '\n\n')
Output:
| | \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
child child \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
| +- com.h2database:h2:jar:1.4.195:compile
child +- com.h2database:h2:jar:1.4.195:compile
parent com.h2database:h2:jar:1.4.195:compile
I would appreciate any insight on the best way to parse & return data in an organized way given that I'm relatively unfamiliar with the capabilities of python. Thank you in advance!
I don't know what your programming experience is, but that's not a trivial task.
First, you can see that the level of imbrication of a dependency is materialized by the symbol
|
. The simplest thing you can do is build a stack that stores the dependency path from root to children, grandchildren, ...:Output:
Second, you can use this stack to build a tree based on imbricated dicts:
Output:
An empty dict materializes a leaf in the tree.
Third, you need to format the tree, ie 1. extract the data and 2. group the children in lists. This is a simple tree traversal (DFS here):
Output:
You can maybe group steps.