I want to use boilerpipe https://github.com/kohlschutter/boilerpipe in my Android app but I'm unable to build the app with this depedency.
I have in build.gradle:
implementation group: 'com.syncthemall', name: 'boilerpipe', version: '1.2.2'
The code which uses boilerpipe:
import de.l3s.boilerpipe.document.TextDocument;
import de.l3s.boilerpipe.extractors.CommonExtractors;
import de.l3s.boilerpipe.sax.BoilerpipeSAXInput;
import de.l3s.boilerpipe.sax.HTMLDocument;
import de.l3s.boilerpipe.sax.HTMLFetcher;
import org.xml.sax.SAXException;
...
final HTMLDocument htmlDoc = HTMLFetcher.fetch(new URL("https://dzone.com/articles/database-connection-pooling-in-java-with-hikaricp"));
final TextDocument doc = new BoilerpipeSAXInput(htmlDoc.toInputSource()).getTextDocument();
String content = CommonExtractors.ARTICLE_EXTRACTOR.getText(doc);
...
When I run the code above from a JUnit-test it works. But when I try to build the app and run it on a device I get the following error:
Duplicate class org.cyberneko.html.HTMLElements found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLElements$Element found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLElements$ElementList found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLTagBalancer found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLTagBalancer$ElementEntry found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLTagBalancer$Info found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
Duplicate class org.cyberneko.html.HTMLTagBalancer$InfoStack found in modules jetified-boilerpipe-1.2.2 (com.syncthemall:boilerpipe:1.2.2) and jetified-nekohtml-1.9.20 (net.sourceforge.nekohtml:nekohtml:1.9.20)
I analyzed the app dependencies and found out that the dependency nekohtml is only used in boilerpipe.
+--- com.syncthemall:boilerpipe:1.2.2
| +--- net.sourceforge.nekohtml:nekohtml:1.9.20
| | \--- xerces:xercesImpl:2.10.0 -> 2.11.0
| | \--- xml-apis:xml-apis:1.4.01
| \--- xerces:xercesImpl:2.11.0 (*)
So why are there some class collisions?
After that I created a general java Maven-based project with the same dependency and used boilerpipe there:
<dependency>
<groupId>com.syncthemall</groupId>
<artifactId>boilerpipe</artifactId>
<version>1.2.2</version>
</dependency>
No class collisions were reported. What does Android do differently? Does it have something to do with jetified- prefix?