As I see less than 500 questions related on deeplearning4J here and most years old, first a different question: is DL4J dead? Do I really have to deal with horrible, horrible Python just to build my AI? I don't want to!
Now real question, I feel a bit stupid but really documentation and googling is a bit lacking (see question above): I have been reading up the past days on building a simple document classifier with DL4J which seems straight forward enough, although the follow-up material again is frighteningly sparse.
I build a ParagraphVector
, add some labels, pass in the training data and train. I also figured out, the data is passed in as a LabelAwareIterator
. Using a file structure I even found this documentation by DL4J how to structure the data. But what if I want to read the data from say an API or similar and not through file structuring? I am guessing I need a LabelAwareDocumentIterator, but how is data supposed to be structured and how to feed it in? I read about structuring as a table of text and label as columns but that seems rather sketchy and very imprecise.
Help would be much appreciated, as are better resources than what I have found so far. Thanks!
--UPDATE
Through reading of the source code (usually a good idea to just check the implementation) it looks like what I really want is the SimpleLabelAwareIterator
. That code is nicely readable. Dont really understand what the LabelAwareDocumentIterator
is for yet. Anyway the Simple one just needs a List of LabelledDocuments
. The LabelledDocuments
just have a string content and a list of labels. So far so good will try implementation this evening. If it works out, I will post this as an answer.
The approach in the update worked out. I am now using a SimpleLabelAwareIterator that I fill with a list of LabelledDocuments. Short code sample: