I've got a bunch of images (~3000) which have been manually classified (approved/rejected) based on some business criteria. I've processed these images with Google Cloud Platform obtaining annotations and SafeSearch results, for example (csv format):
file name; approved/rejected; adult; spoof; medical; violence; annotations A.jpg;approved;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;boat|0.9,vehicle|0.8 B.jpg;rejected;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;text|0.9,font|0.8
I want to use machine learning to be able to predict if a new image should be approved or rejected (second column in the csv file).
Which algorithm should I use?
How should I format the data, especially the annotations column? Should I obtain first all the available annotation types and use them as a feature with the numerical value (0 if it doesn't apply)? Or would it be better to just process the annotation column as text?
I would suggest you try convolutional neural networks.
Maybe the fastest way to test your idea if it will work or not (problem could be the number of images you have, which is quite low), is to use transfer learning with Tensorflow. There are great tutorials made by Magnus Erik Hvass Pedersen, who published them on youtube.
I suggest you go through all the videos, but the important ones are #7 and #8.
Using transfer learning allows you to use the models they build at google to classify images. But with transfer learning, you are able to use your own data with your own labels.
Using this approach you will be able to see if this is suitable for your problem. Then you can dive into convolutional neural networks and create the pipeline that will work the best for your problem.