I have a problem with OpenRefine. I am adding a new column based on a url and from there calling an API for getting some terms from a controlled vocabulary (AAT). I parse the results and I obtain a multivalued cells such as:
http://vocab.getty.edu/aat/300041366||aquatints (prints)::http://vocab.getty.edu/aat/300053242||aquatint (printing process)::http://vocab.getty.edu/aat/300191265||dust bags::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300191278||dust boxes::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300249564||aquatinters::http://vocab.getty.edu/aat/300053242||aquatint (printing process)::http://vocab.getty.edu/aat/300041366||aquatints (prints)::http://vocab.getty.edu/aat/300041368||sandpaper aquatints::http://vocab.getty.edu/aat/300041368||sandpaper aquatints
Where I have the current structure:
URI||Corresponding_TERM::URI||Corresponding_TERM
etc.
I now need to choose one of those records. My solution is to use
value.split("::")[0]
in order to choose which element I want.
Unfortunately this solutions has very evident drawbacks, because the order of the elements in the array is not constant, so if the first element [0] would be the right one for one record, it would probably not be for the next one.
For explain myself better, I now have this kind of structure
-----------------------------------------------------------
|ID | Classification | Term_From_Thesaurus |
| 1 | Aquatints | uri||term1::uri||term2:: |
| 1 | Aquatints | uri||term1::uri||term2:: |
| 2 | Drypoints | uri||term3::uri||term4:: |
| 3 | Woodcut | uri||term5::uri||term6::uri||term7 |
-----------------------------------------------------------
And I need to associate term1 with Aquatints, term 4 with Drypoints and term 7 with Woodcut.
How can I do that? A solution could be using facet and a lot of manual work, but maybe there is a better one? What about going to each record and if ID = 1 they should use term1, if ID=2 should use term 4 etc. Would it be possible? I sincerely do not know how to use the value of another column as variable to determine the result of an operation. cell.cross would help, but in case I need to split the data into two files, and doesn’t seems to me a proper solution..
So I'm not sure if I've understood your question correctly, but it is possible to "select value based on a variable in another column".
If you have:
Then if you split the 'Term_From_Thesaurus' column into an array, then you can use the number in the 'ID' column to select the relevant entry in the array. However, note that for this to work you need to have the number in the ID column to be converted into a Number type (if it isn't already). In this example I'll assume that the number in the ID column starts off as a String rather than Number.
So the formula:
Will find the first value in the first and second row, the second value in the third row and the 4th item in the 4th row. This is illustrated here:
The formula breaks down as follows:
Hope this is clear