How to specify query parameter in Lucene search syntax?

855 Views Asked by At

I want to make a get request to this https://musicbrainz.org/doc/MusicBrainz_API/Search music-API.I want it to search for the name of the album and the release format. The release format should be vinyl. You can search for these things in the query-part of the request. It works fine if I don't specify any format but when I do specify one it doesn't register and still shows other release-formats such as CD and Digital. This is the Url I'm using to do my request: https://musicbrainz.org/ws/2/release?query=depeche%20mode%20music%20for%20the%20massesANDformat%3AVinyl&fmt=json&limit=10 Does anybody know how I have to change my URL so that it only shows me the vinyl-formats?

1

There are 1 best solutions below

1
On BEST ANSWER

It looks as if the Format field is based on a constrained list of pre-defined values - as shown in the release format listing page.

It is therefore possible that the Lucene index has defined this field as a StringField rather than a TextField.

A StringField is defined as:

A field that is indexed but not tokenized: the entire String value is indexed as a single token.

This means that you cannot search for vinyl. You need to use the exact value, which can be one of:

7" Vinyl
10" Vinyl
12" Vinyl

So, to account for this, you can build that part of the Lucene query as follows:

AND (format:"7\" vinyl" OR format:"10\" vinyl" format:"12\" vinyl")

The text values are surrounded by "s to ensure the entire term is treated as a single token in the query (to exactly match the single token in the index).

The backslashes are used to escape the " in the text.

The overall Lucene query therefore becomes this:

title:"music for the masses" AND artist:"depeche mode" AND (format:"7\" vinyl" OR format:"10\" vinyl" OR format:"12\" vinyl")

And when added to the URL, it becomes this:

https://musicbrainz.org/ws/2/release?query=title:"music for the masses" AND artist:"depeche mode" AND (format:"7\" vinyl" OR format:"10\" vinyl" OR format:"12\" vinyl")&fmt=json

I pasted the above into my browser query bar, and I got 8 release objects returned in the JSON response.

When the URL is URL-encoded, it ends up as follows:

https://musicbrainz.org/ws/2/release?query=title:%22music%20for%20the%20masses%22%20AND%20artist:%22depeche%20mode%22%20AND%20(format:%227\%22%20vinyl%22%20OR%20format:%2210\%22%20vinyl%22%20OR%20format:%2212\%22%20vinyl%22)&fmt=json

I mentioned at the beginning that it is therefore possible that the format field (and probably several others) is indexed as a string field. I do not know this as a fact - but it is the only way I can explain why my query works and your does not. So I think it's a reasonable assumption - but I could be wrong.