I want to get images of Discogs releases. Can I do it without Discogs API? They don't have links to the images in their db dumps.
How to get images of Discogs releases?
2.6k Views Asked by user5869792 At
2
There are 2 best solutions below
0

This is how to do it with Java & Jsoup library.
- get HTML page of the release
- parse HTML & get
<meta property="og:image" content=".." />
to getcontent
value
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class DiscogRelease {
private final String url;
public DiscogRelease(String url) {
this.url = url;
}
public String getImageUrl() {
try {
Document doc = Jsoup.connect(this.url).get();
Elements metas = doc.head().select("meta[property=\"og:image\"]");
if (!metas.isEmpty()) {
Element element = metas.get(0);
return element.attr("content");
}
} catch (IOException ex) {
Logger.getLogger(DiscogRelease.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
}
To do this without the API, you would have to load a web page and extract the image from the html source code. You can find the relevant page by loading
https://www.discogs.com/release/xxxx
wherexxxx
is the release number. Since html is just a text file, you can now extract the jpeg URL.I don't know what your programming language is, but I'm sure it can handle String functions, like
indexOf
andsubString
. You could extract the html'sOG:Image
content for picture.So taking an example: https://www.discogs.com/release/8140515
.indexOf("og:image\" content=\");
save asstartPos
to some integer..indexOf(".jpg", startPos + 19);
into aendPos
.This gets the first occurence of .jpg after index of startPos + 19 any other chars.
Now extract a subString from html text
img_URL = myHtmlStr.substring(startPos+19, endPos);
You should end up with a string reading like this below (extracted URL):
https://img.discogs.com/_zHBK73yJ5oON197YTDXM7JoBjA=/fit-in/600x600/filters:strip_icc():format(jpeg):mode_rgb():quality(90)/discogs-images/R-8140515-1460073064-5890.jpeg.jpg
The process can be shortened to finding the startPos index of
https://img.
, then find first occurrence of.jpg
when searching from after that startPos index. Extract within that length range. This is because the image URL is only mentioned in the html source athttps://img.
Compare page at : https://www.discogs.com/release/8140515 with extracted URL image below.
:format(jpeg):mode_rgb():quality(90)/discogs-images/R-8140515-1460073064-5890.jpeg.jpg)