[Bandcamp] Add utility methods to get multiple images

Bandcamp images work with image IDs, which provide different resolutions.

Images on Bandcamp are not always squares, and some IDs respect aspect ratios
where some others not.

The extractor will only use the ones which preserve aspect ratio and will not
provide original images, for performance and size purposes.

Because of this aspect ratio preservation constraint, only one dimension will
be known at a time.

The image IDs with their respective dimension used are:

- 10: 1200w;
- 101: 90h;
- 170: 422h;
- 171: 646h;
- 20: 1024w;
- 200: 420h;
- 201: 280h;
- 202: 140h;
- 204: 360h;
- 205: 240h;
- 206: 180h;
- 207: 120h;
- 43: 100h;
- 44: 200h.

(Where w represents the width of the image and h the height of the image)

Note that these dimensions are theoretical because if the image size is less
than the dimensions of the image ID, it will be not upscaled but kept to its
original size.

All these resolutions are stored in a private static list of ThumbnailSuffixes
in BandcampExtractorHelper, in which the methods to get mutliple images have
been added:

- getImagesFromImageUrl(String): public method to get images from an image URL;
- getImagesFromImageId(long, boolean): public method to get images from an
  image ID;
- getImagesFromImageBaseUrl(String): private utility method to get images from
  the static list of ThumbnailSuffixes from a given image base URL, containing
  the path to the image, a "a" letter if it comes from an album, its ID and an
  underscore.

Some existing methods have been also edited:

- the documentation of getImageUrl(long, boolean) has been changed to reflect
  the Bandcamp images findings;
- getThumbnailUrlFromSearchResult has been renamed to
  getImagesFromSearchResult, and a documentation has been added to this method.

The method replaceHttpWithHttps of the Utils class has been also used in
BandcampExtractorHelper instead of doing manually what the method does.
This commit is contained in:
AudricV 2022-07-30 17:02:19 +02:00
parent 4e6fb368bc
commit 4b80d737a4
No known key found for this signature in database
GPG Key ID: DA92EC7905614198
1 changed files with 163 additions and 17 deletions

View File

@ -6,25 +6,81 @@ import com.grack.nanojson.JsonObject;
import com.grack.nanojson.JsonParser; import com.grack.nanojson.JsonParser;
import com.grack.nanojson.JsonParserException; import com.grack.nanojson.JsonParserException;
import com.grack.nanojson.JsonWriter; import com.grack.nanojson.JsonWriter;
import org.jsoup.Jsoup; import org.jsoup.Jsoup;
import org.jsoup.nodes.Element; import org.jsoup.nodes.Element;
import org.schabi.newpipe.extractor.Image;
import org.schabi.newpipe.extractor.Image.ResolutionLevel;
import org.schabi.newpipe.extractor.NewPipe; import org.schabi.newpipe.extractor.NewPipe;
import org.schabi.newpipe.extractor.exceptions.ParsingException; import org.schabi.newpipe.extractor.exceptions.ParsingException;
import org.schabi.newpipe.extractor.exceptions.ReCaptchaException; import org.schabi.newpipe.extractor.exceptions.ReCaptchaException;
import org.schabi.newpipe.extractor.localization.DateWrapper; import org.schabi.newpipe.extractor.localization.DateWrapper;
import org.schabi.newpipe.extractor.utils.Utils; import org.schabi.newpipe.extractor.utils.ImageSuffix;
import javax.annotation.Nullable;
import java.io.IOException; import java.io.IOException;
import java.nio.charset.StandardCharsets; import java.nio.charset.StandardCharsets;
import java.time.DateTimeException; import java.time.DateTimeException;
import java.time.ZonedDateTime; import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter; import java.time.format.DateTimeFormatter;
import java.util.Collections; import java.util.Collections;
import java.util.List;
import java.util.Locale; import java.util.Locale;
import java.util.stream.Collectors;
import javax.annotation.Nonnull;
import javax.annotation.Nullable;
import static org.schabi.newpipe.extractor.Image.HEIGHT_UNKNOWN;
import static org.schabi.newpipe.extractor.Image.WIDTH_UNKNOWN;
import static org.schabi.newpipe.extractor.utils.Utils.isNullOrEmpty;
import static org.schabi.newpipe.extractor.utils.Utils.replaceHttpWithHttps;
public final class BandcampExtractorHelper { public final class BandcampExtractorHelper {
/**
* List of image IDs which preserve aspect ratio with their theoretical dimension known.
*
* <p>
* Bandcamp images are not always squares, so images which preserve aspect ratio are only used.
* </p>
*
* <p>
* One of the direct consequences of this specificity is that only one dimension of images is
* known at time, depending of the image ID.
* </p>
*
* <p>
* Note also that dimensions are only theoretical because if the image size is less than the
* dimensions of the image ID, it will be not upscaled but kept to its original size.
* </p>
*
* <p>
* IDs come from <a href="https://gist.github.com/f2k1de/06f5fd0ae9c919a7c3693a44ee522213">the
* GitHub Gist "Bandcamp File Format Parameters" by f2k1de</a>
* </p>
*/
private static final List<ImageSuffix> IMAGE_URL_SUFFIXES_AND_RESOLUTIONS = List.of(
// ID | HEIGHT | WIDTH
new ImageSuffix("10.jpg", HEIGHT_UNKNOWN, 1200, ResolutionLevel.HIGH),
new ImageSuffix("101.jpg", 90, WIDTH_UNKNOWN, ResolutionLevel.LOW),
new ImageSuffix("170.jpg", 422, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
// 180 returns the same image aspect ratio and size as 171
new ImageSuffix("171.jpg", 646, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("20.jpg", HEIGHT_UNKNOWN, 1024, ResolutionLevel.HIGH),
// 203 returns the same image aspect ratio and size as 200
new ImageSuffix("200.jpg", 420, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("201.jpg", 280, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("202.jpg", 140, WIDTH_UNKNOWN, ResolutionLevel.LOW),
new ImageSuffix("204.jpg", 360, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("205.jpg", 240, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("206.jpg", 180, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM),
new ImageSuffix("207.jpg", 120, WIDTH_UNKNOWN, ResolutionLevel.LOW),
new ImageSuffix("43.jpg", 100, WIDTH_UNKNOWN, ResolutionLevel.LOW),
new ImageSuffix("44.jpg", 200, WIDTH_UNKNOWN, ResolutionLevel.MEDIUM));
private static final String IMAGE_URL_APPENDIX_AND_EXTENSION_REGEX = "_\\d+\\.\\w+";
private static final String IMAGES_DOMAIN_AND_PATH = "https://f4.bcbits.com/img/";
public static final String BASE_URL = "https://bandcamp.com"; public static final String BASE_URL = "https://bandcamp.com";
public static final String BASE_API_URL = BASE_URL + "/api"; public static final String BASE_API_URL = BASE_URL + "/api";
@ -44,7 +100,7 @@ public final class BandcampExtractorHelper {
+ "&tralbum_id=" + itemId + "&tralbum_type=" + itemType.charAt(0)) + "&tralbum_id=" + itemId + "&tralbum_type=" + itemType.charAt(0))
.responseBody(); .responseBody();
return Utils.replaceHttpWithHttps(JsonParser.object().from(jsonString) return replaceHttpWithHttps(JsonParser.object().from(jsonString)
.getString("bandcamp_url")); .getString("bandcamp_url"));
} catch (final JsonParserException | ReCaptchaException | IOException e) { } catch (final JsonParserException | ReCaptchaException | IOException e) {
@ -76,17 +132,26 @@ public final class BandcampExtractorHelper {
} }
/** /**
* Generate image url from image ID. * Generate an image url from an image ID.
* <p>
* The appendix "_10" was chosen because it provides images sized 1200x1200. Other integer
* values are possible as well (e.g. 0 is a very large resolution, possibly the original).
* *
* @param id The image ID * <p>
* @param album True if this is the cover of an album or track * The image ID {@code 10} was chosen because it provides images wide up to 1200px (when
* @return URL of image with this ID sized 1200x1200 * the original image width is more than or equal this resolution).
* </p>
*
* <p>
* Other integer values are possible as well (e.g. 0 is a very large resolution, possibly the
* original); see {@link #IMAGE_URL_SUFFIXES_AND_RESOLUTIONS} for more details about image
* resolution IDs.
* </p>
*
* @param id the image ID
* @param isAlbum whether the image is the cover of an album or a track
* @return a URL of the image with this ID with a width up to 1200px
*/ */
public static String getImageUrl(final long id, final boolean album) { @Nonnull
return "https://f4.bcbits.com/img/" + (album ? 'a' : "") + id + "_10.jpg"; public static String getImageUrl(final long id, final boolean isAlbum) {
return IMAGES_DOMAIN_AND_PATH + (isAlbum ? 'a' : "") + id + "_10.jpg";
} }
/** /**
@ -136,13 +201,94 @@ public final class BandcampExtractorHelper {
} }
} }
@Nullable /**
public static String getThumbnailUrlFromSearchResult(final Element searchResult) { * Get a list of images from a search result {@link Element}.
return searchResult.getElementsByClass("art").stream() *
* <p>
* This method will call {@link #getImagesFromImageUrl(String)} using the first non null and
* non empty image URL found from the {@code src} attribute of {@code img} HTML elements, or an
* empty string if no valid image URL was found.
* </p>
*
* @param searchResult a search result {@link Element}
* @return an unmodifiable list of {@link Image}s, which is never null but can be empty, in the
* case where no valid image URL was found
*/
@Nonnull
public static List<Image> getImagesFromSearchResult(@Nonnull final Element searchResult) {
return getImagesFromImageUrl(searchResult.getElementsByClass("art")
.stream()
.flatMap(element -> element.getElementsByTag("img").stream()) .flatMap(element -> element.getElementsByTag("img").stream())
.map(element -> element.attr("src")) .map(element -> element.attr("src"))
.filter(string -> !string.isEmpty()) .filter(imageUrl -> !isNullOrEmpty(imageUrl))
.findFirst() .findFirst()
.orElse(null); .orElse(""));
}
/**
* Get all images which have resolutions preserving aspect ratio from an image URL.
*
* <p>
* This method will remove the image ID and its extension from the end of the URL and then call
* {@link #getImagesFromImageBaseUrl(String)}.
* </p>
*
* @param imageUrl the full URL of an image provided by Bandcamp, such as in its HTML code
* @return an unmodifiable list of {@link Image}s, which is never null but can be empty, in the
* case where the image URL has been not extracted (and so is null or empty)
*/
@Nonnull
public static List<Image> getImagesFromImageUrl(@Nullable final String imageUrl) {
if (isNullOrEmpty(imageUrl)) {
return List.of();
}
return getImagesFromImageBaseUrl(
imageUrl.replaceFirst(IMAGE_URL_APPENDIX_AND_EXTENSION_REGEX, "_"));
}
/**
* Get all images which have resolutions preserving aspect ratio from an image ID.
*
* <p>
* This method will call {@link #getImagesFromImageBaseUrl(String)}.
* </p>
*
* @param id the id of an image provided by Bandcamp
* @param isAlbum whether the image is the cover of an album
* @return an unmodifiable list of {@link Image}s, which is never null but can be empty, in the
* case where the image ID has been not extracted (and so equal to 0)
*/
@Nonnull
public static List<Image> getImagesFromImageId(final long id, final boolean isAlbum) {
if (id == 0) {
return List.of();
}
return getImagesFromImageBaseUrl(IMAGES_DOMAIN_AND_PATH + (isAlbum ? 'a' : "") + id + "_");
}
/**
* Get all images resolutions preserving aspect ratio from a base image URL.
*
* <p>
* Base image URLs are images containing the image path, a {@code a} letter if it comes from an
* album, its ID and an underscore.
* </p>
*
* <p>
* Images resolutions returned are the ones of {@link #IMAGE_URL_SUFFIXES_AND_RESOLUTIONS}.
* </p>
*
* @param baseUrl the base URL of the image
* @return an unmodifiable and non-empty list of {@link Image}s
*/
@Nonnull
private static List<Image> getImagesFromImageBaseUrl(@Nonnull final String baseUrl) {
return IMAGE_URL_SUFFIXES_AND_RESOLUTIONS.stream()
.map(imageSuffix -> new Image(baseUrl + imageSuffix.getSuffix(),
imageSuffix.getHeight(), imageSuffix.getWidth(),
imageSuffix.getResolutionLevel()))
.collect(Collectors.toUnmodifiableList());
} }
} }