Implement a max number of requests per minute to prevent hitting reate limits and triggering ReCaptchaExceptions. This slows down the RecordingDownloader significantly and can be adjusted if needed. A request ist retried once when facing a ReCaptchaException.
Extractor.pageFetched -> Extractor.isPageFetched
Stream.equalStats(Stream) is renamed to Stream.areStatsEqual(Stream)
Stream.getBitrate() is renamed to Stream.getBitRate()
This commit adds two new regular expressions to parse the n parameter function.
It also improves existing regular expressions by using the constant representing
multiple characters instead of adding the one or multiple characters
token manually in each regex for everything and not only function names.
It isn't required anymore and not used by extractor anymore since commit
5a6da5f43e, as the wrong page ID is used as a
visitor data (the VerifiedStatus value as a string).
The pattern to detect the channel ID was faulty and e.g. the ID detected for "https://kolektiva.media/video-channels/documentary_channel" was "a/video-channels" which is wrong. A new pattern is added to distinguish between URLs and potential IDs because IDs must not start with a "/" while IDs inside an URL must.
FixesTeamNewPipe/NewPipe#11369
As there are multiple show UI elements which share a lot of common data, a base
implementation, an abstract class named YoutubeBaseShowInfoItemExtractor, has
been created to handle common cases.
Also throw an exception when we cannot get the verified status of a channel in
YoutubeChannelExtractor due to a missing channelHeader, if the channel has no
channelAgeGateRenderer.
The runs object was computed twice in getTextFromObject and getUrlFromObject
methods, leading to unneeded search costs. This has been avoided by storing the
array in method variables.
This param used to throttle bandwidth of streaming URLs which have this
parameter when the correct value is not provided but it is not the case
anymore, as the streaming URLs return now an HTTP response code 403 in
this case.
This commits fixes extraction of the function name decoding the n parameter for
HTML5 clients' streaming URLs for YouTube base JavaScript player 3400486c.
Two new regexes have been added to the existing ones. All regexes and what they
extract has been documented.
- Fix typo in folder name of DescriptionTestPewdiepie test;
- Fix constant usage of DownloaderTestImpl as download implementation for
UnlistedTest and CCLicensed tests.
These changes work around an anti-bot token, for which its requirement is A/B
tested on the WEB client. In this test, streaming URLs of this client return
HTTP errors 403 if the token is not provided after some time.
It also allows to not fetch the JavaScript player for non-age restricted
videos, reducing data usage.
The TVHTML5 embed client is now only fetched in the case of age-restricted
videos.
The methods forceFetchAndroidClient and forceFetchIosClient of
YoutubeStreamExtractor have been removed, as they are now not needed anymore.
These changes also break the extraction of appropriate error states for private
and deleted videos and invalid video IDs.
Previously, if no URL was provided due to the track being a preorder or
unpaid track that doesn't have its own public page, we would catenate
the artist URL and null to a nonsensical string.
This invalid URL would also be used to try fetching a thumbnail URL.
The downstream behavior of the NewPipe client is only marginally
improved, as it now no longer throws visible exceptions due to the
missing thumbnails, but still gives an unfriendly error upon clicking
the item that has a `null` URL.
YouTube now returns a Shorts tab for InteractiveTabbedHeader gaming channels,
which contains Shorts about the game of the topic channel but are not uploaded
on the game topic channel.
As this tab is already supported by the extractor, fetching a gaming topic
channel now returns a tab instead of none.
Channel name and channel URL of these Shorts needs to be set to null in a
separate commit, as Shorts on this tab do not have the topic channel as their
uploader.
The new action data can return multiple contact actions instead of only one,
which will be concatenated by a new line return.
This commit fixes tests of the CrisisResources test class of
YoutubeSearchExtractorTest.
YouTube disabled the effectiveness of the parameters which were used (the
player response we get redirects to another video), but new parameters which
work around Android's client integrity checks have been found.
The API keys are not used anymore by official clients in almost all cases
(still used by the Android app until it gets a configuration) for all requests
we made.
Clients and device OS versions have been bumped to their latest stable version
known.
Methods and fields related to API keys have been renamed or deleted if they're
no longer relevant.
It’s not obvious that the function will fail in some cases and throw
an `IllegalArgumentException`.
So instead of just failing if parsing fails, return an Optional that
all callers have to decide what to do (e.g. the YoutubeExtractor can
just ignore the locale in that case, like it does with most other
fields in the json if they are unexpected).
i.e. without needing to pass through the conference/channel extractor
This was needed because clients (like NewPipe) might rely on link handlers to hold as little data as possible, since they might be kept around for long or passed around in system transactions, so this commit allows obtaining a standalone link handler that does not hold a JsonObject within itself.
YouTube provides that meta info panel when users search for really sensitive content like suicide (e.g. "blue whale").
It contains:
- an encouragement as title (e.g. "We are with you")
- a phone number as action
- details about how to call the phone number (e.g. availability)
- an url pointing to the website of an association
Also add a test that just checks if a meta info is properly extracted
This test only tests that search results are returned, when no content filters
are provided and crisis resources blocking search results should be returned.
Searches with blocking crisis resources and content filters should work too, as
the bypass has been implemented for them.
As search parameters to bypass crisis resources blocking search results have
been implemented, they need to be added to search tests, in order to pass
them.
The subscriber count is now lower than the expected count as some people
unsubscribed to the Sports system channel. The expected count has been so
lowered.
YouTube doesn't return anymore a suggestion for the query "algorythm", but does
for the query "on board ing" ("on boarding"). This search query is now used and
had to be URL-encoded.
URL encoding in the complete YoutubeSearchExtractorTest test class uses now
extractor's Util class instead of Java's URLDecoder class directly.
YouTube is rolling out or A/B testing a new date format returned inside player
responses, which are precise to the second instead of the day.
This commit makes the StreamExtractor tests use these more precise dates.
This commit fixes the testRelatedItems test method by:
- accepting consent in the test class, in order to extract mixes in
recommendations;
- removing assertion of a music mix inside the recommendations, as YouTube
doesn't seem to return such mixes anymore, at least for the video used in the
test class.
Replace the video used in this test class with another one publicly available
and update the corresponding expected test values.
The test class's mocks will be updated in a different commit.
- Change CarouselHeader test channel to Sports system one, as the Coachella one
doesn't return this channel header anymore;
- Fix InteractiveTabbedHeader test by checking whether the test's channel
description is not empty instead of containing some words, as it is changing
frequently.
Video's title and tags have been changed by its uploader, so they have to be
updated.
Also make some package-private constants private, as they are not used outside
of the class, and remove unneeded test overrides.
These crisis resources are preventing search results to be returned. See
https://support.google.com/youtube/answer/10726080?hl=en for more info on them.
This commit changes search parameters to include the property allowing to show
search results.
YouTube returns sometimes videos inside channel search results. As we only want
results corresponding to the type we requested, this commits makes
YoutubeSearchExtractor ignoring non-requested search results we get, using the
extractor LinkHandler's first content filter value.
Also remove an unneeded exception throwing declaration in
YoutubeSearchExtractor.
This query parameter for which its value is set to false was not added to two
requests made in test classes of YoutubeMixPlaylistExtractorTest.
Also remove an unneeded ParsingException exception throwing declaration in a
test method.
This should make returned dates consistent between timezones and countries on
which the extractor is ran.
It was previously only set on YouTube Music search continuations.
For every InnerTube request:
- Always add a `request` object with the following properties:
- "internalExperimentFlags" set to an empty array;
- "useSsl" set to "true";
- "lockedSafetyMode" set to "false".
- Use proper TODO comment to provide a way to enable restricted mode on every
request and add it on requests on which it wasn't present.
For YouTube Music:
- Remove alt query parameter, as it is not used anymore by the website;
- Add prettyPrint query parameter with false value on YouTube Music search
continuations.
Default image qualities were removed in image URLs with the jpg extension,
causing the addition of the image suffix to full non-JPG images URLs and so to
invalid image URLs.
Only the image quality name with its leading "-" character and the "."
character after the name is now removed and replaced by a string format
replaced itself with the image quality name for each quality.
As the image suffixes do not contain the image extension, the name of image
qualities lists has been adapted with these changes and some related comments
have been also improved.
Some services may provide different image formats using the same suffix,
without we know what format the service provide. Enforcing an image extension
could so lead to provide invalid image URLs, like for SoundCloud PNG images
currently.
With this documentation change, it is now clear that users of this class decide
of whether they want to include image extensions in the suffix. The previous
behavior described in the Javadoc was not enforced.
The signature timestamp is used as a number by HTML5 clients, so it should be
used in the same way by the extractor too instead of being a string.
As the timestamp doesn't seem to exceed 5 digits, an integer is used to store
its value.
This commit is introducing breaking changes.
For clients, everything is managed in a new class called
YoutubeJavaScriptPlayerManager:
- caching JavaScript base player code and its extracted code (functions and
variables);
- getting player signature timestamp;
- getting deobfuscated signatures of streaming URLs;
- getting streaming URLs with a throttling parameter deobfuscated, if
applicable.
The class delegates the extraction parts to external package-private classes:
- YoutubeJavaScriptExtractor, to extract and download YouTube's JavaScript base
player code: it always already present before and has been edited to mainly
remove the previous caching system and made it package-private;
- YoutubeSignatureUtils, for player signature timestamp and signature
deobfuscation function of streaming URLs, added in a recent commit;
- YoutubeThrottlingParameterUtils, which was originally
YoutubeThrottlingDecrypter, for throttling parameter of streaming URLs
deobfuscation function and checking whether this parameter is in a streaming
URL.
YoutubeJavaScriptPlayerManager caches and then runs the extracted code if it
has been executed successfully. The cache system of throttling parameters
deobfuscated values has been kept, its size can be get using the
getThrottlingParametersCacheSize method and can be cleared independently using
the clearThrottlingParametersCache method.
If an exception occurs during the extraction or the parsing of a function
property which is not related to JavaScript base player code fetching, it is
stored until caches are cleared, making subsequent failing extraction calls of
the requested function or property faster and consuming less resources, as the
result should be the same until the base player code changes.
All caches can be reset using the clearAllCaches method of
YoutubeJavaScriptPlayerManager.
Classes using JavaScript base player code and utilities directly (in the code
and its tests) have been also updated in this commit.
The goal of this class is to decouple the extraction of signature timestamp and
signature deobfuscation function from YoutubeStreamExtractor.
The extraction of the signature deobfuscation function has been also adapted to
support the latest YouTube player versions.
This new class, YoutubeSignatureUtils, doens't store anything temporary such as
a copy of the player code, which has to be passed where required. It is not
public, as it will be used by a JavaScript player manager class in the future,
in order to handle in a better way fetching, caching and resetting cache of the
player code.
Also remove some public test methods modifiers, add missing Test annotations on
old Junit 4 tests (and update them if needed), and use final in some places
where it was possible.
BandcampChannelExtractorTest.testLength has been removed as the test is always
true.
This method, testImages(Collection<Image>), will use first the default image
collection test in DefaultTests and then will check that each image URL
contains f4.bcbits.com/img and ends with .jpg or .png.
To do so, a new non-instantiable final class has been added: BandcampTestUtils.
Also remove some public test methods modifiers, add missing Test annotations on
old Junit 4 tests (and update them if needed), and use final in some places
where it was possible.
This method, testImages(Collection<Image>), will use first the default image
collection test in DefaultTests and then will check that each image URL
contains the string yt.
The JavaDoc of the class has been also updated to reflect the changes made in
it (it is now more general).
Two new methods have been added in ExtractorAsserts to check if a collection is
empty:
- assertNotEmpty(String, Collection<?>), checking:
- the non nullity of the collection;
- its non emptiness (if that's not case, an exception will be thrown using
the provided message).
- assertNotEmpty(Collection<?>), calling assertNotEmpty(String, Collection<?>)
with null as the value of the string argument.
A new one has been added to this assertion class to check the contrary:
assertEmpty(Collection<?>), checking emptiness of the collection only if it is
not null.
Three new methods have been added in ExtractorAsserts as utility test methods
for image collections:
- assertContainsImageUrlInImageCollection(String, Collection<Image>), checking
that:
- the provided URL and image collection are not null;
- the image collection contains at least one image which has the provided
string value as its URL (which is a string) property.
- assertContainsOnlyEquivalentImages(Collection<Image>, Collection<Image>),
checking that:
- both collections are not null;
- they have the same size;
- each image of the first collection has its equivalent in the second one.
This means that the properties of an image in the first collection must be
equal in an image of the second one.
- assertNotOnlyContainsEquivalentImages(Collection<Image>, Collection<Image>),
checking that:
- both collections are not null;
- one of the following conditions is met:
- they have different sizes;
- an image of the first collection has not its equivalent in the second one.
This means that the properties of an image in the first collection must
be not equal in an image of the second one.
These methods will be used by services extractors tests (and default ones) to
test image collections.
This new method, defaultTestImageList(List<Image), will check that the image
list is not null.
For each image, it will test that its URL is secure and its height and width
are more than or equal to their relevant unknown constants in the Image class
(HEIGHT_UNKNOWN and WIDTH_UNKNOWN).
These three new methods, added in MediaCCCParsingHelper,
getImageListFromImageUrl(String), getThumbnailsFromStreamItem(JsonObject) and
getThumbnailsFromLiveStreamItem(JsonObject) (the last two are based on a common
method, getThumbnailsFromObject(JsonObject, String, String)), return an empty
list if the case no image URL could be extracted.
Images returned have their height and width unknown and a resolution level
depending on the image key of the JSON API response.
Bandcamp images work with image IDs, which provide different resolutions.
Images on Bandcamp are not always squares, and some IDs respect aspect ratios
where some others not.
The extractor will only use the ones which preserve aspect ratio and will not
provide original images, for performance and size purposes.
Because of this aspect ratio preservation constraint, only one dimension will
be known at a time.
The image IDs with their respective dimension used are:
- 10: 1200w;
- 101: 90h;
- 170: 422h;
- 171: 646h;
- 20: 1024w;
- 200: 420h;
- 201: 280h;
- 202: 140h;
- 204: 360h;
- 205: 240h;
- 206: 180h;
- 207: 120h;
- 43: 100h;
- 44: 200h.
(Where w represents the width of the image and h the height of the image)
Note that these dimensions are theoretical because if the image size is less
than the dimensions of the image ID, it will be not upscaled but kept to its
original size.
All these resolutions are stored in a private static list of ThumbnailSuffixes
in BandcampExtractorHelper, in which the methods to get mutliple images have
been added:
- getImagesFromImageUrl(String): public method to get images from an image URL;
- getImagesFromImageId(long, boolean): public method to get images from an
image ID;
- getImagesFromImageBaseUrl(String): private utility method to get images from
the static list of ThumbnailSuffixes from a given image base URL, containing
the path to the image, a "a" letter if it comes from an album, its ID and an
underscore.
Some existing methods have been also edited:
- the documentation of getImageUrl(long, boolean) has been changed to reflect
the Bandcamp images findings;
- getThumbnailUrlFromSearchResult has been renamed to
getImagesFromSearchResult, and a documentation has been added to this method.
The method replaceHttpWithHttps of the Utils class has been also used in
BandcampExtractorHelper instead of doing manually what the method does.
The default avatar picture was used when no profile picture was found, but it
was removed and split in multiple images.
Thumbnails' size is not known, as this data is not provided by the API.
This method, getThumbnailsFromPlaylistOrVideoItem, has been added in
PeertubeParsingHelper and returns the two image variants for playlists and
videos.
Four new static methods have been added in PeertubeParsingHelper to do so:
- two public methods to get the corresponding image type:
getAvatarsFromOwnerAccountOrVideoChannelObject(String, JsonObject) and
getBannersFromAccountOrVideoChannelObject(String, JsonObject);
- two private methods as helper methods: getImagesFromAvatarsOrBanners(String,
JsonObject, String, String) and getImagesFromAvatarOrBannerArray(String,
JsonArray).
These new public and static methods, added in SoundcloudParsingHelper,
getAllImagesFromArtworkOrAvatarUrl(String) and
getAllImagesFromVisualUrl(String) (which call a common private method,
getAllImagesFromImageUrlReturned(String, List<ImageSuffix>, List<Image>)),
return an unmodifiable list of JPEG images containing almost every image
resolution provided by SoundCloud except the original size and the tiny
resolution (for artworks and avatars, as the image size is 20x20 for artworks
and 18x18 for avatars, so very close to or equal to the t20x20 resolution):
- for artworks and avatars:
- mini: 16x16;
- t20x20: 20x20;
- small: 32x32;
- badge: 47x47;
- t50x50: 50x50;
- t60x60: 60x60;
- t67x67: 67x67;
- large: 100x100;
- t120x120: 120x120;
- t200x200: 200x200;
- t240x240: 240x240;
- t250x250: 250x250;
- t300x300: 300x300;
- t500x500: 500x500.
- for visuals/user banners:
- t1240x260: 1240x260;
- t2480x520: 2480x520.
Duplicated code in two methods of SoundcloudParsingHelper
(getUsersFromApi(ChannelInfoItemsCollector, String) and
getStreamsFromApi(StreamInfoItemsCollector, String, boolean)) has been merged
into one common private method, getNextPageUrlFromResponseObject(JsonObject).
Splitting YoutubeMusicSearchExtractor's InfoItemExtractors into separate
classes (YoutubeMusicSongOrVideoInfoItemExtractor,
YoutubeMusicAlbumOrPlaylistInfoItemExtractor and
YoutubeMusicArtistInfoItemExtractor) allows to simplify
YoutubeMusicSearchExtractor,improves reading and applying changes to InfoItems
(no more losing at least quarter of a line due to indentations).
These InfoItems, in which the image changes have been applied, don't extend the
YouTube ones anymore, as most methods were overridden and the few ones that are
not don't apply in YouTube Music items responses, so it was useless to extend
them.
The code of YoutubeMusicSearchExtractor have been also improved a bit.
Unmodifiable lists of Images are returned, parsed from a given YouTube
"thumbnails" JSON array.
These methods will be used in all YouTube extractors and InfoItems, as the
structures between content types (videos, channels, playlists, ...) are common.
The goal of this utility class is to simply store suffixes which need to be
appended to image URLs, in order to get images at the suffix resolution.
This class contains four properties: the suffix (as a string), the height,
the width (as integers) and the estimated resolution level of the image
corresponding to the one represented by the suffix.
Objects of this serializable class contains four properties: a URL (as a
string), a width, a height (represented as integers) and an estimated
resolution level, which can be constructed from a given height.
Possible resolution levels are:
- UNKNOWN: for unknown heights or heights <= 0;
- LOW: for heights > 0 & < 175;
- MEDIUM: for heights >= 175 & < 720;
- HIGH: for heights >= 720.
Getters of these properties are available and the constructor needs these four
properties.
The addition of this support required to turn the isCarouselHeader boolean into
an enum containing all supported channel headers named HeaderType.
Also assert that the page has been fetched where needed to avoid
NullPointerExceptions when the channel page has been not fetched and remove the
getChannelHeaderJson method in YoutubeChannelExtractor, method for which its
code has been moved to its sole usage after the new headers support changes.
The non-null assertion was made on the exception message instead of the string
to check, causing a NullPointerException if the string to check was null.
Note that this introduces a "Raw use of parameterized class 'InfoItemsPage'" warning, but it can be ignored since the type missing would be <InfoItem>, and StreamInfoItem extends InfoItem
Only the first content filter of the ListLinkHandler instances provided is
used when collecting all channel tabs of the ListLinkHandler list, as channel
tabs implementations only use one content filter per ListLinkHandler instance.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
These changes should help to detect tests as tests, when running a subset of
tests or all tests.
They should be also implemented in these interfaces' implementations (new and
existing ones).
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Support of tags and videos, shorts, live, playlists and channels tabs has been
added for non-age restricted channels.
Age-restricted channels are now also supported and always returned the videos,
shorts and live tabs, accessible using system playlists. These tabs are the
only ones which can be accessed using YouTube's desktop website without being
logged-in.
The videos channel tab parameter has been updated to the one used by the
desktop website and when a channel extraction is fetched, this tab is returned
in the list of tabs as a cached one in the corresponding link handler.
Visitor data support per request has been added, as a valid visitor data is
required to fetch continuations with contents on the shorts tab. It is only
used in this case to enhance privacy.
A dedicated shorts UI elements (reelItemRenderers) extractor has been added,
YoutubeReelInfoItemExtractor. These elements do not provide the exact view
count, any uploader info (name, URL, avatar, verified status) and the upload
date.
All service's LinkHandlers are now using the singleton pattern and some code
has been also improved on the files changed.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Co-authored-by: Stypox <stypox@pm.me>
This is required to parse duration of YouTube's reelItemRenderers, returned
only inside accessibility data.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Support of tracks, playlists and albums has been added for users.
Also add the declaration of the UnsupportedOperationException exception to the
service's LinkHandlers.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Co-authored-by: Stypox <stypox@pm.me>
Support of tracks and albums has been added for artists.
Also use the singleton pattern and add the declaration of the
UnsupportedOperationException exception to the service's LinkHandlers and
improved some code in the files changed.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Co-authored-by: Stypox <stypox@pm.me>
Support of channels and videos has been added for accounts and support of
videos and playlists has been added for video channels.
The following changes have been also done:
- collectStreamsFrom method in PeertubeParsingHelper has been renamed to
collectItemsFrom;
- PeertubeChannelInfoItemExtractor.getStreamCount method has been fixed due to
ChannelExtractor's new inheritance;
- the declaration of the UnsupportedOperationException exception thrown has
been added to the service's LinkHandlers;
- a channel tab LinkHandlerFactory has been added,
PeertubeChannelTabLinkHandlerFactory;
- all service's LinkHandlers are now using properly the singleton pattern.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Co-authored-by: Stypox <stypox@pm.me>
This new ListLinkHandler, ReadyChannelTabListLinkHandler, should help saving
clients data, energy and time by helping to reduce duplicate requests.
Co-authored-by: Stypox <stypox@pm.me>
This change advertise to clients that channel tabs' link handler factories can
return an UnsupportedOperationException when a tab provided to them is
unsupported.
This commit introduces the following breaking changes:
- Three new classes have been added:
- ChannelTabExtractor, class extending ListExtractor<InfoItem>, which
extracts InfoItems from a channel tab;
- ChannelTabInfo extending ListInfo<InfoItem>, which extracts InfoItems from
a ChannelTabExtractor and returns them as a ChannelTabInfo;
- ChannelTabs, an immutable class containing all supported channel tabs.
- StreamingService implementations must implement new methods returning a
channel tab LinkHandlerFactory (getChannelTabsLHFactory) and a
ChannelTabExtractor (getChannelTabExtractor);
- ChannelExtractor inherits Extractor instead of ListExtractor<StreamInfoItem>
and ChannelInfo inherits Info instead of ListInfo<StreamInfoItem>;
- ChannelExtractor and ChannelInfo have now getters and/or setters of tabs.
Co-authored-by: ThetaDev <t.testboy@gmail.com>
Co-authored-by: Stypox <stypox@pm.me>
- Enhance documentation;
- Fix the regular expression fallback on HTML embed watch page;
- Use HTML scripts tag search first instead of the regular expression approach,
now used as a last resort;
- Compile regular expressions only once, in order to improve the performance of
subsequent extraction calls when clearing the cache;
- Provide original exceptions when fetching or parsing pages on which the base
JavaScript's player could be found failed, allowing clients to detect network
errors when they are the cause of the failures for instance;
- Remove delegate method which was not taking a video ID and hardcoding one, as
we can provide the video ID in all cases or do not provide a video ID at worse;
- Rename and make extraction methods package-private, as they are not intended
to be used publicly.
These breaking internal changes have been applied where needed, in
YoutubeJavaScriptExtractorTest and YoutubeStreamExtractor (in which an unneeded
initStsFromPlayerJsIfNeeded call have been removed).
- Fix testCheckAudioStreams test of
YoutubeStreamExtractorDefaultTest.AudioTrackLanguage test class, by updating
the excepted audio track name test to use the updated English audio track name
(audio track type info has been added on the video tested);
- Fix YoutubeStreamExtractorDefaultTest.PublicBroadcasterTest test class by
using a different video from a French and German public broadcast channel, as
the channel Dinge Erklärt – Kurzgesagt is not affiliated with a public
broadcast channel anymore;
- Fix YoutubeStreamExtractorLivestreamTest test class, by updating the excepted
name of the livestream to the current one.
These parameters are the only ones currently known to bypass 403 HTTP issues
related to failure of passing Android client integrity checks, as the ones of
stories (and the base of the shorts ones) do not work anymore, which may be
related to end of this format on the service.
- Remove useless concatenation on the downloader path;
- Remove unneeded public test modifier;
- Update license header;
- Specify the service class tested instead of the generic class.
- Switch to the new domain used by YouTube for search suggestions,
suggestqueries-clients6.youtube.com, and add the xhr query parameter with the
t value, to allow getting responses without requiring trim;
- Use the Java 8 Stream API to collect search suggestions and improve invalid
response detection by checking whether the content type of the response
returned is JSON;
- Move the licence header at the top of the file.
When a description is missing, no description should be returned, even the ones
indicating there is no description. This behavior is represented by a null
return instead.
Also update PeertubeAccountExtractorTest to reflect these changes.
The tested comment has been removed, so it couldn't be found in the comments
list.
This comment has been replaced by a new one from the current comments of the
video.
Also, in the parent class PeertubeCommentsExtractorTest, final has been used as
much as possible and for-each loops of lists have been replaced by their
forEach method or the Stream API, in order to simplify code.
As the "No views" string is returned in the case there is no view on a video, a
number cannot be parsed in this case, so -1 was returned.
This string is now detected in all methods to get the view count of a stream.
Getting audio tracks locales by parsing their ID or their label, should not be
done by clients, but by the extractor.
This commit adds the ability to store the Locale of an AudioStream, which is
used to compare similar AudioStreams (in the equalStats method).
webCommandMetadata object is contained inside a commandMetadata one, so it is
not accessible from the root of the navigationEndpoint object.
The corresponding statement has been moved at the bottom of the specific
endpoints parsing, as the webCommandMetadata object is present almost
everywhere, otherwise URLs of some endpoints would have be changed, such as
uploader URLs (from channel IDs to handles).
As no ParsingException is now thrown by getUrlFromNavigationEndpoint, and so by
getTextFromObject, getUrlFromObject and getTextAtKey, the methods which were
catching ParsingExceptions thrown by these methods had to be updated.
URLs got in the HTML version of getTextFromObject are now escaped properly to
provide valid HTML to clients. This has been also done for attribute
descriptions, with the description text for this type of descriptions.
As YouTube descriptions are in HTML format (except for the fallback on the JSON
player response, which is plain text and only happens when there is no visual
metadata or a breaking change), all URLs returned are escaped, so tests which
are testing presence of URLs with escaped characters had to be updated (it was
only the case for YoutubeStreamExtractorDefaultTest.DescriptionTestUnboxing).
As like count is now returned by the extractor, we need to assert a positive
minimum like count, which is close to the actual value, in order to avoid test
failures due to lower like counts than the ones excepted.
- Remove unused imports;
- Replace wildcard imports by single class imports;
- Suppress "HTTP links are not secured" warnings from IDEA IDEs;
- Replace removed video jZViOEv90dI by an existing video, 9Dpqou5cI08 (the
corresponding test has been of course renamed).
- Move license header at the top;
- Use an unmodifiable set for the subpaths instead of a modifiable list;
- Add missing Nonnull and Nullable annotations;
- Improve exception messages.
The yt:channelId element doesn't provide the channel ID anymore and is empty,
like the id element, so we need now to extract it from the channel URL provided
in two elements: author -> uri and feed -> link.
Also avoid a NullPointerException in getUrl and getName methods.
- Return duration of video premieres;
- Add another non-localized method to determine whether a stream is a running
livestream;
- Return view count and upload date of videos in playlists;
- Store isPremiere result;
- Remove shorts workaround code, as it was only useful on channels and shorts
have been moved into a separated channel tab;
- Improve some other code.
This header was not sent partially before and was added and guessed by OkHttp. This can create issues when using other HTTP clients than OkHttp, such as Cronet.
Some code in the modified classes has been improved and / or deduplicated, and usages of the UTF_8 constant of the Utils class has been replaced by StandardCharsets.UTF_8 where possible.
Note that this header has been not added in except in YoutubeDashManifestCreatorsUtils, as an empty body is sent in the POST requests made by this class.
This header was not sent before and was added and guessed by OkHttp. This can create issues when using other HTTP clients than OkHttp, such as Cronet.
Also make use of StandardCharsets.UTF_8 when getting bytes of bodies instead of the platform default's charset, to make sure to prevent some encoding issues on some JVMs.
The video "Makani’s first commercial-scale energy kite" (video ID:
An8vtD1FDqs), which has this behavior, is used for the new test,
NoVisualMetadataVideoTest, added in YoutubeStreamExtractorDefaultTest.
Tests of elements who throw an exception in this case (subscriber count, like
count, uploader avatar URL) test if the ParsingException exception is thrown by
YoutubeStreamExtractor.
description:Create a bug report to help us improve
labels:[bug]
body:
- type:markdown
attributes:
value:|
Thank you for helping to make the NewPipe Extractor better by reporting a bug. :hugs:
Please fill in as much information as possible about your bug so that we don't have to play "information ping-pong" and can help you immediately.
- type:checkboxes
id:checklist
attributes:
label:"Checklist"
options:
- label:"I am able to reproduce the bug with the latest version given here: [CLICK THIS LINK](https://github.com/TeamNewPipe/NewPipeExtractor/releases/latest)."
required:true
- label:"I am aware that this issue is being opened for the NewPipe Extractor, NOT the [app](https://github.com/TeamNewPipe/NewPipe), and my bug report will be dismissed otherwise."
required:true
- label:"I made sure that there are *no existing issues* - [open](https://github.com/TeamNewPipe/NewPipeExtractor/issues) or [closed](https://github.com/TeamNewPipe/NewPipeExtractor/issues?q=is%3Aissue+is%3Aclosed) - which I could contribute my information to."
required:true
- label:"I have taken the time to fill in all the required details. I understand that the bug report will be dismissed otherwise."
required:true
- label:"This issue contains only one bug."
required:true
- label:"I have read and understood the [contribution guidelines](https://github.com/TeamNewPipe/NewPipe/blob/dev/.github/CONTRIBUTING.md)."
required:true
- type:input
id:extractor-version
attributes:
label:Affected version
description:"In which NewPipe Extractor version did you encounter the bug?"
placeholder:"x.xx.x"
validations:
required:true
- type:textarea
id:steps-to-reproduce
attributes:
label:Steps to reproduce the bug
description:|
What did you do for the bug to show up?
If you can't cause the bug to show up again reliably (and hence don't have a proper set of steps to give us), please still try to give as many details as possible on how you think you encountered the bug.
placeholder:|
1. Init NewPipe with 'NewPipe.init(...)'
2. Create a StreamExtractor for xyz:'StreamExtractor e = YouTube.getStreamExtractor(xyz.com)'
3. Get the description 'e.getDescription()'
validations:
required:true
- type:textarea
id:expected-behavior
attributes:
label:Expected behavior
description:|
Tell us what you expect to happen.
- type:textarea
id:actual-behavior
attributes:
label:Actual behavior
description:|
Tell us what happens with the steps given above.
- type:textarea
id:screen-media
attributes:
label:Screenshots/Screen recordings
description:|
A picture or video is worth a thousand words.
If applicable, add screenshots or a screen recording to help explain your problem.
GitHub supports uploading them directly in the text box.
If your file is too big for Github to accept, try to compress it (ZIP-file) or feel free to paste a link to an image/video hoster here instead.
- type:textarea
id:logs
attributes:
label:Logs
description:|
If your bug includes a log you think we need to see, paste it here.
- type:textarea
id:additional-information
attributes:
label:Additional information
description:|
Any other information you'd like to include, for instance that
Thank you for helping to make the NewPipe Extractor better by suggesting a feature. :hugs:
Your ideas are highly welcome! The Extractor is made for you, the downstream developers, after all.
- type:checkboxes
id:checklist
attributes:
label:"Checklist"
options:
- label:"I am aware that this issue is being opened for the NewPipe Extractor, NOT the [app](https://github.com/TeamNewPipe/NewPipe), and my feature request will be dismissed otherwise."
required:true
- label:"I made sure that there are *no existing issues* - [open](https://github.com/TeamNewPipe/NewPipeExtractor/issues) or [closed](https://github.com/TeamNewPipe/NewPipeExtractor/issues?q=is%3Aissue+is%3Aclosed) - which I could contribute my information to."
required:true
- label:"I have taken the time to fill in all the required details. I understand that the feature request will be dismissed otherwise."
required:true
- label:"This issue contains only one feature request."
required:true
- label:"I have read and understood the [contribution guidelines](https://github.com/TeamNewPipe/NewPipe/blob/dev/.github/CONTRIBUTING.md)."
required:true
- type:textarea
id:feature-description
attributes:
label:Feature description
description:|
Explain how you want the Extractor's behavior to change to suit your needs.
validations:
required:true
- type:textarea
id:why-is-the-feature-requested
attributes:
label:Why do you want this feature?
description:|
Describe any problem or limitation you come across while using the Extractor which would be solved by this feature.
validations:
required:true
- type:textarea
id:additional-information
attributes:
label:Additional information
description:Any other information you'd like to include, for instance sketches, mockups, pictures of cats, etc.
NewPipe Extractor is a library for extracting things from streaming sites. It is a core component of [NewPipe](https://github.com/TeamNewPipe/NewPipe), but could be used independently.
@ -11,9 +11,17 @@ NewPipe Extractor is available at JitPack's Maven repo.
If you're using Gradle, you could add NewPipe Extractor as a dependency with the following steps:
1. Add `maven { url 'https://jitpack.io' }` to the `repositories` in your `build.gradle`.
2. Add `implementation 'com.github.TeamNewPipe:NewPipeExtractor:INSERT_VERSION_HERE'` to the `dependencies` in your `build.gradle`. Replace `INSERT_VERSION_HERE` with the [latest release](https://github.com/TeamNewPipe/NewPipeExtractor/releases/latest).
2. Add `implementation 'com.github.teamnewpipe:NewPipeExtractor:INSERT_VERSION_HERE'` to the `dependencies` in your `build.gradle`. Replace `INSERT_VERSION_HERE` with the [latest release](https://github.com/TeamNewPipe/NewPipeExtractor/releases/latest).
3. If you are using tools to minimize your project, make sure to keep the files below, by e.g. adding the following lines to your proguard file:
```
## Rules for NewPipeExtractor
-keep class org.schabi.newpipe.extractor.timeago.patterns.** { *; }
-keep class org.mozilla.javascript.** { *; }
-keep class org.mozilla.classfile.ClassFileWriter
-dontwarn org.mozilla.javascript.tools.**
```
**Note:** To use NewPipe Extractor in projects with a `minSdkVersion` below 26, [API desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring) is required.
**Note:** To use NewPipe Extractor in Android projects with a `minSdk` below 33, [core library desugaring](https://developer.android.com/studio/write/java8-support#library-desugaring) with the `desugar_jdk_libs_nio` artifact is required.
### Testing changes
@ -22,7 +30,7 @@ To test changes quickly you can build the library locally. A good approach would
```groovy
includeBuild('../NewPipeExtractor') {
dependencySubstitution {
substitute module('com.github.TeamNewPipe:NewPipeExtractor') with project(':extractor')
substitute module('com.github.teamnewpipe:NewPipeExtractor') with project(':extractor')
}
}
```
@ -32,7 +40,7 @@ Another approach would be to use the local Maven repository, here's a gist of ho
1. Add `mavenLocal()` in your project `repositories` list (usually as the first entry to give priority above the others).
2. It's _recommended_ that you change the `version` of this library (e.g. `LOCAL_SNAPSHOT`).
3. Run gradle's `ìnstall` task to deploy this library to your local repository (using the wrapper, present in the root of this project: `./gradlew install`)
4. Change the dependency version used in your project to match the one you chose in step 2 (`implementation 'com.github.TeamNewPipe:NewPipeExtractor:LOCAL_SNAPSHOT'`)
4. Change the dependency version used in your project to match the one you chose in step 2 (`implementation 'com.github.teamnewpipe:NewPipeExtractor:LOCAL_SNAPSHOT'`)
> Tip for Android Studio users: After you make changes and run the `install` task, use the menu option `File → "Sync with File System"` to refresh the library in your project.
@ -50,7 +58,7 @@ The following sites are currently supported: