diff --git a/assets/InfoItemsCollector_objectdiagram.dia b/assets/InfoItemsCollector_objectdiagram.dia index f99ccaf..5b0c0a4 100644 Binary files a/assets/InfoItemsCollector_objectdiagram.dia and b/assets/InfoItemsCollector_objectdiagram.dia differ diff --git a/docs/01_Concept_of_the_extractor.md b/docs/01_Concept_of_the_extractor.md index eb3b7b1..4862404 100644 --- a/docs/01_Concept_of_the_extractor.md +++ b/docs/01_Concept_of_the_extractor.md @@ -55,25 +55,68 @@ class MyExtractor extends FutureExtractor { ## Collector/Extractor pattern for lists -Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called -[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order -to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html) -is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-). +Sometimes information can be represented as a list. In NewPipe a list is represented by a +[InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html). +A InfoItemCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). +For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-). ![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg) -When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail +If you are implementing a list for your service you need to extend InfoItem containing the extracted information, +and implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html) +that will return the data of one InfoItem. + +A common Implementation would look like this: +``` +private MyInfoItemCollector collectInfoItemsFromElement(Element e) { + MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId()); + + for(final Element li : element.children()) { + collector.commit(new InfoItemExtractor() { + @Override + public String getName() throws ParsingException { + ... + } + + @Override + public String getUrl() throws ParsingException { + ... + } + + ... + } + return collector; +} + +``` + +## InfoItems encapsulated in pages + +When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail or its creator. Such info can be called __list header__. -Also if you open a list in a web browser the website usually does not load the whole list, but only a part -of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in -NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html)s. Each Page has its own URL, and needs to be extracted separately. - -List header information and extracting multiple pages of an InfoItem list can be handled by a -[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html) - - +When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. +This is why a list in NewPipe lists are chopped down into smaller lists called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s. Each page has its own URL, and needs to be extracted separately. + +Additional metainformation about the list such as it's title a thumbnail +or its creator, and extracting multiple pages can be handled by a +[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html), +and it's [ListExtractor.InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html). + +For extracting list header information it behaves like a regular extractor. For handling `InfoItemsPages` it adds methods +such as: + + - [getInitialPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--) + which will return the first page of InfoItems. + - [getNextPageUrl()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--) + If a second Page of InfoItems is available this will return the URL pointing to them. + - [getPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-) + returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the `getNextPageUrl()` method of the previous page. + + +The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of +items like a regular webpage, but all the others as AJAX request. diff --git a/docs/img/InfoItemsCollector_objectdiagram.svg b/docs/img/InfoItemsCollector_objectdiagram.svg index 2f986a9..d661de9 100644 --- a/docs/img/InfoItemsCollector_objectdiagram.svg +++ b/docs/img/InfoItemsCollector_objectdiagram.svg @@ -1,39 +1,39 @@ - + - + :InfoItemsCollector - - - - itemExtractor1:Extractor + + + + itemExtractor1:InfoItemExtractor - + - - - - itemExtractor2:Extractor + + + + itemExtractor2:InfoItemExtractor - + - - - - itemExtractor3:Extractor + + + + itemExtractor3:InfoItemExtractor - + - - - + + +