add more details about list extraction

2019-07-02 14:23:20 +02:00 · 2019-07-02 14:23:20 +02:00 · 875f536656
parent b2e5672aec
commit 875f536656
2 changed files with 61 additions and 19 deletions
--- a/docs/00_Prepare_everything.md
+++ b/docs/00_Prepare_everything.md
@ -1,9 +1,8 @@
 # Before You Start
-These documents will guide you through the process of creating your own Extractor
+These documents will guide you through the process of understanding or creating your own Extractor
-service of which will enable NewPipe to access additional streaming services, such as the currently supported YouTube and SoundCloud.
+service of which will enable NewPipe to access additional streaming services, such as the currently supported YouTube, SoundCloud and MediaCCC.
-The whole documentation consists of this page, which explains the general concept of the NewPipeExtractor, as well as our
+The whole documentation consists of this page and [Jdoc](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/) setup, which explains the general concept of the NewPipeExtractor.
 [Jdoc](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/) setup.
 __IMPORTANT!!!__ This is likely to be the worst documentation you have ever read, so do not hesitate to
 [report](https://github.com/teamnewpipe/documentation/issues) if
--- a/docs/01_Concept_of_the_extractor.md
+++ b/docs/01_Concept_of_the_extractor.md
@ -57,19 +57,20 @@ class MyExtractor extends FutureExtractor {
 Information can be represented as a list. In NewPipe, a list is represented by a
 [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html).
-A InfoItemCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html).
+A InfoItemsCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html).
-For each item that should be extracted, a new Extractor must be created, and given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
+For each item that should be extracted, a new Extractor must be created, and given to the InfoItemsCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
 ![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
-If you are implementing a list for your service you need to extend InfoItem containing the extracted information
+If you are implementing a list in your service you need to implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html),
-and implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html),
+that will be able to retreve data for one and only one InfoItem. This extractor will then be _comitted_ to the __InfoItemsCollector__ that can collect the type of InfoItems you want to generate.
 that will return the data of one InfoItem.
 A common implementation would look like this:
 ```
-private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
+private SomeInfoItemCollector collectInfoItemsFromElement(Element e) {
-    MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());
+    // See *Some* as something like Stream or Channel
    // e.g. StreamInfoItemsCollector, and ChannelInfoItemsCollector are provided by NP
    SomeInfoItemCollector collector = new SomeInfoItemCollector(getServiceId());
    for(final Element li : element.children()) {
        collector.commit(new InfoItemExtractor() {
@ -90,20 +91,21 @@ private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
 ```
-## InfoItems Encapsulated in Pages
+## ListExtractor
-When a streaming site shows a list of items, it usually offers some additional information about that list like its title, a thumbnail,
+There is more to know about lists:
 1. When a streaming site shows a list of items, it usually offers some additional information about that list like its title, a thumbnail,
 and its creator. Such info can be called __list header__.
-When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. 
+2. When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down.
-This is why a list in NewPipe lists are chopped down into smaller lists called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s. Each page has its own URL, and needs to be extracted separately.
+Both of these Problems are fixed by the [ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html) which takes care about extracting additional metadata about the liast,
 and by chopping down lists into several pages, so called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s.
 Each page has its own URL, and needs to be extracted separately.
 Additional metadata about the list and extracting multiple pages can be handled by a
 [ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html),
 and its [ListExtractor.InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html).
-For extracting list header information it behaves like a regular extractor. For handling `InfoItemsPages` it adds methods
+For extracting list header information a `ListExtractor` behaves like a regular extractor. For handling `InfoItemsPages` it adds methods
 such as:
 - [getInitialPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--)
@ -117,5 +119,46 @@ such as:
 The reason why the first page is handled special is because many Websites such as YouTube will load the first page of
 items like a regular web page, but all the others as an AJAX request.
 An InfoItemsPage itself has two constructors which take these parameters:
 - The __InfoitemsCollector__ of the list that the page should represent
 - A __nextPageUrl__ which represents the url of the following page (may be null if not page follows).
 - Optionally __errors__ which is a list of Exceptions that may have happened during extracton.
 Here is a simplified reference implementation of a list extractor that only extracts pages, but not metadata:
 ```
 class MyListExtractor extends ListExtractor {
    ...
    private Document document;
    ...
    public InfoItemsPage<SomeInfoItem> getPage(pageUrl)
        throws ExtractionException {
        SomeInfoItemCollector collector = new SomeInfoItemCollector(getServiceId());
        document = myFunctionToGetThePageHTMLWhatever(pageUrl);
        //remember this part from the simple list extraction
        for(final Element li : document.children()) {
            collector.commit(new InfoItemExtractor() {
                @Override
                public String getName() throws ParsingException {
                    ...
                }
                @Override
                public String getUrl() throws ParsingException {
                    ...
                }
                ...
        }
        return new InfoItemsPage<SomeInfoItem>(collector, myFunctionToGetTheNextPageUrl(document));
    }
    public InfoItemsPage<SomeInfoItem> getInitialPage() {
        //document here got initialzied by the fetch() function.
        return getPage(getTheCurrentPageUrl(document));
    }
    ... 
 }
 ```