Deployed 0917de4 with MkDocs version: 0.17.2

2018-03-26 07:47:05 +01:00 · 2018-03-26 07:47:05 +01:00 · dcb4fb81ee
parent 8551991e20
commit dcb4fb81ee
5 changed files with 83 additions and 36 deletions
--- a/01_Concept_of_the_extractor/index.html
+++ b/01_Concept_of_the_extractor/index.html
@ -70,6 +70,8 @@
        
            <li><a class="toctree-l3" href="#collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</a></li>
        
+            <li><a class="toctree-l3" href="#infoitems-encapsulated-in-pages">InfoItems encapsulated in pages</a></li>
+        
        </ul>
    

@ -157,18 +159,58 @@ try {
 </code></pre>

 <h2 id="collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</h2>
-<p>Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
-<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>. In order
-to get such items a <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>
-is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>
+<p>Sometimes information can be represented as a list. In NewPipe a list is represented by a
+<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>.
+A InfoItemCollector will collect and assemble a list of <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>.
+For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>
 <p><img alt="InfoItemsCollector_objectdiagram.svg" src="../img/InfoItemsCollector_objectdiagram.svg" /></p>
-<p>When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
+<p>If you are implementing a list for your service you need to extend InfoItem containing the extracted information,
+and implement an <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html">InfoItemExtractor</a>
+that will return the data of one InfoItem.</p>
+<p>A common Implementation would look like this:</p>
+<pre><code>private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
+    MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());
+
+    for(final Element li : element.children()) {
+        collector.commit(new InfoItemExtractor() {
+            @Override
+            public String getName() throws ParsingException {
+                ...
+            }
+
+            @Override
+            public String getUrl() throws ParsingException {
+                ...
+            }
+
+            ...
+    }
+    return collector;
+}
+
+</code></pre>
+
+<h2 id="infoitems-encapsulated-in-pages">InfoItems encapsulated in pages</h2>
+<p>When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail
 or its creator. Such info can be called <strong>list header</strong>.</p>
-<p>Also if you open a list in a web browser the website usually does not load the whole list, but only a part
-of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
-NewPipe is coped down into <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html">InfoItemPage</a>s. Each Page has its own URL, and needs to be extracted separately.</p>
-<p>List header information and extracting multiple pages of an InfoItem list can be handled by a
-<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a></p>
+<p>When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. </p>
+<p>This is why a list in NewPipe lists are chopped down into smaller lists called <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html">InfoItemsPage</a>s. Each page has its own URL, and needs to be extracted separately.</p>
+<p>Additional metainformation about the list such as it's title a thumbnail
+or its creator, and extracting multiple pages can be handled by a
+<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a>,
+and it's <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html">ListExtractor.InfoItemsPage</a>.</p>
+<p>For extracting list header information it behaves like a regular extractor. For handling <code>InfoItemsPages</code> it adds methods
+such as:</p>
+<ul>
+<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--">getInitialPage()</a>
+   which will return the first page of InfoItems.</li>
+<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--">getNextPageUrl()</a>
+   If a second Page of InfoItems is available this will return the URL pointing to them.</li>
+<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-">getPage()</a>
+   returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the <code>getNextPageUrl()</code> method of the previous page.</li>
+</ul>
+<p>The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of
+items like a regular webpage, but all the others as AJAX request.</p>
              
            </div>
          </div>
--- a/img/InfoItemsCollector_objectdiagram.svg
+++ b/img/InfoItemsCollector_objectdiagram.svg
@ -1,39 +1,39 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
-<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<svg width="22cm" height="8cm" viewBox="199 199 435 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
  <g>
    <rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
-    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
+    <text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
      <tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
    </text>
    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
  </g>
  <g>
-    <rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
-    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
-    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
-      <tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
+    <rect style="fill: #ffffff" x="400" y="200" width="232.65" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="232.65" height="36"/>
+    <text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="221.9">
+      <tspan x="516.325" y="221.9">itemExtractor1:InfoItemExtractor</tspan>
    </text>
-    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="622.65" y2="224.9"/>
  </g>
  <g>
-    <rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
-    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
-    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
-      <tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
+    <rect style="fill: #ffffff" x="400" y="260" width="232.65" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="232.65" height="36"/>
+    <text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="281.9">
+      <tspan x="516.325" y="281.9">itemExtractor2:InfoItemExtractor</tspan>
    </text>
-    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="622.65" y2="284.9"/>
  </g>
  <g>
-    <rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
-    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
-    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
-      <tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
+    <rect style="fill: #ffffff" x="400" y="320" width="232.65" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="232.65" height="36"/>
+    <text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="341.9">
+      <tspan x="516.325" y="341.9">itemExtractor3:InfoItemExtractor</tspan>
    </text>
-    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="622.65" y2="344.9"/>
  </g>
-  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
-  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
-  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,218 398.993,218 "/>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.993,278 398.993,278 "/>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,338 398.993,338 "/>
 </svg>
--- a/index.html
+++ b/index.html
@ -161,5 +161,5 @@ This however is not the <a href="https://teamnewpipe.github.io/NewPipeExtractor/

 <!--
 MkDocs version : 0.17.2
-Build Date UTC : 2018-02-24 22:20:22
+Build Date UTC : 2018-03-26 06:47:05
 -->
--- a/search/search_index.json
+++ b/search/search_index.json
@ -42,7 +42,7 @@
        },
        {
            "location": "/01_Concept_of_the_extractor/",
-            "text": "Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n    // Create a new Extractor with a given context provided as parameter.\n    Extractor extractor = new Extractor(some_meta_info);\n    // Retrieves the data form extractor and builds info package.\n    info = Info.getInfo(extractor);\n} catch(Exception e) {\n    // handle errors when collector decided to break up extraction\n}\n\n\n\n\nTypical implementation of a single data extractor\n\n\nThe typical implementation of a single data extractor on the other hand would look like this:\n\n\nclass MyExtractor extends FutureExtractor {\n\n    public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n        super(requiredInfo, forExtraction);\n\n        ...\n    }\n\n    @Override\n    public void fetch() {\n        // Actually fetch the page data here\n    }\n\n    @Override\n    public String someDataFiled() \n        throws ExtractionException {    //The exception needs to be thrown if someting failed\n        // get piece of information and return it\n    }\n\n    ...                                 // More datafields\n}\n\n\n\n\nCollector/Extractor pattern for lists\n\n\nSometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called\n\nInfoItem\n. In order\nto get such items a \nInfoItemsCollector\n\nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via \ncommit()\n.\n\n\n\n\nWhen a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called \nlist header\n.\n\n\nAlso if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into \nInfoItemPage\ns. Each Page has its own URL, and needs to be extracted separately.\n\n\nList header information and extracting multiple pages of an InfoItem list can be handled by a\n\nListExtractor",
+            "text": "Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n    // Create a new Extractor with a given context provided as parameter.\n    Extractor extractor = new Extractor(some_meta_info);\n    // Retrieves the data form extractor and builds info package.\n    info = Info.getInfo(extractor);\n} catch(Exception e) {\n    // handle errors when collector decided to break up extraction\n}\n\n\n\n\nTypical implementation of a single data extractor\n\n\nThe typical implementation of a single data extractor on the other hand would look like this:\n\n\nclass MyExtractor extends FutureExtractor {\n\n    public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n        super(requiredInfo, forExtraction);\n\n        ...\n    }\n\n    @Override\n    public void fetch() {\n        // Actually fetch the page data here\n    }\n\n    @Override\n    public String someDataFiled() \n        throws ExtractionException {    //The exception needs to be thrown if someting failed\n        // get piece of information and return it\n    }\n\n    ...                                 // More datafields\n}\n\n\n\n\nCollector/Extractor pattern for lists\n\n\nSometimes information can be represented as a list. In NewPipe a list is represented by a\n\nInfoItemsCollector\n.\nA InfoItemCollector will collect and assemble a list of \nInfoItem\n.\nFor each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via \ncommit()\n.\n\n\n\n\nIf you are implementing a list for your service you need to extend InfoItem containing the extracted information,\nand implement an \nInfoItemExtractor\n\nthat will return the data of one InfoItem.\n\n\nA common Implementation would look like this:\n\n\nprivate MyInfoItemCollector collectInfoItemsFromElement(Element e) {\n    MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());\n\n    for(final Element li : element.children()) {\n        collector.commit(new InfoItemExtractor() {\n            @Override\n            public String getName() throws ParsingException {\n                ...\n            }\n\n            @Override\n            public String getUrl() throws ParsingException {\n                ...\n            }\n\n            ...\n    }\n    return collector;\n}\n\n\n\n\n\nInfoItems encapsulated in pages\n\n\nWhen a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail\nor its creator. Such info can be called \nlist header\n.\n\n\nWhen a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. \n\n\nThis is why a list in NewPipe lists are chopped down into smaller lists called \nInfoItemsPage\ns. Each page has its own URL, and needs to be extracted separately.\n\n\nAdditional metainformation about the list such as it's title a thumbnail\nor its creator, and extracting multiple pages can be handled by a\n\nListExtractor\n,\nand it's \nListExtractor.InfoItemsPage\n.\n\n\nFor extracting list header information it behaves like a regular extractor. For handling \nInfoItemsPages\n it adds methods\nsuch as:\n\n\n\n\ngetInitialPage()\n\n   which will return the first page of InfoItems.\n\n\ngetNextPageUrl()\n\n   If a second Page of InfoItems is available this will return the URL pointing to them.\n\n\ngetPage()\n\n   returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the \ngetNextPageUrl()\n method of the previous page.\n\n\n\n\nThe reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of\nitems like a regular webpage, but all the others as AJAX request.",
            "title": "Concept of the Extractor"
        },
        {
@ -67,8 +67,13 @@
        },
        {
            "location": "/01_Concept_of_the_extractor/#collectorextractor-pattern-for-lists",
-            "text": "Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called InfoItem . In order\nto get such items a  InfoItemsCollector \nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via  commit() .   When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called  list header .  Also if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into  InfoItemPage s. Each Page has its own URL, and needs to be extracted separately.  List header information and extracting multiple pages of an InfoItem list can be handled by a ListExtractor",
+            "text": "Sometimes information can be represented as a list. In NewPipe a list is represented by a InfoItemsCollector .\nA InfoItemCollector will collect and assemble a list of  InfoItem .\nFor each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via  commit() .   If you are implementing a list for your service you need to extend InfoItem containing the extracted information,\nand implement an  InfoItemExtractor \nthat will return the data of one InfoItem.  A common Implementation would look like this:  private MyInfoItemCollector collectInfoItemsFromElement(Element e) {\n    MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());\n\n    for(final Element li : element.children()) {\n        collector.commit(new InfoItemExtractor() {\n            @Override\n            public String getName() throws ParsingException {\n                ...\n            }\n\n            @Override\n            public String getUrl() throws ParsingException {\n                ...\n            }\n\n            ...\n    }\n    return collector;\n}",
            "title": "Collector/Extractor pattern for lists"
+        },
+        {
+            "location": "/01_Concept_of_the_extractor/#infoitems-encapsulated-in-pages",
+            "text": "When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail\nor its creator. Such info can be called  list header .  When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down.   This is why a list in NewPipe lists are chopped down into smaller lists called  InfoItemsPage s. Each page has its own URL, and needs to be extracted separately.  Additional metainformation about the list such as it's title a thumbnail\nor its creator, and extracting multiple pages can be handled by a ListExtractor ,\nand it's  ListExtractor.InfoItemsPage .  For extracting list header information it behaves like a regular extractor. For handling  InfoItemsPages  it adds methods\nsuch as:   getInitialPage() \n   which will return the first page of InfoItems.  getNextPageUrl() \n   If a second Page of InfoItems is available this will return the URL pointing to them.  getPage() \n   returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the  getNextPageUrl()  method of the previous page.   The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of\nitems like a regular webpage, but all the others as AJAX request.",
+            "title": "InfoItems encapsulated in pages"
        }
    ]
 }
--- a/sitemap.xml
+++ b/sitemap.xml
@ -4,7 +4,7 @@
    
    <url>
     <loc>/</loc>
-     <lastmod>2018-02-24</lastmod>
+     <lastmod>2018-03-26</lastmod>
     <changefreq>daily</changefreq>
    </url>
    
@ -12,7 +12,7 @@
    
    <url>
     <loc>/00_Prepare_everything/</loc>
-     <lastmod>2018-02-24</lastmod>
+     <lastmod>2018-03-26</lastmod>
     <changefreq>daily</changefreq>
    </url>
    
@ -20,7 +20,7 @@
    
    <url>
     <loc>/01_Concept_of_the_extractor/</loc>
-     <lastmod>2018-02-24</lastmod>
+     <lastmod>2018-03-26</lastmod>
     <changefreq>daily</changefreq>
    </url>