Deployed 0917de4 with MkDocs version: 0.17.2

This commit is contained in:
Christian Schabesberger 2018-03-26 07:47:05 +01:00
parent 8551991e20
commit dcb4fb81ee
5 changed files with 83 additions and 36 deletions

View File

@ -70,6 +70,8 @@
<li><a class="toctree-l3" href="#collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</a></li>
<li><a class="toctree-l3" href="#infoitems-encapsulated-in-pages">InfoItems encapsulated in pages</a></li>
</ul>
@ -157,18 +159,58 @@ try {
</code></pre>
<h2 id="collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</h2>
<p>Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>. In order
to get such items a <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>
<p>Sometimes information can be represented as a list. In NewPipe a list is represented by a
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>.
A InfoItemCollector will collect and assemble a list of <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>.
For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>
<p><img alt="InfoItemsCollector_objectdiagram.svg" src="../img/InfoItemsCollector_objectdiagram.svg" /></p>
<p>When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
<p>If you are implementing a list for your service you need to extend InfoItem containing the extracted information,
and implement an <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html">InfoItemExtractor</a>
that will return the data of one InfoItem.</p>
<p>A common Implementation would look like this:</p>
<pre><code>private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());
for(final Element li : element.children()) {
collector.commit(new InfoItemExtractor() {
@Override
public String getName() throws ParsingException {
...
}
@Override
public String getUrl() throws ParsingException {
...
}
...
}
return collector;
}
</code></pre>
<h2 id="infoitems-encapsulated-in-pages">InfoItems encapsulated in pages</h2>
<p>When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail
or its creator. Such info can be called <strong>list header</strong>.</p>
<p>Also if you open a list in a web browser the website usually does not load the whole list, but only a part
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
NewPipe is coped down into <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html">InfoItemPage</a>s. Each Page has its own URL, and needs to be extracted separately.</p>
<p>List header information and extracting multiple pages of an InfoItem list can be handled by a
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a></p>
<p>When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. </p>
<p>This is why a list in NewPipe lists are chopped down into smaller lists called <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html">InfoItemsPage</a>s. Each page has its own URL, and needs to be extracted separately.</p>
<p>Additional metainformation about the list such as it's title a thumbnail
or its creator, and extracting multiple pages can be handled by a
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a>,
and it's <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html">ListExtractor.InfoItemsPage</a>.</p>
<p>For extracting list header information it behaves like a regular extractor. For handling <code>InfoItemsPages</code> it adds methods
such as:</p>
<ul>
<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--">getInitialPage()</a>
which will return the first page of InfoItems.</li>
<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--">getNextPageUrl()</a>
If a second Page of InfoItems is available this will return the URL pointing to them.</li>
<li><a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-">getPage()</a>
returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the <code>getNextPageUrl()</code> method of the previous page.</li>
</ul>
<p>The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of
items like a regular webpage, but all the others as AJAX request.</p>
</div>
</div>

View File

@ -1,39 +1,39 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<svg width="22cm" height="8cm" viewBox="199 199 435 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g>
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
<tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
<rect style="fill: #ffffff" x="400" y="200" width="232.65" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="232.65" height="36"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="221.9">
<tspan x="516.325" y="221.9">itemExtractor1:InfoItemExtractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="622.65" y2="224.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
<tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
<rect style="fill: #ffffff" x="400" y="260" width="232.65" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="232.65" height="36"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="281.9">
<tspan x="516.325" y="281.9">itemExtractor2:InfoItemExtractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="622.65" y2="284.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
<tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
<rect style="fill: #ffffff" x="400" y="320" width="232.65" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="232.65" height="36"/>
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="341.9">
<tspan x="516.325" y="341.9">itemExtractor3:InfoItemExtractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="622.65" y2="344.9"/>
</g>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,218 398.993,218 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.993,278 398.993,278 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,338 398.993,338 "/>
</svg>

Before

Width:  |  Height:  |  Size: 2.9 KiB

After

Width:  |  Height:  |  Size: 2.9 KiB

View File

@ -161,5 +161,5 @@ This however is not the <a href="https://teamnewpipe.github.io/NewPipeExtractor/
<!--
MkDocs version : 0.17.2
Build Date UTC : 2018-02-24 22:20:22
Build Date UTC : 2018-03-26 06:47:05
-->

View File

@ -42,7 +42,7 @@
},
{
"location": "/01_Concept_of_the_extractor/",
"text": "Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(some_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}\n\n\n\n\nTypical implementation of a single data extractor\n\n\nThe typical implementation of a single data extractor on the other hand would look like this:\n\n\nclass MyExtractor extends FutureExtractor {\n\n public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n super(requiredInfo, forExtraction);\n\n ...\n }\n\n @Override\n public void fetch() {\n // Actually fetch the page data here\n }\n\n @Override\n public String someDataFiled() \n throws ExtractionException { //The exception needs to be thrown if someting failed\n // get piece of information and return it\n }\n\n ... // More datafields\n}\n\n\n\n\nCollector/Extractor pattern for lists\n\n\nSometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called\n\nInfoItem\n. In order\nto get such items a \nInfoItemsCollector\n\nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via \ncommit()\n.\n\n\n\n\nWhen a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called \nlist header\n.\n\n\nAlso if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into \nInfoItemPage\ns. Each Page has its own URL, and needs to be extracted separately.\n\n\nList header information and extracting multiple pages of an InfoItem list can be handled by a\n\nListExtractor",
"text": "Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(some_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}\n\n\n\n\nTypical implementation of a single data extractor\n\n\nThe typical implementation of a single data extractor on the other hand would look like this:\n\n\nclass MyExtractor extends FutureExtractor {\n\n public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n super(requiredInfo, forExtraction);\n\n ...\n }\n\n @Override\n public void fetch() {\n // Actually fetch the page data here\n }\n\n @Override\n public String someDataFiled() \n throws ExtractionException { //The exception needs to be thrown if someting failed\n // get piece of information and return it\n }\n\n ... // More datafields\n}\n\n\n\n\nCollector/Extractor pattern for lists\n\n\nSometimes information can be represented as a list. In NewPipe a list is represented by a\n\nInfoItemsCollector\n.\nA InfoItemCollector will collect and assemble a list of \nInfoItem\n.\nFor each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via \ncommit()\n.\n\n\n\n\nIf you are implementing a list for your service you need to extend InfoItem containing the extracted information,\nand implement an \nInfoItemExtractor\n\nthat will return the data of one InfoItem.\n\n\nA common Implementation would look like this:\n\n\nprivate MyInfoItemCollector collectInfoItemsFromElement(Element e) {\n MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());\n\n for(final Element li : element.children()) {\n collector.commit(new InfoItemExtractor() {\n @Override\n public String getName() throws ParsingException {\n ...\n }\n\n @Override\n public String getUrl() throws ParsingException {\n ...\n }\n\n ...\n }\n return collector;\n}\n\n\n\n\n\nInfoItems encapsulated in pages\n\n\nWhen a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail\nor its creator. Such info can be called \nlist header\n.\n\n\nWhen a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. \n\n\nThis is why a list in NewPipe lists are chopped down into smaller lists called \nInfoItemsPage\ns. Each page has its own URL, and needs to be extracted separately.\n\n\nAdditional metainformation about the list such as it's title a thumbnail\nor its creator, and extracting multiple pages can be handled by a\n\nListExtractor\n,\nand it's \nListExtractor.InfoItemsPage\n.\n\n\nFor extracting list header information it behaves like a regular extractor. For handling \nInfoItemsPages\n it adds methods\nsuch as:\n\n\n\n\ngetInitialPage()\n\n which will return the first page of InfoItems.\n\n\ngetNextPageUrl()\n\n If a second Page of InfoItems is available this will return the URL pointing to them.\n\n\ngetPage()\n\n returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the \ngetNextPageUrl()\n method of the previous page.\n\n\n\n\nThe reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of\nitems like a regular webpage, but all the others as AJAX request.",
"title": "Concept of the Extractor"
},
{
@ -67,8 +67,13 @@
},
{
"location": "/01_Concept_of_the_extractor/#collectorextractor-pattern-for-lists",
"text": "Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called InfoItem . In order\nto get such items a InfoItemsCollector \nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via commit() . When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called list header . Also if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into InfoItemPage s. Each Page has its own URL, and needs to be extracted separately. List header information and extracting multiple pages of an InfoItem list can be handled by a ListExtractor",
"text": "Sometimes information can be represented as a list. In NewPipe a list is represented by a InfoItemsCollector .\nA InfoItemCollector will collect and assemble a list of InfoItem .\nFor each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via commit() . If you are implementing a list for your service you need to extend InfoItem containing the extracted information,\nand implement an InfoItemExtractor \nthat will return the data of one InfoItem. A common Implementation would look like this: private MyInfoItemCollector collectInfoItemsFromElement(Element e) {\n MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());\n\n for(final Element li : element.children()) {\n collector.commit(new InfoItemExtractor() {\n @Override\n public String getName() throws ParsingException {\n ...\n }\n\n @Override\n public String getUrl() throws ParsingException {\n ...\n }\n\n ...\n }\n return collector;\n}",
"title": "Collector/Extractor pattern for lists"
},
{
"location": "/01_Concept_of_the_extractor/#infoitems-encapsulated-in-pages",
"text": "When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail\nor its creator. Such info can be called list header . When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. This is why a list in NewPipe lists are chopped down into smaller lists called InfoItemsPage s. Each page has its own URL, and needs to be extracted separately. Additional metainformation about the list such as it's title a thumbnail\nor its creator, and extracting multiple pages can be handled by a ListExtractor ,\nand it's ListExtractor.InfoItemsPage . For extracting list header information it behaves like a regular extractor. For handling InfoItemsPages it adds methods\nsuch as: getInitialPage() \n which will return the first page of InfoItems. getNextPageUrl() \n If a second Page of InfoItems is available this will return the URL pointing to them. getPage() \n returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the getNextPageUrl() method of the previous page. The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of\nitems like a regular webpage, but all the others as AJAX request.",
"title": "InfoItems encapsulated in pages"
}
]
}

View File

@ -4,7 +4,7 @@
<url>
<loc>/</loc>
<lastmod>2018-02-24</lastmod>
<lastmod>2018-03-26</lastmod>
<changefreq>daily</changefreq>
</url>
@ -12,7 +12,7 @@
<url>
<loc>/00_Prepare_everything/</loc>
<lastmod>2018-02-24</lastmod>
<lastmod>2018-03-26</lastmod>
<changefreq>daily</changefreq>
</url>
@ -20,7 +20,7 @@
<url>
<loc>/01_Concept_of_the_extractor/</loc>
<lastmod>2018-02-24</lastmod>
<lastmod>2018-03-26</lastmod>
<changefreq>daily</changefreq>
</url>