Deployed b864491
with MkDocs version: 0.17.2
This commit is contained in:
parent
0048ae5687
commit
ec3fc6b143
|
@ -71,7 +71,7 @@
|
|||
|
||||
<li class="toctree-l1">
|
||||
|
||||
<a class="" href="../01_Basic_concept_of_the_extractor/">01 Basic concept of the extractor</a>
|
||||
<a class="" href="../01_Concept_of_the_extractor/">01 Concept of the extractor</a>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
@ -146,7 +146,7 @@ If all the checks are green you did everything right, and you are good to go to
|
|||
|
||||
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
|
||||
|
||||
<a href="../01_Basic_concept_of_the_extractor/" class="btn btn-neutral float-right" title="01 Basic concept of the extractor">Next <span class="icon icon-circle-arrow-right"></span></a>
|
||||
<a href="../01_Concept_of_the_extractor/" class="btn btn-neutral float-right" title="01 Concept of the extractor">Next <span class="icon icon-circle-arrow-right"></span></a>
|
||||
|
||||
|
||||
<a href=".." class="btn btn-neutral" title="Welcome to NewPipe Tutorial"><span class="icon icon-circle-arrow-left"></span> Previous</a>
|
||||
|
@ -178,7 +178,7 @@ If all the checks are green you did everything right, and you are good to go to
|
|||
<span><a href=".." style="color: #fcfcfc;">« Previous</a></span>
|
||||
|
||||
|
||||
<span style="margin-left: 15px"><a href="../01_Basic_concept_of_the_extractor/" style="color: #fcfcfc">Next »</a></span>
|
||||
<span style="margin-left: 15px"><a href="../01_Concept_of_the_extractor/" style="color: #fcfcfc">Next »</a></span>
|
||||
|
||||
</span>
|
||||
</div>
|
||||
|
|
|
@ -8,7 +8,7 @@
|
|||
|
||||
|
||||
<link rel="shortcut icon" href="../img/favicon.ico">
|
||||
<title>Basic Concept of the Extractor - NewPipe Tutorial</title>
|
||||
<title>Concept of the Extractor - NewPipe Tutorial</title>
|
||||
<link rel="stylesheet" href="../css/local_fonts.css" type="text/css" />
|
||||
|
||||
<link rel="stylesheet" href="../css/theme.css" type="text/css" />
|
||||
|
@ -17,9 +17,9 @@
|
|||
|
||||
<script>
|
||||
// Current page data
|
||||
var mkdocs_page_name = "Basic Concept of the Extractor";
|
||||
var mkdocs_page_input_path = "01_Basic_concept_of_the_extractor.md";
|
||||
var mkdocs_page_url = "/01_Basic_concept_of_the_extractor/";
|
||||
var mkdocs_page_name = "Concept of the Extractor";
|
||||
var mkdocs_page_input_path = "01_Concept_of_the_extractor.md";
|
||||
var mkdocs_page_url = "/01_Concept_of_the_extractor/";
|
||||
</script>
|
||||
|
||||
<script src="../js/jquery-2.1.1.min.js"></script>
|
||||
|
@ -59,15 +59,17 @@
|
|||
|
||||
<li class="toctree-l1 current">
|
||||
|
||||
<a class="current" href="./">Basic Concept of the Extractor</a>
|
||||
<a class="current" href="./">Concept of the Extractor</a>
|
||||
<ul class="subnav">
|
||||
|
||||
<li class="toctree-l2"><a href="#basic-concept-of-the-extractor">Basic Concept of the Extractor</a></li>
|
||||
<li class="toctree-l2"><a href="#concept-of-the-extractor">Concept of the Extractor</a></li>
|
||||
|
||||
<ul>
|
||||
|
||||
<li><a class="toctree-l3" href="#collectorextractor-pattern">Collector/Extractor pattern</a></li>
|
||||
|
||||
<li><a class="toctree-l3" href="#collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</a></li>
|
||||
|
||||
</ul>
|
||||
|
||||
|
||||
|
@ -96,7 +98,7 @@
|
|||
|
||||
|
||||
|
||||
<li>Basic Concept of the Extractor</li>
|
||||
<li>Concept of the Extractor</li>
|
||||
<li class="wy-breadcrumbs-aside">
|
||||
|
||||
</li>
|
||||
|
@ -106,7 +108,7 @@
|
|||
<div role="main">
|
||||
<div class="section">
|
||||
|
||||
<h1 id="basic-concept-of-the-extractor">Basic Concept of the Extractor</h1>
|
||||
<h1 id="concept-of-the-extractor">Concept of the Extractor</h1>
|
||||
<h2 id="collectorextractor-pattern">Collector/Extractor pattern</h2>
|
||||
<p>Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
|
||||
you will find all over the code. It is called the <strong>extractor/collector</strong> pattern. The idea behind this pattern is that
|
||||
|
@ -121,13 +123,52 @@ You need to take care of the extractors.</p>
|
|||
<pre><code class="java">Info info;
|
||||
try {
|
||||
// Create a new Extractor with a given context provided as parameter.
|
||||
Extractor extractor = new Extractor(ome_meta_info);
|
||||
Extractor extractor = new Extractor(some_meta_info);
|
||||
// Retrieves the data form extractor and builds info package.
|
||||
info = Info.getInfo(extractor);
|
||||
} catch(Exception e) {
|
||||
// handle errors when collector decided to break up extraction
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<h3 id="typical-implementation-of-a-single-data-extractor">Typical implementation of a single data extractor</h3>
|
||||
<p>The typical implementation of a single data extractor on the other hand would look like this:</p>
|
||||
<pre><code class="java">class MyExtractor extends FutureExtractor {
|
||||
|
||||
public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
|
||||
super(requiredInfo, forExtraction);
|
||||
|
||||
...
|
||||
}
|
||||
|
||||
@Override
|
||||
public void fetch() {
|
||||
// Actually fetch the page data here
|
||||
}
|
||||
|
||||
@Override
|
||||
public String someDataFiled()
|
||||
throws ExtractionException { //The exception needs to be thrown if someting failed
|
||||
// get piece of information and return it
|
||||
}
|
||||
|
||||
... // More datafields
|
||||
}
|
||||
</code></pre>
|
||||
|
||||
<h2 id="collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</h2>
|
||||
<p>Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
|
||||
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>. In order
|
||||
to get such items a <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html">InfoItemsCollector</a>
|
||||
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html#commit-E-">commit()</a>.</p>
|
||||
<p><img alt="InfoItemsCollector_objectdiagram.svg" src="../img/InfoItemsCollector_objectdiagram.svg" /></p>
|
||||
<p>When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
|
||||
or its creator. Such info can be called <strong>list header</strong>.</p>
|
||||
<p>Also if you open a list in a web browser the website usually does not load the whole list, but only a part
|
||||
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
|
||||
NewPipe is coped down into <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.NextItemsResult.html">InfoItemPage</a>s. Each Page has its own URL, and needs to be extracted separately.</p>
|
||||
<p>List header information and extracting multiple pages of an InfoItem list can be handled by a
|
||||
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a></p>
|
||||
|
||||
</div>
|
||||
</div>
|
2
404.html
2
404.html
|
@ -52,7 +52,7 @@
|
|||
|
||||
<li class="toctree-l1">
|
||||
|
||||
<a class="" href="/01_Basic_concept_of_the_extractor/">01 Basic concept of the extractor</a>
|
||||
<a class="" href="/01_Concept_of_the_extractor/">01 Concept of the extractor</a>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
|
|
@ -0,0 +1,39 @@
|
|||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
|
||||
<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
|
||||
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
|
||||
<tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
|
||||
<tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
|
||||
<tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
|
||||
</g>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
|
||||
</svg>
|
After Width: | Height: | Size: 2.9 KiB |
|
@ -71,7 +71,7 @@
|
|||
|
||||
<li class="toctree-l1">
|
||||
|
||||
<a class="" href="01_Basic_concept_of_the_extractor/">01 Basic concept of the extractor</a>
|
||||
<a class="" href="01_Concept_of_the_extractor/">01 Concept of the extractor</a>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
@ -161,5 +161,5 @@ This however is not the <a href="https://teamnewpipe.github.io/NewPipeExtractor/
|
|||
|
||||
<!--
|
||||
MkDocs version : 0.17.2
|
||||
Build Date UTC : 2018-02-23 20:18:58
|
||||
Build Date UTC : 2018-02-24 21:17:40
|
||||
-->
|
||||
|
|
|
@ -52,7 +52,7 @@
|
|||
|
||||
<li class="toctree-l1">
|
||||
|
||||
<a class="" href="01_Basic_concept_of_the_extractor/">01 Basic concept of the extractor</a>
|
||||
<a class="" href="01_Concept_of_the_extractor/">01 Concept of the extractor</a>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
|
|
|
@ -41,24 +41,34 @@
|
|||
"title": "What you need to have"
|
||||
},
|
||||
{
|
||||
"location": "/01_Basic_concept_of_the_extractor/",
|
||||
"text": "Basic Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(ome_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}",
|
||||
"title": "Basic Concept of the Extractor"
|
||||
"location": "/01_Concept_of_the_extractor/",
|
||||
"text": "Concept of the Extractor\n\n\nCollector/Extractor pattern\n\n\nBefore we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the \nextractor/collector\n pattern. The idea behind this pattern is that\nthe \nextractor\n\nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.\n\n\nUsage in the front end\n\n\nSo typical call for retrieving data from a website would look like this:\n\n\nInfo info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(some_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}\n\n\n\n\nTypical implementation of a single data extractor\n\n\nThe typical implementation of a single data extractor on the other hand would look like this:\n\n\nclass MyExtractor extends FutureExtractor {\n\n public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n super(requiredInfo, forExtraction);\n\n ...\n }\n\n @Override\n public void fetch() {\n // Actually fetch the page data here\n }\n\n @Override\n public String someDataFiled() \n throws ExtractionException { //The exception needs to be thrown if someting failed\n // get piece of information and return it\n }\n\n ... // More datafields\n}\n\n\n\n\nCollector/Extractor pattern for lists\n\n\nSometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called\n\nInfoItem\n. In order\nto get such items a \nInfoItemsCollector\n\nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via \ncommit()\n.\n\n\n\n\nWhen a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called \nlist header\n.\n\n\nAlso if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into \nInfoItemPage\ns. Each Page has its own URL, and needs to be extracted separately.\n\n\nList header information and extracting multiple pages of an InfoItem list can be handled by a\n\nListExtractor",
|
||||
"title": "Concept of the Extractor"
|
||||
},
|
||||
{
|
||||
"location": "/01_Basic_concept_of_the_extractor/#basic-concept-of-the-extractor",
|
||||
"location": "/01_Concept_of_the_extractor/#concept-of-the-extractor",
|
||||
"text": "",
|
||||
"title": "Basic Concept of the Extractor"
|
||||
"title": "Concept of the Extractor"
|
||||
},
|
||||
{
|
||||
"location": "/01_Basic_concept_of_the_extractor/#collectorextractor-pattern",
|
||||
"location": "/01_Concept_of_the_extractor/#collectorextractor-pattern",
|
||||
"text": "Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern\nyou will find all over the code. It is called the extractor/collector pattern. The idea behind this pattern is that\nthe extractor \nwould produce single peaces of data, and the collector would take it and form usable data for the front end out of it.\nThe collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any\npoint the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of\nmany small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.\nYou need to take care of the extractors.",
|
||||
"title": "Collector/Extractor pattern"
|
||||
},
|
||||
{
|
||||
"location": "/01_Basic_concept_of_the_extractor/#usage-in-the-front-end",
|
||||
"text": "So typical call for retrieving data from a website would look like this: Info info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(ome_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}",
|
||||
"location": "/01_Concept_of_the_extractor/#usage-in-the-front-end",
|
||||
"text": "So typical call for retrieving data from a website would look like this: Info info;\ntry {\n // Create a new Extractor with a given context provided as parameter.\n Extractor extractor = new Extractor(some_meta_info);\n // Retrieves the data form extractor and builds info package.\n info = Info.getInfo(extractor);\n} catch(Exception e) {\n // handle errors when collector decided to break up extraction\n}",
|
||||
"title": "Usage in the front end"
|
||||
},
|
||||
{
|
||||
"location": "/01_Concept_of_the_extractor/#typical-implementation-of-a-single-data-extractor",
|
||||
"text": "The typical implementation of a single data extractor on the other hand would look like this: class MyExtractor extends FutureExtractor {\n\n public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {\n super(requiredInfo, forExtraction);\n\n ...\n }\n\n @Override\n public void fetch() {\n // Actually fetch the page data here\n }\n\n @Override\n public String someDataFiled() \n throws ExtractionException { //The exception needs to be thrown if someting failed\n // get piece of information and return it\n }\n\n ... // More datafields\n}",
|
||||
"title": "Typical implementation of a single data extractor"
|
||||
},
|
||||
{
|
||||
"location": "/01_Concept_of_the_extractor/#collectorextractor-pattern-for-lists",
|
||||
"text": "Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called InfoItem . In order\nto get such items a InfoItemsCollector \nis used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via commit() . When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail\nor its creator. Such info can be called list header . Also if you open a list in a web browser the website usually does not load the whole list, but only a part\nof it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in\nNewPipe is coped down into InfoItemPage s. Each Page has its own URL, and needs to be extracted separately. List header information and extracting multiple pages of an InfoItem list can be handled by a ListExtractor",
|
||||
"title": "Collector/Extractor pattern for lists"
|
||||
}
|
||||
]
|
||||
}
|
|
@ -4,7 +4,7 @@
|
|||
|
||||
<url>
|
||||
<loc>/</loc>
|
||||
<lastmod>2018-02-23</lastmod>
|
||||
<lastmod>2018-02-24</lastmod>
|
||||
<changefreq>daily</changefreq>
|
||||
</url>
|
||||
|
||||
|
@ -12,15 +12,15 @@
|
|||
|
||||
<url>
|
||||
<loc>/00_Prepare_everything/</loc>
|
||||
<lastmod>2018-02-23</lastmod>
|
||||
<lastmod>2018-02-24</lastmod>
|
||||
<changefreq>daily</changefreq>
|
||||
</url>
|
||||
|
||||
|
||||
|
||||
<url>
|
||||
<loc>/01_Basic_concept_of_the_extractor/</loc>
|
||||
<lastmod>2018-02-23</lastmod>
|
||||
<loc>/01_Concept_of_the_extractor/</loc>
|
||||
<lastmod>2018-02-24</lastmod>
|
||||
<changefreq>daily</changefreq>
|
||||
</url>
|
||||
|
||||
|
|
Loading…
Reference in New Issue