add more information about lits collector/extractor pattern

2018-02-24 22:18:12 +01:00 · 2018-02-24 22:18:12 +01:00 · ce698b4d9c
parent b86449162a
commit ce698b4d9c
4 changed files with 118 additions and 35 deletions
--- a/assets/InfoItemsCollector_objectdiagram.dia
+++ b/assets/InfoItemsCollector_objectdiagram.dia
--- a/docs/01_Basic_concept_of_the_extractor.md
+++ b/docs/01_Basic_concept_of_the_extractor.md
@ -1,35 +0,0 @@
-# Basic Concept of the Extractor
-
-## Collector/Extractor pattern
-
-Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
-you will find all over the code. It is called the __extractor/collector__ pattern. The idea behind this pattern is that
-the [extractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
-would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.
-The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any
-point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of
-many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.
-You need to take care of the extractors.
-
-### Usage in the front end
-
-So typical call for retrieving data from a website would look like this:
-```java
-Info info;
-try {
-    // Create a new Extractor with a given context provided as parameter.
-    Extractor extractor = new Extractor(ome_meta_info);
-    // Retrieves the data form extractor and builds info package.
-    info = Info.getInfo(extractor);
-} catch(Exception e) {
-    // handle errors when collector decided to break up extraction
-}
-```
-
-
-
-
-
-
-
-
--- a/docs/01_Concept_of_the_extractor.md
+++ b/docs/01_Concept_of_the_extractor.md
@ -0,0 +1,79 @@
+# Concept of the Extractor
+
+## Collector/Extractor pattern
+
+Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
+you will find all over the code. It is called the __extractor/collector__ pattern. The idea behind this pattern is that
+the [extractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
+would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.
+The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any
+point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of
+many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.
+You need to take care of the extractors.
+
+### Usage in the front end
+
+So typical call for retrieving data from a website would look like this:
+```java
+Info info;
+try {
+    // Create a new Extractor with a given context provided as parameter.
+    Extractor extractor = new Extractor(some_meta_info);
+    // Retrieves the data form extractor and builds info package.
+    info = Info.getInfo(extractor);
+} catch(Exception e) {
+    // handle errors when collector decided to break up extraction
+}
+```
+
+### Typical implementation of a single data extractor
+
+The typical implementation of a single data extractor on the other hand would look like this:
+```java
+class MyExtractor extends FutureExtractor {
+
+    public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
+        super(requiredInfo, forExtraction);
+
+        ...
+    }
+
+    @Override
+    public void fetch() {
+        // Actually fetch the page data here
+    }
+
+    @Override
+    public String someDataFiled() 
+        throws ExtractionException {    //The exception needs to be thrown if someting failed
+        // get piece of information and return it
+    }
+
+    ...                                 // More datafields
+}
+```
+
+## Collector/Extractor pattern for lists
+
+Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
+[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order
+to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html)
+is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html#commit-E-).
+
+![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
+
+When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
+or its creator. Such info can be called __list header__.
+
+Also if you open a list in a web browser the website usually does not load the whole list, but only a part
+of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
+NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.NextItemsResult.html)s. Each Page has its own URL, and needs to be extracted separately.
+
+List header information and extracting multiple pages of an InfoItem list can be handled by a
+[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html)
+
+
+
+
+
+
--- a/docs/img/InfoItemsCollector_objectdiagram.svg
+++ b/docs/img/InfoItemsCollector_objectdiagram.svg
@ -0,0 +1,39 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
+<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+  <g>
+    <rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
+    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
+      <tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
+    </text>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
+  </g>
+  <g>
+    <rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
+    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
+      <tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
+    </text>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
+  </g>
+  <g>
+    <rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
+    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
+      <tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
+    </text>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
+  </g>
+  <g>
+    <rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
+    <rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
+    <text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
+      <tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
+    </text>
+    <line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
+  </g>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
+  <polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
+</svg>