add more information about lits collector/extractor pattern

This commit is contained in:
Christian Schabesberger 2018-02-24 22:18:12 +01:00
parent b86449162a
commit ce698b4d9c
4 changed files with 118 additions and 35 deletions

Binary file not shown.

View File

@ -1,35 +0,0 @@
# Basic Concept of the Extractor
## Collector/Extractor pattern
Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
you will find all over the code. It is called the __extractor/collector__ pattern. The idea behind this pattern is that
the [extractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.
The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any
point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of
many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.
You need to take care of the extractors.
### Usage in the front end
So typical call for retrieving data from a website would look like this:
```java
Info info;
try {
// Create a new Extractor with a given context provided as parameter.
Extractor extractor = new Extractor(ome_meta_info);
// Retrieves the data form extractor and builds info package.
info = Info.getInfo(extractor);
} catch(Exception e) {
// handle errors when collector decided to break up extraction
}
```

View File

@ -0,0 +1,79 @@
# Concept of the Extractor
## Collector/Extractor pattern
Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
you will find all over the code. It is called the __extractor/collector__ pattern. The idea behind this pattern is that
the [extractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.
The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any
point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of
many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.
You need to take care of the extractors.
### Usage in the front end
So typical call for retrieving data from a website would look like this:
```java
Info info;
try {
// Create a new Extractor with a given context provided as parameter.
Extractor extractor = new Extractor(some_meta_info);
// Retrieves the data form extractor and builds info package.
info = Info.getInfo(extractor);
} catch(Exception e) {
// handle errors when collector decided to break up extraction
}
```
### Typical implementation of a single data extractor
The typical implementation of a single data extractor on the other hand would look like this:
```java
class MyExtractor extends FutureExtractor {
public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
super(requiredInfo, forExtraction);
...
}
@Override
public void fetch() {
// Actually fetch the page data here
}
@Override
public String someDataFiled()
throws ExtractionException { //The exception needs to be thrown if someting failed
// get piece of information and return it
}
... // More datafields
}
```
## Collector/Extractor pattern for lists
Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order
to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html)
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemCollector.html#commit-E-).
![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
or its creator. Such info can be called __list header__.
Also if you open a list in a web browser the website usually does not load the whole list, but only a part
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.NextItemsResult.html)s. Each Page has its own URL, and needs to be extracted separately.
List header information and extracting multiple pages of an InfoItem list can be handled by a
[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html)

View File

@ -0,0 +1,39 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g>
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
<tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
<tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
</g>
<g>
<rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
<tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
</text>
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
</g>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
</svg>

After

Width:  |  Height:  |  Size: 2.9 KiB