enhance list extractor description
This commit is contained in:
parent
0917de4a70
commit
49e343d82d
Binary file not shown.
|
@ -55,25 +55,68 @@ class MyExtractor extends FutureExtractor {
|
|||
|
||||
## Collector/Extractor pattern for lists
|
||||
|
||||
Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
|
||||
[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order
|
||||
to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html)
|
||||
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
|
||||
Sometimes information can be represented as a list. In NewPipe a list is represented by a
|
||||
[InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html).
|
||||
A InfoItemCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html).
|
||||
For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
|
||||
|
||||
![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
|
||||
|
||||
When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
|
||||
If you are implementing a list for your service you need to extend InfoItem containing the extracted information,
|
||||
and implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
|
||||
that will return the data of one InfoItem.
|
||||
|
||||
A common Implementation would look like this:
|
||||
```
|
||||
private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
|
||||
MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());
|
||||
|
||||
for(final Element li : element.children()) {
|
||||
collector.commit(new InfoItemExtractor() {
|
||||
@Override
|
||||
public String getName() throws ParsingException {
|
||||
...
|
||||
}
|
||||
|
||||
@Override
|
||||
public String getUrl() throws ParsingException {
|
||||
...
|
||||
}
|
||||
|
||||
...
|
||||
}
|
||||
return collector;
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## InfoItems encapsulated in pages
|
||||
|
||||
When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail
|
||||
or its creator. Such info can be called __list header__.
|
||||
|
||||
Also if you open a list in a web browser the website usually does not load the whole list, but only a part
|
||||
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
|
||||
NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html)s. Each Page has its own URL, and needs to be extracted separately.
|
||||
|
||||
List header information and extracting multiple pages of an InfoItem list can be handled by a
|
||||
[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html)
|
||||
|
||||
|
||||
When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down.
|
||||
|
||||
This is why a list in NewPipe lists are chopped down into smaller lists called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s. Each page has its own URL, and needs to be extracted separately.
|
||||
|
||||
Additional metainformation about the list such as it's title a thumbnail
|
||||
or its creator, and extracting multiple pages can be handled by a
|
||||
[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html),
|
||||
and it's [ListExtractor.InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html).
|
||||
|
||||
For extracting list header information it behaves like a regular extractor. For handling `InfoItemsPages` it adds methods
|
||||
such as:
|
||||
|
||||
- [getInitialPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--)
|
||||
which will return the first page of InfoItems.
|
||||
- [getNextPageUrl()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--)
|
||||
If a second Page of InfoItems is available this will return the URL pointing to them.
|
||||
- [getPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-)
|
||||
returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the `getNextPageUrl()` method of the previous page.
|
||||
|
||||
|
||||
The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of
|
||||
items like a regular webpage, but all the others as AJAX request.
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -1,39 +1,39 @@
|
|||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
|
||||
<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<svg width="22cm" height="8cm" viewBox="199 199 435 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
|
||||
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
|
||||
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
|
||||
<tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
|
||||
<rect style="fill: #ffffff" x="400" y="200" width="232.65" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="232.65" height="36"/>
|
||||
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="221.9">
|
||||
<tspan x="516.325" y="221.9">itemExtractor1:InfoItemExtractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="622.65" y2="224.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
|
||||
<tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
|
||||
<rect style="fill: #ffffff" x="400" y="260" width="232.65" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="232.65" height="36"/>
|
||||
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="281.9">
|
||||
<tspan x="516.325" y="281.9">itemExtractor2:InfoItemExtractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="622.65" y2="284.9"/>
|
||||
</g>
|
||||
<g>
|
||||
<rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
|
||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
|
||||
<tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
|
||||
<rect style="fill: #ffffff" x="400" y="320" width="232.65" height="36"/>
|
||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="232.65" height="36"/>
|
||||
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="341.9">
|
||||
<tspan x="516.325" y="341.9">itemExtractor3:InfoItemExtractor</tspan>
|
||||
</text>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
|
||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="622.65" y2="344.9"/>
|
||||
</g>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,218 398.993,218 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.993,278 398.993,278 "/>
|
||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,338 398.993,338 "/>
|
||||
</svg>
|
||||
|
|
Before Width: | Height: | Size: 2.9 KiB After Width: | Height: | Size: 2.9 KiB |
Loading…
Reference in New Issue