enhance list extractor description
This commit is contained in:
parent
0917de4a70
commit
49e343d82d
Binary file not shown.
|
@ -55,25 +55,68 @@ class MyExtractor extends FutureExtractor {
|
||||||
|
|
||||||
## Collector/Extractor pattern for lists
|
## Collector/Extractor pattern for lists
|
||||||
|
|
||||||
Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
|
Sometimes information can be represented as a list. In NewPipe a list is represented by a
|
||||||
[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order
|
[InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html).
|
||||||
to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html)
|
A InfoItemCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html).
|
||||||
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
|
For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-).
|
||||||
|
|
||||||
![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
|
![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg)
|
||||||
|
|
||||||
When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
|
If you are implementing a list for your service you need to extend InfoItem containing the extracted information,
|
||||||
|
and implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html)
|
||||||
|
that will return the data of one InfoItem.
|
||||||
|
|
||||||
|
A common Implementation would look like this:
|
||||||
|
```
|
||||||
|
private MyInfoItemCollector collectInfoItemsFromElement(Element e) {
|
||||||
|
MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId());
|
||||||
|
|
||||||
|
for(final Element li : element.children()) {
|
||||||
|
collector.commit(new InfoItemExtractor() {
|
||||||
|
@Override
|
||||||
|
public String getName() throws ParsingException {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getUrl() throws ParsingException {
|
||||||
|
...
|
||||||
|
}
|
||||||
|
|
||||||
|
...
|
||||||
|
}
|
||||||
|
return collector;
|
||||||
|
}
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## InfoItems encapsulated in pages
|
||||||
|
|
||||||
|
When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail
|
||||||
or its creator. Such info can be called __list header__.
|
or its creator. Such info can be called __list header__.
|
||||||
|
|
||||||
Also if you open a list in a web browser the website usually does not load the whole list, but only a part
|
When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down.
|
||||||
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
|
|
||||||
NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html)s. Each Page has its own URL, and needs to be extracted separately.
|
|
||||||
|
|
||||||
List header information and extracting multiple pages of an InfoItem list can be handled by a
|
|
||||||
[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html)
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
This is why a list in NewPipe lists are chopped down into smaller lists called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s. Each page has its own URL, and needs to be extracted separately.
|
||||||
|
|
||||||
|
Additional metainformation about the list such as it's title a thumbnail
|
||||||
|
or its creator, and extracting multiple pages can be handled by a
|
||||||
|
[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html),
|
||||||
|
and it's [ListExtractor.InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html).
|
||||||
|
|
||||||
|
For extracting list header information it behaves like a regular extractor. For handling `InfoItemsPages` it adds methods
|
||||||
|
such as:
|
||||||
|
|
||||||
|
- [getInitialPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--)
|
||||||
|
which will return the first page of InfoItems.
|
||||||
|
- [getNextPageUrl()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--)
|
||||||
|
If a second Page of InfoItems is available this will return the URL pointing to them.
|
||||||
|
- [getPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-)
|
||||||
|
returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the `getNextPageUrl()` method of the previous page.
|
||||||
|
|
||||||
|
|
||||||
|
The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of
|
||||||
|
items like a regular webpage, but all the others as AJAX request.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -1,39 +1,39 @@
|
||||||
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
|
||||||
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
|
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/PR-SVG-20010719/DTD/svg10.dtd">
|
||||||
<svg width="20cm" height="8cm" viewBox="199 199 382 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
<svg width="22cm" height="8cm" viewBox="199 199 435 159" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||||||
<g>
|
<g>
|
||||||
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
|
<rect style="fill: #ffffff" x="200" y="260" width="141.3" height="36"/>
|
||||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
|
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="200" y="260" width="141.3" height="36"/>
|
||||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
|
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="270.65" y="281.9">
|
||||||
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
|
<tspan x="270.65" y="281.9">:InfoItemsCollector</tspan>
|
||||||
</text>
|
</text>
|
||||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
|
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="210" y1="284.9" x2="331.3" y2="284.9"/>
|
||||||
</g>
|
</g>
|
||||||
<g>
|
<g>
|
||||||
<rect style="fill: #ffffff" x="400" y="200" width="179.25" height="36"/>
|
<rect style="fill: #ffffff" x="400" y="200" width="232.65" height="36"/>
|
||||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="179.25" height="36"/>
|
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="200" width="232.65" height="36"/>
|
||||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="221.9">
|
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="221.9">
|
||||||
<tspan x="489.625" y="221.9">itemExtractor1:Extractor</tspan>
|
<tspan x="516.325" y="221.9">itemExtractor1:InfoItemExtractor</tspan>
|
||||||
</text>
|
</text>
|
||||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="569.25" y2="224.9"/>
|
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="224.9" x2="622.65" y2="224.9"/>
|
||||||
</g>
|
</g>
|
||||||
<g>
|
<g>
|
||||||
<rect style="fill: #ffffff" x="400" y="260" width="179.25" height="36"/>
|
<rect style="fill: #ffffff" x="400" y="260" width="232.65" height="36"/>
|
||||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="179.25" height="36"/>
|
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="260" width="232.65" height="36"/>
|
||||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="281.9">
|
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="281.9">
|
||||||
<tspan x="489.625" y="281.9">itemExtractor2:Extractor</tspan>
|
<tspan x="516.325" y="281.9">itemExtractor2:InfoItemExtractor</tspan>
|
||||||
</text>
|
</text>
|
||||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="569.25" y2="284.9"/>
|
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="284.9" x2="622.65" y2="284.9"/>
|
||||||
</g>
|
</g>
|
||||||
<g>
|
<g>
|
||||||
<rect style="fill: #ffffff" x="400" y="320" width="179.25" height="36"/>
|
<rect style="fill: #ffffff" x="400" y="320" width="232.65" height="36"/>
|
||||||
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="179.25" height="36"/>
|
<rect style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" x="400" y="320" width="232.65" height="36"/>
|
||||||
<text font-size="12.7998" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="489.625" y="341.9">
|
<text font-size="12.8" style="fill: #000000;text-anchor:middle;font-family:sans-serif;font-style:normal;font-weight:normal" x="516.325" y="341.9">
|
||||||
<tspan x="489.625" y="341.9">itemExtractor3:Extractor</tspan>
|
<tspan x="516.325" y="341.9">itemExtractor3:InfoItemExtractor</tspan>
|
||||||
</text>
|
</text>
|
||||||
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="569.25" y2="344.9"/>
|
<line style="fill: none; fill-opacity:0; stroke-width: 1; stroke: #000000" x1="410" y1="344.9" x2="622.65" y2="344.9"/>
|
||||||
</g>
|
</g>
|
||||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,218 398.994,218 "/>
|
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,218 398.993,218 "/>
|
||||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.994,278 398.994,278 "/>
|
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 343.309,278 397.993,278 398.993,278 "/>
|
||||||
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.652,278 370.652,338 398.994,338 "/>
|
<polyline style="fill: none; fill-opacity:0; stroke-width: 2; stroke: #000000" points="342.309,278 370.651,278 370.651,338 398.993,338 "/>
|
||||||
</svg>
|
</svg>
|
||||||
|
|
Before Width: | Height: | Size: 2.9 KiB After Width: | Height: | Size: 2.9 KiB |
Loading…
Reference in New Issue