From 49e343d82d59c70d272c8fb98cfcdd461dda0bee Mon Sep 17 00:00:00 2001 From: Christian Schabesberger Date: Mon, 26 Mar 2018 08:47:15 +0200 Subject: [PATCH] enhance list extractor description --- assets/InfoItemsCollector_objectdiagram.dia | Bin 1263 -> 1262 bytes docs/01_Concept_of_the_extractor.md | 69 ++++++++++++++---- docs/img/InfoItemsCollector_objectdiagram.svg | 40 +++++----- 3 files changed, 76 insertions(+), 33 deletions(-) diff --git a/assets/InfoItemsCollector_objectdiagram.dia b/assets/InfoItemsCollector_objectdiagram.dia index f99ccafc4a963b5cd2c847d2a32532b818a84d84..5b0c0a4ef25e5932ac092e879ab2b8337bc9bbd9 100644 GIT binary patch delta 1150 zcmV-^1cCeS3GNAhABzY8000000t4-vNspsA6oBvbD-d<9CM0>&tsMGQ&-@ZngVf%#95k|=QHR4S936HKVI26!w# zE|Nq+|>g=mIh)$CD!z^i^=e4pNWCby~{a zaF%i;FlT7JzdLdYnf-usmaX>N-(RE;(${22KC~db0YUCq{;KAERa_lc#nWKPSu1kb zQRX0( zQxz?5OV!#LV{pr+ak7bMc`xKUR8{S;o_&A%^h_vS_(3CO-|BA=i~|< z_w(mqK|V9+hrbhyc?ux3rUgGyx9VI)0#;=^9+wutM|j7l2!c?SLbi+ZE6yT(h`ff&aQ?WJ6h;ypI41pN}GX!P`%n+DyZOqWG zff+{cn34XL@yZ#)yXOo;aE9Ov!5KNu&;@6_A!mr6GX!S{&Jdg-I74v8^*O`124`6P zbB39Jycs$I4lM(YVdpa%$-#-?lV&JRpWG3jJ&4Wl$TeSL3UM4OG1{`TCX_9w$vT>^ zu0qbv{@eLdIrWSjU*IrapxWB2@%jiw`7l0z6~d>gb^UXVN?OEwND{GpEjoAV@sx!* z2?D-R_?8Y!g>wImwB(_X6&gm8b4k+S+p%ny7FzMFmOZQ0KD$qIg$Z3VSj~G2R^xpJ zE0?n_k#iJ#Kk~CM^LLoOG387`=NJWi%a*loCbNNL5@zd7F(i+kbkzg!qiYiMm zm=hNT-dO(j@1GxR`P1a#ql>^Z|M4jBCH_PZ#&=_R$=K?Dc`(>)HVWQ_fDx)-6e%Gb z{0A@w11>a>Cl8WztiS~f#ADegV2q+!#Gn)aACBc2nE&)Bi2_$nr82oW!Gua{fXDLV zBDv&2re=`Sb3*qDcyLA`_*t)7<(K2MQN4n+sMcSR5OEQGMQA!g#vH# z_?!8dPAMCIIOwXHXsr?E1L~omY=`>>r;gASTi12&49l?{b!3jTy-%8(o88tNzO6ZW zTXQI!t_WomAXYXxBLqVbq*fV?V0$~^9ALi1PA#j$Di(+_a%TPufWu1`kPg2_Thqw( zC~{8+Zcd^`%Lch@IoXoT2sp?#L-*_5;pYw%Tuhe~~^&Uy~vE(1P#=1i54RtD5&!adli3PlF|At;k_V zsZW}JQCxjFcho7L&5F7ypC|}C3{(A%;#g0b&OcwYk)mn*##C&pokbrqAfMUiF5-@A zr5$4EPv?XN<@Vz`4oeIqIjZcmeSR7#>d|kTg_dgcE)%Ggy$Bf zDq7x_sa{hC;RAdD|Ck#FIfJL+7t6lCg}oTQD;$rU>8 z=g+}{d}h!OeVzgBLqNtmg$%CI|6|K2i19*ue*IPSBYpQH zx~Z79rpIYjkcJ=)K^lTI1ZfD;xHf6%*B}j}chX3|&Ui%)^X^f@5U3$gL!d?-YUlzr z-T*bkml^^!1ZoJ>5U3$guu2D&gcn?V;maj$Ujz6BVFegF4Hwxdr4qboeW0 zw9-N=p4GBvwc2O*X|6D#YX+-zZ^3H3&tT#i8z`rMF0TEzd{-S diff --git a/docs/01_Concept_of_the_extractor.md b/docs/01_Concept_of_the_extractor.md index eb3b7b1..4862404 100644 --- a/docs/01_Concept_of_the_extractor.md +++ b/docs/01_Concept_of_the_extractor.md @@ -55,25 +55,68 @@ class MyExtractor extends FutureExtractor { ## Collector/Extractor pattern for lists -Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called -[InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). In order -to get such items a [InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html) -is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-). +Sometimes information can be represented as a list. In NewPipe a list is represented by a +[InfoItemsCollector](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html). +A InfoItemCollector will collect and assemble a list of [InfoItem](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html). +For each item that should be extracted a new Extractor must be created, and given to the InfoItemCollector via [commit()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-). ![InfoItemsCollector_objectdiagram.svg](img/InfoItemsCollector_objectdiagram.svg) -When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail +If you are implementing a list for your service you need to extend InfoItem containing the extracted information, +and implement an [InfoItemExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html) +that will return the data of one InfoItem. + +A common Implementation would look like this: +``` +private MyInfoItemCollector collectInfoItemsFromElement(Element e) { + MyInfoItemCollector collector = new MyInfoItemCollector(getServiceId()); + + for(final Element li : element.children()) { + collector.commit(new InfoItemExtractor() { + @Override + public String getName() throws ParsingException { + ... + } + + @Override + public String getUrl() throws ParsingException { + ... + } + + ... + } + return collector; +} + +``` + +## InfoItems encapsulated in pages + +When a streaming site shows a list of items it usually offers some additional information about that list, like it's title a thumbnail or its creator. Such info can be called __list header__. -Also if you open a list in a web browser the website usually does not load the whole list, but only a part -of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in -NewPipe is coped down into [InfoItemPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html)s. Each Page has its own URL, and needs to be extracted separately. - -List header information and extracting multiple pages of an InfoItem list can be handled by a -[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html) - - +When a website shows a long list of items it usually does not load the whole list, but only a part of it. In order to get more items you may have to click on a next page button, or scroll down. +This is why a list in NewPipe lists are chopped down into smaller lists called [InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html)s. Each page has its own URL, and needs to be extracted separately. + +Additional metainformation about the list such as it's title a thumbnail +or its creator, and extracting multiple pages can be handled by a +[ListExtractor](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html), +and it's [ListExtractor.InfoItemsPage](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemsPage.html). + +For extracting list header information it behaves like a regular extractor. For handling `InfoItemsPages` it adds methods +such as: + + - [getInitialPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getInitialPage--) + which will return the first page of InfoItems. + - [getNextPageUrl()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getNextPageUrl--) + If a second Page of InfoItems is available this will return the URL pointing to them. + - [getPage()](https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html#getPage-java.lang.String-) + returns a ListExtractor.InfoItemsPage by its URL which was retrieved by the `getNextPageUrl()` method of the previous page. + + +The reason why the first page is handled speciall is because many Websites such as Youtube will load the first page of +items like a regular webpage, but all the others as AJAX request. diff --git a/docs/img/InfoItemsCollector_objectdiagram.svg b/docs/img/InfoItemsCollector_objectdiagram.svg index 2f986a9..d661de9 100644 --- a/docs/img/InfoItemsCollector_objectdiagram.svg +++ b/docs/img/InfoItemsCollector_objectdiagram.svg @@ -1,39 +1,39 @@ - + - + :InfoItemsCollector - - - - itemExtractor1:Extractor + + + + itemExtractor1:InfoItemExtractor - + - - - - itemExtractor2:Extractor + + + + itemExtractor2:InfoItemExtractor - + - - - - itemExtractor3:Extractor + + + + itemExtractor3:InfoItemExtractor - + - - - + + +