Basic Concept of the Extractor
+Concept of the Extractor
Collector/Extractor pattern
Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern you will find all over the code. It is called the extractor/collector pattern. The idea behind this pattern is that @@ -121,13 +123,52 @@ You need to take care of the extractors.
Info info;
try {
// Create a new Extractor with a given context provided as parameter.
- Extractor extractor = new Extractor(ome_meta_info);
+ Extractor extractor = new Extractor(some_meta_info);
// Retrieves the data form extractor and builds info package.
info = Info.getInfo(extractor);
} catch(Exception e) {
// handle errors when collector decided to break up extraction
}
+
+Typical implementation of a single data extractor
+The typical implementation of a single data extractor on the other hand would look like this:
+class MyExtractor extends FutureExtractor {
+
+ public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
+ super(requiredInfo, forExtraction);
+
+ ...
+ }
+
+ @Override
+ public void fetch() {
+ // Actually fetch the page data here
+ }
+
+ @Override
+ public String someDataFiled()
+ throws ExtractionException { //The exception needs to be thrown if someting failed
+ // get piece of information and return it
+ }
+
+ ... // More datafields
+}
+
+
+Collector/Extractor pattern for lists
+Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called +InfoItem. In order +to get such items a InfoItemsCollector +is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via commit().
+ +When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail +or its creator. Such info can be called list header.
+Also if you open a list in a web browser the website usually does not load the whole list, but only a part +of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in +NewPipe is coped down into InfoItemPages. Each Page has its own URL, and needs to be extracted separately.
+List header information and extracting multiple pages of an InfoItem list can be handled by a +ListExtractor