-

Basic Concept of the Extractor

+

Concept of the Extractor

Collector/Extractor pattern

Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern you will find all over the code. It is called the extractor/collector pattern. The idea behind this pattern is that @@ -121,13 +123,52 @@ You need to take care of the extractors.

Info info;
 try {
     // Create a new Extractor with a given context provided as parameter.
-    Extractor extractor = new Extractor(ome_meta_info);
+    Extractor extractor = new Extractor(some_meta_info);
     // Retrieves the data form extractor and builds info package.
     info = Info.getInfo(extractor);
 } catch(Exception e) {
     // handle errors when collector decided to break up extraction
 }
 
+ +

Typical implementation of a single data extractor

+

The typical implementation of a single data extractor on the other hand would look like this:

+
class MyExtractor extends FutureExtractor {
+
+    public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
+        super(requiredInfo, forExtraction);
+
+        ...
+    }
+
+    @Override
+    public void fetch() {
+        // Actually fetch the page data here
+    }
+
+    @Override
+    public String someDataFiled() 
+        throws ExtractionException {    //The exception needs to be thrown if someting failed
+        // get piece of information and return it
+    }
+
+    ...                                 // More datafields
+}
+
+ +

Collector/Extractor pattern for lists

+

Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called +InfoItem. In order +to get such items a InfoItemsCollector +is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via commit().

+

InfoItemsCollector_objectdiagram.svg

+

When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail +or its creator. Such info can be called list header.

+

Also if you open a list in a web browser the website usually does not load the whole list, but only a part +of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in +NewPipe is coped down into InfoItemPages. Each Page has its own URL, and needs to be extracted separately.

+

List header information and extracting multiple pages of an InfoItem list can be handled by a +ListExtractor