newpipe-documentation/01_Concept_of_the_extractor/index.html

<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  
  
  <link rel="shortcut icon" href="../img/favicon.ico">
  <title>Concept of the Extractor - NewPipe Tutorial</title>
  <link rel="stylesheet" href="../css/local_fonts.css" type="text/css" />

  <link rel="stylesheet" href="../css/theme.css" type="text/css" />
  <link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />
  <link rel="stylesheet" href="../css/highlight.css" type="text/css" />
  
  <script>
    // Current page data
    var mkdocs_page_name = "Concept of the Extractor";
    var mkdocs_page_input_path = "01_Concept_of_the_extractor.md";
    var mkdocs_page_url = "/01_Concept_of_the_extractor/";
  </script>
  
  <script src="../js/jquery-2.1.1.min.js"></script>
  <script src="../js/modernizr-2.8.3.min.js"></script>
  <script type="text/javascript" src="../js/highlight.pack.js"></script> 
  
</head>

<body class="wy-body-for-nav" role="document">

  <div class="wy-grid-for-nav">

    
    <nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">
      <div class="wy-side-nav-search">
        <a href=".." class="icon icon-home"> NewPipe Tutorial</a>
        <div role="search">
  <form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
  </form>
</div>
      </div>

      <div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
	<ul class="current">
	  
          
            <li class="toctree-l1">
		
    <a class="" href="..">Welcome to NewPipe Tutorial</a>
	    </li>
          
            <li class="toctree-l1">
		
    <a class="" href="../00_Prepare_everything/">Prepare everything</a>
	    </li>
          
            <li class="toctree-l1 current">
		
    <a class="current" href="./">Concept of the Extractor</a>
    <ul class="subnav">
            
    <li class="toctree-l2"><a href="#concept-of-the-extractor">Concept of the Extractor</a></li>
    
        <ul>
        
            <li><a class="toctree-l3" href="#collectorextractor-pattern">Collector/Extractor pattern</a></li>
        
            <li><a class="toctree-l3" href="#collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</a></li>
        
        </ul>
    

    </ul>
	    </li>
          
        </ul>
      </div>
      &nbsp;
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">

      
      <nav class="wy-nav-top" role="navigation" aria-label="top navigation">
        <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
        <a href="..">NewPipe Tutorial</a>
      </nav>

      
      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="breadcrumbs navigation">
  <ul class="wy-breadcrumbs">
    <li><a href="..">Docs</a> &raquo;</li>
    
      
    
    <li>Concept of the Extractor</li>
    <li class="wy-breadcrumbs-aside">
      
    </li>
  </ul>
  <hr/>
</div>
          <div role="main">
            <div class="section">
              
                <h1 id="concept-of-the-extractor">Concept of the Extractor</h1>
<h2 id="collectorextractor-pattern">Collector/Extractor pattern</h2>
<p>Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern
you will find all over the code. It is called the <strong>extractor/collector</strong> pattern. The idea behind this pattern is that
the <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html">extractor</a>
would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.
The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any
point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of
many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.
You need to take care of the extractors.</p>
<h3 id="usage-in-the-front-end">Usage in the front end</h3>
<p>So typical call for retrieving data from a website would look like this:</p>
<pre><code class="java">Info info;
try {
    // Create a new Extractor with a given context provided as parameter.
    Extractor extractor = new Extractor(some_meta_info);
    // Retrieves the data form extractor and builds info package.
    info = Info.getInfo(extractor);
} catch(Exception e) {
    // handle errors when collector decided to break up extraction
}
</code></pre>

<h3 id="typical-implementation-of-a-single-data-extractor">Typical implementation of a single data extractor</h3>
<p>The typical implementation of a single data extractor on the other hand would look like this:</p>
<pre><code class="java">class MyExtractor extends FutureExtractor {

    public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {
        super(requiredInfo, forExtraction);

        ...
    }

    @Override
    public void fetch() {
        // Actually fetch the page data here
    }

    @Override
    public String someDataFiled() 
        throws ExtractionException {    //The exception needs to be thrown if someting failed
        // get piece of information and return it
    }

    ...                                 // More datafields
}
</code></pre>

<h2 id="collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</h2>
<p>Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>. In order
to get such items a <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>
is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>
<p><img alt="InfoItemsCollector_objectdiagram.svg" src="../img/InfoItemsCollector_objectdiagram.svg" /></p>
<p>When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail
or its creator. Such info can be called <strong>list header</strong>.</p>
<p>Also if you open a list in a web browser the website usually does not load the whole list, but only a part
of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in
NewPipe is coped down into <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html">InfoItemPage</a>s. Each Page has its own URL, and needs to be extracted separately.</p>
<p>List header information and extracting multiple pages of an InfoItem list can be handled by a
<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a></p>
              
            </div>
          </div>
          <footer>
  
    <div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
      
      
        <a href="../00_Prepare_everything/" class="btn btn-neutral" title="Prepare everything"><span class="icon icon-circle-arrow-left"></span> Previous</a>
      
    </div>
  

  <hr/>

  <div role="contentinfo">
    <!-- Copyright etc -->
    
  </div>

  Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
      
        </div>
      </div>

    </section>

  </div>

  <div class="rst-versions" role="note" style="cursor: pointer">
    <span class="rst-current-version" data-toggle="rst-current-version">
      
      
        <span><a href="../00_Prepare_everything/" style="color: #fcfcfc;">&laquo; Previous</a></span>
      
      
    </span>
</div>
    <script>var base_url = '..';</script>
    <script src="../js/theme.js"></script>
      <script src="../search/require.js"></script>
      <script src="../search/search.js"></script>

</body>
</html>
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`<!DOCTYPE html>`
			`<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->`
			`<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->`
			`<head>`
			`<meta charset="utf-8">`
			`<meta http-equiv="X-UA-Compatible" content="IE=edge">`
			`<meta name="viewport" content="width=device-width, initial-scale=1.0">`


			`<link rel="shortcut icon" href="../img/favicon.ico">`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<title>Concept of the Extractor - NewPipe Tutorial</title>`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`<link rel="stylesheet" href="../css/local_fonts.css" type="text/css" />`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00
			`<link rel="stylesheet" href="../css/theme.css" type="text/css" />`
			`<link rel="stylesheet" href="../css/theme_extra.css" type="text/css" />`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`<link rel="stylesheet" href="../css/highlight.css" type="text/css" />`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00
			`<script>`
			`// Current page data`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`var mkdocs_page_name = "Concept of the Extractor";`
			`var mkdocs_page_input_path = "01_Concept_of_the_extractor.md";`
			`var mkdocs_page_url = "/01_Concept_of_the_extractor/";`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`</script>`

			`<script src="../js/jquery-2.1.1.min.js"></script>`
			`<script src="../js/modernizr-2.8.3.min.js"></script>`
			`<script type="text/javascript" src="../js/highlight.pack.js"></script>`

			`</head>`

			`<body class="wy-body-for-nav" role="document">`

			`<div class="wy-grid-for-nav">`


			`<nav data-toggle="wy-nav-shift" class="wy-nav-side stickynav">`
			`<div class="wy-side-nav-search">`
			`<a href=".." class="icon icon-home"> NewPipe Tutorial</a>`
			`<div role="search">`
			`<form id ="rtd-search-form" class="wy-form" action="../search.html" method="get">`
			`<input type="text" name="q" placeholder="Search docs" />`
			`</form>`
			`</div>`
			`</div>`

			`<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">`
			`<ul class="current">`


			`<li class="toctree-l1">`

			`<a class="" href="..">Welcome to NewPipe Tutorial</a>`
			`</li>`

			`<li class="toctree-l1">`

			`<a class="" href="../00_Prepare_everything/">Prepare everything</a>`
			`</li>`

			`<li class="toctree-l1 current">`

Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<a class="current" href="./">Concept of the Extractor</a>`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`<ul class="subnav">`

Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<li class="toctree-l2"><a href="#concept-of-the-extractor">Concept of the Extractor</a></li>`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00
			`<ul>`

			`<li><a class="toctree-l3" href="#collectorextractor-pattern">Collector/Extractor pattern</a></li>`

Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<li><a class="toctree-l3" href="#collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</a></li>`

Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`</ul>`


			`</ul>`
			`</li>`

			`</ul>`
			`</div>`
			` `
			`</nav>`

			`<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">`


			`<nav class="wy-nav-top" role="navigation" aria-label="top navigation">`
			`<i data-toggle="wy-nav-top" class="fa fa-bars"></i>`
			`<a href="..">NewPipe Tutorial</a>`
			`</nav>`


			`<div class="wy-nav-content">`
			`<div class="rst-content">`
			`<div role="navigation" aria-label="breadcrumbs navigation">`
			`<ul class="wy-breadcrumbs">`
			`<li><a href="..">Docs</a> »</li>`



Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<li>Concept of the Extractor</li>`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`<li class="wy-breadcrumbs-aside">`

			`</li>`
			`</ul>`
			`<hr/>`
			`</div>`
			`<div role="main">`
			`<div class="section">`

Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<h1 id="concept-of-the-extractor">Concept of the Extractor</h1>`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`<h2 id="collectorextractor-pattern">Collector/Extractor pattern</h2>`
			`<p>Before we can start coding our own service we need to understand the basic concept of the extractor. There is a pattern`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`you will find all over the code. It is called the <strong>extractor/collector</strong> pattern. The idea behind this pattern is that`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`the <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/Extractor.html">extractor</a>`
			`would produce single peaces of data, and the collector would take it and form usable data for the front end out of it.`
			`The collector also controls the parsing process, and takes care about error handling. So if the extractor fails at any`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`point the collector will decide whether it should continue parsing or not. This requires the extractor to be made out of`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`many small methods. One method for every data field the collector wants to have. The collectors are provided by NewPipe.`
			`You need to take care of the extractors.</p>`
			`<h3 id="usage-in-the-front-end">Usage in the front end</h3>`
			`<p>So typical call for retrieving data from a website would look like this:</p>`
			`<pre><code class="java">Info info;`
			`try {`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`// Create a new Extractor with a given context provided as parameter.`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`Extractor extractor = new Extractor(some_meta_info);`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`// Retrieves the data form extractor and builds info package.`
			`info = Info.getInfo(extractor);`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`} catch(Exception e) {`
Deployed 8ce9f05 with MkDocs version: 0.17.2 2018-02-23 20:18:58 +00:00			`// handle errors when collector decided to break up extraction`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00			`}`
			`</code></pre>`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00
			`<h3 id="typical-implementation-of-a-single-data-extractor">Typical implementation of a single data extractor</h3>`
			`<p>The typical implementation of a single data extractor on the other hand would look like this:</p>`
			`<pre><code class="java">class MyExtractor extends FutureExtractor {`

			`public MyExtractor(RequiredInfo requiredInfo, ForExtraction forExtraction) {`
			`super(requiredInfo, forExtraction);`

			`...`
			`}`

			`@Override`
			`public void fetch() {`
			`// Actually fetch the page data here`
			`}`

			`@Override`
			`public String someDataFiled()`
			`throws ExtractionException { //The exception needs to be thrown if someting failed`
			`// get piece of information and return it`
			`}`

			`... // More datafields`
			`}`
			`</code></pre>`

			`<h2 id="collectorextractor-pattern-for-lists">Collector/Extractor pattern for lists</h2>`
			`<p>Sometimes information can not be represented as a structure, but as a list. In NewPipe an item of a list is called`
			`<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItem.html">InfoItem</a>. In order`
Deployed ce698b4 with MkDocs version: 0.17.2 2018-02-24 22:20:22 +00:00			`to get such items a <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html">InfoItemsCollector</a>`
			`is used. For each item that should be extracted a new Extractor will be given to the InfoItemCollector via <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/InfoItemsCollector.html#commit-E-">commit()</a>.</p>`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<p><img alt="InfoItemsCollector_objectdiagram.svg" src="../img/InfoItemsCollector_objectdiagram.svg" /></p>`
			`<p>When a streaming site shows a list it usually offers some additional information about that list, like it's title, a thumbnail`
			`or its creator. Such info can be called <strong>list header</strong>.</p>`
			`<p>Also if you open a list in a web browser the website usually does not load the whole list, but only a part`
			`of it. In order to get more you may have to click on a next page button, or scroll down. This is why a list in`
Deployed ce698b4 with MkDocs version: 0.17.2 2018-02-24 22:20:22 +00:00			`NewPipe is coped down into <a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.InfoItemPage.html">InfoItemPage</a>s. Each Page has its own URL, and needs to be extracted separately.</p>`
Deployed b864491 with MkDocs version: 0.17.2 2018-02-24 21:17:40 +00:00			`<p>List header information and extracting multiple pages of an InfoItem list can be handled by a`
			`<a href="https://teamnewpipe.github.io/NewPipeExtractor/javadoc/org/schabi/newpipe/extractor/ListExtractor.html">ListExtractor</a></p>`
Deployed b3d1bbc with MkDocs version: 0.17.2 2018-02-22 18:22:22 +00:00
			`</div>`
			`</div>`
			`<footer>`

			`<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">`


			`<a href="../00_Prepare_everything/" class="btn btn-neutral" title="Prepare everything"><span class="icon icon-circle-arrow-left"></span> Previous</a>`

			`</div>`


			`<hr/>`

			`<div role="contentinfo">`
			`<!-- Copyright etc -->`

			`</div>`

			`Built with <a href="http://www.mkdocs.org">MkDocs</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.`
			`</footer>`

			`</div>`
			`</div>`

			`</section>`

			`</div>`

			`<div class="rst-versions" role="note" style="cursor: pointer">`
			`<span class="rst-current-version" data-toggle="rst-current-version">`


			`<span><a href="../00_Prepare_everything/" style="color: #fcfcfc;">« Previous</a></span>`


			`</span>`
			`</div>`
			`<script>var base_url = '..';</script>`
			`<script src="../js/theme.js"></script>`
			`<script src="../search/require.js"></script>`
			`<script src="../search/search.js"></script>`

			`</body>`
			`</html>`