93 lines
13 KiB
HTML
93 lines
13 KiB
HTML
|
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="API documentation for the Rust `regex_syntax` crate."><meta name="keywords" content="rust, rustlang, rust-lang, regex_syntax"><title>regex_syntax - Rust</title><link rel="stylesheet" type="text/css" href="../normalize.css"><link rel="stylesheet" type="text/css" href="../rustdoc.css" id="mainThemeStyle"><link rel="stylesheet" type="text/css" href="../dark.css"><link rel="stylesheet" type="text/css" href="../light.css" id="themeStyle"><script src="../storage.js"></script><noscript><link rel="stylesheet" href="../noscript.css"></noscript><link rel="shortcut icon" href="../favicon.ico"><style type="text/css">#crate-search{background-image:url("../down-arrow.svg");}</style></head><body class="rustdoc mod"><!--[if lte IE 8]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="sidebar"><div class="sidebar-menu">☰</div><a href='../regex_syntax/index.html'><div class='logo-container'><img src='../rust-logo.png' alt='logo'></div></a><p class='location'>Crate regex_syntax</p><div class="sidebar-elems"><a id='all-types' href='all.html'><p>See all regex_syntax's items</p></a><div class="block items"><ul><li><a href="#modules">Modules</a></li><li><a href="#structs">Structs</a></li><li><a href="#enums">Enums</a></li><li><a href="#functions">Functions</a></li><li><a href="#types">Type Definitions</a></li></ul></div><p class='location'></p><script>window.sidebarCurrent = {name: 'regex_syntax', ty: 'mod', relpath: '../'};</script></div></nav><div class="theme-picker"><button id="theme-picker" aria-label="Pick another theme!"><img src="../brush.svg" width="18" alt="Pick another theme!"></button><div id="theme-choices"></div></div><script src="../theme.js"></script><nav class="sub"><form class="search-form js-only"><div class="search-container"><div><select id="crate-search"><option value="All crates">All crates</option></select><input class="search-input" name="search" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"></div><a id="settings-menu" href="../settings.html"><img src="../wheel.svg" width="18" alt="Change settings"></a></div></form></nav><section id="main" class="content"><h1 class='fqn'><span class='out-of-band'><span id='render-detail'><a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">[<span class='inner'>−</span>]</a></span><a class='srclink' href='../src/regex_syntax/lib.rs.html#11-228' title='goto source code'>[src]</a></span><span class='in-band'>Crate <a class="mod" href=''>regex_syntax</a></span></h1><div class='docblock'><p>This crate provides a robust regular expression parser.</p>
|
|||
|
<p>This crate defines two primary types:</p>
|
|||
|
<ul>
|
|||
|
<li><a href="ast/enum.Ast.html"><code>Ast</code></a> is the abstract syntax of a regular expression.
|
|||
|
An abstract syntax corresponds to a <em>structured representation</em> of the
|
|||
|
concrete syntax of a regular expression, where the concrete syntax is the
|
|||
|
pattern string itself (e.g., <code>foo(bar)+</code>). Given some abstract syntax, it
|
|||
|
can be converted back to the original concrete syntax (modulo some details,
|
|||
|
like whitespace). To a first approximation, the abstract syntax is complex
|
|||
|
and difficult to analyze.</li>
|
|||
|
<li><a href="hir/struct.Hir.html"><code>Hir</code></a> is the high-level intermediate representation
|
|||
|
("HIR" or "high-level IR" for short) of regular expression. It corresponds to
|
|||
|
an intermediate state of a regular expression that sits between the abstract
|
|||
|
syntax and the low level compiled opcodes that are eventually responsible for
|
|||
|
executing a regular expression search. Given some high-level IR, it is not
|
|||
|
possible to produce the original concrete syntax (although it is possible to
|
|||
|
produce an equivalent concrete syntax, but it will likely scarcely resemble
|
|||
|
the original pattern). To a first approximation, the high-level IR is simple
|
|||
|
and easy to analyze.</li>
|
|||
|
</ul>
|
|||
|
<p>These two types come with conversion routines:</p>
|
|||
|
<ul>
|
|||
|
<li>An <a href="ast/parse/struct.Parser.html"><code>ast::parse::Parser</code></a> converts concrete
|
|||
|
syntax (a <code>&str</code>) to an <a href="ast/enum.Ast.html"><code>Ast</code></a>.</li>
|
|||
|
<li>A <a href="hir/translate/struct.Translator.html"><code>hir::translate::Translator</code></a>
|
|||
|
converts an <a href="ast/enum.Ast.html"><code>Ast</code></a> to a <a href="hir/struct.Hir.html"><code>Hir</code></a>.</li>
|
|||
|
</ul>
|
|||
|
<p>As a convenience, the above two conversion routines are combined into one via
|
|||
|
the top-level <a href="struct.Parser.html"><code>Parser</code></a> type. This <code>Parser</code> will first
|
|||
|
convert your pattern to an <code>Ast</code> and then convert the <code>Ast</code> to an <code>Hir</code>.</p>
|
|||
|
<h1 id="example" class="section-header"><a href="#example">Example</a></h1>
|
|||
|
<p>This example shows how to parse a pattern string into its HIR:</p>
|
|||
|
|
|||
|
<div class="example-wrap"><pre class="rust rust-example-rendered">
|
|||
|
<span class="kw">use</span> <span class="ident">regex_syntax</span>::<span class="ident">Parser</span>;
|
|||
|
<span class="kw">use</span> <span class="ident">regex_syntax</span>::<span class="ident">hir</span>::{<span class="self">self</span>, <span class="ident">Hir</span>};
|
|||
|
|
|||
|
<span class="kw">let</span> <span class="ident">hir</span> <span class="op">=</span> <span class="ident">Parser</span>::<span class="ident">new</span>().<span class="ident">parse</span>(<span class="string">"a|b"</span>).<span class="ident">unwrap</span>();
|
|||
|
<span class="macro">assert_eq</span><span class="macro">!</span>(<span class="ident">hir</span>, <span class="ident">Hir</span>::<span class="ident">alternation</span>(<span class="macro">vec</span><span class="macro">!</span>[
|
|||
|
<span class="ident">Hir</span>::<span class="ident">literal</span>(<span class="ident">hir</span>::<span class="ident">Literal</span>::<span class="ident">Unicode</span>(<span class="string">'a'</span>)),
|
|||
|
<span class="ident">Hir</span>::<span class="ident">literal</span>(<span class="ident">hir</span>::<span class="ident">Literal</span>::<span class="ident">Unicode</span>(<span class="string">'b'</span>)),
|
|||
|
]));</pre></div>
|
|||
|
<h1 id="concrete-syntax-supported" class="section-header"><a href="#concrete-syntax-supported">Concrete syntax supported</a></h1>
|
|||
|
<p>The concrete syntax is documented as part of the public API of the
|
|||
|
<a href="https://docs.rs/regex/%2A/regex/#syntax"><code>regex</code> crate</a>.</p>
|
|||
|
<h1 id="input-safety" class="section-header"><a href="#input-safety">Input safety</a></h1>
|
|||
|
<p>A key feature of this library is that it is safe to use with end user facing
|
|||
|
input. This plays a significant role in the internal implementation. In
|
|||
|
particular:</p>
|
|||
|
<ol>
|
|||
|
<li>Parsers provide a <code>nest_limit</code> option that permits callers to control how
|
|||
|
deeply nested a regular expression is allowed to be. This makes it possible
|
|||
|
to do case analysis over an <code>Ast</code> or an <code>Hir</code> using recursion without
|
|||
|
worrying about stack overflow.</li>
|
|||
|
<li>Since relying on a particular stack size is brittle, this crate goes to
|
|||
|
great lengths to ensure that all interactions with both the <code>Ast</code> and the
|
|||
|
<code>Hir</code> do not use recursion. Namely, they use constant stack space and heap
|
|||
|
space proportional to the size of the original pattern string (in bytes).
|
|||
|
This includes the type's corresponding destructors. (One exception to this
|
|||
|
is literal extraction, but this will eventually get fixed.)</li>
|
|||
|
</ol>
|
|||
|
<h1 id="error-reporting" class="section-header"><a href="#error-reporting">Error reporting</a></h1>
|
|||
|
<p>The <code>Display</code> implementations on all <code>Error</code> types exposed in this library
|
|||
|
provide nice human readable errors that are suitable for showing to end users
|
|||
|
in a monospace font.</p>
|
|||
|
<h1 id="literal-extraction" class="section-header"><a href="#literal-extraction">Literal extraction</a></h1>
|
|||
|
<p>This crate provides limited support for
|
|||
|
<a href="hir/literal/struct.Literals.html">literal extraction from <code>Hir</code> values</a>.
|
|||
|
Be warned that literal extraction currently uses recursion, and therefore,
|
|||
|
stack size proportional to the size of the <code>Hir</code>.</p>
|
|||
|
<p>The purpose of literal extraction is to speed up searches. That is, if you
|
|||
|
know a regular expression must match a prefix or suffix literal, then it is
|
|||
|
often quicker to search for instances of that literal, and then confirm or deny
|
|||
|
the match using the full regular expression engine. These optimizations are
|
|||
|
done automatically in the <code>regex</code> crate.</p>
|
|||
|
</div><h2 id='modules' class='section-header'><a href="#modules">Modules</a></h2>
|
|||
|
<table><tr class='module-item'><td><a class="mod" href="ast/index.html" title='regex_syntax::ast mod'>ast</a></td><td class='docblock-short'><p>Defines an abstract syntax for regular expressions.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="mod" href="hir/index.html" title='regex_syntax::hir mod'>hir</a></td><td class='docblock-short'><p>Defines a high-level intermediate representation for regular expressions.</p>
|
|||
|
</td></tr></table><h2 id='structs' class='section-header'><a href="#structs">Structs</a></h2>
|
|||
|
<table><tr class='module-item'><td><a class="struct" href="struct.Parser.html" title='regex_syntax::Parser struct'>Parser</a></td><td class='docblock-short'><p>A convenience parser for regular expressions.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="struct" href="struct.ParserBuilder.html" title='regex_syntax::ParserBuilder struct'>ParserBuilder</a></td><td class='docblock-short'><p>A builder for a regular expression parser.</p>
|
|||
|
</td></tr></table><h2 id='enums' class='section-header'><a href="#enums">Enums</a></h2>
|
|||
|
<table><tr class='module-item'><td><a class="enum" href="enum.Error.html" title='regex_syntax::Error enum'>Error</a></td><td class='docblock-short'><p>This error type encompasses any error that can be returned by this crate.</p>
|
|||
|
</td></tr></table><h2 id='functions' class='section-header'><a href="#functions">Functions</a></h2>
|
|||
|
<table><tr class='module-item'><td><a class="fn" href="fn.escape.html" title='regex_syntax::escape fn'>escape</a></td><td class='docblock-short'><p>Escapes all regular expression meta characters in <code>text</code>.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="fn" href="fn.escape_into.html" title='regex_syntax::escape_into fn'>escape_into</a></td><td class='docblock-short'><p>Escapes all meta characters in <code>text</code> and writes the result into <code>buf</code>.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="fn" href="fn.is_meta_character.html" title='regex_syntax::is_meta_character fn'>is_meta_character</a></td><td class='docblock-short'><p>Returns true if the give character has significance in a regex.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="fn" href="fn.is_word_byte.html" title='regex_syntax::is_word_byte fn'>is_word_byte</a></td><td class='docblock-short'><p>Returns true if and only if the given character is an ASCII word character.</p>
|
|||
|
</td></tr><tr class='module-item'><td><a class="fn" href="fn.is_word_character.html" title='regex_syntax::is_word_character fn'>is_word_character</a></td><td class='docblock-short'><p>Returns true if and only if the given character is a Unicode word
|
|||
|
character.</p>
|
|||
|
</td></tr></table><h2 id='types' class='section-header'><a href="#types">Type Definitions</a></h2>
|
|||
|
<table><tr class='module-item'><td><a class="type" href="type.Result.html" title='regex_syntax::Result type'>Result</a></td><td class='docblock-short'><p>A type alias for dealing with errors returned by this crate.</p>
|
|||
|
</td></tr></table></section><section id="search" class="content hidden"></section><section class="footer"></section><aside id="help" class="hidden"><div><h1 class="hidden">Help</h1><div class="shortcuts"><h2>Keyboard Shortcuts</h2><dl><dt><kbd>?</kbd></dt><dd>Show this help dialog</dd><dt><kbd>S</kbd></dt><dd>Focus the search field</dd><dt><kbd>↑</kbd></dt><dd>Move up in search results</dd><dt><kbd>↓</kbd></dt><dd>Move down in search results</dd><dt><kbd>↹</kbd></dt><dd>Switch tab</dd><dt><kbd>⏎</kbd></dt><dd>Go to active search result</dd><dt><kbd>+</kbd></dt><dd>Expand all sections</dd><dt><kbd>-</kbd></dt><dd>Collapse all sections</dd></dl></div><div class="infos"><h2>Search Tricks</h2><p>Prefix searches with a type followed by a colon (e.g., <code>fn:</code>) to restrict the search to a given type.</p><p>Accepted types are: <code>fn</code>, <code>mod</code>, <code>struct</code>, <code>enum</code>, <code>trait</code>, <code>type</code>, <code>macro</code>, and <code>const</code>.</p><p>Search functions by type signature (e.g., <code>vec -> usize</code> or <code>* -> vec</code>)</p><p>Search multiple things at once by splitting your query with comma (e.g., <code>str,u8</code> or <code>String,struct:Vec,test</code>)</p></div></div></aside><script>window.rootPath = "../";window.currentCrate = "regex_syntax";</script><script src="../aliases.js"></script><script src="../main.js"></script><script defer src="../search-index.js"></script></body></html>
|