<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Software and Opinions &#187; fnmatch</title>
	<atom:link href="http://ianloic.com/tag/fnmatch/feed/" rel="self" type="application/rss+xml" />
	<link>http://ianloic.com</link>
	<description>from Ian McKellar</description>
	<lastBuildDate>Wed, 07 Sep 2011 21:48:48 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<atom:link rel='hub' href='http://ianloic.com/?pushpress=hub'/>
		<item>
		<title>Implementing text pattern matching languages in LLVM</title>
		<link>http://ianloic.com/2009/06/05/implementing-text-pattern-matching-languages-in-llvm/</link>
		<comments>http://ianloic.com/2009/06/05/implementing-text-pattern-matching-languages-in-llvm/#comments</comments>
		<pubDate>Fri, 05 Jun 2009 20:07:39 +0000</pubDate>
		<dc:creator>Ian McKellar</dc:creator>
				<category><![CDATA[Default]]></category>
		<category><![CDATA[fnmatch]]></category>
		<category><![CDATA[llvm]]></category>
		<category><![CDATA[regex]]></category>

		<guid isPermaLink="false">http://ianloic.com/?p=123</guid>
		<description><![CDATA[We use pattern matching languages all day long. From shell filename matching rules (fnmatch) in our shells and shell utilities like find and locate to regular expression matching in programming languages, configuration files and shell utilities like grep and sed. &#8230; <a href="http://ianloic.com/2009/06/05/implementing-text-pattern-matching-languages-in-llvm/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We use pattern matching languages all day long. From shell filename matching rules (<a href="http://www.opengroup.org/onlinepubs/9699919799/functions/fnmatch.html">fnmatch</a>) in our shells and shell utilities like find and locate to <a href="http://en.wikipedia.org/wiki/Regular_expression">regular expression</a> matching in programming languages, configuration files and shell utilities like <a href="http://en.wikipedia.org/wiki/Grep">grep</a> and <a href="http://en.wikipedia.org/wiki/Sed">sed</a>. These have typically been implemented by parsing the pattern into data structures and walking those data structures as input is processed. Obviously, these work fairly well &#8211; we couldn&#8217;t live without them, but can we do better? I&#8217;m thinking particularly about patterns that are matched against many times like the input to <a href="http://en.wikipedia.org/wiki/GNU_locate">GNU locate</a> (matched against every path on your system) or the routing table of your web application (matched for every request)?</p>
<p>Pattern matching languages are programming languages. If we were looking to speed up long-running conventional programs an obvious approach would be to use a virtual machine with a JIT to end up with efficient native code at the cost of slower startup.</p>
<p>I&#8217;ve been experimenting with this and the <a href="http://www.llvm.org/">LLVM</a> compiler infrastructure project. LLVM is a set of libraries and UNIX utilities that provide an assembler, optimizer, interpreter, and native compiler for a high level <a href="http://en.wikipedia.org/wiki/SSA_(compilers)">SSA</a> intermediate form. So far I&#8217;ve implemented a simplified subset of the POSIX fnmatch function&#8217;s functionality as a proof of concept. It&#8217;s pretty hacky, but it&#8217;s <a href="http://github.com/ianloic/llvm-fnmatch/tree/master">up on github</a>.</p>
<p>The performance isn&#8217;t great, it&#8217;s 20% slower than <a href="http://en.wikipedia.org/wiki/GNU_Libc">GLIBC</a> in my trivial testcase, but so far all I&#8217;ve got is a really naiive implementation. I&#8217;m going to implement an <a href="http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine">NFA</a> / <a href="http://en.wikipedia.org/wiki/Deterministic_finite_state_machine">DFA</a> based algorithm which should be more efficient, and easier extend to full regular expressions. Hopefully it&#8217;ll look more useful then.</p>
]]></content:encoded>
			<wfw:commentRss>http://ianloic.com/2009/06/05/implementing-text-pattern-matching-languages-in-llvm/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

