<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments for Software and Opinions</title>
	<atom:link href="http://ianloic.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://ianloic.com</link>
	<description>from Ian McKellar</description>
	<lastBuildDate>Fri, 05 Mar 2010 07:19:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on jQuery selector escaping by Kayhadrin</title>
		<link>http://ianloic.com/2009/12/27/jquery-selector-escaping/comment-page-1/#comment-1869</link>
		<dc:creator>Kayhadrin</dc:creator>
		<pubDate>Fri, 05 Mar 2010 07:19:46 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=161#comment-1869</guid>
		<description>Hi there,

Thanks for publishing your plugin. 
Just sharing a modified version that I made to avoid using recursive functions.

The comparison method seems a bit lengthy but I think it&#039;s faster than doing regular expressions continuously.

(function($) {
	if ($) {
		var escape_re = /[#;&amp;,\.\+\*~&#039;:&quot;!\^\$\[\]\(\)=&gt;&#124;\/\\]/,
			escapeCharacters = {
			&#039;#&#039;: 1,
			&#039;;&#039;: 1,
			&#039;&amp;&#039;: 1,
			&#039;,&#039;: 1,
			&#039;.&#039;: 1,
			&#039;+&#039;: 1,
			&#039;*&#039;: 1,
			&#039;~&#039;: 1,
			&#039;\&#039;&#039;: 1,
			&#039;:&#039;: 1,
			&#039;&quot;&#039;: 1,
			&#039;!&#039;: 1,
			&#039;^&#039;: 1,
			&#039;$&#039;: 1,
			&#039;[&#039;: 1,
			&#039;]&#039;: 1,
			&#039;(&#039;: 1,
			&#039;)&#039;: 1,
			&#039;=&#039;: 1,
			&#039;&gt;&#039;: 1,
			&#039;&#124;&#039;: 1,
			&#039;/&#039;: 1,
			&#039;\\&#039;: 1
		};
		$.escape = function(s){
			var ret = &#039;&#039;, offset;
			if (s &amp;&amp; ((offset = s.search(escape_re)) !== -1)) { // look for an occurence of a special character
				ret = s.substr(0, offset) + &#039;\\&#039; + s[offset];
				for(var i=offset + 1, len=s.length, ch; i &lt; len; i++){ // assume that another special character may occur so we just loop through the rest of the string
					ch = s[i];
					ret += (escapeCharacters[ch]? &#039;\\&#039;: &#039;&#039;) + ch;
				}
			}
			return ret;
		};
	}
})(window.jQuery);

Cheers,

Kayhadrin</description>
		<content:encoded><![CDATA[<p>Hi there,</p>
<p>Thanks for publishing your plugin.<br />
Just sharing a modified version that I made to avoid using recursive functions.</p>
<p>The comparison method seems a bit lengthy but I think it&#8217;s faster than doing regular expressions continuously.</p>
<p>(function($) {<br />
	if ($) {<br />
		var escape_re = /[#;&amp;,\.\+\*~':"!\^\$\[\]\(\)=&gt;|\/\\]/,<br />
			escapeCharacters = {<br />
			&#8216;#&#8217;: 1,<br />
			&#8216;;&#8217;: 1,<br />
			&#8216;&amp;&#8217;: 1,<br />
			&#8216;,&#8217;: 1,<br />
			&#8216;.&#8217;: 1,<br />
			&#8216;+&#8217;: 1,<br />
			&#8216;*&#8217;: 1,<br />
			&#8216;~&#8217;: 1,<br />
			&#8216;\&#8221;: 1,<br />
			&#8216;:&#8217;: 1,<br />
			&#8216;&#8221;&#8216;: 1,<br />
			&#8216;!&#8217;: 1,<br />
			&#8216;^&#8217;: 1,<br />
			&#8216;$&#8217;: 1,<br />
			&#8216;[': 1,<br />
			']&#8216;: 1,<br />
			&#8216;(&#8216;: 1,<br />
			&#8216;)&#8217;: 1,<br />
			&#8216;=&#8217;: 1,<br />
			&#8216;&gt;&#8217;: 1,<br />
			&#8216;|&#8217;: 1,<br />
			&#8216;/&#8217;: 1,<br />
			&#8216;\\&#8217;: 1<br />
		};<br />
		$.escape = function(s){<br />
			var ret = &#8221;, offset;<br />
			if (s &amp;&amp; ((offset = s.search(escape_re)) !== -1)) { // look for an occurence of a special character<br />
				ret = s.substr(0, offset) + &#8216;\\&#8217; + s[offset];<br />
				for(var i=offset + 1, len=s.length, ch; i &lt; len; i++){ // assume that another special character may occur so we just loop through the rest of the string<br />
					ch = s[i];<br />
					ret += (escapeCharacters[ch]? &#039;\\&#039;: &#039;&#039;) + ch;<br />
				}<br />
			}<br />
			return ret;<br />
		};<br />
	}<br />
})(window.jQuery);</p>
<p>Cheers,</p>
<p>Kayhadrin</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Ian McKellar</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1838</link>
		<dc:creator>Ian McKellar</dc:creator>
		<pubDate>Sun, 07 Feb 2010 14:42:58 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1838</guid>
		<description>I definitely understand the advantages of a powerful, well tested platform like [C]Lucene, but many search applications I&#039;ve seen don&#039;t need what it has to offer. I&#039;m becoming less and less convinced in one-size-fits-all solutions as the universal answer, especially for applications that only require a subset of the functionality offered by comprehensive packages like Lucene.

As for our CLucene index corruption issues, as far as I remember we talked to developers on IRC and to Ben in person. We were never able to reproduce the issues in a controlled environment and when we did get corrupted indexes (for example by sending users external hard disks to copy their corrupt indexes to) we couldn&#039;t work out what was wrong.</description>
		<content:encoded><![CDATA[<p>I definitely understand the advantages of a powerful, well tested platform like [C]Lucene, but many search applications I&#8217;ve seen don&#8217;t need what it has to offer. I&#8217;m becoming less and less convinced in one-size-fits-all solutions as the universal answer, especially for applications that only require a subset of the functionality offered by comprehensive packages like Lucene.</p>
<p>As for our CLucene index corruption issues, as far as I remember we talked to developers on IRC and to Ben in person. We were never able to reproduce the issues in a controlled environment and when we did get corrupted indexes (for example by sending users external hard disks to copy their corrupt indexes to) we couldn&#8217;t work out what was wrong.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Detecting advanced CSS features by Ian McKellar</title>
		<link>http://ianloic.com/2010/02/06/detecting-advanced-css-features/comment-page-1/#comment-1837</link>
		<dc:creator>Ian McKellar</dc:creator>
		<pubDate>Sun, 07 Feb 2010 14:32:50 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=207#comment-1837</guid>
		<description>It&#039;s really not so bad unless you want to do crazy shit. The defaults are usually pretty spot on and CSS is pretty painless when you get your head around it. The trickery comes when you want to do things that aren&#039;t widely supported.

I maintain my resume in HTML these days. At one point I kept it in a custom XML schema with tools to convert to plain-text and HTML, but it&#039;s not worth it anymore. Pretty much everyone&#039;s happy receiving HTML and &quot;lynx -dump&quot; (or similar) gives a reasonable text approximation for the few that don&#039;t. Now that the web is the primary publishing format of almost everything people are even starting to author full books in HTML and then doing a little conversion for print. Word loads HTML better than it saves it anyway.</description>
		<content:encoded><![CDATA[<p>It&#8217;s really not so bad unless you want to do crazy shit. The defaults are usually pretty spot on and CSS is pretty painless when you get your head around it. The trickery comes when you want to do things that aren&#8217;t widely supported.</p>
<p>I maintain my resume in HTML these days. At one point I kept it in a custom XML schema with tools to convert to plain-text and HTML, but it&#8217;s not worth it anymore. Pretty much everyone&#8217;s happy receiving HTML and &#8220;lynx -dump&#8221; (or similar) gives a reasonable text approximation for the few that don&#8217;t. Now that the web is the primary publishing format of almost everything people are even starting to author full books in HTML and then doing a little conversion for print. Word loads HTML better than it saves it anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Detecting advanced CSS features by Duncan Sargeant</title>
		<link>http://ianloic.com/2010/02/06/detecting-advanced-css-features/comment-page-1/#comment-1827</link>
		<dc:creator>Duncan Sargeant</dc:creator>
		<pubDate>Sun, 07 Feb 2010 01:25:16 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=207#comment-1827</guid>
		<description>A rare insight into the brokenness for me. Glad I got out while I could.

Do you maintain your resume in HTML or is it converted from something else?  As soon as I asked the question, I know the answer... you would have it in the format you would personally most be able to control the typesetting in - HTML.</description>
		<content:encoded><![CDATA[<p>A rare insight into the brokenness for me. Glad I got out while I could.</p>
<p>Do you maintain your resume in HTML or is it converted from something else?  As soon as I asked the question, I know the answer&#8230; you would have it in the format you would personally most be able to control the typesetting in &#8211; HTML.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Itamar Syn-Hershko</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1826</link>
		<dc:creator>Itamar Syn-Hershko</dc:creator>
		<pubDate>Sun, 07 Feb 2010 00:04:11 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1826</guid>
		<description>Writing your own implementation of anything hardcore is a great learning exercise, I couldn&#039;t agree more. I hope I had the time to do anything of the sort myself. What you were implying in your original post, and again in your reply, is that code you wrote from scratch (or will write wherever the need arise) may be better for some real-world scenarios. This is the point I strongly object.

The Lucene index is actually very generic, and most of the code is meant for dealing with this generality, to provide you with the ability to use the library in a very broad set of usages. The library size you were complaining about, is actually Lucene&#039;s strongest points. It has a record of 7+ years (CLucene has 7, JLucene even more) where many people have tried it under different circumstances, with different hardware and for different use-cases. They have both proof-tested it, and provided fixes or extensions. No new library or code written from scratch  can match this.

But, it is not just about stability. And not even about re-using code. I think in the open-source world, there&#039;s a great value for joining forces and working on something together; re-writing something from scratch, for internal use or releasing it to the open, is something one should do only if there&#039;s a compelling reason to do so. There are mainly two reasons for that - you and others. You don&#039;t have to write all the code for yourself and can keep focused on your own business logic, relying on others to do this work for you, and along the way you help improve the original code base and by that you help others.
I&#039;m saying all this, because I don&#039;t recall seeing any report regarding a corrupted index in CLucene. My memory may be fooling me, or I may be new to the project, but as it appears it is quite common for developers using open-source projects not to provide feedback or own patch work to the original project or developers. I think we all are losing lots of good stuff.

You are right about the learning curve for all non-simple usage, although it is not as steep as it looks. Developing tools (Analyzer, Filter, Scorer etc) for a very customized search pattern is indeed not a task one will do with only basic understanding of the code, but is not too hard a task to learn. If there&#039;s something I learned from your post, is how important in-depth documentation, articles and tutorials are. Hopefully we&#039;ll get the time to write many of them soon; right now we are focused on making some really cool code improvements.

You definitely had a very cold epilogue to your journey. It snowed last night up north and in the Jerusalem area. In Israel, that happens like once or twice a year. A real celebration...</description>
		<content:encoded><![CDATA[<p>Writing your own implementation of anything hardcore is a great learning exercise, I couldn&#8217;t agree more. I hope I had the time to do anything of the sort myself. What you were implying in your original post, and again in your reply, is that code you wrote from scratch (or will write wherever the need arise) may be better for some real-world scenarios. This is the point I strongly object.</p>
<p>The Lucene index is actually very generic, and most of the code is meant for dealing with this generality, to provide you with the ability to use the library in a very broad set of usages. The library size you were complaining about, is actually Lucene&#8217;s strongest points. It has a record of 7+ years (CLucene has 7, JLucene even more) where many people have tried it under different circumstances, with different hardware and for different use-cases. They have both proof-tested it, and provided fixes or extensions. No new library or code written from scratch  can match this.</p>
<p>But, it is not just about stability. And not even about re-using code. I think in the open-source world, there&#8217;s a great value for joining forces and working on something together; re-writing something from scratch, for internal use or releasing it to the open, is something one should do only if there&#8217;s a compelling reason to do so. There are mainly two reasons for that &#8211; you and others. You don&#8217;t have to write all the code for yourself and can keep focused on your own business logic, relying on others to do this work for you, and along the way you help improve the original code base and by that you help others.<br />
I&#8217;m saying all this, because I don&#8217;t recall seeing any report regarding a corrupted index in CLucene. My memory may be fooling me, or I may be new to the project, but as it appears it is quite common for developers using open-source projects not to provide feedback or own patch work to the original project or developers. I think we all are losing lots of good stuff.</p>
<p>You are right about the learning curve for all non-simple usage, although it is not as steep as it looks. Developing tools (Analyzer, Filter, Scorer etc) for a very customized search pattern is indeed not a task one will do with only basic understanding of the code, but is not too hard a task to learn. If there&#8217;s something I learned from your post, is how important in-depth documentation, articles and tutorials are. Hopefully we&#8217;ll get the time to write many of them soon; right now we are focused on making some really cool code improvements.</p>
<p>You definitely had a very cold epilogue to your journey. It snowed last night up north and in the Jerusalem area. In Israel, that happens like once or twice a year. A real celebration&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Ian McKellar</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1825</link>
		<dc:creator>Ian McKellar</dc:creator>
		<pubDate>Sat, 06 Feb 2010 22:29:41 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1825</guid>
		<description>I &lt;em&gt;have&lt;/em&gt; read Lucene In Action (not cover to cover, but a lot of it), and I have looked through the Lucene sources (often to try to understand CLucene better), but I feel like implementing something yourself has some value - at least as a learning exercise.

Perhaps Lucene isn&#039;t complicated, but it is large. The compressed tar file is 12MB. That&#039;s daunting when you&#039;re trying to learn, and even when you&#039;re trying to debug why the hell your indexes keep getting corrupted (a recurring problem with CLucene that I never resolved during my tenure at Flock). In some ways large is worse than complicated because it just becomes hard to keep things straight.

Anyway, my intention isn&#039;t to reinvent the wheel. It&#039;s ultimately to understand the problem space better. I wasn&#039;t interested in building something &quot;enterprise-level&quot; because I don&#039;t have an enterprise to serve right now.

What I found interesting while building my 250 line inverted index was that I could build application specific search systems rather than trying to customize general purpose ones like Lucene. A lot of the complexity in a system like Lucene is all of the support for specific use cases. It&#039;s necessary for a general purpose library like Lucene to support these specific use cases because every real world system will involve the general inverted index plus something specific to be useful. If the general system can be implemented relatively easily, perhaps we should just be implementing specific full text search systems for specific applications rather than using Lucene&#039;s various building blocks. I know this is counter to the computer science mantras of abstraction and reuse, but I think that those are often applied to readily. Who knows. I know that next time I need to build a search system for a real application I&#039;ll try Lucene and a few of its friends before building my own :)

I don&#039;t know when I&#039;ll make it back to Jerusalem. My two months in Israel ends on Tuesday when I fly to Ghana. I was visiting Jerusalem from Tel Aviv every week, but I think it&#039;ll be a couple of years at least before I&#039;m back for a visit. We were in Abu Ghosh and Latrun today, I can&#039;t imagine how cold it&#039;s gotten up in Jerusalem now! Brrr!</description>
		<content:encoded><![CDATA[<p>I <em>have</em> read Lucene In Action (not cover to cover, but a lot of it), and I have looked through the Lucene sources (often to try to understand CLucene better), but I feel like implementing something yourself has some value &#8211; at least as a learning exercise.</p>
<p>Perhaps Lucene isn&#8217;t complicated, but it is large. The compressed tar file is 12MB. That&#8217;s daunting when you&#8217;re trying to learn, and even when you&#8217;re trying to debug why the hell your indexes keep getting corrupted (a recurring problem with CLucene that I never resolved during my tenure at Flock). In some ways large is worse than complicated because it just becomes hard to keep things straight.</p>
<p>Anyway, my intention isn&#8217;t to reinvent the wheel. It&#8217;s ultimately to understand the problem space better. I wasn&#8217;t interested in building something &#8220;enterprise-level&#8221; because I don&#8217;t have an enterprise to serve right now.</p>
<p>What I found interesting while building my 250 line inverted index was that I could build application specific search systems rather than trying to customize general purpose ones like Lucene. A lot of the complexity in a system like Lucene is all of the support for specific use cases. It&#8217;s necessary for a general purpose library like Lucene to support these specific use cases because every real world system will involve the general inverted index plus something specific to be useful. If the general system can be implemented relatively easily, perhaps we should just be implementing specific full text search systems for specific applications rather than using Lucene&#8217;s various building blocks. I know this is counter to the computer science mantras of abstraction and reuse, but I think that those are often applied to readily. Who knows. I know that next time I need to build a search system for a real application I&#8217;ll try Lucene and a few of its friends before building my own <img src='http://ianloic.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I don&#8217;t know when I&#8217;ll make it back to Jerusalem. My two months in Israel ends on Tuesday when I fly to Ghana. I was visiting Jerusalem from Tel Aviv every week, but I think it&#8217;ll be a couple of years at least before I&#8217;m back for a visit. We were in Abu Ghosh and Latrun today, I can&#8217;t imagine how cold it&#8217;s gotten up in Jerusalem now! Brrr!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Itamar Syn-Hershko</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1823</link>
		<dc:creator>Itamar Syn-Hershko</dc:creator>
		<pubDate>Sat, 06 Feb 2010 18:43:58 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1823</guid>
		<description>By the time you let your thoughts roll, and sat to code this in Python, you could have just read Lucene In Action and understand the concepts behind Lucene, and how it&#039;s classes are implementing them. Understanding Lucene is also easier by reading the Java Lucene sources than CLucene&#039;s.

Lucene is not all the complicated, really. It is very well-structured, and ready for enterprise-level usage (can you allow for multi-searchers / indexers, or distributed 4GB index with your code?).

IMHO, instead of reinventing the wheel, join existing developers and help their efforts in making it roll faster and more efficiently. If you don&#039;t like reading up docs, call me next time you&#039;re in Jerusalem :)</description>
		<content:encoded><![CDATA[<p>By the time you let your thoughts roll, and sat to code this in Python, you could have just read Lucene In Action and understand the concepts behind Lucene, and how it&#8217;s classes are implementing them. Understanding Lucene is also easier by reading the Java Lucene sources than CLucene&#8217;s.</p>
<p>Lucene is not all the complicated, really. It is very well-structured, and ready for enterprise-level usage (can you allow for multi-searchers / indexers, or distributed 4GB index with your code?).</p>
<p>IMHO, instead of reinventing the wheel, join existing developers and help their efforts in making it roll faster and more efficiently. If you don&#8217;t like reading up docs, call me next time you&#8217;re in Jerusalem <img src='http://ianloic.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by pvh</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1821</link>
		<dc:creator>pvh</dc:creator>
		<pubDate>Fri, 05 Feb 2010 18:53:41 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1821</guid>
		<description>Complication is an unfortunate consequence of time. It&#039;s much harder to keep something simple than to let it grow.</description>
		<content:encoded><![CDATA[<p>Complication is an unfortunate consequence of time. It&#8217;s much harder to keep something simple than to let it grow.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Ian McKellar</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1819</link>
		<dc:creator>Ian McKellar</dc:creator>
		<pubDate>Fri, 05 Feb 2010 12:27:02 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1819</guid>
		<description>Yeah, but can it &lt;em&gt;scale&lt;/em&gt;? :)

That&#039;s when all that silly inverted index shit becomes useful.</description>
		<content:encoded><![CDATA[<p>Yeah, but can it <em>scale</em>? <img src='http://ianloic.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>That&#8217;s when all that silly inverted index shit becomes useful.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Implementing Information Retrieval Systems by Joel</title>
		<link>http://ianloic.com/2010/02/05/implementing-information-retrieval-systems/comment-page-1/#comment-1818</link>
		<dc:creator>Joel</dc:creator>
		<pubDate>Fri, 05 Feb 2010 12:11:28 +0000</pubDate>
		<guid isPermaLink="false">http://ianloic.com/?p=198#comment-1818</guid>
		<description>You should see the really really full text search I did for the help system on the NIC.  No simplified words.  No phrases.  No trie.  Still found relevant help topics way better than Windows or Linux.  Wait, there&#039;s help topics on Linux?</description>
		<content:encoded><![CDATA[<p>You should see the really really full text search I did for the help system on the NIC.  No simplified words.  No phrases.  No trie.  Still found relevant help topics way better than Windows or Linux.  Wait, there&#8217;s help topics on Linux?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
