Xapian and Chinese Indexing&Searching
Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!)
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.
一个开源搜索引擎库。基于GPL发布,C++语言所写。通过SWIG可与Perl,Python、PHP等绑定。
研究了两天,挺有趣的东西。操作、索引的思路与Lucene有点差别。Xapian是基于概率模型,而Lucene是基于向量模型。Xapian内置只支持English、Danish、French、Spanish等类似的语言,不支持使用汉字的中文。但目前中文资料实在太少了,现在还没弄清楚如何自己写stemmer。还只能预先对中文进行分词并用空格分隔后,将其当作一个个英文单词,并使用英语的stemmer来索引。不过这样,到也能达到索引效果。
这里有一篇文章,是讲使用Xapian来进行中文索引和搜索的:Chinese Xapian Indexing and Searching。
Else:
今天看了几段The Forbidden Kingdom(功夫之王)的视频片段,太让人喜欢了。真期待!
Directed by Rob Minkoff, with Jet Li, Jackie Chan, Michael Angarano. A discovery made by a kung fu obsessed American teen sends hime on an adventure to China.
--http://www.forbiddenkingdommovie.com
李冰冰、刘亦菲、成龙、李连杰都是让人喜欢的演员。
No comments yet.