Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!)
Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.
If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.
一个开源搜索引擎库。基于GPL发布,C++语言所写。通过SWIG可与Perl,Python、PHP等绑定。
研究了两天,挺有趣的东西。操作、索引的思路与Lucene有点差别。Xapian是基于概率模型,而Lucene是基于向量模型。Xapian内置只支持English、Danish、French、Spanish等类似的语言,不支持使用汉字的中文。但目前中文资料实在太少了,现在还没弄清楚如何自己写stemmer。还只能预先对中文进行分词并用空格分隔后,将其当作一个个英文单词,并使用英语的stemmer来索引。不过这样,到也能达到索引效果。
这里有一篇文章,是讲使用Xapian来进行中文索引和搜索的:Chinese Xapian Indexing and Searching。 continue...

