Xapian and Chinese Indexing&Searching

Posted by Cofyc, on April 27, 2008, 7:04 pm

Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C# and Ruby (so far!)

Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.

-- www.xapian.org

一个开源搜索引擎库。基于GPL发布,C++语言所写。通过SWIG可与Perl,Python、PHP等绑定。

研究了两天,挺有趣的东西。操作、索引的思路与Lucene有点差别。Xapian是基于概率模型,而Lucene是基于向量模型。Xapian内置只支持English、Danish、French、Spanish等类似的语言,不支持使用汉字的中文。但目前中文资料实在太少了,现在还没弄清楚如何自己写stemmer。还只能预先对中文进行分词并用空格分隔后,将其当作一个个英文单词,并使用英语的stemmer来索引。不过这样,到也能达到索引效果。

这里有一篇文章,是讲使用Xapian来进行中文索引和搜索的:Chinese Xapian Indexing and Searching

 

Else:

今天看了几段The Forbidden Kingdom(功夫之王)的视频片段,太让人喜欢了。真期待!

Directed by Rob Minkoff, with Jet Li, Jackie Chan, Michael Angarano. A discovery made by a kung fu obsessed American teen sends hime on an adventure to China.

--http://www.forbiddenkingdommovie.com

李冰冰、刘亦菲、成龙、李连杰都是让人喜欢的演员。

0 comment  - Tags: xapian, chinese, forbidden, kingdom, 功夫之王, jet li, jackie chan, michael angarano, 李冰冰, 刘亦菲

No comments yet.

Post a comment

Reply

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 3.0 License
Powered by Project Neverland, Theme modified from gluedideas