Phil's Blog: Search indexer on local desktop

Sunday, March 13, 2011

Search indexer on local desktop

Hi Foxes!

Have you ever wanted to use something like windows search or google desktop software, but lighter, with a small footprint memory, fast and open source? I'm tired about windows search, which I don't use very often, but when you need it it is pretty cool.

So today I was looking for 2 kind of tools :

one open source indexer / search engine just on the filename, which will be the most used,
one just to index the file contents, like pdf, html, doc, ppt, dejavu and so on

Comparison links about desktop search engine

Indexer based only on the filename

I have found :

everything is a freeware, light and fast. It works only with a NTFS filesystem and on windows.
locate32 is working like the locate and updatedb linux commands. It is open source.

Search engine based on the file content

Hot, to have a look on it!

indri
lucene which is developped in pure JAVA. There are many other implemantation in other language too, like C++ with Clucene. Open source and widely used.
swish-e
zettair, only support html, text. No PDF basically. You should convert anything to PS, then use ps2acii tool before indexing. Pretty boring.
doc fetcher which trig only the information that a file has been updated in background. It parses it again only when you run the application. It use a GUI. But if you don't have too much RAM, you would like to avoid to run the JVM.
strigi (old homepage) must be used with a backend engine likce clucene. Can work on WinXP, but you have to compile everything with cygwin.
datapark search is open source
mind retrieve indexes the web you have visited only. It can be usefull.
mendeley a collaborative pdf and any document indexer (wikipedia). Very usefull if you want to write some papers with many references. Mendeley is a free reference manager and academic social network
sphinx written in C++, works on WinXP

Other result

Basilic, server side
hyper estraider
refdb, in C
refbase in PHP
BM25 ranking (wikipedia)is present in lucene
sino. It has no dependance. You can compile it like this, even under windows. Simple to use but when I try to index a folder, it uses around 500Mo of RAM! Too much to me.
Wilma (new) and Wilbur (old version)
regain, in java, based on lucene

My combo winner are...

everything is impressive, amazing!
swish-e. It is used a command tool, and you have to use it with additionnal software such pdftohtml, pdf2txt, catdoc, src2dest filter and so on. It is very fast, the database looks to be quite ok about the size and it don't use too much memory when indexing (around 60Mo on my PC running WinXP)

Labels: clucene, dataparksearch, engine, everything, grep, index, indexer, indri, information retrieval, lucene, mendeley, retrieval, search, sino, strigi, swish-e, zettair

# posted by Phil @ 5:13 PM

Comments: Post a Comment

<< Home

Phil's Blog

Mobile Edition

Sunday, March 13, 2011

Search indexer on local desktop

Comparison links about desktop search engine

Indexer based only on the filename

Search engine based on the file content

Hot, to have a look on it!

Other result

My combo winner are...

About Me

The Web Radio

My Deezer Playlists

Links

archives