Pages

Sunday, March 13, 2011

Search indexer on local desktop

Hi Foxes!

Have you ever wanted to use something like windows search or google desktop software, but lighter, with a small footprint memory, fast and open source? I'm tired about windows search, which I don't use very often, but when you need it it is pretty cool.

So today I was looking for 2 kind of tools :
  • one open source indexer / search engine just on the filename, which will be the most used,
  • one just to index the file contents, like pdf, html, doc, ppt, dejavu and so on





Comparison links about desktop search engine

Indexer based only on the filename

I have found :
  • everything is a freeware, light and fast. It works only with a NTFS filesystem and on windows.
  • locate32 is working like the locate and updatedb linux commands. It is open source.

Search engine based on the file content

Hot, to have a look on it!

  • indri
  • lucene which is developped in pure JAVA. There are many other implemantation in other language too, like C++ with Clucene. Open source and widely used.
  • swish-e
  • zettair, only support html, text. No PDF basically. You should convert anything to PS, then use ps2acii tool before indexing. Pretty boring.
  • doc fetcher which trig only the information that a file has been updated in background. It parses it again only when you run the application. It use a GUI. But if you don't have too much RAM, you would like to avoid to run the JVM.
  • strigi (old homepage) must be used with a backend engine likce clucene. Can work on WinXP, but you have to compile everything with cygwin.
  • datapark search is open source
  • mind retrieve indexes the web you have visited only. It can be usefull.
  • mendeley a collaborative pdf and any document indexer (wikipedia). Very usefull if you want to write some papers with many references. Mendeley is a free reference manager and academic social network
  • sphinx written in C++, works on WinXP

Other result

My combo winner are...

  • everything is impressive, amazing!
  • swish-e. It is used a command tool, and you have to use it with additionnal software such pdftohtml, pdf2txt, catdoc, src2dest filter and so on. It is very fast, the database looks to be quite ok about the size and it don't use too much memory when indexing (around 60Mo on my PC running WinXP)

No comments:

Post a Comment