MySQL Internals: Full-Text investigations: Discovery of Sphinx and others

I have written several people, forums and a Usenet group about the issue of "MySQL reading MYD when not necessary", but I haven't still gathered replies enough that gives me some light to solve this problem. So I will carry on and be patient.

Meanwhile, while trying to find out something on my own about this subject, I have discovered some important projects I didn't know till now. This projects change the focus of this text-full investigations.

Remember that the main goal in these investigations is to achieve a MySQL search engine performance acceptable when using huge databases. I thought MySQL could be tweaked or patched in some way that resolves search queries in 0.1 seconds instead of 1 minute or 10 minutes. These are real benchmarks when a table holds 1 million records and the server has nor enough RAM to cache the whole data file.

I still think it can be done if we could (I don't know how) create a weighted index and could constrain the query to be fulfilled by reading uniquely the index MYI file. This way, it would be necessary to read and process the index value for every keyword of the full-text query and read only the 10 (for example) records in data MYD file that best match the search. This reading I/O process could take 0.01 seconds for each record, so the query could be resolved in 0.12 seconds taking into account that the main bottleneck is I/O reading.

My recent discoveries are the existence of Sphinx project driven by Andrew Aksyonoff, and also other fast full-text engines like Mnogosearch, Lucene/DBSight, tbgSearch, Sienna, Zettair, Sphider, TSEP, PHPDig, iSearch, RiSearch, SiteSearch, Simple Web Search, IndexServer and Xapian.

There is a good review of them at:

http://www.onlamp.com/pub/a/php/2006/02/16/search-engine-showdown.html

MySQL Internals: Full-Text investigations

Friday, October 26, 2007

Discovery of Sphinx and others

No comments: