Tuesday, October 23, 2007

First conclusion: Good performance under small corpus

My first seriuos test for full-text searches is obteined when inserting 100 records in MySQL. Each record corresponds to a Wikipedia article. Then mean value in characters long for one of this articles is 5.000. So, the whole data dump would occupy 500 Kb.

On the other hand, if we measure the index size and the data size for the MySQL raw MyISAM files, these is the result:

# ls -laSh /var/lib/mysql/full_text_investigations/
total 992K
-rw-rw---- 1 mysql mysql 505K Oct 23 13:37 articles.MYD
-rw-rw---- 1 mysql mysql 454K Oct 23 13:37 articles.MYI
-rw-rw---- 1 mysql mysql 8.4K Oct 23 13:34 articles.frm

Here we have one first rough conclusion: "Index size = Data size = Dump size". 500 kb each one in this case.

I haven´t apply any tweacking to MySQL, so my.cnf is the default file, as follows:

# cat /etc/my.cnf



Here there is a brief system description for the server:

# free
total used free shared buffers cached
Mem: 255504 213016 42488 0 55620 81948
-/+ buffers/cache: 75448 180056
Swap: 0 0 0

That means, the server has got 256 Mb RAM and only 40 Mb remains idle.

# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU T5500 @ 1.66GHz
stepping : 8
cpu MHz : 1662.696
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss pni ds_cpl
bogomips : 3329.22

The CPU is a Celeron Core 2 Duo at 1.6 Ghz.

...to server version: 3.23.58

This is the MySQL server version.

# cat /etc/*release*
Fedora Core release 1 (Yarrow)

The Linux distrbution is a FC1.

# hdparm -tT /dev/sda1

Timing buffer-cache reads: 3968 MB in 2.00 seconds = 1984.00 MB/sec
Timing buffered disk reads: 62 MB in 3.00 seconds = 20.67 MB/sec

This is a quick way to test hard disk speed, for this SATA unit. It is not slow but it isn't very fast either. Today, some SATA drives reach 70 MB/sec.

Take attention to the point that having 256 Mb RAM, the whole database is cached in RAM (only 1 Mb for index+data).

The queries are extremely fast, as expected:

mysql> SELECT id, title, LEFT(body, 64) FROM articles WHERE MATCH (title,body) AGAINST ('keyword');
| id | title | LEFT(body, 64) |
| .. | ... | ... |
5 rows in set (0.00 sec)

Can´t be quickest.

So, the sencond big conclusion should be: "Good performance under small corpus".

Here, the corpus, as said, is 100 articles 5.000 characters length each one.

Next experiment: What if corpus is incremented till 10.000 articles?

No comments: