Wednesday, October 24, 2007

Rebooting to avoid cache biasing in the benchmarks

When the server is booted all the filesystem caches are clean. Therefore any data that MySQL needs to read has to be taken out directly from the hard disk. This way we are able to simulate a RAM lacking situation; and this will be very common when working with huge databases.

In the following experiments our database holds 30.000 registers (15 million words). Here, the size measures:

# ls -laSh /var/lib/mysql/full_text_investigations/
total 178M
-rw-rw---- 1 mysql mysql 97M Oct 23 21:07 articles.MYD
-rw-rw---- 1 mysql mysql 81M Oct 23 21:07 articles.MYI
-rw-rw---- 1 mysql mysql 8.4K Oct 23 16:25 articles.frm

The SQL dump file occupies 96Mb. Again the rule "Index size = Data size = Dump size" works reliably. Regarding that our server only has 256 Mb RAM we can expect that most of it will be consumed in caching data and index files. Hence, MySQL is going to respond fast, but it will reach the maximum limit of data for this situation.

Now, we reboot the server and take some measures:

# ps axv | grep sql
1609 ? S 0:00 276 566 5365 1160 0.4 /bin/sh /usr/bin/safe_mysqld
1633 ? S 0:00 767 3094 27097 5140 2.0 /usr/libexec/mysqld


# free
total used free shared buffers cached
Mem: 255504 50424 205080 0 6152 27052
-/+ buffers/cache: 17220 238284
Swap: 0 0 0


65 rows in set (1.68 sec)
165 rows in set (2.95 sec)
256 rows in set (3.65 sec)
27 rows in set (0.63 sec)
5 rows in set (0.16 sec)
Empty set (0.00 sec)
28 rows in set (0.55 sec)
236 rows in set (3.34 sec)
30 rows in set (0.54 sec)
75 rows in set (1.38 sec)

ps axv | grep sql
1609 ? S 0:00 276 566 5365 1160 0.4 /bin/sh /usr/bin/safe_mysqld
1633 ? S 0:00 767 3094 27225 5600 2.1 /usr/libexec/mysqld

# free
total used free shared buffers cached
Mem: 255504 83804 171700 0 6624 58708
-/+ buffers/cache: 18472 237032
Swap: 0 0 0

The command "ps axv | grep sql" let us know how much memory is consuming MySQL for its own. We see that before querying as after doing it, MySQL expends just 27 Mb.

Nevertheless, the cache size has increased considerably jumping from 27Mb at first to 58 Mb after querying.

We may conclude the filesystem has not cached completely the whole database, but only some chunks of data, precisely those needed to respond the ten predefined queries.

What about performance? Watch the timings. When there is no cache, we confirm the rough rule of thumb "0.01 seconds to obtain 1 register".

Now we measure times once data is cached:

65 rows in set (0.01 sec)
165 rows in set (0.01 sec)
256 rows in set (0.01 sec)
27 rows in set (0.00 sec)
5 rows in set (0.00 sec)
Empty set (0.00 sec)
28 rows in set (0.01 sec)
236 rows in set (0.01 sec)
30 rows in set (0.00 sec)
75 rows in set (0.00 sec)


Obvious benchmarks are these. When data is cached, respond times are nearly zero.

In the next experiment we will manipulate cache filesystem to force whole database to be cached.

No comments: