How to Optimize mySQL with @Mail
We recommend the use of mySQL for the database backend of @Mail. Our
tests show it is faster and easier to setup then any other SQL database
backend for @Mail. The @Mail software was designed with mySQL in mind,
although since @Mail uses the standard Perl DBI library the software is
compatible with any other SQL server.
The most common question regarding @Mail and mySQL is how to I optimize
the software to perform at it's best? Learn more by reading the following
FAQ!
What one can and should optimize
- Hardware
- OS / libraries
- SQL server (setup and queries)
- API
- Application
Optimizing hardware for MySQL
- If you need big tables ( > 2G), you should consider using 64 bit hardware
like Alpha, Sparc or the upcoming IA64. As MySQL uses a lot of 64 bit
integers internally, 64 bit CPUs will give much better performance.
- For large databases, the optimization order is normally RAM, Fast
disks, CPU power.
- More RAM can speed up key updates by keeping most of the used key
pages in RAM.
- If you are not using transaction-safe tables or have big disks and
want to avoid long file checks, a UPS is good idea to be able to take
the system down nicely in case of a power failure.
- For systems where the database is on a dedicated server, one should
look at 1G Ethernet. Latency is as important as throughput.
Optimizing disks
- Have one dedicated disk for the system, programs and for temporary
files. If you do very many changes, put the update logs and transactions
logs on dedicated disks.
- Low seek time is important for the database disk; For big tables you
can estimate that you will need: log(row_count) / log(index_block_length/3*2/(key_length
+ data_ptr_length))+1 seeks to find a row. For a table with 500,000
rows indexing a medium int: log(500,000)/log(1024/3*2/(3+4)) +1
= 4 seeks The above index would require: 500,000 * 7 * 3/2
= 5.2M. In real life, most of the blocks will be buffered, so probably
only 1-2 seeks are needed.
- For writes you will need (as above) 4 seek requests, however, to find
where to place the new key, and normally 2 seeks to update the index
and write the row.
- For REALLY big databases, your application will be bound by the speed
of your disk seeks, which increase by N log N as you get more
data.
- Split databases and tables over different disks. In MySQL you can
use symbolic links for this.
- Striping disks (RAID 0) will increase both read and write throughput.
- Striping with mirroring (RAID 0+1) will give you safety and increase
the read speed. Write speed will be slightly lower.
- Don't use mirroring or RAID (except RAID 0) on the disk for temporary
files or for data that can be easily re-generated..
- On Linux use
hdparm -m16 -d1 on the disks on boot to
enable reading/writing of multiple sectors at a time, and DMA. This
may increase the response time by 5-50 %.
- On Linux, mount the disks with
async (default) and noatime.
- For some specific application, one may want to have a ram disk for
some very specific tables, but normally this is not needed.
Optimizing OS
- No swap; If you have memory problems, add more RAM instead or configure
your system to use less memory.
- Don't use NFS disks for data (you will have problems with NFS locking).
- Increase number of open files for system and for the SQL server. (add
ulimit -n # in the safe_mysqld script).
- Increase the number of processes and threads for the system.
- If you have relatively few big tables, tell your file system to not
break up the file on different cylinders (Solaris).
- Use file systems that support big files (Solaris).
- Choose which file system to use; Reiserfs on Linux is very fast for
open, read and write. File checks take just a couple of seconds.
If you need more speed, you should:
- Find the bottleneck (CPU, disk, memory, SQL server, OS, API, or application)
and concentrate on solving this.
- Use extensions that give you more speed / flexibility.
- Get to know your SQL server so that you can use the fastest possible
SQL constructs for your problem and avoid bottlenecks.
- Optimize your table layouts and queries.
- Use replication to get more select speed.
- If you have a slow net connection to the database, use the compressed
client/server protocol.
Don't be afraid to make the first version of your application not perfectly
portable; when you have solved your problem, you can always optimize it
later.
Compiling and installing MySQL
- By choosing the best possible compiler for your system, you can usually
get 10-30 % better performance.
- On Linux/Intel, compile MySQL with pgcc. (The Pentium optimized version
of gcc). The binary will only work with Intel Pentium CPUs, however.
- Use the optimize options that are recommended in the MySQL manual
for a particular platform.
- Normally a native compiler for a specific CPU (like Sun Workshop for
Sparc) should give better performance than gcc, but this is not always
the case.
- Compile MySQL with only the character sets you are going to use.
- Compile the mysqld executable statically (with
--with-mysqld-ldflags=-all-static)
and strip the final executable with strip sql/mysqld.
- Note that as MySQL doesn't use C++ exceptions, compiling MySQL without
exceptions support will give a big performance win!
- Use native threads (instead of the included mit-pthreads) if your
OS supports native threads.
- Test the resulting binary with the MySQL benchmark test.
Maintenance
- If possible, run
OPTIMIZE table once in a while. This
is especially important on variable size rows that are updated a lot.
- Update the key distribution statistics in your tables once in a while
with
myisamchk -a; Remember to take down MySQL before doing
this!
- If you get fragmented files, it may be worth it to copy all files
to another disk, clear the old disk and copy the files back.
- If you have problems, check your tables with myisamchk or
CHECK
table.
- Monitor MySQL status with:
mysqladmin -i10 processlist extended-status
- With the MySQL GUI client you can monitor the process list and the
status in different windows.
- Use
mysqladmin debug to get information about locks and
performance.
Speed difference between different SQL servers (times in
seconds)
| Reading 2000000 rows by key: |
NT |
Linux |
| mysql |
367 |
249 |
| mysql_odbc |
464 |
|
| db2_odbc |
1206 |
|
| informix_odbc |
121126 |
|
| ms-sql_odbc |
1634 |
|
| oracle_odbc |
20800 |
|
| solid_odbc |
877 |
|
| sybase_odbc |
17614 |
|
| |
| Inserting (350768) rows: |
NT |
Linux |
| mysql |
381 |
206 |
| mysql_odbc |
619 |
|
| db2_odbc |
3460 |
|
| informix_odbc |
2692 |
|
| ms-sql_odbc |
4012 |
|
| oracle_odbc |
11291 |
|
| solid_odbc |
1801 |
|
| sybase_odbc |
4802 |
In the above test, MySQL was run with a 8M cache; the other databases
were run with installations defaults.
Important MySQL startup options
back_log |
Change if you do a lot of new connections. |
thread_cache_size |
Change if you do a lot of new connections. |
key_buffer_size |
Pool for index pages; Can be made very big |
bdb_cache_size |
Record and key cache used by BDB tables. |
table_cache |
Change if you have many tables or simultaneous connections |
delay_key_write |
Set if you need to buffer all key writes |
log_slow_queries |
Find queries that takes a lot of time |
max_heap_table_size |
Used with GROUP BY |
sort_buffer |
Used with ORDER BY and GROUP BY |
myisam_sort_buffer_size |
Used with REPAIR TABLE |
join_buffer_size |
When doing a join without keys |
Optimizing tables
- MySQL has a rich set of different types. You should try to use the
most efficient type for each column.
- The
ANALYSE procedure can help you find the optimal types
for a table: SELECT * FROM table_name PROCEDURE ANALYSE()
- Use
NOT NULL for columns which will not store null values.
This is particularly important for columns which you index.
- Change your ISAM tables to MyISAM.
- If possible, create your tables with a fixed table format.
- Don't create indexes you are not going to use.
- Use the fact that MySQL can search on a prefix of an index; If you
have and
INDEX (a,b), you don't need an index on (a).
- Instead of creating an index on long
CHAR/VARCHAR
column, index just a prefix of the column to save space. CREATE
TABLE table_name (hostname CHAR(255) not null, index(hostname(10)))
- Use the most efficient table type for each table.
- Columns with identical information in different tables should be declared
identically and have identical names.
How MySQL stores data
- Databases are stored as directories.
- Tables are stored as files.
- Columns are stored in the files in dynamic length or fixed size format.
In BDB tables the data is stored in pages.
- Memory-based tables are supported.
- Databases and tables can be symbolically linked from different disks.
- On Windows MySQL supports internal symbolic links to databases with
.sym files.
MySQL table types
- HEAP tables; Fixed row size tables that are only stored in memory
and indexed with a HASH index.
- ISAM tables; The old B-tree table format in MySQL 3.22.
- MyISAM tables; New version of the ISAM tables with a lot of extensions:
- Binary portability.
- Index on NULL columns.
- Less fragmentation for dynamic-size rows than ISAM tables.
- Support for big files.
- Better index compression.
- Better key statistics.
- Better and faster auto_increment handling.
- Berkeley DB (BDB) tables from Sleepycat: Transaction-safe (with
BEGIN
WORK / COMMIT | ROLLBACK).
MySQL row types (only relevant for ISAM/MyISAM tables)
- MySQL will create the table in fixed size table format if all columns
are of fixed length format (no
VARCHAR, BLOB
or TEXT columns). If not, the table is created in dynamic-size
format.
- Fixed-size format is much faster and more secure than the dynamic
format.
- The dynamic-size row format normally takes up less space but may be
fragmented over time if the table is updated a lot.
- In some cases it's worth it to move all
VARCHAR, BLOB
and TEXT columns to another table just to get more speed
on the main table.
- With
myisampack (pack_isam for ISAM) one
can create a read-only, packed table. This minimizes disk usage which
is very nice when using slow disks. The packed tables are perfect to
use on log tables which one will not update anymore.
MySQL caches (shared between all threads, allocated once)
- Key cache ; key_buffer_size, default 8M
- Table cache ; table_cache, default 64
- Thread cache ; thread_cache_size, default 0.
- Hostname cache ; Changeable at compile time, default 128.
- Memory mapped tables ; Currently only used for compressed tables.
Note that MySQL doesn't have a row cache, but lets the OS handle this!
MySQL buffer variables (not shared, allocated on demand)
- sort_buffer ;
ORDER BY / GROUP BY
- record_buffer ; Scanning tables
- join_buffer_size ; Joining without keys
- myisam_sort_buffer_size ;
REPAIR TABLE
- net_buffer_length ; For reading the SQL statement and buffering the
result.
- tmp_table_size ; HEAP-table-size for temporary results.
How the MySQL table cache works
- Each open instance of a MyISAM table uses an index file and a data
file. If a table is used by two threads or used twice in the same query,
MyISAM will share the index file but will open another instance of the
data file.
- The cache will temporarily grow larger than the table cache size if
all tables in the cache are in use. If this happens, the next table
that is released will be closed.
- You can check if your table cache is too small by checking the mysqld
variable Opened_tables. If this value is high you should increase your
table cache!
Learn to use EXPLAIN Use
EXPLAIN on every
query that you think is too slow!
mysql> explain select t3.DateOfAction, t1.TransactionID
-> from t1 join t2 join t3
-> where t2.ID = t1.TransactionID and t3.ID = t2.GroupID
-> order by t3.DateOfAction, t1.TransactionID;
+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
| t1 | ALL | NULL | NULL | NULL | NULL | 11 | Using temporary; Using filesort |
| t2 | ref | ID | ID | 4 | t1.TransactionID | 13 | |
| t3 | eq_ref | PRIMARY | PRIMARY | 4 | t2.GroupID | 1 | |
+-------+--------+---------------+---------+---------+------------------+------+---------------------------------+
Types
ALL and
range signal a potential
problem.
Learn to use SHOW PROCESSLIST Use
SHOW processlist
to find out what is going on:
+----+-------+-----------+----+---------+------+--------------+-------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+----+---------+------+--------------+-------------------------------------+
| 6 | monty | localhost | bp | Query | 15 | Sending data | select * from station,station as s1 |
| 8 | monty | localhost | | Query | 0 | | show processlist |
+----+-------+-----------+----+---------+------+--------------+-------------------------------------+
Use
KILL in
mysql or
mysqladmin
to kill off runaway threads.
How to find out how MySQL solves a query Run the following
commands and try to understand the output:
SHOW VARIABLES;
SHOW COLUMNS FROM ...\G
EXPLAIN SELECT ...\G
FLUSH STATUS;
SELECT ...;
SHOW STATUS;
MySQL is extremely good
- For logging.
- When you do many connects; connect is very fast.
- Where you use
SELECT and INSERT at the same
time.
- When you don't combine updates with selects that take a long time.
- When most selects/updates are using unique keys.
- When you use many tables without long conflicting locks.
- When you have big tables (MySQL uses a very compact table format).
Things to avoid with MySQL
- Updates to a table or
INSERT on a table with deleted
rows, combined with SELECTS that take a long time.
HAVING on things you can have in a WHERE
clause.
JOINS without using keys or keys which are not unique
enough.
JOINS on columns that have different column types.
- Using HEAP tables when not using a full key match with
=
Tricks to give MySQL more information to solve things better
Note that you can always comment out a MySQL feature to make the query portable:
SELECT /*! SQL_BUFFER_RESULTS */ ...
SELECT SQL_BUFFER_RESULTS ...
Will force MySQL to make a temporary result set. As soon as the temporary
set is done, all locks on the tables are released. This can help when
you get a problem with table locks or when it takes a long time to transfer
the result to the client.
SELECT SQL_SMALL_RESULT ... GROUP BY ...
To tell the optimizer that the result set will only contain a few rows.
SELECT SQL_BIG_RESULT ... GROUP BY ...
To tell the optimizer that the result set will contain many rows.
SELECT STRAIGHT_JOIN ...
Forces the optimizer to join the tables in the order in which they are
listed in the FROM clause.
SELECT ... FROM table_name [USE INDEX (index_list) | IGNORE
INDEX (index_list)] table_name2
Forces MySQL to use/ignore the listed indexes.
General tips
- Use short primary keys. Use numbers, not strings, when joining tables.
- When using multi-part keys, the first part should be the most-used
key.
- When in doubt, use columns with more duplicates first to get better
key compression.
- If you run the client and MySQL server on the same machine, use sockets
instead of TCP/IP when connecting to MySQL (this can give you up to
a 7.5 % improvement). You can do this by specifying no hostname or
localhost
when connecting to the MySQL server.
- Use
--skip-locking (default on some OSes) if possible.
This will turn off external locking and will give better performance.
- Use application-level hashed values instead of using long keys:
SELECT * FROM table_name WHERE hash=MD5(concat(col1,col2)) AND
col_1='constant' AND col_2='constant'
- Store BLOB's that you need to access as files in files. Store only
the file name in the database.
- It is faster to remove all rows than to remove a large part of the
rows.
- If SQL is not fast enough, take a look at the lower level interfaces
to access the data.