Sunday, October 23, 2016

TokuDB Compression



When i am working on a project, research, simple assignment, or simply just reading something online, i always write down stuff in a text editor or our trac system (as a ticket).

This blog is taken from my study and test that i wrote in our trac system as a ticket, and thats why its format kind of looks this way. I also did not have time to clean it up, so it may appear as if its written for my team to read.


Explore the 4 TokuDB compression algorithms [1]:
  • compression
  • quicklz
  • zlib
  • lzma
REFERENCES
[1]https://github.com/percona/tokudb-engine/wiki/Patch-for-TokuDB-compression-algorithms
Last edited 9 months ago by george.chilumbu (previous) (diff)

comment:3Changed 9 months ago by george.chilumbu

mysql> CREATE TABLE tb_uncompressed(id int not null auto_increment, name varchar(20) not null, primary key(id))ENGINE=TokuDB, CHARSET=utf8mb4, COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_UNCOMPRESSED;                                                                       
Query OK, 0 rows affected (0.03 sec)

*************************** 1. row ***************************
       Table: tb_uncompressed
Create Table: CREATE TABLE `tb_uncompressed` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_UNCOMPRESSED
1 row in set (0.00 sec)

mysql> CREATE TABLE tb_quicklz(id int not null auto_increment, name varchar(20) not null, primary key(id))ENGINE=TokuDB, CHARSET=utf8mb4, COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_QUICKLZ;                                                                                 
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW CREATE TABLE tb_quicklz\G
*************************** 1. row ***************************
       Table: tb_quicklz
Create Table: CREATE TABLE `tb_quicklz` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_QUICKLZ
1 row in set (0.00 sec)

mysql> 
mysql> CREATE TABLE tb_zlib(id int not null auto_increment, name varchar(20) not null, primary key(id))ENGINE=TokuDB, CHARSET=utf8mb4, COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_ZLIB;                                                                                       
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW CREATE TABLE tb_zlib\G
*************************** 1. row ***************************
       Table: tb_zlib
Create Table: CREATE TABLE `tb_zlib` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
1 row in set (0.01 sec)

mysql> CREATE TABLE tb_lzma(id int not null auto_increment, name varchar(20) not null, primary key(id))ENGINE=TokuDB, CHARSET=utf8mb4, COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_LZMA;                                                                                       
Query OK, 0 rows affected (0.02 sec)

mysql> SHOW CREATE TABLE tb_lzma\G
*************************** 1. row ***************************
       Table: tb_lzma
Create Table: CREATE TABLE `tb_lzma` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_LZMA
1 row in set (0.00 sec)

REFERENCES
https://github.com/percona/tokudb-engine/wiki/Compression
Last edited 9 months ago by george.chilumbu (previous) (diff)

comment:4Changed 9 months ago by george.chilumbu

I was reading a block about TokuDB compression by Percona which said
"Compression is an always-on feature of TokuDB. There are no server/session variables to enable compression or change the compression level (one goal of TokuDB is to have as few tuning parameters as possible) [1]".
But just like InnoDB has the server variable,innodb_file_format, for compression, TokuDB also has a server variable, tokudb_row_format, that you can actually tune.

Its possible the article is old and maybe at the time it was written, there were really no server variables for TokuDB compression that you could tune.

So to change the compression level, you can do do something like this:
mysql> SHOW VARIABLES LIKE 'tokudb_row_format';                                                                                                                                                                                                                                 
+-------------------+-------------+
| Variable_name     | Value       |
+-------------------+-------------+
| tokudb_row_format | tokudb_zlib |
+-------------------+-------------+
1 row in set (0.00 sec)

mysql> set tokudb_row_format='tokudb_quicklz';
Query OK, 0 rows affected (0.01 sec)

mysql> show variables like 'tokudb_row_format';
+-------------------+----------------+
| Variable_name     | Value          |
+-------------------+----------------+
| tokudb_row_format | tokudb_quicklz |
+-------------------+----------------+
1 row in set (0.00 sec)

REFERENCES
[1] https://www.percona.com/blog/2011/09/15/compression-benchmarking-size-vs-speed-i-want-both/
Last edited 8 months ago by george.chilumbu (previous) (diff)

comment:5Changed 9 months ago by george.chilumbu

Add a subticket #143725. (Benchmark compressions)

comment:6Changed 9 months ago by george.chilumbu

Test converting compression from the one set for InnoDB to one of the compression types for TokuDB.
mysql> CREATE TABLE tb1(id int not null auto_increment, name varchar(20) not null, primary key(id)) ENGINE=INNODB, CHARSET=utf8mb4, COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED;                                                                                        
Query OK, 0 rows affected, 2 warnings (0.02 sec)

mysql> show create table tb1\G
*************************** 1. row ***************************
       Table: inn_tb
Create Table: CREATE TABLE `tb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
1 row in set (0.00 sec)
So to convert this table + the compression to TukuDb? and say,TOKUDB_QUICKLZ compression, we can first output the update statements to a file:
SELECT CONCAT("ALTER TABLE ", TABLE_SCHEMA,".",TABLE_NAME, " ENGINE=TokuDB, ROW_FORMAT=TOKUDB_QUICKLZ;") 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = 'sbtest' AND ENGINE='InnoDB' 
INTO OUTFILE '/net/account/admin/george.chilumbu/BK/tokudb.sql';
The file content will look like this
ALTER TABLE sbtest.tb1 ENGINE=TokuDB, ROW_FORMAT=TOKUDB_QUICKLZ;
BK/tokudb2.sql (END)
Now to update the table storage engine and compression, we can just run the statement in the file:
mysql> SOURCE /net/account/admin/george.chilumbu/BK/tokudb.sql;
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE tb1\G
*************************** 1. row ***************************
       Table: tb1
Create Table: CREATE TABLE `tb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=TOKUDB_QUICKLZ
1 row in set (0.00 sec)

Last edited 9 months ago by george.chilumbu (previous) (diff)

comment:7Changed 9 months ago by george.chilumbu

Changing the compression of a table only affects newly written data (dirtied blocks). After changing a table’s compression you can run OPTIMZE TABLE to rewrite all blocks of the table and its indexes.

So we can create OPTIMIZE TABLE statements in a file just like we did with the ALTER statements for the storage engines conversion.
SELECT CONCAT("OPTIMIZE TABLE ", TABLE_SCHEMA,".",TABLE_NAME, " ;") 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = 'test' 
INTO OUTFILE '/net/account/admin/george.chilumbu/BK/optimize_tables.sql';
To run these commands, we can use this command:
mysql>SOURCE /net/account/admin/george.chilumbu/BK/optimize_tables.sql

comment:8Changed 8 months ago by george.chilumbu

Compression is set on a per-table basis and is controlled by setting row format during a CREATE TABLE or ALTER TABLE. For example:
  CREATE TABLE tb1 (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(20) NOT NULL,
  PRIMARY KEY(id)) ENGINE=TokuDB ROW_FORMAT=row_format;
If the compression value of =row_format is in a create table as demonstrated above, this defaults to the zlib algorithm. However, if no row format is specified in a create table, the table is compressed using whichever row format is specified in the session variable tokudb_row_format as configured in the my.cnf file. If no row format is set nor is tokudb_row_format, the zlib compressor is used as default.

When configured in the my.cnf file, row_format and tokudb_row_format variables accept the following values:
TOKUDB_DEFAULT: This sets the compression to zlib library. 
TOKUDB_FAST: This sets the compression to use the quicklz library.
TOKUDB_SMALL: This sets the compression to use the lzma library.
Alternatively, you can set the compression value directly to override the row_format or tokudb_row_format set in the my.cnf file. So using the table above as an example, you can set the compression values to the following:
TOKUDB_ZLIB: Compress using the zlib library, which provides mid-range compression and CPU utilization.
TOKUDB_QUICKLZ: Compress using the quicklz library, which provides light compression and low CPU utilization.
TOKUDB_LZMA: Compress using the lzma library, which provides the highest compression and high CPU utilization.
TOKUDB_SNAPPY - This compression is using snappy library and aims for very high speeds and reasonable compression.
TOKUDB_UNCOMPRESSED: This setting turns off compression and is useful for tables with data that cannot be compressed.
So for example, if you want to use TOKUDB_LZMA, you can set that compression value in the create table as demonstrated below:
  CREATE TABLE tb1 (
  id INT NOT NULL AUTO_INCREMENT,
  name VARCHAR(20) NOT NULL,
  PRIMARY KEY(id)) ENGINE=TokuDB ROW_FORMAT=TOKUDB_LZMA;

comment:9Changed 8 months ago by george.chilumbu

I noticed that when a table has no compression directly specified in neither the create table nor in the my.cnf file, using the alter table to add compression is fairly fast and completes in fraction of a second. This is not the case with InnoDB, which can take a long time depending on the data size.

I also tried to change from compression Quicklz to LZMA, it also takes a fraction of a second, even on a table with 50 millions records like the one i was testing on. 
Last edited 8 months ago by george.chilumbu (previous) (diff)

comment:10Changed 8 months ago by george.chilumbu

One of the benefits of TOkuDB is that it uses significantly less disk space than InnoDB.

Let me try to explain why this is - and its kind of the same reason why TokuDB's fractal trees give the engine better performance. Its documented that Fractal Trees have much larger nodes of about 4MB compared to B-Tree's 8KB size nodes. Having large node size leads to high compression ratio. Compression controls disk space.

So putting it all together in an equation, large data node size = greater compression ratio = less disk space. But it is important to note that this does not necessarily mean better performance in terms of efficiency in processing transactions.

comment:11Changed 8 months ago by george.chilumbu

Summary

To provide a summary of the different compression algorithms available in TOkuDB, LZMA provides better compression, with about 20% - 40% additional compression over Quicklz [1]. However, LZMA uses more CPU. Quicklz and zlib use less CPU but generally compress at a lower level.

So between high compression LZMA and lower-level or standard compression Quicklz or zlib, the trade off is really between the amount of CPU usage in favor of Quicklz and compression level in favor of LZMA. As a rule of thumb, it is recommended that users use standard compression (Quicklz or zlib) on machines with six or fewer cores, and high compression (LZMA) on machines with more than six cores.

REFERENCES
[1] https://www.percona.com/doc/percona-server/5.6/tokudb/using_tokudb.html#compression-details

No comments:

Post a Comment