Saturday, October 22, 2016

Exploring Fractal Tree Indexes

While doing research and testing TokuDB, i also explored Fractal Trees, which are a type of index that TokuDB uses just like InnoDB use Btree and Fulltext.

According to Wikipedia, "a Fractal Tree index is a tree data structure that keeps data sorted and allows searches and sequential access in the same time as a B-tree but with insertions and deletions that are asymptotically faster than a B-tree."

It is important to mention that Fractal Tree indexes are to some extent based on B-Tree, but with key improvements to enhance performance. So how do Fractal Tree indexes achieve this better performance than B-Tree indexes? The answer lies in I/O. It is well known that for traditional storage engines using B-Tree indexes, their performance is great when their working data set fits in memory. In fact, TokuDB's fractal trees do not have any inherent advantage over InnoDB or other B-Tree based storage engines if data fits in memory.

With B-Tree indexing storage engines like InnoDB, the bottleneck hits when the data is bigger than can fit in main memory. However with TokuDB, performance remains consistence even when the data does not fit main memory anymore. This is basically where Fractal Tree indexing gives TokuDB a performance advantage over InnoDB and other engines using B-Tree indexing. According to the Percona documentation [1], "The innovation of Fractal Tree indexes is that as your working set grows larger than main memory, write performance stays consistent. With each I/O servicing many more writes, Fractal Tree indexes reduce the amount of I/O done by a LARGE factor, thereby eliminating the I/O bottleneck that B-Trees have". Because of this I/O reduction, Fractal Tree indexes don’t require indexes to fit in memory, and TokuDB is able to achieve such high sustained write performance on data that does not fit in memory. This innovation is why Fractal Tree indexes perform so well on write-heavy benchmarks [1].

It is important to note that TokuDB will not take some well running existing system and enhance its performance. TokuDB will ensure that systems will continue to perform well as data grows and eventually no-longer fits into memory.

Below are more results of my findings:

Each Fractal Tree Index is stored in it's own file. The good side about about this is that space allocated to an index is reclaimed back to the filesystem almost instantly when an index is dropped. The bad side about it is that you end up with too many created files in your datadir as demonstrated below. The datadir looks overwhelming compared to what we are used to.

george.chilumbu@db-kdb-slave-5:~$ sudo ls -l /var/lib/mysql/
total 262400
-rw-rw---- 1 mysql mysql       56 Feb  3 13:14 auto.cnf
-rw-r----- 1 mysql root     38995 Feb  5 11:38 db-kdb-slave-5.err
-rw-rw---- 1 mysql mysql        6 Feb  5 11:38 db-kdb-slave-5.pid
-rw-rw---- 1 mysql mysql     2224 Feb  5 11:38 db-kdb-slave-5-slow.log
-rw-r--r-- 1 root  root         0 Feb  3 13:13 debian-5.6.flag
-rw-rw---- 1 mysql mysql 67108864 Feb  5 11:38 ibdata1
-rw-rw---- 1 mysql mysql 67108864 Feb  5 11:38 ibdata2
-rw-rw---- 1 mysql mysql 67108864 Feb  5 11:38 ib_logfile0
-rw-rw---- 1 mysql mysql 67108864 Feb  5 11:38 ib_logfile1
-rw------- 1 mysql mysql    71688 Feb  5 13:02 log000000000011.tokulog29
drwx------ 2 mysql mysql     4096 Feb  3 13:14 mysql
-rw-rw---- 1 mysql mysql      174 Feb  5 10:39 mysqld-bin.000001
-rw-rw---- 1 mysql mysql      174 Feb  5 10:46 mysqld-bin.000002
-rw-rw---- 1 mysql mysql      174 Feb  5 11:36 mysqld-bin.000003
-rw-rw---- 1 mysql mysql     1435 Feb  5 13:01 mysqld-bin.000004
-rw-rw---- 1 mysql mysql       80 Feb  5 11:38 mysqld-bin.index
-rw------- 1 mysql mysql       11 Feb  3 13:14 mysql_upgrade_info
drwx------ 2 mysql mysql     4096 Feb  3 13:14 performance_schema
drwx------ 2 mysql mysql     4096 Feb  5 13:01 test
-rw-rw---- 1 mysql mysql    32768 Feb  5 13:02 tokudb.directory
-rw-rw---- 1 mysql mysql    16384 Feb  3 13:37 tokudb.environment
-rw------- 1 mysql mysql        0 Feb  3 13:37 __tokudb_lock_dont_delete_me_data
-rw------- 1 mysql mysql        0 Feb  3 13:37 __tokudb_lock_dont_delete_me_environment
-rw------- 1 mysql mysql        0 Feb  3 13:37 __tokudb_lock_dont_delete_me_logs
-rw------- 1 mysql mysql        0 Feb  3 13:37 __tokudb_lock_dont_delete_me_recovery
-rw------- 1 mysql mysql        0 Feb  3 13:37 __tokudb_lock_dont_delete_me_temp
-rw-rw---- 1 mysql mysql    32768 Feb  5 13:02 tokudb.rollback
george.chilumbu@db-kdb-slave-5:~$
To avoid all these extra files created by TOkuDB and fractal Tree indexes, i have created a second datadir just for files as a result of Fractal indexes. So in the test database, i create a table called tb1 with indexes as demonstrated below and try to see what files will be created by the Fractal Tree indexes.
CREATE TABLE tb1(
    id NOT NULL AUTO_INCREMENT,
    name VARCHAR(20) NOT NULL,
    email VARCHAR(20) NOT NULL,
    PRIMARY KEY(id),
    KEY name(name),
    CLUSTERING KEY email(email)
   ) ENGINE=TokuDB CHARSET=utf8mb4;
And the new toku_datadir looks like this:
george.chilumbu@db-kdb-slave-5:~$ ls /var/lib/tokudb/
total 136
drwxr-xr-x  2 mysql  4096 Feb  5 13:01 ./
drwxr-xr-x 54 root   4096 Feb  4 19:48 ../
-rw-rw----  1 mysql 16896 Feb  5 13:01 _test_tb1_key_email_22_3_1d_B_0.tokudb
-rw-rw----  1 mysql 16896 Feb  5 13:01 _test_tb1_key_name_1e_3_1d_B_0.tokudb
-rw-rw----  1 mysql 32768 Feb  5 13:00 _test_tb1_main_12_2_1d.tokudb
-rw-rw----  1 mysql 65536 Feb  5 13:01 _test_tb1_status_12_1_1d.tokudb
-rw-------  1 mysql     0 Feb  4 20:02 __tokudb_lock_dont_delete_me_data
-rw-------  1 mysql     0 Feb  4 20:02 __tokudb_lock_dont_delete_me_temp
george.chilumbu@db-kdb-slave-5:~$
And when you delete the table, the toku_datadir
george.chilumbu@db-kdb-slave-5:~$ ls /var/lib/tokudb/
total 8
drwxr-xr-x  2 mysql 4096 Feb  5 13:40 ./
drwxr-xr-x 54 root  4096 Feb  4 19:48 ../
-rw-------  1 mysql    0 Feb  4 20:02 __tokudb_lock_dont_delete_me_data
-rw-------  1 mysql    0 Feb  4 20:02 __tokudb_lock_dont_delete_me_temp

comment:5Changed 8 months ago by george.chilumbu

One of the keys to exploiting TokuDB’s strength in indexing is to make use of clustering secondary indexes. Clustering index store data by key and include the entire document as the payload (rather than pointer to the document). Percona Server parser and query optimizer support Multiple Clustering Keys when TokuDB engine is used. This means that the query optimizer will avoid primary clustered index reads and replace them by secondary clustered index reads in certain scenarios.

To create a primary key on a TOkuDB table, you can do
mysql> CREATE TABLE tb1 (
    ->   id INT NOT NULL auto_increment,
    ->   name VARCHAR(20) NOT NULL,
    ->   email VARCHAR(64) NOT NULL,
    ->   PRIMARY KEY(id),
    ->   CLUSTERING KEY (email)) ENGINE = TokuDB;
Query OK, 0 rows affected (0.02 sec)

mysql> show create table tb1\G
*************************** 1. row ***************************
       Table: tb1
Create Table: CREATE TABLE `tb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL,
  `email` varchar(64) NOT NULL,
  PRIMARY KEY (`id`),
  CLUSTERING KEY `email` (`email`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
The primary table is indexed on column id. But as we see, there is a secondary clustering index on column email. Unlike non-clustered indexes, clustering indexes include all the columns of a table and can be used as covering indexes. So the query below would run much faster on a clustering index on column email:
SELECT name FROM tb1 WHERE email IN ('johndoe@gmail.com','anndoe@yahoo.com','pgriffin@hotmail.com');

Since the WHERE clause is on the clustering index, and includes the column 'name', the query will perform super fast as it avoids lookups in the primary table to satisfy the query. TokuDB makes clustering indexes feasible because of its excellent compression and very high indexing rates. Some benchmarks to see if clustering indexes + high compression would increase server performance as far as reads are concerned would be something to consider.

We already know that Fractal Tree are all about start fast, stay fast as opposed to B-Tree which start fast, and slow down as data grows, especially if it grows more than the size of available memory. We also know that Fractal-Tree reduce IO significantly.
Last edited 8 months ago by george.chilumbu (previous) (diff)

comment:6Changed 8 months ago by george.chilumbu

One cool feature with TOkuDB is the ability to add indexes to an existing table without locking. This feature is referred to as Hot Index creation. With Hot Index creation, inserts and queries would continue to work on that table as normal while the index is being created. But for Hot Index creation to work, the tokudb_create_index_online variables has to be enabled in the my.cnf file, or temporarily using the following command:
mysql> SET tokudb_create_index_online=on;
Hot index creation is invoked using the CREATE INDEX command after setting tokudb_create_index_online as demonstrated below:
mysql> show create table tb1\G
*************************** 1. row ***************************
       Table: tb1
Create Table: CREATE TABLE `tb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL,
  `email` varchar(64) NOT NULL,
  PRIMARY KEY (`id`),
  CLUSTERING KEY `email` (`email`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> CREATE INDEX name ON tb1 (name);
Query OK, 0 rows affected (0.63 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table tb1\G
*************************** 1. row ***************************
       Table: tb1
Create Table: CREATE TABLE `tb1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(20) NOT NULL,
  `email` varchar(64) NOT NULL,
  PRIMARY KEY (`id`),
  CLUSTERING KEY `email` (`email`),
  KEY `name` (`name`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Using the alter table statement puts a lock on the table while creating an index, and would make a table unavailable for a long period of time if the table has a large data set.
How fast it takes adding an index using the Hot creating an index depends on how busy the mysqld server is with other tasks. You can use the SHOW PROCESSLIST command to see the progress of the hot index creation.

If more than one hot CREATE INDEX is issued for a particular table, the indexes will be created serially. An index creation that is waiting for another to complete will be shown as Locked in SHOW PROCESSLIST. It is recommended that each CREATE INDEX be allowed to complete before the next one is started [1].

CONCLUSION:
using pt-online-schema-change also does not block the table. The above was just to highlight that an internal feature exists within TokuDB that allows adding an index without blocking the table. But with InnoDB tables, i always use pt-online-schema-change and it works great for altering tables, especially large tables.

I noticed that TokuDB does not pre-allocate memory to buffer pools as InnoDB does. The reason is that InnoDB nodes/pages are fixed size and map directly to disk block of the same size within the InnoDB data files. An InnoDB node/page never moves within the data file and can be very easily calculated exactly where it will reside within the file.

TokuDB is quite different in that its nodes can vary in their physical size and may even (temporarily) exceed the defined node size for the indices. The TokuDB data files operate much like dynamic memory complete with dead/unused space within and nodes have no permanent location within the file. The TokuDB cache is allocated as needed as TokuDB nodes and node partitions (message buffers, basement nodes) are loaded into memory and evicted.

REFERENCES:
[1] https://www.percona.com/doc/percona-server/5.6/tokudb/using_tokudb.html#compression-details
[2] https://www.percona.com/blog/2013/07/02/tokumx-fractal-treer-indexes-what-are-they/

No comments:

Post a Comment