Wednesday, October 19, 2016

TokuDB Variables

TokuDB Variables

These are some of the important Tokudb system variables:

tokudb_analyze_time  #This is a boundary on the number of seconds an ANALYZE TABLE will operate on each index on each partition on a TokuDB table.


# Recommendation is to set tokudb_cache_size=75-80%, and tokudb_directio=1 if compressed data does not fits in RAM. Otherwise leave these values as defaults
tokudb_cache_size    #Like innodb_buffer_pool_size, this variable sets the amount of memory allocated by TokuDB for caching pages. T
tokudb_directio      #Setting tokudb_directio = 1 is like specifying innodb_flush_method = O_DIRECT. Which in turn means the OS should not cache pages requested by TokuDB.

# These two variable are similar in essence to innodb_flush_log_at_trx_commit. With tokudb_commit_sync = 1 (which is default) you get transaction log sync to disk per commit. When tokudb_commit_sync = 0, then tokudb_fsync_log_period dictates the interval between flushes. 
tokudb_commit_sync = 0  
tokudb_fsync_log_period = 1000  #means once per second

tokudb_load_save_space =1  #this parameter decides whether temporary file created on bulk load operations (e.g. ALTER TABLE) are compressed or uncompressed.  =1 recommended

sql_log_bin = 0 # Default is 1
open-files-limit = 40000  #Not sure if this depends on the OS number of open files like max_connection.


tokudb_alter_print_error=0  #Print errors for alter table operations. Dynamic
tokudb_block_size = 4MB # Uncompressed size of internal fractal tree and leaf nodes. Dynamic
tokudb_bulk_fetch = 1 #f set to 1 (the default), the bulk fetch algorithm is used for SELECT's and DELETE's, including related statements such as INSERT INTO.

tokudb_data_dir = /path/to/datadir #If not specified, percona datadir will be used

tokudb_disable_slow_alter = 1 #This variable determines whether certain slow [alter|ALTER]] table statements that cannot take advantage of this feature are permitted.
tokudb_last_lock_timeout = empty  #When a lock deadlock is detected, or a lock request times out, set to a JSON document describing the most recent lock conflict.
Its also important to note that most of the TokuDb? variables are dynamic.

You can view other TokuDB variables by just running the following command inside the db server

    mysql> show variables like 'tokudb%';

Other good resources about TokuDB variables are listed below:

    https://www.percona.com/doc/percona-tokudb/variables.html

In most cases, it is advised to use the default setting for TokuDB variables as most of their variables are dynamic, and will change dynamically following system changes.
One thing i think is missing with TokuDB is any facility similar to InnoDB buffer pool dump/load. With the innodb dump/load, you can easily warm up the innodb buffer pool if a server is restarted. TokuDB does not have this provision.

I know there is an alternative of warming up data using the select count(*) from table_name. The problem with this method is that it will only load nodes from the primary key index and not other indices that might exist on the table. It will traverse the entire PRIMARY KEY index and not just regions that are/were hot when shutdown and as such is likely pretty useless for the purposes of trying to reload the 'hot' FT nodes that were in memory prior to shutting down. If you have a lot more data than memory, nodes will immediately begin getting evicted once the cachetable becomes full and you will only be left with whatever the last nodes that were loaded and not evicted.

So in truth, even using the innodb dump/load mechanism will have in a way the same effect as the select count(*) method if you have more data than memory.
It would be interesting to find a better alternative to warming data such as using queries from the performance_schema i.e, those query groups that are normally run on the database. Since almost the same queries from Applications on kdb, we can preload these queries in a file where they are part of a transaction. Then whenever we restart the server and need to warm the data, we can just run this transaction that contains repeated groups of queries from the command line.

Another alternative would be to turn on slow log for maybe half a day or so, to log all running queries (so we set the time limit to 0 seconds) and not just the slow ones. Then we can retrieve these queries from the log and set them up in a new file as a transaction that we can run whenever we need to warm up the database.


No comments:

Post a Comment