Tuesday, October 18, 2016

MySQL Performance Schema

Performance_schema


performance_schema.events_statements_summary_by_digest

"The performance schema includes a set of tables that give information on how statements are performing" [3].

This post focuses on the events_statements_summary_by_digest table, so thats what we'll do. The events_statements_summary_by_digest provides aggregated statement digest information. It normalizes and groups statements with the same “signature” and allow questions to be answered about the types of statements the mysql server is executing and also their frequency. Normalized statements in this case means Stripping whitespace in queries, stripping comments, relacing literals (integer and string inputs) with a “?” placeholder etc. Some examples of normalized statements when used in this contest include [1]:


  • Replacing literals (integer and string inputs) with a “?” placeholder
    • SELECT foo, bar FROM foobar WHERE foo = 100; becomes SELECT foo, bar FROM foobar WHERE foo = ?
    • For a single value INSERT statement, INSERT INTO foobar VALUES (100); becomes INSERT INTO foobar VALUES (?);
  • Lists of values are folded
    • Folding lists of IN values
      • SELECT foo FROM foobar WHERE bar IN (1, 2, 3) becomes SELECT foo FROM foobar WHERE bar IN (…)

The schema_name/digest fields are the grouping fields for this table and determine the summary of events for each record. Though not directly defined in the schema, the schema_name and digest fields act more like composite primary keys for records on the table. So the combination of these two fields makes each record in the table unique.

The DIGEST field contains a MD5 hashed value, while the DIGEST_TEXT is the corresponding normalized statement text string.

On a specific schema, the DIGEST_TEXT is distinct (hence the term grouping) since the use of the same query will be recorded only once, with the COUNT_STAR field providing a counter value of how often the statement occurs, or in other words, how many digests are recorded in a record. Extra info indicating when the query was first and last used is also provided by the FIRST_SEEN and LAST_SEEN fields.

Times units in Performance Schema are displayed in picosecond granularity. But you can try to change to readable values.

Limitations

The events_statements_summary_by_digest table is limited in size by the performance_schema_digests_size system variable. The size set on our kdb servers (one of our main databases) is 10,000 i.e, unto 10,000 rows can be stored in this table. I think 10,000 is the default for Percona Servers. You can use the parameter performance_schema_digests_size to check the size limit on your server [2]. You can run the following command in mysql to check the size:

The events_statements_summary_by_digest table does not record anything about prepared statements. The only recorded statements recorded are those of full text regular SQL.

     mysql> show variables like 'performance_schema_digests_size';


To change this variable, e.g, to 20,000, you can add the parameter in the my.cnf file under [mysqld] like:

      performance_schema_digests_size = 20000



This is a read-only variable and therefore cannot be added using the set command dynamically.

If you want to reset your statistics in the performance_schema, you can run the command below:

     mysql> TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;


## List normalized queries with the highest latency in kkox

SELECT IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
       COUNT_STAR AS exec_count,
       SUM_NO_INDEX_USED AS no_index_used_count,
       SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
       SEC_TO_TIME(SUM_TIMER_WAIT/1000000000000) AS exec_time_total,
       SEC_TO_TIME(MAX_TIMER_WAIT/1000000000000) AS exec_time_max,
       (AVG_TIMER_WAIT/1000000000) AS exec_time_avg_ms,
       SUM_ROWS_SENT AS rows_sent,
       ROUND(SUM_ROWS_SENT / COUNT_STAR) AS rows_sent_avg,
       SUM_ROWS_EXAMINED AS rows_scanned,
       FIRST_SEEN AS first_executed,
       LAST_SEEN AS last_executed,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox'
  ORDER BY SUM_TIMER_WAIT DESC LIMIT 1\G;

## List all normalized statements that use temporary tables ordered by number of on disk temporary tables descending first, then by the number of memory tables.
SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
       SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
       ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
       ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_CREATED_TMP_TABLES > 0 AND SCHEMA_NAME='kkbox'
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC LIMIT 5;

## List all normalized statements that have done sorts, ordered by sort_merge_passes, sort_scans and sort_rows, all descending.
SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_SORT_MERGE_PASSES AS sort_merge_passes,
       ROUND(SUM_SORT_MERGE_PASSES / COUNT_STAR) AS avg_sort_merges,
       SUM_SORT_SCAN AS sorts_using_scans,
       SUM_SORT_RANGE AS sort_using_range,
       SUM_SORT_ROWS AS rows_sorted,
       ROUND(SUM_SORT_ROWS / COUNT_STAR) AS avg_rows_sorted,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_SORT_ROWS > 0 AND SCHEMA_NAME='kkbox'
 ORDER BY SUM_SORT_MERGE_PASSES DESC, SUM_SORT_SCAN DESC, SUM_SORT_ROWS DESC LIMIT 5;

## List all normalized statements that use have done a full table scan ordered by the percentage of times a full scan was done, then by the number of times the statement executed

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_NO_INDEX_USED AS no_index_used_count,
       SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
       ROUND((SUM_NO_INDEX_USED / COUNT_STAR) * 100) no_index_used_pct,
       DIGEST AS digest
  FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_NO_INDEX_USED > 0
    OR SUM_NO_GOOD_INDEX_USED > 0 AND SCHEMA_NAME='KKBOX'
ORDER BY no_index_used_pct DESC, exec_count DESC LIMIT 5\G;

## List all normalized statements that have raised errors or warnings.

SELECT IF(LENGTH(DIGEST_TEXT) > 64, CONCAT(LEFT(DIGEST_TEXT, 30), ' ... ', RIGHT(DIGEST_TEXT, 30)), DIGEST_TEXT) AS query,
       COUNT_STAR AS exec_count,
       SUM_ERRORS AS errors,
       (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
       SUM_WARNINGS AS warnings,
       (SUM_WARNINGS / COUNT_STAR) * 100 as warning_pct,
       DIGEST AS digest
FROM performance_schema.events_statements_summary_by_digest
 WHERE SUM_ERRORS > 0
    OR SUM_WARNINGS > 0 AND SCHEMA_NAME='KKBOX'
ORDER BY SUM_ERRORS DESC, SUM_WARNINGS DESC;


##First, i would like to list queries with the largest number of executions i.e, that has been run the most.

SELECT
      DIGEST AS digest,
      SCHEMA_NAME as db,
      DIGEST_TEXT AS query,
   COUNT_STAR AS execution_count,
   FIRST_SEEN AS first_executed,
   LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox'
ORDER BY COUNT_STAR DESC LIMIT 5\G;

But now i would like to know which statements are executed the most, but do not utilize indexes?

SELECT IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    DIGEST AS digest,
    SCHEMA_NAME as db,
    DIGEST_TEXT AS query,
    COUNT_STAR AS execution_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    SUM_ROWS_EXAMINED AS rows_scanned,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox' AND (SUM_NO_INDEX_USED > 0 OR SUM_NO_GOOD_INDEX_USED > 0)
ORDER BY COUNT_STAR DESC LIMIT 5\G;

But now i would like to know which statements scan the most rows in a table, yet run the least number of times on the server. I would also like to know if these same statements utilize indexes.

SELECT IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
    DIGEST AS digest,
    SCHEMA_NAME as db,
    DIGEST_TEXT AS query,
    COUNT_STAR AS execution_count,
    SUM_NO_INDEX_USED AS no_index_used_count,
    SUM_NO_GOOD_INDEX_USED AS no_good_index_used_count,
    SUM_ROWS_EXAMINED AS rows_scanned,
    (SUM_ROWS_EXAMINED/COUNT_STAR) as avg_rows_scanned_per_exec,
    FIRST_SEEN AS first_executed,
    LAST_SEEN AS last_executed
FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='kkbox' AND (SUM_NO_INDEX_USED > 0 OR SUM_NO_GOOD_INDEX_USED > 0)
ORDER BY SUM_ROWS_EXAMINED DESC, COUNT_STAR LIMIT 5\G;

Now, i would like to list all statements that use temporary tables, ordered by number of disk temporary tables first, then the number of memory tables.

SELECT IF(SUM_NO_GOOD_INDEX_USED > 0 OR SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
       DIGEST AS digest,
       SCHEMA_NAME AS db,
       DIGEST_TEXT AS query,
       COUNT_STAR AS execution_count,
       SUM_CREATED_TMP_TABLES AS memory_tmp_tables,
       SUM_CREATED_TMP_DISK_TABLES AS disk_tmp_tables,
       ROUND(SUM_CREATED_TMP_TABLES / COUNT_STAR) AS avg_tmp_tables_per_query,
       ROUND((SUM_CREATED_TMP_DISK_TABLES / SUM_CREATED_TMP_TABLES) * 100) AS tmp_tables_to_disk_pct,
       FIRST_SEEN AS first_executed,
       LAST_SEEN AS last_executed    
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_CREATED_TMP_TABLES > 0 AND SCHEMA_NAME='kkbox'
ORDER BY SUM_CREATED_TMP_DISK_TABLES DESC, SUM_CREATED_TMP_TABLES DESC LIMIT 5\G;

I would now also like to list statements that produce errors.

SELECT DIGEST AS digest,
       SCHEMA_NAME AS db,
       DIGEST_TEXT as query,
       COUNT_STAR AS execution_count,
       SUM_ERRORS AS errors_total,
       (SUM_ERRORS / COUNT_STAR) * 100 as error_pct,
        FIRST_SEEN AS first_executed,
       LAST_SEEN AS last_executed    
FROM performance_schema.events_statements_summary_by_digest
WHERE SUM_ERRORS > 0 AND SCHEMA_NAME='kkbox'
ORDER BY SUM_ERRORS DESC LIMIT 5\G;


Unnormalized statements
## List top 10 queries running the most
SELECT
d.COUNT_STAR AS execution_count, d.DIGEST, c.SQL_TEXT AS original_query, c.DIGEST_TEXT as normalized_query, c.SOURCE AS source, c.CURRENT_SCHEMA as db FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  WHERE d.DIGEST=c.DIGEST AND c.SQL_TEXT IS NOT NULL
ORDER BY d.COUNT_STAR DESC limit 10\G

## List top 10 queries running the most, but not utilizing indexes.
SELECT IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
d.COUNT_STAR AS execution_count, d.DIGEST, c.SQL_TEXT AS original_query, c.DIGEST_TEXT as normalized_query, c.SOURCE AS source, c.CURRENT_SCHEMA as db, d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, d.SUM_NO_INDEX_USED AS total_no_index_used
 FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0)
AND c.SQL_TEXT IS NOT NULL
ORDER BY d.COUNT_STAR DESC limit 10\G

## List top 10 queries scanning the most records without utilizing indexes.
SELECT IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
d.COUNT_STAR AS execution_count, d.DIGEST, c.SQL_TEXT AS original_query, c.DIGEST_TEXT as normalized_query, c.SOURCE AS source, c.CURRENT_SCHEMA as db, d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, d.SUM_NO_INDEX_USED AS total_no_index_used, d.SUM_ROWS_EXAMINED AS total_rows_scanned
 FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0)
AND c.SQL_TEXT IS NOT NULL
ORDER BY d.SUM_ROWS_EXAMINED DESC, COUNT_STAR DESC limit 10\G

## List top 10 queries scanning the most records without utilizing indexes.
SELECT IF(d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0, '*', '') AS full_scan,
d.COUNT_STAR AS execution_count, d.DIGEST, c.SQL_TEXT AS original_query, c.DIGEST_TEXT as normalized_query, c.SOURCE AS source, c.CURRENT_SCHEMA as db, d.SUM_NO_GOOD_INDEX_USED AS total_bad_index, d.SUM_NO_INDEX_USED AS total_no_index_used, d.SUM_ROWS_EXAMINED AS total_rows_scanned
 FROM events_statements_current as c
JOIN events_statements_summary_by_digest as d  WHERE d.DIGEST=c.DIGEST AND (d.SUM_NO_GOOD_INDEX_USED > 0 OR d.SUM_NO_INDEX_USED > 0)
AND c.SQL_TEXT IS NOT NULL
ORDER BY d.SUM_ROWS_EXAMINED DESC,  DESC limit 10\G

##List the top 10 most update statements
SELECT
  digest_text,
  ((count_star/update_total) * 100) as percentage_of_all
FROM events_statements_summary_by_digest,
(SELECT sum(count_star) update_total
  FROM events_statements_summary_by_digest
  WHERE digest_text LIKE 'UPDATE%'
) update_totals
WHERE digest_text LIKE 'UPDATE%'
ORDER BY percentage_of_all DESC
LIMIT 10



CONCLUSION

Using the Performance Schema, it is very easy to see what is actually running on your MySQL server. Though tools like mytop and commands like SHOW PROCESSLIST can help us to see what is happening on our mysql server in real time, the Performance Schema helps us to understand the different statements that run on our MySQL server over time. This makes the performance schema more helpful in planning  profiling and optimizing statements. A more detail review of the data provided by the performance schema would help us understand patterns of statement execution, and in return help us make decisions that would bring improvement on system scalability.

REFERENCES:
[1] http://www.markleith.co.uk/2012/07/04/mysql-performance-schema-statement-digests/
[2] https://mariadb.com/kb/en/mariadb/performance-schema-events_statements_summary_by_digest-table/
[3] https://www.vividcortex.com/blog/2014/11/03/mysql-query-performance-statistics-in-the-performance-schema/

No comments:

Post a Comment