PERFORMANCE: SELECT COUNT() #30

kfo2010 · 2018-09-06T23:02:40Z

PROBLEM: The use of select count() in mysql is extremely inefficient and will not scale well with tables full of millions+ or even billions+ of rows.

PROOF:
sql: EXPLAIN SELECT COUNT(1) FROM transactions; # this shows that it did a full index/table scan of every row in the table. 1.5M rows. Full index/table scans are bad. They are SLOOOW and will not scale well.

sql: EXPLAIN SELECT COUNT(id) FROM transactions; # still full scan 1.5M rows

sql: EXPLAIN SELECT id FROM transactions ORDER BY id LIMIT 0,1; # still full scan 1.5M rows

sql: EXPLAIN SELECT id FROM transactions ORDER BY id LIMIT 1; # scanned ONE row.

sql: SHOW TABLE STATUS LIKE 'transactions'; # this took 80 milliseconds

RESULT: The difference is 2+ seconds vs 80 milliseconds. This is huge and the 2+ seconds will increase over time as data grows. The 80 milliseconds will not.

SOLUTION:

If you need to check if at least one result is returned use ORDER BY id LIMIT 0,1; instead of SELECT COUNT. Avoid using COUNT if at all possible. This will not scale.
SQL queries in code should be tested with EXPLAIN. If they are doing full table scans, they need to be removed completely from code and other methods should be used that can scale to billions of rows.
NOTE: InnoDB (MyISAM is dead) do not store row counts and will count() all rows each time.
if you need table counts try "SHOW TABLE STATUS LIKE ..." NOTE this is an approximation but should be fine for large tables when you need to do some sort of paging.
This should also speed up PAGING on the block explorer. I am seeing that the explorer is starting to get slower. This is one reason why.

angelexevior · 2020-03-24T15:42:33Z

I know this is old, but i'm throwing this out there.
Just a thought process, and a wicked technique i used in the past to optimize queries for a db having millions of rows. Not sure it will apply as we don't have auto-increment fields in db.

Added where count was necessary, a primary key, autoincrement id field (If not already). For this to work, no voids should exists between AI ids. (Let's call this AIid)
Last id, is your total table count.... -1 if you want to count from 0 and not from 1.
If looking for the count on specific set of results, SELECT (MAX(AIid)-MIN(AIid)) as counttx FROM transactions WHERE.... WHATEVER...
Of course an autoincrement index comes with it's own performance impact. Will have to test!

Thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERFORMANCE: SELECT COUNT() #30

PERFORMANCE: SELECT COUNT() #30

kfo2010 commented Sep 6, 2018

angelexevior commented Mar 24, 2020 •

edited

Loading

PERFORMANCE: SELECT COUNT() #30

PERFORMANCE: SELECT COUNT() #30

Comments

kfo2010 commented Sep 6, 2018

angelexevior commented Mar 24, 2020 • edited Loading

angelexevior commented Mar 24, 2020 •

edited

Loading