• Hi,

    we’ve been using the mysql slow-query log to analyze WP performance on our site, and we’ve figured out that some of our regular slow queries include this sort of SQL construct:

    SELECT DISTINCT * from (table, join, etc.) …

    today i did some empirical tests and found that that DISTINCT clause was slowing us down significantly, while not changing the results we get. right now we’re running without, in a “watch carefully” mode.

    now, i try not to understand things until i have to, or to any greater depth than necessary. =) but today i did the research, and it seems to me that to put the three strings together ‘SELECT DISTINCT *’ and hand them to SQL is probably a mistake.

    if i understand SQL right (and maybe i don’t), if you’re getting every single column back from every table in your possibly multi-table query (as specified by SELECT *), your results will never include duplicate rows. at least not in MySQL, in the way it’s documented to behave. at least when you’re selecting on tables that contain unique keys, as all the wp_tables do.

    so with no duplicate rows, the DISTINCT clause, at best, does nothing. but at worst — in mysql at least — it seems to cause the query to be kicked into a ‘temporary table’ aka a file, aka a massive slowdown.

    at least that’s what we were seeing with the get-recent-comments plugin. we took out the DISTINCT and our performance problems literally vanished. so far we’ve seen no duplicate comments displayed. (but i’ll let you know if we get bitten.)

    i would like the opinion of more wizened wp/sql hackers about this, because it appears that another use of ‘SELECT DISTINCT *’ is sitting right in the get_post function. is this a performance improvement waiting to be seized, or a dumb n00b idea? please advise.

    -mykle-

Viewing 8 replies - 1 through 8 (of 8 total)
  • Ugh…. are you serious…. there’s SELECT DISTINCT * code in there? If so, I may have to go spellunking though the code and eradicate them. In general, it’s not a good idea to use DISTINCT anything…. unless absolutely necessary. As a matter of practice, I make sure that my where rules out as many dupes as possible…. in those rare caseswhere it’s not sufficient… I find Group Bys to me more effective.

    -tg

    Thread Starter mykle

    (@mykle)

    i have a wp 2.2 install where i found it in wp-includes/functions.php … i’m not sure how current that is. i did a find/exec/grep through every .php file looking for it, and that’s the only one i found. but in the get-recent-comments plugin (in a different, 2.0 install) it’s like night and day!

    having thought about it more, i can see how if you have duplicate rows in your table or one of the tables in your join, SELECT DISTINCT * could be useful. so i guess the answer to my question in the subject line is “occasionally”. but for the cases where you join between tables that all have primary not-null keys, it should never be necessary.

    we are continuing to enjoy the significant performance boost that we got from removing that DISTINCT clause in just two spots in that plugin. we are using it on our front page, so it gets hit constantly.

    Moderator Samuel Wood (Otto)

    (@otto42)

    www.ads-software.com Admin

    There is only one use of SELECT DISTINCT * code in WordPress, and that is in the get_posts() function, where it’s selecting the posts from the posts table.

    In that case, you could probably remove the DISTINCT without any issues.

    DISTINCT within SQL is quite often mis-used, and will slow down SELECT jobs as it forces sorts in order to resolve duplicates.

    If you remove the DISTINCT from whichever bit of code you’re looking at and suddenly experience problems with duplicate results or similar, you could always put it back again – but I’d say any piece of SQL that is relying on a DISTINCT to bring the correct row back is skating on thin ice anyway.

    Moderator Samuel Wood (Otto)

    (@otto42)

    www.ads-software.com Admin

    I removed the “DISTINCT” from my own wp-includes/post.php file last week, in the get_posts() function, and have not noticed any significant differences. The results all seem the same, and I can’t see any obvious measurable speed increase, without further testing. Note that it’s also using a GROUP BY the posts->ID, which is the primary key for that table. I’m not sure that’s strictly needed either.

    There are valid uses of DISTINCT in WordPress to eliminate duplicate results, such as in the wp_get_archives() function, to get the various years/months that older posts were made on. These are limited cases though. The one in the get_posts() function does seem unnecessary, although there may be unusual cases that I’m missing.

    I put in a trac ticket for this one, to get that query optimized some more: https://trac.www.ads-software.com/ticket/4665

    I would be very interested to hear what Matt says about this. Apparently the mysql guy himself has been through all the db interactions in WP.

    Can some one help me out to create new user and database in MYSQL server

    Thanks,
    Saurabh

    @sabxin: You need to search. If that fails start a new thread please.

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘is “SELECT DISTINCT *” ever a good idea?’ is closed to new replies.