• At my workplace, I do a lot of work on importing / exporting WP data and we pull in batches of stories from our external content management site via RSS. We are currently working on trying to go live with a few WP sites with over 500K posts.

    My question / complaint / suggestion is about how long it takes on a large site like these to do term counting. We can’t have our DB bogged down for 30 minutes every time a recount needs to happen. I’m not sure why such an expensive query needs to run for each term, but I’d like to propose the beginnings of a better way.

    Here is the current query which runs for each term_taxonomy_id:
    SELECT COUNT(*) FROM wp_term_relationships, wp_posts WHERE wp_posts.ID = wp_term_relationships.object_id AND post_status = 'publish' AND post_type IN ('post') AND term_taxonomy_id = 375;

    Contrast that with this query which returns all the term_taxonomy_ids with their counts:

    select tr.term_taxonomy_id, count(p.ID) as numposts
    from wp_posts p 
    inner join wp_term_relationships tr on p.ID = tr.object_id 
    where p.post_status = 'publish' and p.post_type in ('post') 
    group by tr.term_taxonomy_id;
    +------------------+----------+
    | term_taxonomy_id | numposts |
    +------------------+----------+
    |                1 |      166 |
    |              223 |     3426 |
    |              230 |    13742 |
    |              263 |       24 |
    |              265 |     2076 |
    |              287 |   530851 |
    |              288 |     1511 |
    |              289 |    39231 |
    |              290 |    22945 |
    |              291 |      912 |
    |              293 |   346980 |
    |              295 |    17818 |
    |              296 |       74 |
      --- snipped ---
    |              952 |        1 |
    |              953 |        1 |
    |              954 |        1 |
    +------------------+----------+
    665 rows in set (1.72 sec)

    Sure it’s also an expensive query, but you get all the data at once in seconds instead of minutes.

  • The topic ‘term counting is inefficient’ is closed to new replies.