HTML compression breaks CDATAs
-
The HTML compression seems to be changing CDATAs so something like this :
<script type=”text/javascript”>
/*<![CDATA[*/
…
/*]]>*/
</script>becomes this :
<script type=”text/javascript”>
/**/
…
/**/
</script>
-
ow, that indeed would constitute a (minor) bug wildmice, it should simply be
<script type="text/javascript"> ... </script>
I’ll have a look and fix on the GitHub version today or tomorrow.
thanks for bringing this to my attention!
frank
Yes, it would be better to not lose the JS code. I guess people could live without the xHTML compliance, although it would be nice to retain the CDATA too. However, i can imagine the complexity of parsing something like that ??
Yes, it would be better to not lose the JS code.
that I don’t understand, there’s no JS lost, is there?
Apologies, there were two things i found today that were not handling this case, and it was the other one that was making the entire JS code disappear. Sorry for my lapse…
The way it’s posted above is correct, where “…” is the JS code.
that’s a relief ??
OK, the fix; open up wp-content/plugins/autoptimize/classes/external/php/minify-html.php and on line 41 replace
protected function _removeCdata($str) { return (false !== strpos($str, '<![CDATA[')) ? str_replace(array('<![CDATA[', ']]>'), '', $str) : $str; }
with
protected function _removeCdata($str) { return (false !== strpos($str, '<![CDATA[')) ? str_replace(array('/*<![CDATA[*/','/*]]>*/','<![CDATA[', ']]>'), '', $str) : $str; }
Have a go at this and if OK I’ll commit to GitHub and make sure this goes in the next version.
frank
I made that change, but it looks like the fix does not work. The CDATA and comments around it get completely removed. …Or maybe that was what you intended?
Here is the code before i enable HTML compression :
<script>
/*<![CDATA[*/
(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);
ga(‘create’, ‘UA-65746333-1’, ‘auto’);
ga(‘send’, ‘pageview’);
/*]]>*/
</script>And here is what it looks like with HTML compression turned on :
<script>(function(i,s,o,g,r,a,m){i[‘GoogleAnalyticsObject’]=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,’script’,’//www.google-analytics.com/analytics.js’,’ga’);
ga(‘create’, ‘UA-65746333-1’, ‘auto’);
ga(‘send’, ‘pageview’);</script>For JS code inside a script tag to be valid XML/XHTML, we need the CDATA there.
The CDATA and comments around it get completely removed. …Or maybe that was what you intended?
It was ??
For JS code inside a script tag to be valid XML/XHTML, we need the CDATA there.
If that is important to you, I have some good news; if the HTML minifier detects it is dealing with XHTML and if the JS contains characters that would need to be either escaped or CDATA’ed (typically < and &), it will indeed keep the CDATA-sections.
If that does not work to your satisfaction, you can always wrap the entire JS-block in noptimize-tags and it will be left alone.
Have a fun weekend,
frankDid not realize that nooptimize would work for JS as well…
That is a nice (and smart) solution, as long as you are not relying on the doctype to determine whether it’s xhtml, because HTML5 can be either xhtml or html4-style html.
Practically speaking, this is really only a concern for purists, and just something i passed along because i noticed it, not because it broke anything functional. Although i do use XHTML everywhere…
Well… i went to put nooptimize tags around that JS and realized it is already being excluded by AO without my doing anything. Isn’t all inline JS supposed to be combined too, unless it’s inside nooptimize tags?
That is a nice (and smart) solution, as long as you are not relying on the doctype to determine whether it’s xhtml, because HTML5 can be either xhtml or html4-style html.
It (the HTML minifier that’s used by AO) does. I could however add a filter to AO which passes the xhtml=true argument to the minifier.
Isn’t all inline JS supposed to be combined too, unless it’s inside nooptimize tags?
Inline JS is not aggregated if;
* a function hooks into the filter “autoptimize_filter_js_inline” returning false
* the JS optimization-exclusion list contains a word that is found in the inline JS
* the JS is between noptimize-tags
* the JS contains one of the hardcoded exclusion strings (e.g. post_id or nonce -> needed to exclude inline JS that would cause the cache to be busted too frequently)Specifically to prevent the CDATA-removal, you’d have to noptimize anyway, wrapping the entire script (including opening and closing-script tags) in them.
frank
I went ahead and added the filter to enforce the HTML minifier to treat the HTML as XHTML on github, cfr. this commit. Still to be tested though ??
frank
Ok, thanks. This is the standard Google Analytics JS, and apparently something in it matches something in your default exclusions. But i’m not especially worried about inline JS that’s at the bottom of the file. There’s not much of a performance problem there i don’t think.
As for how to detect XHTML, that may be a can of worms if you don’t use the doctype. Do you really want to go there…? Even if a theme developer uses xhtml, there would be little control over html produced by plugins, so WordPress is basically html soup with assorted standards. I don’t see any way to have a pure xhtml WP site. So does this really matter? I think the only important thing is that valid xhtml should not be changed.
Well that’s my opinion, but i’m ok with using nooptimize tags to preserve CDATAs.
Anyway, thanks for looking into this. The more i think about it the less i think that html standards matter any more in html5, as long as it’s html of some sort.
I will try to test that Github version tomorrow.
- The topic ‘HTML compression breaks CDATAs’ is closed to new replies.