Special characters in URL
-
Hi,
I use your extension and it’s very usefull. thanks for the job.
I m from France, and here we use a lot the special characters as : é è à ? …
and obviously some contributors upload files whom name contain some of theses characters.2 questions :
1. These URL files are marked as broken, cause broken link checker stop read them when it meets a special character.
example an URL file : “http//www.mydomain.com/wp-content/uploads/2015/02/réveil.jpg”
will be read by the extension as “http//www.mydomain.com/wp-content/uploads/2015/02/r”
Is it normal ?2. Is the only way to correct them, is to paste the true name (with special characters) and modify the URL ?
I already added a function, to clean these special characters when contributors upload files. I will be quiet for the following.
Thanks per adavnce for your answer.
Matt
-
1. These URL files are marked as broken, cause broken link checker stop read them when it meets a special character.
example an URL file : “http//www.mydomain.com/wp-content/uploads/2015/02/réveil.jpg”
will be read by the extension as “http//www.mydomain.com/wp-content/uploads/2015/02/r”
Is it normal ?That’s not normal. A few other users have reported similar issues in the past. However, I’ve never been able to reproduce the problem. This makes it very difficult to determine what might be causing it.
For example, if I add your example URL to one of my test sites, the plugin parses it correctly, including the special character. I even have a test post with 20+ links with different special characters in different parts of the URL, and somehow all of them are also parsed correctly.
This suggests that the problem might have something to do with server configuration since it only shows up on some sites and not others. Does your site use any unusual settings, like, say, a custom database character set?
2. Is the only way to correct them, is to paste the true name (with special characters) and modify the URL ?
Sorry, I don’t know the answer to that. Depending on what exactly caused the plugin to stop at the special character, this approach might work, or it might replace only part of the URL.
Thanks for the fast answer.
I went further… in fact , my special characters like ” é ” are inserted in the post-content column in wp-posts table as ” é “
that’s this type of character I think your plugin can’t decode.
I made an SQL query in the database, to change all this ” é ” per ” é “
and it seems after that your plugin can read the url with special characters. (I will confirm you cause it’s still scanning)
Unfortunately for me, when I do that, my images (with special char) can’t display no more in the website ?? (despite of they can be read by the browser when I display them alone)
In my case it seems , I haven’t any good solution…I went further… in fact , my special characters like ” é ” are inserted in the post-content column in wp-posts table as ” é “
that’s this type of character I think your plugin can’t decode.Those characters appear identical to me. What is the difference? Are they different Unicode codepoints or some such?
lol ! sorry it was tranformed when I publish the comment :-/
I m going to put an undersocre between each charachter :
for example : é in my database is inserted as &_#_2_3_3_;I see, so one is a numeric HTML entity and the other is a “raw” UTF-8 character.
Try the development version of the plugin. It might help with parsing HTML entities.
I updated it. thanks.
I just sent a global recheck of the links.
It will take a few hours.
I will tell you tomrrow, if my links with raw characters are still marked as broken links.
Thanks again for your help.unfortunately, doesn’t seem to work…
for example this URL : https://www.mafamillezen.com/wp-content/uploads/2010/04/fessée.jpgis still marked as broken (see capture) : https://i.imgur.com/eWXRIT6.png
the URL noticed by the plugin is cut, where there is the special character.Just to verify:
- That URL contains an HTML entity and not a plain UTF-8 character, correct?
- You started started the global recheck after installing the development version, yes? As a test, you can force the plugin to re-parse a specific post by making some minor change, like editing a single word or adding a line break.
- You looked at the database before. Could you also verify that the “wp_blc_links” table uses the “utf8” character set? The “url” column is especially important in this case and should also use “utf8”.
- Take a look at the “wp_options” table. What is the “blog_charset” option set to?
- Does the character set of the “wp_blc_links” table match the character set of the “wp_posts” table?
My answers :
– That URL contains an HTML entity and not a plain UTF-8 character, correct?
=> here’is an URL marked as broken copied from a post :
https://www.mafamillezen.com/wp-content/uploads/2010/04/fess&_#_2_3_3_;e.jpg
(I just added the underscores to answer you, if not you would see a normal é)– You started started the global recheck after installing the development version, yes?
=> Yes, i push the button to force recheck after the update yesterday
– As a test, you can force the plugin to re-parse a specific post by making some minor change, like editing a single word or adding a line break.
=> where is this option ? what do you want me to add /modifiy exactly in the post ?.You looked at the database before. Could you also verify that the “wp_blc_links” table uses the “utf8” character set?
=> utf8_general_ciThe “url” column is especially important in this case and should also use “utf8”.
=> utf8_binTake a look at the “wp_options” table. What is the “blog_charset” option set to?
=> UTF-8Does the character set of the “wp_blc_links” table match the character set of the “wp_posts” table?
=> I can’t see directly in the database cause it’s blob format, but if I open the blob file from URL column for the example I gave you in the previous post. I have this URL : https://www.mafamillezen.com/wp-content/uploads/2010/04/fess
that means I don’t have the HTML character and the URL is cut as I see it in the back office.Thanks again !
Well, I’m almost out of ideas. Everything you posted above looks correct. All of the character sets are what they should be.
The only remaining thing that I can suggest is this: download the new development version and enable logging in Settings -> Link Checker -> Advanced. The plugin will log various debugging information about each link it finds to the specified file – including the original URL and the URL that was stored in the database. Maybe that will help figure out when the URL gets truncated.
OK, I activated the log trace.
I ll see what it says.last question.
When I replace all the special HTML characs in the database ( example : &_#_2_3_3_; per é), your plugin works well and doesn’t mark these kind of links as broken.
the only problem I have if I do that, is that the images don’t display no more on the website (but I can display it alone in the browser – open image in a new window)
I know it’s out from your plugin but would you have an idea, to make them display well in the post ?! :-/
Thanks again for your time.When I replace all the special HTML characs in the database ( example : &_#_2_3_3_; per é), your plugin works well and doesn’t mark these kind of links as broken.
the only problem I have if I do that, is that the images don’t display no more on the website (but I can display it alone in the browser – open image in a new window)In theory, the plugin and the browser should treat
&
#233;
andé
as the same thing. They’re just two ways to represent the same Unicode character.As far as I know, the only way for that to go wrong would be to use the wrong character set or encoding somewhere. However, everything you’ve posted so far indicates that your site is using the right encoding.
I think I m going to execute my SQL query, and I will see to correct the display images problem in a second time.
After that, I will confirm you that I have no more broken links.
Thanks again for your precious help.here are all the special characters I had in URLS and I replace in the database !
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘
&
#
233;’, ‘é’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
234;’, ‘ê’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
232;’, ‘è’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
224;’, ‘à’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
200;’, ‘è’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
235;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
238;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
169;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
239;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
231;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
201;’, ‘é’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
252;’, ‘ü’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
226;’, ‘a’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
8364;’, ‘€’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
199;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
246;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
251;’, ‘?’ ) ;
UPDATE wp_posts SET post_content = REPLACE( post_content, ‘&
#
185;’, ‘1’ ) ;the main are well read by your plugin (é,à,è ..)
it seems that some are not (€,?…) but that’s not really allowed characters in URL , so that’s completely normal I guess.Thanks again for your help and time.
I put the topic solved.
- The topic ‘Special characters in URL’ is closed to new replies.