• Some browsers seam to pass the named anchor with the http request (i.e. server access log shows “GET /blog/#respond HTTP/1.1”). When the broswer does this, WordPress refuses to ignore the #respond and gives a 404 instead. This is difficult to test since most browsers quietly drop the named anchor (#respond) when giving the request to the server.

    I see the problem in my access logs and I have successfully reproduced it using the tool ‘curl’ (in linux).

    Feel free to test it out (https://foobert.ath.cx/blog/#respond), but take note of the browser hiding this issue as listed above — something you won’t know without looking at the server access log.

    I’m running apache 2.0.53, php 4.3.11, and WP 2.0.4.

Viewing 8 replies - 1 through 8 (of 8 total)
  • Why should this work?
    https://foobert.ath.cx/blog/#respond
    Which post is it supposed to pick to respond to?

    It works fine on your comment links:
    https://foobert.ath.cx/blog/2006/10/16/mooney-homecoming/#respond

    Thread Starter foobert

    (@foobert)

    You seamed to have a fairly recent version of firefox — which doesn’t pass the #respond to the webserver. Here’s you’re (I think) entry in apache’s access log:

    70.136.64.147 - - [18/Oct/2006:14:02:23 -0700] "GET /blog/2006/10/16/mooney-homecoming/ HTTP/1.1" 200 4048 "https://foobert.ath.cx/blog/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7"

    I do not have a list of browsers/vesions that pass the #respond, but. Here’s an entry that got the 404:

    203.28.159.168 - - [18/Oct/2006:09:18:31 -0700] "GET /blog/2006/09/02/blue-border-around-
    mplayer-videos/#respond HTTP/1.1" 404 5337 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Window
    s NT 5.1; SV1)"

    Thread Starter foobert

    (@foobert)

    Samboll — I should mention that I used /blog/#respond as an example to clarify what I meant by a “named anchor”. Although that anchor doesn’t exist in the main page, your second link is a perfectly valid case that DOES exist.

    The point is: passing the named anchor to the webserver shouldn’t cause a 404.

    Additional information — this is not an apache problem. If I do a test with a simple file.html page and pass bogus #anchors, apache doesn’t care and still serves the file up just fine. I think the problem is in how WP is attempting to decipher which post to fetch and it is including the #.* string in this process. Looking through xmlrpc.php, is what leads be to beleive this, but — I’ve got a lot to learn about PHP before I’d be able to begin debugging it.

    foolip

    (@foolip)

    I see this problem in my logs as well and I believe it is a bug in some versions of IE 6 since it only happens for that UserAgent. I’m just guessing, but I think that if the # comes directly after a slash, IE doesn’t interpret that as an anchor but instead sends it as a part of the URI to the server. One solution would be ensure never to use this kind of URI, i.e. not putting / at the end of permalinks. The other solution might be to filter out the #anchor of the URL when it arrives to wordpress. This would give i.e. the correct page despite giving a stupid URL, although it would probably not jump to the proper anchor.

    foolip

    (@foolip)

    Adding the following to wp-settings.php fixed the problem, although I am not convinced that is is a good solution.

    $_SERVER[‘REQUEST_URI’] = preg_replace(“/#respond$/”,”,$_SERVER[‘REQUEST_URI’]\);

    Using a wildcard to remove all trailing #anchors is probably not a good idea as it would just hide potential future unrelated issues instead of showing up as 404 in the logs.

    foolip

    (@foolip)

    Inspecting the logs a bit closer leads me to believe that this is actually just spambots faking as IE which are too stupid to strip out the #respond out of the URL. These are all one-off requests to my server without any referer or previously loading any images or anything. So this shouldn’t be fixed, it should be left as is for stupid bots to trip on.

    Thread Starter foobert

    (@foobert)

    I agree with foolip about this being a “problem” that appears to only surface as the result of poorly implemented spambots. That being said, I’ve generally stopped worrying about it.

    However, that doesn’t answer my principle question of the matter: Is passing the #anchor from the browser to the web server legal in the HTTP protocol or not? And if it is legal, then I think there is a bug in wordpress, even if it’s a mostly beneficial bug!

    Of course, since I’m too lazy to even find the HTTP protocol spec to answer what amounts to a passing curiosity at this point in time, much less spend any time wading through it, I’m not the least bit surprised that no one else has bothered either ??

    Cheers,
    ~foobert

    I agree there is an issue here with named anchor tags. For example right now if I go to the WordPress blog itself:
    https://www.ads-software.com/development/

    And scroll down to the March 10, 2007 entry about SXSW, check out the “Read on for more ?” link at the bottom of that post. The link points to this URL:
    https://www.ads-software.com/development/2007/03/wordpress-at-sxsw/#more-200

    So far so good, it looks fine. Let’s look at the HTTP status code returned:

    HTTP/1.1 301 Moved Permanently
    X-Powered-By: PHP/4.4.4
    X-Pingback: https://www.ads-software.com/development/xmlrpc.php
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Last-Modified: Sun, 18 Mar 2007 01:43:04 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Location: https://www.ads-software.com/development/2007/03/wordpress-at-sxsw/#more-200/
    Content-type: text/html; charset=utf-8
    Server: LiteSpeed
    Date: Sun, 18 Mar 2007 01:43:04 GMT
    Connection: close

    Okay, so it issues a 301 and gives the new location. That should be fine. Here is the new URL it says to look at:
    https://www.ads-software.com/development/2007/03/wordpress-at-sxsw/#more-200/

    Let’s check out the HTTP status code from that request:

    HTTP/1.1 404 Not Found
    X-Powered-By: PHP/4.4.4
    X-Pingback: https://www.ads-software.com/development/xmlrpc.php
    Expires: Wed, 11 Jan 1984 05:00:00 GMT
    Last-Modified: Sun, 18 Mar 2007 01:43:38 GMT
    Cache-Control: no-cache, must-revalidate, max-age=0
    Pragma: no-cache
    Content-type: text/html; charset=utf-8
    Server: LiteSpeed
    Date: Sun, 18 Mar 2007 01:43:38 GMT
    Connection: close

    That isn’t good, it is issuing a 404 for that request. While my browser seems be handling the 404 by then removing the named anchor, but I’m not sure getting a 404 back is a desirable option. I track broken links via my server logs and there are thousands of 404’s now from “/#more-” links generated by the_content().

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘Named anchors causing 404’ is closed to new replies.