I spoke with my developer about this in more detail. Here’s what he said:
It’s true our URL validation (or any URL validation for that matter) is not perfect but it is not true that we only check for “http” when validating URLs. That’s what the validation error message says most of the time but just so you don’t forget to enter the scheme. That doesn’t mean that’s the only thing we look for when validating input.
We use PHP’s built-in filter_var() function for URL validation. According to PHP’s documentation this function validates the URL in conformance with the URI syntax spec from the official “RFC 2396 – Uniform Resource Identifiers (URI): Generic Syntax” document. It doesn’t work for all cases but it’s probably better than anything we can hand-roll.
The client suggests using FILTER_FLAG_HOST_REQUIRED but that’s not a valid flag on modern versions of PHP (5.5+). It might make a difference in older PHP setups but adding it to our code would be an error and only valid for a short time longer (with only certain PHP setups). What that flag used to do is now assumed to be the default behavior for filter_var when validating URLs.
While his example is not incorrect in terms of the URL being accessible (https://www.qwerty probably doesn’t exist but https://www.qwerty.com does) it’s not the entire story because https://www.qwerty is still a valid URL in the sense that the grammar of an URL is still properly formed here. TLDs are constantly being added and it could also refer to a local network hostname. Using this stricter validation will start to break some URLs, depending on which ones the user enters.
With the exception of actually trying to contact or ping the URL when validating, the best you can do is check if the syntax is correct. That’s what we do today.
So with that in mind, the URL validation probably won’t be changing after all. It may not protect against every invalid URLs, but adding the behavior to find only valid ones that resolve on the internet would be potentially expensive and flaky (what happens when a DNS node goes down and someone types Google.com in and it fails to validate? Do you want to prevent the user from submitting then?)
In this case, we’re going for the most common use case which is to validate the presence of a properly formed URL. It’s true they can flub that, but in reality, most users won’t (we haven’t seen in that in sites and I don’t get complaints about invalid user input on this field).
Since we’re relying on the behavior of filter_var, if they require https://, we can’t prevent that per se but we can look to automatically add it if it’s missing.