Anne van Kesteren

Browser differences in IDNA ToASCII processing between ASCII and non-ASCII input

13 February 2017

At the moment the URL Standard passes the domain of certain schemes through the ToASCII operation for further processing. I believe this to be in line with how the ToASCII operation is defined. It expects a domain, whether ASCII or non-ASCII, and either returns it normalized or errors out.

Unfortunately, it seems like the web depends on ToASCII effectively being a no-op when applied to ASCII-only input (at least for some cases), as is the way browsers seem to behave from these tests:

Input	Description	ToASCII Expected	Chrome 58 dev	Edge 14.14393	Firefox 54.0a1	Safari TP 23
`x01234567890123456789012345678901234567890123456789012345678901x`	A domain that is longer than 63 code points.	Error, unless VerifyDnsLength is passed.	No error.	No error.	No error.	No error.
`x01234567890123456789012345678901234567890123456789012345678901†`	A domain that is longer than 63 code points.	Error, unless VerifyDnsLength is passed.	Error.	Error.	Error.	Error.
`aa--`	A domain that contains hyphens at the third and fourth position.	Error.	No error.	No error.	No error.	No error.
`a†--`		Error.	Error.	No error, returns input.	No error, returns `xn--a---kp0a`.	Error.
`-x`	A domain that begins with a hyphen.	Error.	No error.	No error.	No error.	No error.
`-†`	A domain that begins with a hyphen.	Error.	Error.	No error, returns input.	No error, returns `xn----xhn`.	Error.

There is also a slight difference in error handling as rather than returning input, Chrome returns the input percent-encoded.

(I used the Live URL Viewer and Live DOM Viewer to get these results, typically prefixing the input with https://.)