Ajax and UTF8

21 08 2007

I have suddenly found out that the faith that I had placed in the javascript escape(), encodeURI() and encodeURIComponent() for encoding correctly were misplaced. Here is the problem, a traditional form submits UTF-8 perfectly and all works. An AJAX form only works if there are no UTF-8 characters. And this only happens with certain UTF-8 characters that are high up in the range. It turns out that %20%u5FFD encoding produced by the above doesnt work when submitted as a application/x-www-form-urlencoded to Tomcat, even with charset=utf-8 or Character-Encoding: UTF-8; . The encoding has to be +%5F%FD to make it work. If you bring up tcpdump and look at the raw tcp packets you will see that Firefox uses the latter for direct posts.

Unfortunately there is no javascript encoder to do this :(, but its not that hard.

var result = encodeURIComponent(formVar);
result = result.replace(/%20/g,"+");
for ( var p = result.indexOf("%u"); p != -1; p = result.indexOf("%u")  ) {
   var code = result.substr(p,6);
   var rep = '%' + code.substr(2,2) + '%' + code.substr(4,2);
   result = result.replace(code,rep);
var p = -1;
for ( p = result.indexOf("%",p+1); p != -1; p = result.indexOf("%",p+1)  ) {
   var code = result.substr(p,3);
   var rep = code.toUpperCase();
   result = result.replace(code,rep);
return result;

Its not exactly the perfect way, but it works.



One response

2 12 2009

Great job! It really does work 🙂 You’ve saved me lots of time, thank you.

%d bloggers like this: