Friday, February 17, 2012

Net:::HTTP, Encoding, \x8B

Last night coding session, that I kind of waited for the whole evening while putting my kids to bed, quickly turned out to be the "what the heck is going on" session.

I have this small home project where I am building a Ruby library wrapper around a public JSON API. It's half-fun-half-learning exercise so I am trying to stay down to bare metal Ruby without unnecessarily introducing gems that I don't really need. So I am using net/http to interact with the API endpoint and it's been all fine until yesterday.

My code fires up the request, stucks the response right into JSON.parse() and off it goes. As I build this new class to represent a new entity my tests start to fail with :

JSON::ParserError: 743: unexpected token at '▼'

The saying goes that there are two categories of developers when it comes to debugging. Faced with an error, one would fire up the debugger and dive right into it, while another one would sprinkle puts statements and just follow the trails. I am a member of the second group. If you think it's not right then I would encourage you to listed to the Debugging in Ruby episode on the Ruby Rogues podcast. There are cases when debugger is the way to go, but I am convinced that working on your own code with a good test coverage (like it should be!) you're better off finding the problem with your brain and some help from the puts method.

As I was inspecting the response object that JSON parser choked on I got to learn the Ruby 1.9 String#encoding and some other very interesting things. Somehow I remembered early 2000s when I was actively writing Java for the web and dealing with encoding to support multilingual was a black belt art. The database, the IO streams, the XSLT transformations, HTTP headers and HTML meta tags, and what have you. Dejavu. Only it wasn't.

The next thing I did was searching for what Web the almighty had to say. I very quickly figured that if JSON library couldn't figure the encoding then probably no one would. There were receipts suggesting to go through the JSON library only to get the encoding straight! I also came across an enlightening conversation on the ruby lang about whether net/http should set the encoding based on the HTTP headers or not. Good stuff. I still thought it was the encoding issue and for a second I even considered resorting to Mechanize as was suggested in that thread on the ruby lang, even though it reportedly had some similar issues as well

Well, if I did move to Mechanize it wouldn't have helped me. It wasn't the encoding issue. It turned out to be exactly what my code was asking for and all I had to do was to teach my code not to ask for what it couldn't chew on.

Sending JSON requests via net/http I was pretending to be a browser. Not that I needed it, but since I had to send some state in cookie headers anyway I figured I would just pretend I am a full blown browser client. Among other harmless things that I carried over from the Firebug Net log was the following rather important "expectation":

Accept-Encoding=gzip, deflate;

I was telling the server that I am ready to process zipped content and I wasn't. All previous JSON requests that I sent were small enough for the server not to bother zipping it so I had no issues. And then I asked for something big enough and I said I am ok with the things zipped. The server was built by some good guys who knew their stuff and they did just like I asked. Removing that single line and not telling the server I am ok with the things zipped did the trick... Then I thought how silly it was of me not to get puzzled from the byte stream that I saw when I first time dumped the response object to the STDOUT. it must have been too late already :)

Even though I didn't have good time coding the way I planned it, I had a terrific time troubleshooting. It's through exercises like this you learn to be a better software engineer - I got exposed to so many new things thanks to this little trap I got myself into. It's basically like bugs-driven-learning :)

1 comment:

iain said...

Thanks for writing this, I had exactly the same problem and was trying all sort of encoding mangling to get it to work!

For those that get problems using Zlib to inflate the strings, this StackOverflow question has a lot of good answers