Pavel Veller's blog

Sunday, March 13, 2016

Moved To www.pveller.com

I moved all my blogging to www.pveller.com. Please follow me there or just @pveller on Twitter. I always drop a line there when I publish something new. If you prefer RSS then you will be pleased to know that my new blog has it as well.

Wednesday, January 28, 2015

A Year In Restrospect

2014 was a good year. Fast. Challenging.

A year ago I made a change. I left a company I was with for over 10 years, a company that was about 200 people when I started and 10k+ when I left. Great experience. Probably one in a lifetime.

I felt like having more than one lifetime experience and off I went. From mostly hands off solution architecture and management back into hands on software engineering, from Java and Linux to .NET and Windows, from Adobe CQ to Sitecore, from a big public company to a small feels-like-family boutique. How do I feel about it? Loving it and would do it again if given a second chance. Experience of a lifetime indeed.

Last year:

Got back to blogging and published 25 technical articles (all Sitecore)
Joined Sitecore.FakeDb open source project and contributed 40 commits
Presented at two Atlanta Sitecore User Groups (one, two)
Live coded for about 45 minutes on a Sitecore Virtual User Group. It was scary but TDD with NCrunch is so fun to do that I actually very much enjoyed it
Co-presented on the Sitecore Virtual User Summit
Earned Sitecore MVP 2015 award

This year:

No resolutions but I will sure continue to craft code in various languages, blog about it, present at user groups, produce screencasts, and contribute to open source. These days I am more active on twitter than on any other social outlet so please feel free to connect @pveller.

Happy New Year!

Monday, February 17, 2014

Moving On

I am actually not sure anybody is reading this as I haven't blogged for more than a year. But if you're still out there then you may want to stick around. A little less than a month ago I parted with the company who I had been with for almost eleven years. We remain (and I very much hope will remain) friends and I wish them all the luck. I will sure write a larger post about what made me decide and how I feel about the change and what I think about it too. I will just give it a little more time. I am still breathing it all in.

On that one, I also made a technological change. It's a much smaller change, of course, as I fully agree with Scott Hanselman in his recent post - Technology is changing so fast it's hard to be a "professional" at anything. Ultimately, we're all amateurs. Out from Java and Linux and some Ruby and Python and into .NET. From Adobe CQ to Sitecore. It's still all for the Web so the transition has been really smooth and very much enjoyable. Learning and mastering new tech is fun and probably the ultimate hobby of mine. And I still work on my Mac with the Windows inside the VM so what's not to like!

As I am actively diving in into the new tech and the new environment and the new everything I also want to write more. I already started actually. If you have even remote interest in Sitecore or just want to follow me in my ventures you are more than welcome to read http://jockstothecore.com. No commitment but my backlog of things to write about is pretty full for the next month or two. Stay tuned! Oh, and drop me a line if you are reading this :)

Thursday, May 24, 2012

SAXDVR

Today I have shipped my first ever open source library. It's so small and tiny that I am not sure it qualifies to be called a library, it doesn't even have a built script yet but it's definitely open and it's definitely shipped. It's out there on my github page. Meet SAXDVR. In my defense, it has a documentation page and it has unit tests. And it has a purpose.

The question at StackOverflow made me sit down and play with SAX filters. Couple hours into it and I had it working so I thought why not make one more step and put it out there. It felt great! Don't get me wrong, it will hardly be used by anybody other than as an illustration to my answer, but it did feel like a big deal to me. Ship!, doesn't matter small or large, and you will likely feel the same.

Monday, April 09, 2012

Functional JavaScript

A good old friend of mine challenged me with a small JavaScript assignment. Apparently he used it in his job interviews to see how good the candidates were. Here is a code snippet:

var add = function (a,b) { return a+b; };
var mul = function (a,b) { return a*b; };

var x = make(1)(2)(3)(4);
var y = x (5);

x (add) // 10
y (mul) // 120

The assignment is to implement make.

It didn't take me too long to produce a dirty version that worked:

var make = function() {
    var args = Array.prototype.slice.call(arguments, 0);
    if (args[0] instanceof Function) {
        var result = args[1];
        for (i = 2; i < args.length; i++) {
            result = args[0].apply(this, [result, args[i]]);
        }
        return result;
    } else {
        var curry_args = args;
        return function() {
            var args = Array.prototype.slice.call(arguments, 0);
            return make.apply(this, args.concat(curry_args));
        }
    }
}

I realize that it's a happy path with no error checks but it works. I figured I was relying on the commutative property of assignment and multiplication so I decided to reverse the collected arguments:

...
   for (i = args.length - 1; i > 1; i--) {
            result = args[0].apply(this, [result, args[i]]);
        }
...

The friend of mine didn't accept the solution and asked me to keep working on it to get rid of the loop. Iteration can easily be converted into a recursion and the result didn't take long to produce. My friend, however, didn't accept it either and asked me to get rid of arrays and get rid of working with the arguments (which isn't really an array but that's another story). The map function, he said, has arity of one so implement it as such, don't let the arguments collect behind the scenes. Well, I have to admit I was stuck. I knew I needed to somehow stack the closures and return functions of functions but the solution was out of my reach. I gave up. Here's his pure functional solution that he gladly shared with me after I admitted I couldn't get it:

var make = function (value) {
    return (function (f, g) {
        return f(g, f);
    })(function (cont, loop) {
        return function (v) {
            return (
                v instanceof Function ?
                    cont(v) :
                    loop(function (reducer) {
                        return reducer(v, cont(reducer));
                    }, loop));
        };
    }, function () {
        return value;
    });
};

Pretty, isn't it? Cryptic, sure, but pretty. He also broke it down into a more verbose but somewhat easier to navigate version:

var konst = function (v) {
    return function () {
        return v;
    };
};

var make2 = function (value) {
    var reducer;

    var reduce = function (v, cont) {
        return function () {
            return reducer(v, cont());
        };
    };

    var gen = function (cont, loop) {
        return function (v) {
            return v instanceof Function ? (reducer = v, cont()) : loop(reduce(v, cont), loop);
        };
    };

    return gen(konst(value), gen);
};

I would argue that for the benefit of the reader (and we all know how code is read much more often than it is written) my original version is "better", but his solution is just so much more awesome that I can't really support my argument :) The memoizing described in Douglas Crockford's "Javascript: The Good Parts" uses arrays, not exactly like I had it but close. I found another one on the web - Y Combinator in JavaScript and it is mind blowing. Do you see your school math or a recursion in:

x = x² - 2

I see my school math. And to really get the functional solution to the make challenge I'd better had seen a recursion in it. My friend recommended the following reading to up my familiarity and awareness of those pure functional concepts:

Structure and interpretation of computer programs. This one is famous. MIT has an online version of the book and the lectures recorded in 86 I believe.
The Little Schemer
Sketchy LISP

Let's see if working through the SICP book gets me closer to the functional nirvana. The book is coming any day now.

Monday, March 19, 2012

Cloning into a copy of GIT repo

I needed to get the files from under a file copy of a GIT repo. For some reason cloning into a remote repo didn't work, the remote machine kept hanging up on me. But I was able to scp into the machine and just grab the repo files. All I needed really was a quick look into someone else's work and figuring out how to properly clone the remote repo would take me longer so I figured what's the heck.

Here's my recipe:

copy the repo to some local location. let it be <REPO>
create a folder where you want your "true" local clone to reside. let it be <LOCAL>
fire up git bash and CD into the <LOCAL>

the following command sequence will get you where you want to be:

$ git init
$ git remote add main <REPO>
$ git fetch main
$ git checkout master

That's it. I now have my repo copy "unzipped", pretty much like if the original attempt to git clone it worked

Enjoy. Drop me a line if you know of a better way to unzip a copy of a git repo.

Monday, March 05, 2012

cover_me for standalone Ruby

I am working on this small home project of my own exploring various aspects of Ruby language while building something useful for myself. I will tell more about what it is once (if :) it's ready.

Part of the techniques and things that I am trying is TDD. Like I heard the other day on the Ruby Rogues podcast, the right response to "testing is hard" is not "let's don't test", it's "let's get better at it". I also wanted to see if I really can put my mind in a position to think "backwards" when designing an API. When you start with a test you basically first define how you use it and only then go build it. In theory it should lead to a cleaner easier to use API as well as help with the whole YAGNI thing. Instead of doing all those extra behaviors the users might never need you would focus on the critical things that make it useful.

I like my TDD experience so far. I find that it disciplines me to focus on things that matter. Without writing the test first I am more likely to think wild and do some yak shaving, you know. It's likely much less apparent that you're building a not needed feature or doing a premature optimization than it is that you're writing a meaningless test. Starting with the test makes me think of what I need and then I am obligated to build that thing that I need or my tests suite fails. The whole red-green-refactor cycle. It's fun. You should try it.

Having done it for a little while I thought I would measure my test coverage. Enough was said about how it doesn't necessarily mean a lot to have your code "covered", how striving for coverage alone you get your code exercised but not necessarily tested, how there are different levels of coverage, etc. Still I wanted to measure.

Ruby 1.9 comes with a built-in experimental coverage feature that does runtime instrumentation and collects the metrics. There are simple gems out there that leverage it to get the data out and build coverage reports and cover_me is one of them. You basically require 'cover_me' in your test suite and off you go. Almost.

If you do just that you will have the coverage.data file but no reports. Here's their github issue #20 that talks just about that. Running CoverMe.complete! from irb or pry is one of the recommended solution, as well as doing rake cover_me:complete. I did like neither as I wanted it all "out of the box" and working even when I simply do ruby test/test_all.rb. And here comes the beauty of Ruby being dynamic and interpreted language. Any external library you have in your gems can be opened up and looked into. A few minutes (or hours, or days :) of looking through the code and you know what it does and what it needs to do what you want. I first posted a dirty way of doing it and then came up with a much cleaner solution that doesn't require tapping into the cover_me sources. Maybe someone will find it useful. I guess the best way would be to wrap it all into a rake task and just run my test suite along with the coverage steps as one rake command. Next time.

p.s. I know that I shouldn't be using Windows to do Ruby development.

Tuesday, February 28, 2012

Metaprogramming in Ruby

Just yesterday I started the what I learned today series and I'm already thinking it should have been a weekly digest. Not that I didn't learn anything new (I, in fact, did learn a few things that I will share in a second), I just figured it would be too much noise to publish a digest daily. So weekly it is.

Today I instead want to talk about metaprogramming in Ruby. I actually not so much want to talk about it as I want to put a summary for my future self. Like a collection of links that finally made me get it. The links are in no particular order:

A great episode of the Ruby Rouges podcast about what metaprogramming is, how it is different from monkey patching, when and why and how you would do it, and beyond. Great stuff. I would actually highly recommend this podcast even to folks not doing Ruby coding on a daily basis (I am not, for example). I have recently picked it up and my commute just got so much better. Haven't yet caught up on all of the episodes.
Great post from Yehuda Katz about how it's all about self
This question on SO and the answer to it upvoted 66 times
Alternatives for redefining methods with pros and cons
define_method, method_missing, and instance_eval with examples of how Chef DSL is made

I am sure there are more decent blog posts out there on what metaprogramming in Ruby is about and how to do it right and when it is right to do it so please add it to my collection.

UPDATE. As I was wrapping up this one I came across two more very interesting and very relevant posts:

awesome writeup on include and included. It goes into showing off bizarre examples of massive metaprogramming in ActiveRecord as well as some conscious evolution of rails in doing it more right than it was doing it before
A response to the post I just mentioned that brings up an interesting point of namespace pollution. While I am ok protecting the global namespace in JavaScript with the "closure everywhere" (function(...){})(); pattern, I am not sure I see the point in giving up syntax convenience and protecting my class namespace in the same way.

What I Learned Today (02/27/12)

There shouldn't be a day in your life when you don't learn something new.

I believe we all learn every day and just not always think about it. I thought I would start a daily digest of things I learned "today" to help me better remember it and to also share it with you as some may find it useful.

Some of it comes from the things I did while working on a problem in the office, some comes from reading or listening to other people, some comes from proof reading my son's homework. There's clearly different depth to the whole notion of "I learned it" so I by no means say that I am now an expert at the things I learned.

Too much "learned" already so with no further due today I have learned:

Groovy way of grabbing Maven dependencies via Grape and @Grab import annotations isn't searching in your local .m2 repo. You will need to make an explicit configuration in your .groovy/grapeConfig.xml to make it do a trip "home".
Cassandra's secondary index is a hash index so all you can ask it is the equality query. you can't do a range query on it and bitmap secondary indexes are not yet on the roadmap
The way you bulk load data into Cassandra is via building SSTable files directly and pushing them to the cluster. Nothing new really, all "databases" do direct datafiles writes for fast bulk inserts but here you literally create datafiles "increments"
Ruby has a define operator that you can use to find out if super is defined for the method you're in so you can do things like super if defined?(super). Pretty neat
Maven requires you to specify a version of your dependency and would otherwise refuse to build. There's a very popular question on SO why it's this way and how to make it go for the "latest" version.
There are 8 planets in our solar systems. There was 9 when I went to school and Pluto was since downgraded to be a dwarf planet with two more dwarfs discovered so it didn't feel lonely

Friday, February 17, 2012

Net:::HTTP, Encoding, \x8B

Last night coding session, that I kind of waited for the whole evening while putting my kids to bed, quickly turned out to be the "what the heck is going on" session.

I have this small home project where I am building a Ruby library wrapper around a public JSON API. It's half-fun-half-learning exercise so I am trying to stay down to bare metal Ruby without unnecessarily introducing gems that I don't really need. So I am using net/http to interact with the API endpoint and it's been all fine until yesterday.

My code fires up the request, stucks the response right into JSON.parse() and off it goes. As I build this new class to represent a new entity my tests start to fail with :

JSON::ParserError: 743: unexpected token at '▼'

The saying goes that there are two categories of developers when it comes to debugging. Faced with an error, one would fire up the debugger and dive right into it, while another one would sprinkle puts statements and just follow the trails. I am a member of the second group. If you think it's not right then I would encourage you to listed to the Debugging in Ruby episode on the Ruby Rogues podcast. There are cases when debugger is the way to go, but I am convinced that working on your own code with a good test coverage (like it should be!) you're better off finding the problem with your brain and some help from the puts method.

As I was inspecting the response object that JSON parser choked on I got to learn the Ruby 1.9 String#encoding and some other very interesting things. Somehow I remembered early 2000s when I was actively writing Java for the web and dealing with encoding to support multilingual was a black belt art. The database, the IO streams, the XSLT transformations, HTTP headers and HTML meta tags, and what have you. Dejavu. Only it wasn't.

The next thing I did was searching for what Web the almighty had to say. I very quickly figured that if JSON library couldn't figure the encoding then probably no one would. There were receipts suggesting to go through the JSON library only to get the encoding straight! I also came across an enlightening conversation on the ruby lang about whether net/http should set the encoding based on the HTTP headers or not. Good stuff. I still thought it was the encoding issue and for a second I even considered resorting to Mechanize as was suggested in that thread on the ruby lang, even though it reportedly had some similar issues as well

Well, if I did move to Mechanize it wouldn't have helped me. It wasn't the encoding issue. It turned out to be exactly what my code was asking for and all I had to do was to teach my code not to ask for what it couldn't chew on.

Sending JSON requests via net/http I was pretending to be a browser. Not that I needed it, but since I had to send some state in cookie headers anyway I figured I would just pretend I am a full blown browser client. Among other harmless things that I carried over from the Firebug Net log was the following rather important "expectation":

Accept-Encoding=gzip, deflate;

I was telling the server that I am ready to process zipped content and I wasn't. All previous JSON requests that I sent were small enough for the server not to bother zipping it so I had no issues. And then I asked for something big enough and I said I am ok with the things zipped. The server was built by some good guys who knew their stuff and they did just like I asked. Removing that single line and not telling the server I am ok with the things zipped did the trick... Then I thought how silly it was of me not to get puzzled from the byte stream that I saw when I first time dumped the response object to the STDOUT. it must have been too late already :)

Even though I didn't have good time coding the way I planned it, I had a terrific time troubleshooting. It's through exercises like this you learn to be a better software engineer - I got exposed to so many new things thanks to this little trap I got myself into. It's basically like bugs-driven-learning :)

Wednesday, January 25, 2012

Neat JavaScript syntax shortcuts

Every language has a few nice features that are natural result of the syntax rules. It's just many really neat ones are not part of the mainstream code that is being written. Some examples:

Java's double brace initialization

final List<String> list = new ArrayList<String>() {{ 
    add("a");
    add("b");
    add("c");
}};

Ruby's idiomatic default assignment

my_data ||= ""

I came acorss a few nice JavaScript syntax "shortcuts" today that I haven't seen before. Maybe I just didn't pay attention?

Anonymous function call

If you write unobtrusive javascript or just unconsciously use a lot of jQuery then you do a lot of this:

(function($) {
...
})(jQuery);

well, if you're tired from wrapping all your anonymous functions into extra ( ) here's a shortcut for you:

!function($) {
...
}(jQuery);

Getting boolean out of virtually anything

Say you have something passed to your function or there's something you receive from another function and all you need to know is that it's "there". You basically need "true" if it exists or "false" otherwise. Here's a neat shortcut to do so:

myFlag = !!varToVerify;

The double negate will yield true for anything but a boolean false, NaN, or undefined (which will throw an error). Infinity will yield true. You can test it on jconsole:

!!parseInt('a'); // prints false
    a = !!1/0; // prints Infinity
    a ? 1 : 2; // prints 1
    !!b; // errors out with 'b' is not defined
    !!Object.prototype.toString; // prints true
    !!{}; // prints true

What else? what other neat JavaScript syntax "shortcuts" you can share?

Tuesday, January 24, 2012

Spotify Puzzle in Java, Ruby, Prolog and ... Common Sense

I like music and rarely work without a nice track playing in the background. Pandora and MOG are always with me, on my laptop, on my iPhone, on my Kindle. I believe my TV has a Pandora app and I saw one running on my neighbor's fridge (fancy that). Last week I figured I would check out Spotify just to make sure I wasn't missing out on something even better. Facebooking is not something I do so that's how I missed all the buzz I guess. Anyway, long story short, I didn't chose Spotify over MOG but I did find myself a puzzle to spare a few night hours with.

These guys posted a few puzzles for their candidates to try before one would consider applying for an engineering job at Spotify. As of time of this writing there are three, each labeled with a music genre - reggae, funk, and heavy metal. I would almost always prefer heavy metal over reggae and funk given no other choice so I knew which one I would tickle my brain with.

The Bilateral Projects is about finding an optimal solution to what sounds like an easy problem. You have project teams of exactly two individuals each from one of two different offices and you need to invite the smallest amount of people to a company conference to save on expenses while having all projects represented. Same people can work on more than one project hence a potential for savings. Should the problem have more than one optimal solution you would pick one that would invite your friend otherwise you wouldn't really care. Oh, and a very important detail as we will soon learn. The input may ask you to optimize for as many as 10,000 project teams.

Java

First, let's do a careful, defensive-style object oriented brute force solution in Java. We would find all possible solutions, reduce the list to only optimal ones, and finally pick one that ideally has our friend in it. Clearly not a solution for large data sets but a good way to test the waters and see what we've got here.

Note: if you're here for a solution then you should scroll down to the Dominating Set chapter, the Java part won't be of much interest to you. O(2^n) isn't really a solution for anything larger than a handful of teams.

The full code is on github and here's the important pieces. Input data is parsed into a collection of Team objects each referencing two Employee objects (we could clearly work with primitive arrays but that's something I left for Ruby). It's now time to build a list of all possible solutions to the puzzle. There are algorithms to build all combinations and permutations but we need a slightly different thing that can be expressed in a much simpler way. We only need one of the two team members from each team which means 2^n choices that we can visualize as a binary tree. A tree as deep as many teams we have branching out and doubling the nodes on each level to have 2^n on the bottom leaf level. Take a look.

Each line in the dataset on the left represents a team of two employees referenced by their IDs. Full solutions tree would look like the one to the right:

And the code does exactly that. It loops through all teams, accumulates a List<Set<Employee>> (a path from the root down to the end leaf node is a set of nodes you need to walk through, and outer list as a collection of all possible ways down), creates a new solution branch on each step adding employees from the team as leaf "nodes":

private List<Set<Employee>> generateAllSolutionsList() {
    final List<Set<Employee>> accummulatedSolutions = new ArrayList<Set<Employee>>();
    accummulatedSolutions.add(new HashSet<Employee>()); // root of the solutions tree
    
    for (final Team team : projects) {
        final List<Set<Employee>> newBranch = new ArrayList<Set<Employee>>();
                
        for (Set<Employee> solution : accummulatedSolutions){
            final Set<Employee> solutionSibling = new HashSet<Employee>(solution);

            solution.add(team.getLondonEmployee());  // "left" leaf node (existing branch)
            solutionSibling.add(team.getStockholmEmployee()); // "right" leaf node (new branch)
            
            newBranch.add(solutionSibling);
        }
        
        accummulatedSolutions.addAll(newBranch);
    }
    
    return accummulatedSolutions;
}

Having a superset of all solutions we now just need to reduce it a few times to get to the bottom of it. First, let's sort them all and get that minimum size we were looking for:

final List<Set<Employee>> allPossibleSolutions = generateAllSolutionsList();
Collections.sort(allPossibleSolutions, 
        new Comparator<Set<Employee>>() {
            @Override
            public int compare(Set<Employee> set1, Set<Employee> set2) {
                return set1.size() - set2.size();
            }
        });

final int smallest = allPossibleSolutions.get(0).size();

Now reduce it to only optimal solutions:

// reduce to only optimal solutions
final List<Set<Employee>> optimalSolutions = new ArrayList<Set<Employee>>();
for (Set<Employee> s : allPossibleSolutions) {
    if (s.size() > smallest) {
        break; // sorted list so we know all others are even larger in size
    } else {
        optimalSolutions.add(s);
    }
}

Finally, try to find that best solution that has our friend in it:

// SPEC: If possible (subject to the set of people being smallest possible), 
//       the list of invitees should include your friend
final Employee myFriend = new Employee(MY_FRIEND_ID);

// reduce to those having our "friend"
final List<Set<Employee>> bestSolutions = new ArrayList<Set<Employee>>();
for (Set<Employee> s : optimalSolutions) {
    if (s.contains(myFriend)) {
        bestSolutions.add(s);
    }
}

final List<Set<Employee>> solutions = !bestSolutions.isEmpty() ? bestSolutions : optimalSolutions; 
final Set<Employee> solution = solutions.get(0);

Given the data set we used to illustrate the solutions tree the code would find a 5 people solution: 2002, 2021, 2038, 2024, 2031.

Dominating Set

Before we move on to other languages and potentially better ways to solve the heavy metal puzzle let's see what we're dealing with. If we plot employees on a chart and cross-link people who work together we'll end up with a graph (or a set of graphs) with employees as vertexes and projects and edges. Here's one for the dataset we've already looked at:

I used orange coloring for the employee nodes that represent the optimal solution. You see how all other nodes not colored are connected to at least one colored node by an edge? It make sense, right? We want to pick employees to represent all projects so we should have all edges "covered" by the highlighted set. In graph theory a set like this is said to be a dominating set. And there's apparently a dominating number that represents the number of vertices in a smallest dominating set... As wikipedia says, the dominating set problem [..] is a classical NP-complete decision problem [..] therefore it is believed that there is no efficient algorithm that finds a smallest dominating set for a given graph.

Right. Why would otherwise folks over at Spotify have called this one a "heavy metal" puzzle? :)

Ruby

My original idea was to illustrate that one would use much less words to express same thing in Ruby but rewriting a brute force solution in Ruby is no fun so let's optimize it a bit. Instead of building a list of all potential solutions let's just try to get straight to that optimal one.

The idea is to:

Try to figure out who is more likely to make it to the final list of attendees
Add that likely candidate to the attendees list and reduce the list of projects removing all that the selected candidate works on.
As we reduce the projects list we will keep notes of other people who participate in those being removed. This way we will know we don't need them unless they represent other projects
For the teams that don't have a clean "winner" we would take either one giving a preference to our friend (if he or she is one of the two) otherwise we'll try to balance between the two offices

Our Solver will keep track of projects, employees, and selected attendees (oh, and full listing is of course available over on github):

class Solver
 attr_accessor :projects, :employees, :attendees

end

When solver initializes itself from the input data it builds a list of projects (a project is a simple two elements array and a list of projects is an array of project arrays), and also calculates how many projects each employee participates in. The latter we would keep in a hash with employee ID as a key and a number of projects as a value:

def initialize(fileName)
    # a project is a team of two. collect all projects into an array
    @projects = File.readlines(fileName).grep(/^\d+\s\d+$/).collect {|line| line.split(/\s/)}

    # employee ID as a key and project participation index as a value, default to 0
    @employees = Hash.new(0)

    # calculate participation index
    @projects.flatten.each {|e| @employees[e] += 1 }
    @attendees = []
end

well, it's time to optimize it:

def optimize()
    # project with the most "active" employees and ideally a project with a clean winnder
    project = @projects.sort_by! {|p| (@employees[p.first] - @employees[p.last]).abs}.last
    
    @attendees <<
        if (@employees[project.first] != @employees[project.last])
            # clean win
            (@employees[project.first] > @employees[project.last]) ? project.first : project.last
        else
            # a tie with a prefernece to "a friend"
            project.include?("1099") ? "1099" : project[rand(2)]
        end
    
    # discard projects that the attendee participates in 
    # and reduce  participation index of the individuals on those teams
    @projects.reject! {|p| p.include?(@attendees.last) && p.each {|e| @employees[e] -= 1}} 
    
    optimize() if @projects.length > 0
end

That's it. A bit of "side effect" programming and we're done. It generates a slightly different though still optimal solution: 2002, 2021, 2024, 1003, 2031.

Note: project list is sorted in place using the ! version of the sort_by to keep subsequent sorting close to a best case O(n). There's still room for improvement though: we could stop recursing and re-sorting once we know the rest of the list are teams that don't have "shared" resources on them. We could also do a more intelligent sorting to push project teams where people are equally shared across more than one projects first when there's no more "unbalanced" teams. I am jumping ahead of myself a little bit but my tests didn't show any noticeable difference from those improvements so I voted for smaller and cleaner code, not the code that pretends to be "smarter" when it's not.

Prolog

I have to admit I spent a few evenings "learning" it before I could express a solution to the puzzle in a declarative way. I thought the experience I had with inference systems in a comfortable environment of imperative languages (JBoss Drolls, for example) will be enough to just "get" it and boy, was I wrong. It's predicates, not functions; it's unification, not assignments; it's terms and atoms, not variables; and on and on it goes. If you never programmed the logical way I dare you to try! Like Bruce Tate wrote about Prolog in his Seven Languages in Seven Weeks: "if this is your first exposure, I guarantee either you will change the way you think or you'll fail". A good prime on predicate calculus might also help.

Prolog's approach to solving any kind of problem you would throw at it is all about permutations. It's optimized to proof search and backtrack the decision tree so we know the puzzle will be solved with the "brute force"'s O(2^n) "complexity". Let's see:

Firs, we define the attendees rule. It starts with a terminate empty goal to make sure the recursion has where to stop. Then we say that either one of the two employees is a good solution for a one team puzzle, and then we recursively solve it for a list of teams. attendees works on a list of tuples and in this form would report the entire solutions tree (just like that one we build in Java). The sort predicate at the end will make sure the solution set is free of duplicate employees (those shared across more than one project teams thus listed more than once).

attendees([], []).
attendees((E1, _), [E1]).
attendees((_, E2), [E2]).
attendees([H|T], Solution) :- 
    attendees(H, AttendeeFromFirstTeam), 
    attendees(T, AttendeesFromOtherTeams), 
    append(AttendeeFromFirstTeam, AttendeesFromOtherTeams, Accumulator), 
    sort(Accumulator, Solution).

It's now time to find that optimal solution that ideally has our best friend in it. Let's collect the list of all solutions, sort it by length (assuming lists are free of duplicates), push lists with our friend to the front among same length lists, and then just pick the first one. We are going to need a custom sorting predicate:

optimize(Result, ListA, ListB) :- 
    length(ListA, X), 
    length(ListB, Y),
    (\=(X,Y) -> compare(Result, X, Y) ; (member(1099,ListA) -> Result = (<) ; Result = (>))).

optimize compares by length when two lists are different in size and, when two lists represent an equally optimal solution, returns the one with our friend in it (if found) or the first one. Here's the solve goal:

solve(AllTeams, OptimalSolution) :- 
    setof(Solution, attendees(AllTeams, Solution), UniqueSolutions),
    predsort(optimize, UniqueSolutions, [OptimalSolution|_]),
    length(OptimalSolution, L),
    print(L).

It collects all permutations as delivered by the attendees goal, sorts them with the optimize, picks the head of that list and prints its length. When solver.pl is loaded you can ask Prolog a question like this:

solve([(1009,2002),(1009,2021),(1002,2002),(1003,2038),(1003,2002),(1021,2021),(1022,2024),(1088,2031)], Solution).

and it will gladly respond:

5
Solution = [2002, 2021, 2024, 2031, 2038].

You can find projects.pl on github. Note: Prolog dialects vary. I used SWI-Prolog and then tried to run it in GNU Prolog and it didn't work. GNU version didn't like the predsort/3 which I believe is SWI-Prolog "extension".

Let's Rock!

There's not much examples on the Spotify website so I had to generate a few data sets to see how well we can cope with the puzzle, at what point each solution breaks, and whether it finds that optimal solution. The generator that I quickly put together is capable of building various setups based on a few parameters: number of teams/projects, how many employees from each office to consider (a density factor basically and a way to do balanced and unbalanced participation), and finally whether to "pad" it with our friend a few extra times. It produces data files with the names like this - dataset_A_B_C_D - where A is how many teams we asked it to generate, B and C stand for how many Stockholm and London employees it randomized project participation across, and D is for up to how many times it tried to squeeze our friend into the project teams. You can find the generator on github as well.

Let's start with a few simple tests. 20_7_4_0 is a slightly unbalanced setup but it's a good test. We know right of the bat that the optimial solution can't require more than four employees. It can be less if rand(4) didn't spit out all the options though it's unlikely for 20 runs. So let's see: Java solution said it's 4 very quickly, so did Ruby version, so did Prolog.

30_10_10_2 is already large enough to keep Java spinning and eventually run out of heap space. I gave it some more and it ate it all up and was still not satisfied. That was to be expected with exponential complexity and all those Set objects. Prolog ran out of local stack recursing into the solutions tree. Poor guy. Ruby version was very fast telling me it needed only 9 employees. I would expect it at max to be 10 so 9 is believable. 100_30_20_0 reports back it needs 20. 100_80_60_0 says 51. (your experience may vary as data sets are essentially randomized). 100_500_500_0 says 85. Again, I would expect the number to be less than 100. So far so good.

500_50_50_0 reports... 53. 5,000_500_500_0 reports ... 563. And the edge case of 40,000_999_999_0 took a few seconds to generate and about 20 seconds to solve. It reported... 1014. Well, now we know we didn't crack it completely... We'll get back to this one in a minute.

To wrap it up on the algorithmic part, Java and Prolog solutions have at least O(2^n) complexity to collect the solution tree plus n*log(n) sorting (they all do variations of merge sort) and some more overhead. The Ruby version in the worst case would loop n times and each time do n*log(n) of sorting and another n iterations to "reject" the processed project(s). That gives us O(n^2*log(n)) if I am not mistaken. No surprise it can get to the bottom of a 40,000 set. Windows calculator in a scientific view reported "overflow" when I tried to look at 2^40,000. And n^2*log(n) for 40,000 is a little shy of 10 billions which isn't really a big deal to iterate through, especially with the tail recursion keeping it all "flat". I have to admit though that I liked the way Prolog solution "sounded".

So what's up with those large sets producing suboptimal results? Well, I wouldn't expect I could find a fast algorithm to what folks with PhD in computer science believe to be a classical NP-complete problem :) Here comes the last optimization.

Common Sense

If the "optimized" answer after 20 seconds of crunching the numbers is larger than the number of employees from one of the two offices participating in all projects then do the following. If you're still up for some savings and can afford having one of the two offices miss the Barbados "thing" and still be there when you come back then just take the smallest office with you. If you don't think your talent will stick around when you dump them like that then I see another option. Understand that people is the most important asset in your business and if you truly can't take everybody to Barbados then take them all to either London or Stockholm and have some real fun together :)

Wednesday, September 07, 2011

Very simple Java puzzle

I spent a good minute figuring out what was going on. See if you like solving this one:

The piece of code I will be showing in a second receives a String that might have a single digit at the end. I need to extract that digit and increment by one. If you wondered, the code is actually a content automation tool that runs inside the Apache Jackrabbit (it's actually Adobe Day CQ/CRX) and it's messing with the content nodes and their attributes. Sibling nodes have unique names and CQ simply adds the _x, where x increments with every new node of the same original name. My script needs to do the same and so I am looking at an existing node and trying to parse out it's "x" to further increment it. It goes like this:

if (Character.isDigit(nodeName.charAt(nodeName.length() - 1))) {
    colNumber = Math.max(colNumber , Integer.valueOf(nodeName.charAt(nodeName.length() - 1)) + 1);
} else {
    colNumber++;
}

and for a nodeName of colctrl_1 is says the next number is ... 50.

if I do like this:

if (Character.isDigit(nodeName.charAt(nodeName.length() - 1))) {
    colNumber = Math.max(colNumber , Integer.parseInt(nodeName.charAt(nodeName.length() - 1)) + 1);
} else {
    colNumber++;
}

I will get a compilation error because there's not parseInt(char). so I change it to look like this:

if (Character.isDigit(nodeName.charAt(nodeName.length() - 1))) {
    colNumber = Math.max(colNumber , Integer.parseInt("" + nodeName.charAt(nodeName.length() - 1)) + 1);
} else {
    colNumber++;
}

now it reports the next number is 2.

I am sure you know why, I now do too. Still, will you tell me?

p.s. Here's a nice read from Alex Miller on the Integer.valueOf(): http://tech.puredanger.com/2007/02/01/valueof

Tuesday, August 23, 2011

Just a day in life: "In and Out"

4:00am -	alarm goes off on my iPhone.
4:10am -	my cab is already upfront. first time on time
4:20am -	in the cab. I am sure Frank is the only white driver in Atlanta, really nice guy, works all nigths. He moved here from Arizona more than 10 years ago. He tells me it's a paradize there. I know it's not, I've been there three times when I had a project with a client down there. It was 110 early in June! That's above 40 in centigrades! It's very dry down there so you don't have that humid air like you do in south east, but the heat still kills you. The brain starts melting afer a short five minutes walk. You basically spend your summer (and it's not a three months summer, believe me) indoors to enjoy the rest of the year outside. I guess some people just tend to like their home places most of all...
4:55am -	at the airpot. I-75 goes straight through thre downtown. nice ride in the morning
5:15am -	past the security gate (ATL is a way better than PHL on Monday mornings. I would spend good 40 minutes back there)
5:30am -	Quiznoss, as a business, was once reported to do much worse than Subway but their subs are much better still. breakfest time.
5:40am -	one thing I like about morning flights is that the pllane is always at the gate
5:55am -	no one is about to start boarding us yet
6:00am -	here they come
6:30am -	taxing out of the gate. on time.
7:50am -	wow, a good one hour of sleep. It rarely happens to me on the plane and I always wish it did
8:20am -	landed. we're 20 minutes earlier so now need to wait for the gate to become available.
8:50am -	immigration control. flying to Canada feels much like flying domestic US except immigration. here you remember you actually are about to cross the border.
9:10am -	in the cab on my way to the office. flying without luggage with same day return is the fastest. No need to wait at the carousel at a baggage claim and then again at the rental car counter... it also feels better in the back seat of a limo than in the front seat driving a Hynday Accent "compact"
9:40am -	Morning cofee and I am back online (mentally too). Flying short distances feels a lot like riding a bus or a train. Just a lot more expensive :) One guy once told me why it should be so. He said (rising his hand up above his head with the palm flat facing down illustrating a flying airplane) - "magic", then he lowered the hand below his waist with the palm this time illustrating driving on the surface of the earth - "no magic", back up again - expensive, back down again - cheap. Magic, no magic, expensive, cheep.
10:00am -	meeting, more like a working session, nothing special
12:00pm -	done with the meeting. btw, don't you think the "pm" after 12 is a little weird? 11pm is almost midnight, then it's 0:00am the next day. 11am is almost noon and then comes the 12... pm followed by 1pm and all the way up to 11pm. weird...
12:40pm -	over with the lunch and back to work. follow-up emails, skype calls, full recursive check-out of the SVN HEAD revision...
2:00pm -	meeting. it started more like a working session but then slightly transitioned into a fight. I am currently right in the middle of a rather large content-centric project that I am delivering with the team of about fourty. That's only on our end. There are other partners and vendors and things balance on the edge: from very tough to extremely challenging. The whole thing is very fragile and as things heat up so does the temperature int he room. What was scheduled to last for two hours lasted for about three and a half and we only have gone through half the agenda.
5:30pm -	in a cab. on my way back to the airport. it doesn't look like these guys accept credit cards so I will have to pay cash. You can pay US dollars but you should only expect canadians as a change. Will spend it here next timme, no big deal. My Mondays will likely all become like this one for a few months to come.
5:50pm -	dragging through traffic on 401 West. I should be fine, my flight will only start boarding at about 7:50pm. Hope to grab a bite at the aitport and do some followup emails. Unlike Atlanata, Toronto offers free wi-fi in the airport. I first time encountered free wi-fi in Ireland's Shanon back in 2004. It's 2011 and not all major airports do that. You can pay $9.99 a month and have it with Bingo, they have hotspots everywhere and ten bucks a month is nothing... still, wi-fi at the airport should be free. They charge us, passenmgers, enough through the fees they charge airlines which we end up paying anyway.
6:05pm -	through with the traffic. should be at the gate in five
6:50pm -	done with the immigration and security control. It would indeed feel like a bus ride if not these two things on the way in and out.
6:55pm -	there's a nice pub two gates away. will grab a bite, get some beer, do some emails. I have a good hour before my flight starts boarding. I wish they had chargers under every dining table.
8:10pm -	boarding
10:20pm -	landed in Atlanta
11:30pm -	made it home. it's when I opened the door I actually felt how much I missed my family. Everybody is asleep by now. Will see them tomorrow in the morning. Dennis started his 4th school year on Aug 15th and his bus is picking him up at 7:08am. We will be up by 6:20 tomorrow. Have a good night you all.

Sunday, March 20, 2011

A train set and a two story bed

What would you name if I asked you to recall a few things you had badly wanted as a kid but had never had?

Mine are a train set toy and a two story bed.

Train set. A metal one, from the East Germany, the ultimate gift I, a six year old boy in the USSR, could have wished for. It was expensive, would likely cost a good part of what my dad was making a month, if you could find it. No store had it. Your family should have had connections to get it delivered from the place of origin.

Two story bed. I was growing up with my younger brother and we were sharing the room for as long as I remember myself, up to the point when I left to start my own family. As a kid I saw a two story bed once or twice and it must have penetrated my memories so deep I still cannot get it out.

My son is eight and a half. He had number of different train sets made of metal and wood and plastic and stopped playing them all a while ago. His younger sister is two and a half and they rarely share a bedroom but he enjoys the second story of his two story bed. I told him how much I once wanted to have one when I was a kid so he offered me the second story on the night it was brought in and assembled. I let him enjoy it and kept my memories :)

Last winter I took my whole family skiing up north in New Hampshire. We shared a nice big house right on the mountain with two other friend families, our rooms where on the first floor and one had a two story bed in it. The bed was higher than usual, my son found it uncomfortable to be too close to the ceiling so I took it. Next night I sprained my ankle getting down at night, must have missed a rung or something. Did not enjoy much climbing it back and forth in the upcoming few days but still love my memories of not having it as a kid :)

Care to share your story?

Monday, November 29, 2010

My Phusion Passenger Checklist

I spent some considerable time trying to get the "hello world" rails app (the one you get when you run rails new myapp) to run under Apache + Passenger the same way it does under rails server.

I tried both root and sub URI setup and was basically struggling with variations of the following three errors:

Apache would serve static content out of the document root and then send 404 for the default /rails/info/properties route (Apache
s 404, not RoR's 404)
The Passenger would engage but then fail to start the rails app
The Passenger engages and rails app starts but the default /rails/info/properties would report a "No route" error even for local request (telnet from within the SSH)

Without going into very much details (drop me a line if you're interested and I will elaborate) here are the things that you should check:

Permissions for the folder the document root is pointing out. It has got to be accessible and executable by the Apache's user
Have the mod_rewrite loaded. The default Apache installation that you get with "sudo apt-get install apache2" doesn't have it and the Passenger's doc (at least the pieces I skimmed through) doesn't explicitly tells you to have it on.
The default /rails/info/properties controller works for local requests only and only in the development environment. Passenger runs your rails app in production environment by default. So make sure to have the config.consider_all_requests_local set to true and also use the RailsEnv development directive in the Virtual Host configuration
You don't need the config.ru in your application root. Passenger might treat your app as a Rack app and it's not. Not sure if this one helped me but I googled it up before I figured I was missing the rewrite mode so I followed the advise

Right now both built-in rails server and the kosher Apache show exactly the same result and render the well known "You're riding Ruby on Rails" including the little environment details AJAX piece.

--

I would have been proud if I figured it all in "no time" as the documentation suggested :)

Sunday, November 28, 2010

Continuations

Today in the morning I made potato pancakes for breakfast. My family loves it and I usually make more than we together can eat so we have some leftovers as snack.

I must have made a little more than usual today and there was something left when in the afternoon we went out to finish our holiday shopping. Great, we packed it to-go so that kids have something to eat on the way.

My little daughter took one out of the bag and... fall asleep halfway through it. It actually was her time to take a nap anyway. She slept for good forty minutes as we drove around making our scheduled stops. All this time she kept the pancake safe in her hand.

We got back home and she woke up the very minute I turned off the engine. Looked at me. Smiled. Looked down to her hand. Saw the pancake. And she just resumed eating it as if nothing happened.

I couldn't help it and smiled with a very clear thought - continuation. A great example of one in our real life :)

--

Just about fifteen minutes before, at our last stop, I was catching up on my RSS subscriptions watching Alice asleep while another half of my family was finishing the grocery gshopping list. It's what I was reading that fifteen minutes later made me think continuation and not "how cute".

The latest installment from Eric Lippert about async in the upcoming C#5.0. Even if you don't program in C# or work on .NET platform (I do neither) it's absolutely worth it.

Read it and tell me if you still think I should have first noticed how cute that move was :)

Tuesday, November 16, 2010

Nice weather up here

Online at 30,000 feet is nothing special these days though I still find it amazing. For some $10 you can get a cross country flight long WiFi access. If only the coach cabin had a little more leg room for a full size laptop experience to be a pleasure and not a struggle. Diverting to iPad might be great for reading and movies and emails but if you need to work you're still stuck with a laptop. My usual in-flight setup includes a laptop to catch up on emails and do some work, a Kindle to read a book, an iPhone to listen to music and catch up on pre-cached RSS feeds, and some printouts to read during takeoff and landing (apparently, they let you use WiFi past 3,000 feet but require no electronic device on during take off and landing).

It's a nice weather up here:

Tuesday, October 19, 2010

My first question on stackoverflow

Here it is:
http://stackoverflow.com/questions/3965331/weird-behavior-around-same-erasure-compilation-error

Playing with the code on the edge of compile and runtime errors is fascinating. Compilers are software components and may have defects and deficiencies. Language specs may have wholes and ambiguities too. But these are thoroughly tested in labs and then applied daily by hundreds of thousands (if not millions) of programmers so they must be almost perfectly clean. Almost. But still not perfect.

How far from this "still not perfect" is the software you write?

Wednesday, September 29, 2010

VirtualBox vs. Virtual PC as a host for Ubuntu

Having worked with Ubuntu on Virtual PC for a little while I gave it a test inside a VirtualBox. A waaaaaaay better. Squeezing a Linux to run in a Windows virtualization host designed to only accept Windows guests might have been fun in and of itself, but using it there is not necessarily the experience you would enjoy.

VirtualBox over Virtual PC as a host for Ubuntu:

No hassle installation. Just point the CD drive to the Ubuntu installation ISO and enjoy the ride. No tweaking with the VGA loading modes and GRUB.
Out of the box networking with the host. No need to manually install loopback adapters as it comes with a network "device" to support it
No-capture mouse integration. Virtual PC has a hot key to release a mouse pointer to the host but it conflicts with my Intel video. When I do Ctrl+Alt+Left my screen turns 90 degree. Not fun. VirtualBox with guest additions captures your mouse when it's inside a guest OS window and releases it once you past the window's border. Natural. The only thing that I liked better about VirtualPC here is that it seems to propagate Alt+Tab to the host. So basically when you're in a guest Ubuntu window pressing Alt+Tab brings up the familiar "where to" selector. In VirtualBox you have to start with the "host" key.
Visually VirtualBox has a nicer response from the Ubuntu guest GUI compared to Virtual PC. At least no hiccups. Ubuntu in Virtual PC ran ok until I hooked it up with the host over a loopback adapter. That was when it became noticeably slow.
Audio support. I don't need it in a guest system. I actually don't need a desktop Ubuntu but it was nice to hear what Ubuntu plays when it boots up :)

There's got to be more. Unlike Virtual PC VirtualBox officially supports Ubuntu and many other guest OSes.