So You Think You Can Code, Eh?

My team at kijiji.ca recently launched a coding contest. You can find this contest on Github at https://github.com/kijiji-ca/2013-coding-contest and also see our official announcement at http://kijijiblog.ca/so-you-think-you-can-code-eh.

Early feedback suggests that people have a lot of fun solving this problem. It can certainly be rewarding as an achievement by itself, but to make it even better we’ll be offering some sweet prizes to the best solutions, including a top notch MacBook Pro to the winner. There is still plenty of time to participate, the deadline is on July 31st.

coding contest banner

The instructions come with a Java unit test to fix, but the implementation can be using any JVM language, as long as it’s properly integrated with the provided test case.

It’s great to see this contest received with enthusiasm from the developers community. I’m definitely looking forward to reviewing a lot of brilliant solutions and I’m sure I will learn interesting things along the way. I will be happy to blog about the results when the contest closes.

If Java was a Haskell – The Type System

In my previous article If Java was a Haskell I tried to explain the pure functional paradigm offered by Haskell through the words of a Java developer. In this post I will focus on the other strength of Haskell, a strong type system.

Types

What would a Java Class become in the Haskell world?

In my previous post, we ended with static methods working against immutable data structures. In this context, there is no room for Objects as Java developers know them. The first thing which appears clearly is that these static methods are not tied to the data structure they work on. So they don’t need to be in a class. They only need some sort of namespace, a file to keep our code organized. So imagine you can create a method directly at the root of a file. Is there anything left to make a class? Yes, the data. That’s why Haskell uses the keyword data instead of class.

Since the data structures are immutable, they need to be initialized with constructors. Here is an example of a basic class:

class Animal {
  String name;
  Color color;

  Animal(String name, Color color) {
    this.name = name;
    this.color = color;
  }

  String getName() {
    return name;
  }

  Color getColor() {
    return color;
  }
}

Oops … a problem with the getters above is that they’re instance methods. There is not such thing in a functional language. That’s easy to fix, by extracting these methods from the class and making them static like we discussed in the previous article:

class Animal {
  String name;
  Color color;

  Animal(String name, Color color) {
    this.name = name;
    this.color = color;
  }
}

String getName(Animal a) {
  return a.name;
}

Color getColor(Animal a) {
  return a.color;
}

A lot of boilerplate can be removed. When you have named parameters, all the information you may need about an object is available in the constructor. So the language can leverage that and our class becomes simply:

class Animal {
  Animal(String name, Color color);
}

The parameter names are used by the compiler to generate the “getters” functions. This type of constructor is called record syntax in Haskell. Here is an example of how it could be used in a function which tells us if an animal is dangerous:

boolean isDangerous(Animal a) {
  if (getColor(a) == Color.PINK) {
    return false;
  }
  return true;
}

There can be many constructors with different parameters, but also, and that’s a bit more surprising, different names. In Java, we can’t have 2 constructors with the same number and type of parameters, only because they must have the same name. When you remove this limitation, you can do something like:

class Animal {
  Cat(String name, Color color);
  Dog(String name, Color color);
}

The name of the constructor used is a bit like an additional field, used as a discriminator. With a little bit of destructuring, we can use this information. Here is a method which makes an animal speak its language:

String speak(Animal a) {
  switch a {
    case (Cat name _) : return name + " says Miaow!";
    case (Dog name _) : return name + " says Woof!";
  }
}

Notice how destructuring also gives us the field name, so we don’t need to access the generated getter. We use “_” for the color parameter which means we don’t care about it. For a simple class like Animal, we don’t even need to give names to the constructor parameters, we will not have getters but that’s fine thanks to destructuring. So we can simplify Animal into:

class Animal {
  Cat(String, Color);
  Dog(String, Color);
}

In Haskell, the signature of the function is separated from the implementation which brings the benefit of having multiple implementations driven by pattern matching. That removes completely the need of the boilerplate switch statement. While we’re at removing boilerplate, the return statement can go too, the last statement is automatically returned. Our speak function becomes:

String speak(Animal a)
speak(Cat name _) { name + " says Miaow!" }
speak(Dog name _) { name + " says Woof!" }

One last note on types. What happens when the constructors of a class don’t have any parameter? We obtain something very familiar to Java developers: enums. But we don’t need a different name for it, they’re just classes. We can even mix enum constants and parameterized constructors. For instance the class Color could have Pink and Color(int int int) as constructors.

Interfaces (Typeclasses)

When types are not used to define behaviors, the concept of inheritance goes away. However polymorphism is still useful and is provided in Haskell by typeclasses, which is something similar to Java interfaces, except it defines a set of functions which can be applied to a data structure. Let’s keep the keyword interface, here is an example:

interface Showable {
  String toString(Showable s);
}

Then, for each concrete type implementing this interface, you define the desired behavior:

class Animal {
  Cat(String, Color);
  Dog(String, Color);
}

Animal implements Showable {
  String toString(Animal a) {
    speak(a); // see function speak above
  }
}

After this, any function requiring a Showable argument can be called with an Animal.

Type Inference

Once we have data types and interfaces, we should be able to rely on the language to deduce the type of variables as much as possible. This is an area where Haskell shines while Java clearly sucks. Here is a trivial example:

myCat = Cat("Puss", ORANGE);
message = speak(myCat);

There is no need to say that myCat is of type Animal, the compiler already knows that Cat is a constructor for Animal. The compiler also infers that message is of type String because that’s what the function speak returns.

The joy of type inference becomes more obvious with higher-order functions. For instance, imagine we have a list of Animal named myPets and we want to transform this list into a list of String containing the names of the animals. In Java, using Guava collections API, we get the following:

List<String> names = transform(myPets, new Function<Animal, String>() {
  @Override
  public String apply(Animal a) {
    return a.getName();
  }
});

In the Haskellized version, we can do:

names = transform(getName, myPets);

That’s it! The compiler knows that myPets is a list of Animal and that map transforms a list into another list by applying the given function to all the elements. If we pass a function that doesn’t take an animal as argument, the compiler will complain. When passing getName, not only the compiler doesn’t complain, it also deduces that the output will be a list of String.

The Java version needed 3 String, 2 Animal and 1 List. Type inference allowed the haskellized version to get rid of these 6 declarations without losing any type safety.

Some special types

Haskell offers a few standard types which are made special by syntactic sugars. Just like any language, there is of course the support for boolean, character and various numbers. More interesting and worth mentioning are:

  • List which is much closer to Java array than Java list and can be defined with square brackets syntax like [1, 2, 3]. A list can even be infinite since it is evaluated lazily (like everything in Haskell). For instance an infinite list of even numbers can be defined as [2, 4, ..].
  • String which is simply a list of characters, so “hello” is the same as ['h', 'e', 'l', 'l', 'o'] 
  • Tuple which is a special small list containing values of different types. It’s basically a container for simple data structure which are not worth creating a type. An example would be (42, “Mo”, “Molybdenum”)

A very special interface: Monad

Monads are not specific to Haskell, there is even a monad implementation in Java. However monads are one of Haskell most famous signatures because Haskell offers syntactic sugar that makes them natural to use, it’s called the “do” notation.

A monad is a typeclass (interface in our world). It basically defines a container which can hold a value and be chained with other containers of the same type according to a specified mechanism. To make something useful with this, we don’t chain directly those containers but computations which return them.

A typical example of a monads is the Maybe monad, which can contain either a value or nothing. The Maybe container is parameterized and has 2 constructors. In our haskellized Java, a Maybe of String would be:

class Maybe<String> {
  Just(String);
  Nothing;
}

The chaining mechanism for Maybe ensures that an input of nothing always yields an output of nothing. In other words, as soon as a nothing happens somewhere in the chain, the output of the chain is nothing.

When the language is not designed for monads, this mechanism gets very clunky. But with the do notation, the invocations of the chaining method are generated by the compiler, making the whole thing feel a bit magical.

// utility function which returns a Maybe
Maybe<String> changeSex(String sex) {
 switch sex {
    case "XX" : Just("XY");
    case "XY" : Just("XX");
    default: Nothing
  }
}

changeSexFiveTimes() {
  do {
    s1 <- changeSex("ZY"); // the "<-" is used to extract the value from a monad
                           // (we say bind) but oops ... wrong input yields Nothing

    s2 <- changeSex(s1);   // this changeSex is not called because the chaining
                           // mechanism of Maybe yields Nothing

    s3 <- changeSex(s2);   // this one is skipped too
    s4 <- changeSex(s3);   // and so is this one
    changeSex(s4);  // returns Nothing as the result of the whole block
  }
}

The magical chaining behavior is defined in the Maybe class. Type inference is used by the compiler to guess what type of chaining mechanism to apply (i.e. the first value is a Maybe so the compiler knows the chaining of Maybe applies).

There is a lot more to say about monads, and a lot of literature on the web. Let’s just mention that Haskell uses monads for things which would otherwise be impossible in a pure functional world, namely I/O and state management.

Conclusion

Those two articles were scratching the surface of Haskell with the eyes and the words of a Java developer. If you know nothing about Haskell, I hope I was able to whet your appetite and I recommend you take a look at Learn You a Haskell for Great Good to discover the wonders this other world has to offer.

If Java was a Haskell

Haskell has the reputation of being one of the most advanced programming language of our time. Its main strengths are its functional paradigm, its purity which implies immutability, and a strong type system. I won’t expand on why these properties make it a powerful language, but in a nutshell they lead to short and expressive code, with a pretty darn good amount of reusability, without sacrificing the robustness of strong and static typing which is familiar to Java developers.

If you’d like to learn Haskell, I recommend starting with the wonderful Learn You a Haskell for Great Good. This introduction to Haskell is really pleasant to read, thanks to the detailed explanations and the cheeky humor, and it’s available online for free.

Very often, people recommend OO developers to forget anything they know in order to grasp the concepts of functional programming. I’ve even seen the term “unlearn” in some places. That may be a wise advice but as I learned about Haskell I couldn’t help comparing it with Java. Let’s see how far we would need to stretch the Java language in order to make it a Haskell.

Variables

The first thing that prevents Java from becoming a Haskell is its ever-present mutability. It’s relatively easy to address, just imagine that every single variable has to be final. In other words, every variable becomes a constant in a given scope. This scope can be a class, an object or just local to a method.

From there, all a programmer can do with a variable is binding it to a value, only once, then reading it as many times as necessary. An interesting consequence is that objects can then be created, but never modified, they’re immutable. In order to make Java a Haskell, just imagine immutability is the default and there is no other way. And that’s true even for local variables, so no more i++ folks.

Functions

Haskell is a pure functional language, it means everything can be thought in terms of input and output, like with mathematical functions. The first thing that comes to mind to a Java developer is the similarity with static methods. If you take for granted the total immutability mentioned above, static methods actually become pure functions (as long as they don’t contain I/O side effects which is another story).

Now we need to to beef up a bit these static methods. You may be aware of Project Lambda which will bring lambda expressions in Java 8. It will basically allow methods to take other methods as parameters. In other words, instead of taking only input values, a method will be able to take as a parameter some code to execute. Note that formally speaking a lambda expression is an anonymous function, however Java 8 will offer a special construct allowing to send a real named method as parameter for another method. So we’ll get more than just lambda expressions.

To bridge the gap with higher-order functions that Haskell and other functional languages offer, we also need the ability to return a method. As far as I know, this will also be possible in Java 8 by encapsulating the returned code into a Runnable or Callable. So Java 8 will pretty much allow for higher-order functions.

Example

Believe it or not, with total immutability and higher-order functions, Java would become a functional language. Objects would merely become data structures – I’ll cover this in another post – and imperative constructs such as loops would become useless. Think about it, even if we keep for in the language, how much can you do with it if you can’t mutate anything? Nada! Instead you’ll need to go the functional way, meaning recursion. For instance, to sum up the integers in an array, without mutating a variable, we’d need to do something like this:

int sum(int[] values) {
    return doSum(values, 0, 0);
}

int doSum(int[] values, int index, int accumulator) {
    if (index < values.length) {
        return doSum(values, index+1, accumulator+values[index]);
     }
     return accumulator;
}

The code above requires to pass the current index to look up. To avoid that we would need a way to extract the tail of the array. One way would be to wrap the array into an object also containing the current index and use this object instead of the plain array. That’s how the rest function (lisp name for tail) is implemented internally in Clojure. But in order to bring more Haskell flavor into our mutant Java, we will introduce some syntactic sugar called destructuring, which allows to automatically split an argument into multiple variables. Here is how it can transform our code:

int sum(int[] values) {
     return doSum(values, 0);
}

// notice how values is decomposed into head and tail
int doSum(int:int[] head:tail, int accumulator) {
    if (tail.length > 0) {
        return doSum(tail, accumulator+head);
    }
    return accumulator;
}

In this code, the function applied between the accumulator and each value is hard-coded, it’s ‘+’. Thanks to higher-order functions, we can replace it with any 2-argument function that we send as a parameter. Let’s do this and rename doSum into comprehend.

int sum(int[] values) {
    return comprehend(values, 0, (int x, int y) => x + y);
}

int comprehend(int:int[] head:tail, int accumulator, Callable function) {
    if (tail.length > 0) {
       // Note that I couldn't find how Java 8 will allow invoking a Callable
       // with some arguments so I made up the call method below
       int newAccumulator = function.call(accumulator, head);
       return comprehend(tail, newAccumulator, function);
    }
    return accumulator;
}

The next step would be to parameterize the comprehend method to allow for any type of accumulator, not just int. This way we can re-use comprehend to do other things, such as filtering a list. Here I will assume that primitives can be used as parameterized type in our new language.

int sum(int[] values) {
    return <int, int>comprehend(values, 0, (int x, int y) => x + y);
}

String[] filterStartsWith(String[] values, String prefix) {
    // the function below introduces a new syntactic sugar ++ which allows instantiating
    // an array which is the exact copy of x, with y appended to it
    Callable filter = (String[] x, String y) => (y.startsWith(prefix))? x++y : x};
    return <String, String[]>comprehend(values, new String[0], filter);
}

<E,T> comprehend(E:E[] head:tail, T accumulator, Callable function) {
    if (tail.length > 0) {
       T newAccumulator = function.call(accumulator, head);
       return comprehend(tail, newAccumulator, function);
    }
    return accumulator;
}

You can see how we reused our comprehend method for 2 very different things, summing the integers in a list and filtering the strings which start with a given prefix from a list of strings.

There is more…

At this point, we have a functional Java, that’s already a big step. Haskell provides more functional goodness that I won’t expand on. Here is a quick overview:

  • Laziness: functions and variables are not evaluated before they’re actually used. This even allows for infinite lists.
  • Function composition: like in mathematics you can build a function which is a chain of other functions.
  • Pattern matching and Guards to automatically branch code based on an input value (or a destructured input value)
  • where and let bindings to structure the code with temporary variables
  • List comprehension: this is such a common use case that Haskell has syntactic sugar for it, so there is no need of a comprehend function like we made
  • currying: internally every function takes a single argument, when there are 2 or more in your code, the compiler decomposes it into 2 or more functions. It’s easier to understand with an example, the function a + b can be decomposed into a function which takes an argument a and returns a function which adds a to its argument. Then we pass b to this new function. The cool benefit is the ability to pass around some partially applied function.

In my humble opinion, those are just nice secondary features of Haskell. Its true essence is made of the functional paradigm that I tried to cover in this post, and its type system which deserves another post.