Archive for the ‘Java’ Category.

Revenge of Hello Terracotta

When my manager Alex first started at Terracotta, he blogged a simple Hello Terracotta example to demonstrate POJO clustering in action. When I first started last month, he walked me through this same example and went into quite a bit more depth, decompiling some code and showing me a glimpse of how client POJO’s are instrumented by Terracotta and thus plug into our clustering framework. Here’s what we did.

First, if you haven’t yet, please go and run through Alex’s Hello Terracotta example. I’ll be starting right where he left off.

Now, as Alex pointed out to me, I need to make a slight correction to the example POJO code he is using. Here is the code:

The correction is the commented out bit. The compiler complains about that code, since the root field is final. And anyway, it is not necessary – if an L1 node starts up and Terracotta sees that the clustered root already exists, Terracotta will ignore the assignment in the source code (Item 2 above) and instead set that field to the pre-existing root. That is why, if you run this sample program twice in a row without bouncing the Terracotta server, you will see that the counter continues to increment from where it left off the first time. Terracotta clustered objects persist.

To continue, today when I run this example I am adding the special super-secret -Dtc.classloader.writeToDisk=true flag to the vm flags when I start up the dso process. My full command looks like this:

dso-java.sh -Dtc.config=config/tc-config.xml -Dtc.classloader.writeToDisk=true -classpath bin test.HelloTerracotta

This additional flag causes Terracotta to dump the modified classes under ~/adapted/. Under that folder you should find a test/ directory containing the instrumented HelloTerracotta.class file.

At this point you can examine the raw bytecode to see what we’ve added – Terracotta uses ASM to instrument bytecode on the fly in order to cluster arbitrary client code.

Or, you can decompile the file like I did using Jad. This likely won’t be able to decompile everything all the way, but it’s close enough that you can see what’s going on.

When I decompile HelloTerracotta using Jad, after cleaning up the go() method by hand, this is what I see:

First thing to notice, the instrumented version of the class now implements two interfaces, Manageable and TransparentAccess. These are both in the dso-l1-api project if you download the Terracotta source code. The api is our internal api for dealing with all clustered objects in the L1 nodes.

Next, notice that all direct reads of the root field have been replaced with calls to a new dynamically-generated Set __tc_getroot() method, and the one spot in the constructor where we were setting the root field directly is now replaced by a call to a new __tc_setroot(Set) method. And you can see that those getter/setter methods are doing the work of ensuring any pre-existing root is set on this object, or else creating the new clustered root.

You can see that there are a lot of calls into ManagerUtil – this is a dynamically clustered object’s hook into the Terracotta runtime. ManagerUtil can be thought of as a facade of static methods, encapsulating a more interesting system which I won’t go into here.

The go() method didn’t decompile entirely correctly, so I’ve had to tweak it by hand into what you see here. What I find most interesting is that now, in addition to the synchronized block, there are calls (in a try-finally block) to ManagerUtil to acquire/release a clustered lock. Remember – since the root field is actually a clustered object, modifying it requires obtaining a cluster-wide exclusive write lock. It’s also interesting to note that a chunk of that tc-config.xml file, specifically the lock information including the method expression, is being passed to the monitorEnterWithContextInfo method.

In conclusion, this example illustrates the pattern for any POJO being clustered by Terracotta. Terracotta dynamically instruments a class to adhere to the dso api, an api for handling all of the distributed objects and locks within Terracotta. Of course that barely scratches the surface of all that Terracotta does, but at least you can see that conceptually what Terracotta is doing behind the scenes to cluster your code is really not all that complicated.

For further reading, the Terracotta website offers some very straightforward articles about how Terracotta works, and how Terracotta scales. Additionally, there are quite a few other Terracotta employees who blog, and their blogs are listed here.

Weekly Summary

Here’s what I’ve been working on this week at Terracotta.

The Terracotta 2.6 stable 0 release was released this last week. Personally, my contribution was to help implement the dynamic String compression feature. This week, emphasis shifted from achieving feature-freeze to testing.

We wanted to add some tests around serializing Terracotta-instrumented objects between instrumented and uninstrumented code, particularly our new compressed Strings. We needed to make sure that we hadn’t broken serialization with our bytecode manipulation – particularly for String, which is a special case (although this wasn’t apparent to me until I read the source code for java.io.ObjectInputStream#readString).

I mentioned that we have a system test framework, built on JUnit, which is able to start up and run multiple L1 nodes in a test fixture, complete with instrumented classes. My first attempt was to try to test serialization in-process. That is, for each Class I was interested in testing serialization for, I wanted to load two different Class instances (from two different ClassLoaders) and serialize a sample object back and forth using both Classes. The first ClassLoader would be the normal Terracotta ClassLoader, which loads the instrumented version of a Class (either from the Terracotta bootjar or using dynamic instrumentation – in this case I was just trying to test the bootjar classes). I wrote a second, custom (nondelegating) ClassLoader which would load a non-instrumented Class for the same classname – it did this by loading from the unmodified bootclasspath (without Terracotta bootjar prepended to it). This was tricky, since (as my teammate Tim showed me) ClassLoader is hardcoded to not allow loading of any class beginning with “java.”, except from the bootstrap loader only. However, galvanized with the spirit of reckless abandon that I’ve come to embrace since working here, I quickly circumvented this with reflection:

Don’t try this at home. Anyway, this actually worked – I was able to get a test running which loaded different Class objects for each Class I was interested in, including the JDK classes.

However, I was thwarted in my attempts to test Java Serialization in process. I haven’t taken the time to track down the exact cause, but no matter what I tried I either got java.lang.LinkageErrors or some other problem I can’t remember. I’m sure it’s something to do with the fact that my hacked ClassLoader doesn’t play well with serialization, somewhere deep in the bowels of ObjectInputStream or ObjectOutputStream.

As a compromise, I honed in on compressed Strings specifically. At Alex’s suggestion, I manually serialized a sample large String to a file, using a regular non-instrumented String. Then I wrote a test which took the same String, compressed it, and then Serialized it, and I compared the two resulting byte[] arrays. Luckily, the test passed.

I also added some more unit tests to our StringCompressionUtil class, which is the class that actually does the compression of the String for us. It has two responsibilities: (1) actually compressing the byte[] array which we get from the original String, and (2) encoding that compressed byte[] array as a char[] array which can be stuck back into the same String instance. In the course of adding more tests I did a little refactoring and was pleased to find that even with such low-level code I was able to eliminate some duplication and extract some methods which made it a little more clear what our algorithm was intended to do.

At the end of the week I was just beginning to look into some monkey failures. The monkey machines are our continuous integration machines – read all about them here at Hung’s blog. Basically we do continuous integration on a whole bunch of different OS’s using different JDK’s and versions, all the time. This particular failure was using the IBM JDK on a linux machine.

Hacking on java.lang.String

This week at Terracotta we accomplished something that two weeks ago I thought was both impossible and dangerous – we instrumented java.lang.String to be compressible.

What, you ask, the heck is going on? String doesn’t implement any interface called JavaLangString. It doesn’t have a __tc_decompress() method. String is final and immutable! It has to be for thread-safety! Are you mad?

I offered these same objections to my boss two weeks ago, but the fact is that something like the above code will be in an upcoming Terracotta release. What makes this all possible is that we instrument Java bytecode on the fly.

Terracotta already does some large String compression when clustering Strings across cluster nodes, but we wanted to improve on that. What if, when we decoded a compressed String data from across the cluster and constructed a String instance on a remote node, we actually kept the String instance compressed until it absolutely needed to decompress to be read?

Through the voodoo black magic of bytecode instrumentation, we have accomplished this and made it completely transparent to the application’s use of String. Here’s a basic outline of what happens:

  • data about clustered compressed String is sent over the wire (we call it “hydration”) and decoded at one of the cluster nodes. The following data is encoded:
    • actual compressed String data, in byte[] array form
    • uncompressed String length (int)
    • String hashcode (int)
  • when decoded, the compressed byte[] array is encoded into a char[] array – basically two bytes can fit into a single char.
  • a String instance is constructed with that char[] array, and also the uncompressed length and original hashcode

So, if the resulting String were actually read, it would be gibberish (if it were displayable at all). (By “read” I mean, if it’s internal character content needed to be accessed.) That’s why we need to instrument the String class – if and when the String needs to be read, we have instrumented it so that it knows how to decompress itself. We have basically added a new private boolean field (indicating if compressed or not) a new constructor and some additional methods.

There’s a lot going on here so l’ll point a few things out. First, we intercept any field-level access of the private internal char[] value and we route it through this __tc_getvalue() getter method. This is how we transparently decompress the contents no matter how the String instance is accessed.

Secondly, there are important concurrency issues in play here. String is thread-safe because it is immutable, or was until we got our grubby mits on it. We need for String to remain both thread-safe and also highly concurrent, so we wanted to avoid synchronized blocks. Our solution?

We have instrumented value to be volatile, as well as our new $__tc_compressed field. Now if you once more examine the __tc_getvalue() method above, you’ll see that there is a benign race condition. It’s possible two threads could both decompress and set the value field. That’s fine, since the decompression is deterministic and outputs the same result each time. Because the fields are volatile, once they are set those changes should be visible to all other threads. What should not ever happen is, no thread should attempt to decompress the characters once they are already decompressed – the call to StringCompressionUtil inspects the data to see if it has already been decompressed, and if it has it returns null.

The above methods don’t exist anywhere in source code per se. Rather, we use ASM to add the bytecode for these new methods dynamically to the String class as it is loaded at runtime. So what we actually have somewhere is code that looks like this:

This creates the bytecode to implement the __tc_decompress() method.

When does an instrumented String instance need to decompress itself? Basically, when it’s internal char[] array needs to be accessed. It does not need to decompress when length() method is called, because we have set the uncompressed length when we constructed the String. It does not need to decompress when hashCode() is called, since we have preset the hash code, so that means a compressed String can sit around in a Map all day long without needing to decompress. We even have plans to muck with the equals() method – if two Strings are unequal due to hash code we can exit early and avoid decompressing them to compare character content.

Let’s look once more at the code way at the top:

Because we have instrumented String, at runtime instances of String do implement the interface JavaLangString. So, within Terracotta we can cast an instance of String to JavaLangString, which is an interface containing the __tc_decompress() method. In this way we avoid using reflection. One interesting thing to note is that we first have to upcast to Object to trick the compiler – at compile time the compiler knows (or thinks it knows) that String does not implement this interface, so it errors.

In conclusion, a bunch of us were talking after Alex’s Terracotta talk last night, and someone made the point that, with any good technology, your users will inevitably take advantage of it in a way that you never dreamed of. And that, I think, is what Terracotta is doing.

Bit By Byte

Quick, what’s wrong with this code?

Looks good, right? Well this innocent looking routine caused Eclipse to go rampant and brought my MacBook to it’s knees, twice. I had to lean on the power button both times.

I stared and stared at this code and finally started stepping through it in the debugger! It appeared that both loops were working fine, but that a was never incrementing. It just stayed at -128 while b incremented and wrapped around again and again, infinitely...

Have you spotted the problem yet?

The problem is, neither of the loops can terminate. Neither for condition will ever become false and cause the loop to terminate, because neither a nor b can ever grow larger than Byte.MAX_VALUE. Of course! >Audible forehead slap.<

Feeling rather sheepish, I replaced the above code with this:

I pondered whether this would be a good fit for the elusive do...while construct, but I'm a lazy lazy man.

Taking the Steve Yegge Challenge (Part 1: Bits and Bytes)

A few months ago I read Steve Yegge’s old post about phone screen tips, which included five categories that at a minimum should be covered during a phone screen. I found that I wasn’t as up to snuff as I thought I was in a couple of those categories, including “Bits and Bytes”. I took this as a challenge.

In particular I discovered I couldn’t even name the size in memory of each of the Java primitives. About a month ago I looked them all up, as I was preparing to interview for a job at Terracotta. The sad thing is, I’ve already forgotten them all again. So I am hoping now, once and for all, to burn them into my brain.

Java primitives:

primitive size (bits) min value max value comments
byte 8 -27 = -128 27-1 = 127 signed 2's complement
short 16 -215 = -32768 215-1 = 32767 signed 2's complement
int 32 -231 = -2,147,483,648 231-1 = 2,147,483,647 signed 2's complement
long 64 -263 = -9,223,372,036,854,775,808 263-1 = 9,223,372,036,854,775,807 signed 2's complement
float 32     32-bit IEEE 754 floating point
double 64     64-bit IEEE 754 floating point
char 16 '\u0000' (or 0) '\uffff' (or 65,535) Unicode
boolean       size isn't precisely defined in JLS

Java bitwise and bit shift operators (legal on Java integral types)

operator explanation comment
~ bitwise complement inverts a bit pattern
<< signed left shift keeps sign (high bit) of first operand
>> signed right shift keeps sign (high bit) of first operand
>>> unsigned right shift shifts a zero into high (leftmost) bit
& bitwise and see a clever real-world use of this
| bitwise or  
^ bitwise exclusive or  

Notes on 2’s complement:

  • high (leftmost) bit indicates sign (0 indicates positive, 1 indicates negative)
  • largest value for n bits is 2(n-1)-1, smallest value is -2(n-1)
  • processors can treat (add) all numbers uniformly (no need to check for sign and/or subtract)
  • To negate a two’s-complement number, invert all the bits (bitwise NOT) then add 1 to the result.
  • More conceptually, 2’s compliment of positive number obtained by subtracting that from 2n. For example, for 8 bits, “-3″ is gotten by subtracting “3″ (00000011) from 28 (100000000) to get (11111101).
  • One interesting side effect of 2’s complement is that the smallest negative number is a weird number – it violates the aforementioned algorithm to flip the bit, which all other numbers adhere to. This leads to an exciting “feature” of Math.abs(), where it’s possible for that method to return a negative number. For example, using the smallest negative int, Math.abs(-2147483648) returns -2147483648!