Hacking on java.lang.String
This week at Terracotta we accomplished something that two weeks ago I thought was both impossible and dangerous - we instrumented java.lang.String to be compressible.
What, you ask, the heck is going on? String doesn’t implement any interface called JavaLangString. It doesn’t have a __tc_decompress() method. String is final and immutable! It has to be for thread-safety! Are you mad?
I offered these same objections to my boss two weeks ago, but the fact is that something like the above code will be in an upcoming Terracotta release. What makes this all possible is that we instrument Java bytecode on the fly.
Terracotta already does some large String compression when clustering Strings across cluster nodes, but we wanted to improve on that. What if, when we decoded a compressed String data from across the cluster and constructed a String instance on a remote node, we actually kept the String instance compressed until it absolutely needed to decompress to be read?
Through the voodoo black magic of bytecode instrumentation, we have accomplished this and made it completely transparent to the application’s use of String. Here’s a basic outline of what happens:
- data about clustered compressed String is sent over the wire (we call it “hydration”) and decoded at one of the cluster nodes. The following data is encoded:
- actual compressed String data, in byte[] array form
- uncompressed String length (int)
- String hashcode (int)
- when decoded, the compressed byte[] array is encoded into a char[] array - basically two bytes can fit into a single char.
- a String instance is constructed with that char[] array, and also the uncompressed length and original hashcode
So, if the resulting String were actually read, it would be gibberish (if it were displayable at all). (By “read” I mean, if it’s internal character content needed to be accessed.) That’s why we need to instrument the String class - if and when the String needs to be read, we have instrumented it so that it knows how to decompress itself. We have basically added a new private boolean field (indicating if compressed or not) a new constructor and some additional methods.
There’s a lot going on here so l’ll point a few things out. First, we intercept any field-level access of the private internal char[] value and we route it through this __tc_getvalue() getter method. This is how we transparently decompress the contents no matter how the String instance is accessed.
Secondly, there are important concurrency issues in play here. String is thread-safe because it is immutable, or was until we got our grubby mits on it. We need for String to remain both thread-safe and also highly concurrent, so we wanted to avoid synchronized blocks. Our solution?
We have instrumented value to be volatile, as well as our new $__tc_compressed field. Now if you once more examine the __tc_getvalue() method above, you’ll see that there is a benign race condition. It’s possible two threads could both decompress and set the value field. That’s fine, since the decompression is deterministic and outputs the same result each time. Because the fields are volatile, once they are set those changes should be visible to all other threads. What should not ever happen is, no thread should attempt to decompress the characters once they are already decompressed - the call to StringCompressionUtil inspects the data to see if it has already been decompressed, and if it has it returns null.
The above methods don’t exist anywhere in source code per se. Rather, we use ASM to add the bytecode for these new methods dynamically to the String class as it is loaded at runtime. So what we actually have somewhere is code that looks like this:
This creates the bytecode to implement the __tc_decompress() method.
When does an instrumented String instance need to decompress itself? Basically, when it’s internal char[] array needs to be accessed. It does not need to decompress when length() method is called, because we have set the uncompressed length when we constructed the String. It does not need to decompress when hashCode() is called, since we have preset the hash code, so that means a compressed String can sit around in a Map all day long without needing to decompress. We even have plans to muck with the equals() method - if two Strings are unequal due to hash code we can exit early and avoid decompressing them to compare character content.
Let’s look once more at the code way at the top:
Because we have instrumented String, at runtime instances of String do implement the interface JavaLangString. So, within Terracotta we can cast an instance of String to JavaLangString, which is an interface containing the __tc_decompress() method. In this way we avoid using reflection. One interesting thing to note is that we first have to upcast to Object to trick the compiler - at compile time the compiler knows (or thinks it knows) that String does not implement this interface, so it errors.
In conclusion, a bunch of us were talking after Alex’s Terracotta talk last night, and someone made the point that, with any good technology, your users will inevitably take advantage of it in a way that you never dreamed of. And that, I think, is what Terracotta is doing.
php code and scripts » Blog Archive » Hacking on java.lang.String:
[…] unknown wrote an interesting post today onHere’s a quick excerptWhat makes this all possible is that we instrument Java bytecode on the fly. Terracotta already does some large String compression when clustering Strings across cluster nodes, but we wanted to improve on that. … […]
14 March 2008, 3:07 pmAlex Miller:
One gotcha on the IBM JDK is that they named the hash field hashCode instead so we had to work around that. I also noticed that they actually do have fail-fast code in their equals() method that checks the hash code. (Sun does not.)
14 March 2008, 8:01 pmGuy:
Can you extend more on how you were able to instrument java.lang.String?
15 March 2008, 4:33 pmScott:
@Guy - as soon as time permits I’d like to write a whole nother post about just what Terracotta’s dso-java process does, including how instrumentation happens during Class loading. Stay tuned
15 March 2008, 9:07 pmGuy:
Thanks that would be great, I’m trying to build a monitoring system that required this capability.
16 March 2008, 12:12 amAnd I couldn’t find a way to instrument the classes that were loaded before my javaagent.
Tim:
I’ve looked at String compression before to speed up network communications. It seems that unless the network is a significant bottleneck, the added cost of CPU cycles invalidates the benefits gained by the compression.
In short, doesn’t compressing kill your scalability due to CPU loads? The network is not usually the bottleneck.
18 March 2008, 4:24 pmSteve:
The short answer is depends.
If one is using large strings then it doesn’t take long before the network is the bottleneck. Also, having the strings compressed means less processing by the server, less GC on the server and less data to write to disk. Another advantage of doing the compression on the client is that actually, the client is rarely CPU bound and has horsepower to spare. Another cool thing about this approach is that nodes that receive the compressed strings may never have to decompress them or not have to decompress them for some time saving considerable memory. Keep in mind that it is important to only compress strings over a certain size and that size is configurable. In the future we may also do tricks like proxifying strings.
18 March 2008, 5:34 pmScott:
@Tim - good points, and it would probably take me (or better yet one of my smarter teammates) at least an entire post to properly address them all. But a couple things: the optimizations I’m talking about in this blog are more concerned with conserving heap space on a cluster (L1) node. Imagine a clustered Map cache with large String values (say, XML documents). The heaps of all L1 nodes are impacted by a large String cache. With these optimizations, even though each L1 node takes a CPU hit each time it compresses a String, the tradeoff is less heap usage for all L1 nodes (assuming each L1 nodes doesn’t need to access every single value in the clustered cache). And of course this is all configurable - you can turn String compression on/off and set the String size at which it kicks in. So you could say we’re tackling use cases where the L1 heap is the bottleneck. It’s definitely a balancing act, and one that we are continuing to test and profile.
18 March 2008, 5:53 pmPeter Lawrey:
I like ASM and the idea of dynamic instrumentation however I believe in this case a simpler approach would be to just create a customised version of the String class and use that.
The linked example compresses Strings longer than 1024 characters.
http://www.freshvanilla.org:8080/display/www/Compressing+java.lang.String
The included unit test outputs
19 March 2008, 4:08 pmString of length=38890 uses 18527 bytes.
Scott:
@Peter - thanks for the comment and link. Yes, yours is a much more straightforward way to accomplish the same thing. At Terracotta it made sense to use ASM since we are already doing quite a lot of dynamic bytecode instrumentation (we have to in order to cluster any arbitrary client code) and have a framework for it. ASM makes it straightforward to, say, intercept field-level reads of “value” char[] field and route them through our __tc_getvalue() method, although you can certainly code that by hand. Also, our dynamic bootjar generation shields the users from slight variations of String source code in sun, ibm and azul JDKs.
19 March 2008, 10:20 pmpeter lawrey:
I take your point re: different JDKs. You could have changes between versions of the same JDK.
The approach I have taken to writing ASM code is to write basically what I want in Java, use the ASMifier to dump the code required and generalise it.
{: Its occurred to me you could override Object to change the behaviour of primitive arrays. :}
20 March 2008, 3:37 amSteve:
Interesting and creative idea but I don’t think it would work. I think arrays rely on special bytecode instructions for setting and getting it’s indexed fields instead of methods.
20 March 2008, 5:53 am