Archive for the ‘Terracotta developer’ Category.

Weekly Summary - web app

It was a better week. As I said last week I was looking to cut down on distractions and improve my focus. I stuck to a morning routine this week: every day I got up around six, took a 30 minute walk on the beach while listening to Stackoverflow podcasts, then got showered and online by 7 to 7:15. I usually got over an hour of work done before anybody else even woke up. I kept twittering, e-mail and such to a minimum.

I’ve definitely enjoyed working at our condo here in Destin, FL. It was quite a novelty to work on a balcony overlooking the Gulf of Mexico, and I made sure to rub my teammates’ faces in it. But honestly by the end of the week I couldn’t stand to work out there with all the noise and glare. I spent most of my remaining time on the north balcony which overlooks a golf course and lake, and is much quieter.

This week I got started on the reference web app that we are working on as part of Terracotta’s strategy to really nail the clustered “user session” use case. Alex has blogged in detail about that strategy, and more recently on the technology stack we settled on for the web app. Geert Bevin, who is himself the author of the Rife web framework, had already spent a week or two learning Spring MVC and laying down some architectural groundwork. By the time I got started on Monday he was up and running and had stubbed out some basic pages.

My week was divided into two parts. Part one was two and a half painful days of just setting up my workspace. This involved upgrading to Eclipse 3.4 Ganymede, which includes the WTP that Geert was already using to launch the app in Tomcat. Then upgrading all of the plugins I needed. Then endlessly debugging because I got exception after exception when I tried to do seemingly anything.

When I was interviewing for Terracotta, one phrase I heard was “just in time learning”, and that was certainly true this week. Here is a probably-not-complete list of technologies that I’ve had to learn at least some of (or some more of) this week, quickly: Maven and it’s Eclipse plugin, Subversion and Subclipse plugin, WTP, Spring MVC, JPA annotations with Hibernate as ORM provider, SOJO (for JSON parsing in Java), Crosscheck (being evaluated for JavaScript unit testing in Java vm), jQuery, MySql, and probably some other stuff I’m forgetting.

During the second half of the week, everything started clicking and I was finally committing changes. I began working on the exam-creation page. Geert recommended that we approach the page by using JavaScript to allow the user to build up an entire exam and submit it as a single request, passing JSON to the Controller. I’m really excited about this because I’ve grown to like coding JavaScript. I’m almost disappointed that I’ll be on vacation this week. By week’s end I had the controller basically working, accepting and parsing JSON. But I did not get any JavaScript in place yet, although I spent a good bit of time reading up on jQuery and Crosscheck.

My Office This Week

I’m working remotely this week in Destin, FL. It’s rough, but that’s life working for Terracotta.

destin.jpg s100_1479.JPG

Weekly Summary

I struggled last week. It was just one of those weeks where nothing seemed to be easy. The week started off nicely with me adding some more functionality to our distributed cache testing framework, that’s a peripheral and relatively new code base that I’m pretty familiar and comfortable with by now. But after that things started queuing up and I couldn’t close anything out.

First, I was asked to take a look at a customer issue involving Serialization of Terracotta-instrumented ConcurrentHashMap instances. The instrumenting we do on that class breaks Serialization of CHM instances between non-Terracotta processes and Terracotta processes. I’d like to blog about this separately, and actually I think someone smarter than me could do a PhD dissertation on how Terracotta instruments CHM (there’s a lot of it) and why it’s necessary. So in two days I was really only able to evaluate the bug, get it reproduced in a test case, and decide on an approach, but was not able to actually make a fix. That was disappointing.

Next, I was asked to see about removing the JSR 107 dependency from our ehcache TIM. By this time it was later in the week and I was packing for our trip to Florida, and I only finally found what I think is the problem while en route to Florida without an internet connection.

Thirdly, a teammate and India and I have been trying to do some followup testing on the initial cache testing I last blogged about, but it has been going slower than we’d like due to various roadblocks too boring to mention. Just a little while ago I committed some more functionality to our distributed testing framework which will allow us to vary the number of Segments used in CHM instances when we test, and that should’ve been done last week.

This week I’m going to make a concerted effort to eliminate potential distractions, and to better manage all the various feeds that bombard me. IM has to be on pretty much all the time, so my coworkers can reach me, but otherwise this week I’m going to turn of Twitter, RSS feeds and all e-mail for long stretches at a time and get…things…done! I’m also going to go to bed early (soon) and try to get up very early, take a walk to wake myself up, and get to work early. I’d like to do that all week and see if it helps.

My family and I are in Destin, FL for two weeks. I’m going to work this week and take next week as vacation. I’m excited to be down here and have a change of scenery, but at the same time I’m surrounded by many more potential distractions.

Weekly Summary - Clustered Performance Testing

I haven’t been writing here as much as I’d like to, and truthfully there is a ton of stuff I want to write about! But it’s hard to make the time. Especially when, given extra time, I’d rather just keep working on what I’m doing :)

Over the last couple weeks I’ve been running a set of clustered performance tests using our homemade distributed testing framework, nicknamed “Droid”. I went into some detail about this in my last . The testing I did was to measure the cluster-wide throughput (transactions per second) given one, two, four and eight nodes in a cluster (not counting the Terracotta server as a node). We repeated these test using both ConcurrentHashMap and Ehcache as our distributed cache, and we repeated all of the tests with a new (2.6.2) and older (2.5.4) version of Terracotta.

I had a one-on-one with my boss Steve, also one of the co-founders and originally an engineer himself (now head of engineering). We had an interesting discussion about the testing results, and concurrent testing in general. He reminded me to always be aware of unexpected bottlenecks when testing, and always make sure you’re measuring what you think you’re measuring. For example, we designed the test so that none of the machines would be memory or CPU bound - but did I verify that that was in fact the case? Not really. I just set the jvm memory high enough and hoped for the best. We were really trying to get a feel for how Terracotta distributed lock contention would bog down linear scalability as more L1 nodes were added, so Steve’s point was that we don’t want other unexpected resource constraints to mar the measurements.

Late last week, continuing into this week, I started taking a first swipe at collecting cluster-wide statistics in Droid. We already have single-node statistics, but it would save us (primarily Alex) some time if the framework did the number crunching for us. Of course this means we have to use Terracotta and create another distributed object for doing such collecting and processing.

I also spent some time with a new engineer in India, Himadri, trying to get him started on Droid. His development machine runs Windows and I have a MBP, so there’s been some pain there. In particular, we ran into what turned out to be a known (but not by me) issue in our build process that occurs only in Windows.

In other news, so far I absolutely love working out of my home, but on Thursday two weeks ago I experienced the downside. My internet connection went out. Grrr. So I packed up and drove to my parents’ house, but it was out there, too. Stupid of me - my parents and I both have Charter cable, and it turned out that Charter had an area-wide outage that day. At the time I was very irritated - it’s so easy to hate Charter. I finally decided to go to McAlister’s Deli, which has free wifi. I started my working day at 11 o’clock that day. But it ended up being a great experience: the wifi worked fine, and McAlister’s has sweet tea, which to me is like crack cocaine. My boss Alex and I even ended up meeting there last week for a working lunch. Incidentally, Alex has DSL at his house, so chances are good that we won’t both have an outage at the same time.

This Friday I’m leaving with my family for Florida for two weeks. I’m going to work down there the first week.

Weekly Summary

It was a good week. I finally got all of the automated TC Spring tests to pass for Spring 2.5.4, so I was able to mark that issue done. Terracotta now clusters Spring 2.0.x through 2.5.x. That code base is due for a refactoring, though. Our code for clustering Spring uses AspectWerkz to define join points all over the Spring source code, not just the public API. What this means, as I’ve ranted about before, is that even minor changes to Spring’s source code (as occur even between minor releases such as 2.0.5 and 2.0.8) have broken our clustering code. What I’d like to do, when time permits, is see if we can rewrite our aspects to only use methods of the public Spring API as join points. That should give us a whole lot more stability.

My boss Alex is prepping me to help him do some more performance testing. He recently wrote some great blog entries about that here and here. We met with the product management team this week to brainstorm what sort of testing we want to do, what sort of data they might want to have from a marketing/sales perspective, etc. As Alex pointed out, it’s a tricky thing - this sort of testing always leads to finding bugs, which leads to bug fixes, which invalidates any prior testing and so you have to start over. Luckily, we already have a very capable distributed testing framework, developed in-house by Alex, in which we can pretty easily script tests with Groovy. We can have agents on multiple machines (i.e. L1 nodes, talking to a TC L2 server) and have the agents start workers to run tests. The agents can do things like kill and restart workers, to test having to repartition a distributed cache. Sounds like the first thing we’re going to measure is the load time and then the TPS (transactions per second) for a couple different kinds of distributed caches: ConcurrentHashMap and Ehcache.

We found out this week our next big company-wide gathering in San Francisco will be the week of Oct. 13-18. I’ve already book my flight and hotel room. I’m excited - these trips have so far been a lot of fun.

I did a phone interview for a candidate to join my team. Probably shouldn’t elaborate on that yet, but I will say that Terracotta is very thorough with candidates. When I interviewed back in January, I did five phone interviews, four of them with other engineers, before being invited to come out in person. When I did fly out, I was interviewed by another five people, including the CEO and CTO! Honestly, although it was exhausting, I had a great time! I loved being challenged by, and having conversations with, some very smart and talented people who have produced some amazing software.

New software this week: OmniGraffle, which I’ve heard from everyone is the only graphics editing software you need on a Mac. I’ve got a copy now which I will hopefully be using in the not-too-distant-future to write some more technical blog entries about Terracotta. Also, Alex encouraged us to try out FindBugs, including it’s Eclipse plugin here (update site). I’ve added both of these to my list of essential Mac software for the Terracotta developer.

Bash and TC Build Hacks I Learned in the Last Two Hours

There’s very good documentation about Terracotta’s in-house TC Build system already. But I’ve been doing some intense debugging with Hung, and have learned some things that I want to write down before I forget.

run without ivy: tcbuild blah blah --no-ivy - I’m assuming this runs faster because it skips using Ivy to check that all dependencies are in place.

run without compiling tcbuild --no-compile blah... when just shuffling some runtime dependency or something.

put environment stuff in .bashrc

check trunk/buildsystem to find things like jruby

For our automated container tests, individual jar files are placed in one huge WAR file. This is not true for ordinary unit tests.

Doing something like ./tcbuild check_one CustomScopedBeanTest --no-ivy > log.txt 2>&1 puts output in a file, and the last part redirects err stream to output stream.

Important shared stuff at /shares/terra/jdk/ such as Java, ant, etc

Grep trick 1: ps -ef | grep java to see details about Java processes running

Grep trick 2: env | grep JAVA to see environment variables I should have set up to run tcbuild

Grep trick 3: find <path> -name <filenamepattern> | xargs grep <searchstring> find all files matching filenamepattern that also have search string within them

find trick: rm -rf `find . -type d -name .svn` remove all .svn directories recursively

~/.tc/appserver is where tomcat is stored during automated tests - may want to remove as sanity check sometimes.

~/.ivy* is where ivy stuff is stored - may want to remove prior to doing total clean rebuild.

Weekly Summary - TC Spring again

This weekly summary actually encompasses the last three weeks. Sigh.

Lots of activity throughout dev is centered around the Terracotta 2.6 and 2.6.1 releases, as well as the upcoming 2.6.2 release.

Primarily I’ve been working on updating Terracotta’s Spring support to 2.5.x. Currently we only support up to 2.0.5. I had thought I had gotten it working up through Spring 2.0.8, but late last week we fixed a bug in our build process which then revealed three failing automated TC Spring tests which were previously (incorrectly) passing. So Spring 2.0.8 is not quite there…but close. Meanwhile, my compadre Nitin had made some changes that got TC working with Spring 2.5, but those changes are not backwards compatible to Spring 2.0.x, so I’m investigating whether they can be merged together somehow. Since we are dependent on the Spring source code in order to instrument their code (by using Aspectwerkz), we are subject to the whims of whatever source code changes occur between even minor releases (such as differences between Spring 2.0.5 and 2.0.8).

The other thing of note that I got accomplished was to respond to this post on our forums about a deadlock occurring in Terracotta L1. The poster had nicely laid it all out for us, with a stack trace excerpt clearly showing the deadlock. My teammates and I reviewed the pertinent class, and I cleaned up a number of synchronization bugs or missing synchronization. The deadlock itself was cleaned up by moving to a CopyOnWriteArrayList for a collection, which previously was being locked while iterating through it (read-only) and doing expensive stuff. The fix will be in 2.6.2 release.

I was without internet connection at my house a couple weeks ago for a few days. I had to do bloody battle with Charter to get that fixed. Ultimately a technician came and found that the line to my house had been put on a splitter at some undetermined point in the past, and so my signal strength was no longer strong enough. Meanwhile, luckily, I was able to go to my parents’ house and get some work done there. Have I mentioned that I love my MacBook Pro, and wireless internet?

Weekly Summary

Last week was shortened by jury duty on Monday. Fortunately, I was never selected from the pool, and on Tuesday I was back to work.

There are (still) a number of monkey failures (such as this one) that I need to get working on.

However, I discovered I could procrastinate tackling those by checking the forums. I decided to try to answer this post (which has since been addressed by a couple of my teammates). Almost two days later, I conceded that it’s really really hard to try to cluster the underlying javax.swing.text.AbstractDocument of a JTextField. I still haven’t got it. (See gkeim’s response for a clever workaround.)

I finished up the week by working on droid. One of my peers was having trouble running a test in which he wanted one of the spawned workers to have a different tc-config than all the others. There are some weird subtleties in passing vm arguments through the agent which are intended for the worker, but as it turned out, I believe the functionality is already there and didn’t require any changes on my part, just an explanation of how to do it.

Weekly Summary

I spent last week entirely on one issue: CDV-244 (and it’s duplicate CDV-736, which someone raised recently on a forum). I got a lot of coding done, but ultimately did not yet fix the issue. This week I need to put that aside and do some performance testing, my team owes marketing some comparative testing between versions 2.5 and 2.6 of Terracotta. So, it was another frustrating week of not actually finishing anything.

The issue itself is interesting. Serializing an object in the L1 which is not fully faulted into memory can fail if the serialization uses sun.misc.Unsafe.getObject(Object, long), which is a native method and which bypasses all of our bytecode instrumentation magic. Our solution, which is becoming somewhat of a pattern with native methods, is twofold: (1) use instrumentation to create a new wrapper method within Unsafe, called __tc_getObject, which does all of the necessary resolution of the Object arg if it is clustered, and (2) instrument any and all instrumented code which calls the original native method to call this instead. (This is similar to the way we tackled String.intern(), another native method.)

Our product manager Taylor wrote a nice simple TIM which reproduced the issue. I took that and modified it so that I could step through using Eclipse’s debugger. I also came up with one of our automated system tests which currently fails, and which I hope will prove the fix works, once the fix is done. At this point I’ve got a fix that I think should work, but the boot jar tool complains about the boot jar being invalid during startup. I’ve decompiled and tested the instrumented copy of Unsafe from within the boot jar and it is fine, so I’m not yet sure what the problem is.

Friday was the last day for my teammate Antonio, who has accepted a job with NASA. We are extremely sorry to see him go. He has been on the team for some time, he knows a lot, and has done tons of great coding. He is very smart, works hard and is a nice guy. Good luck in space, Antonio!

Weekly Summary - TC Spring

This week is already underway, but here’s what was up last week.

Mostly I worked on this issue: http://jira.terracotta.org/jira/browse/CDV-569. The gist is that Terracotta Spring has a bug when you are running as the root web app inside Tomcat - it didn’t properly parse the application name, or in this case use the reserved ROOT app name. (When running TC Spring as the root web app, within your tc-config.xml file, you would want to use ROOT, in all caps, as your application name: .)

I’ve still got a lot to learn about Terracotta Spring integration. It’s particularly hard to debug, because clustering of Spring beans happens through aspects (using Aspectwerkz mixins).

Incidentally, here are three good links to information about Terracotta Spring support.