Archive for October 2008

Kyle Cordes presentation on Git tomorrow at St. Louis JUG

Just a quick post – tomorrow evening at the St. Louis Java User’s Group, Kyle Cordes will be giving a presentation on Git. I’m looking forward to this both because Kyle is a very good and accomplished speaker on technical topics, and also because I’m eager to learn more about this distributed source control tool. Hope to see you there!

Caching the Hot Stuff with Terracotta

As I’ve been blogging about recently, we have been developing an exam-taking web application at Terracotta to demonstrate the Session Clustering capabilities of Terracotta. Since one of the requirements of this web app is that we support 40,000 concurrent users, we thought we’d better cache the hottest exams (using Ehcache) rather than fetch them from Hibernate each time. Since modifying an exam should occur far, far less frequently than taking an Exam, and since Terracotta already supports Ehcache, this was a no-brainer.

There is an ExamService service, configured as a Spring bean, with DaoExamService being the default implementation:

The findById method is expected to be the most frequently-invoked method. The other two methods shown are administrative functionality, not expected to be used frequently. We want to cache the results of all three methods in a single, clustered Exam cache.

My approach was to write a new service, CachingWrapperExamService, a proxy which owned the cache and which delegated to another ExamService instance:

The following were straightforward changes:

  • Once again, our use of interfaces have reaped dividends: Introducing this new Caching ExamService was as easy as tweaking the Spring XML file – the change was completely transparent to all users of the ExamService bean. It also made unit testing easy, as I could create a mock ExamService to test caching with.
  • The Maven pom.xml had to be changed to note that Ehcache is now a compile-time dependency, not just a runtime dependency.
  • The Terracotta config file tc-config.xml did not have to be modified, as we were already using the Ehcache TIM, and so our CacheManager was automatically clustered.

And just like that, we have a clustered cache of the hottest exams being used.

ORM can lead to inflexibility; Terracotta can help

Okay, granted, I’m biased, I work for Terracotta. Be that as it may, I’d like to share some experiences my teammates and I have had using Hibernate recently while developing a web app.

First, some brief background. We are developing a “reference” web app at Terracotta to use to promote and explore the Sessions Clustering Use Case which we are working to nail. The app is an online exam-taking application, with the goal of supporting 40,000 concurrent users. I’ve blogged about this before, and you can read about the technology stack we settled on. Development has been done primarily by myself and my teammates Geert Bevin, Abhishek Sanoujam, and our supervisor Alex Miller.

Hibernate is wonderful, and it is an integral part of our web app. It feels to me like we got moving pretty quickly using Hibernate for persistence of our domain objects. For ORM, it’s unbeatable.

But the thing I noticed is, there’s just no avoiding the fact that whatever your domain POJO’s are that need to be persisted, chances are good that the use of ORM will impose some constraints on how you must write those POJO’s. I have two examples of this to share.

Example One – Generics

First, we have an exam Section class which, conceptually, is a container for either multiple sub sections, or Questions, but not both. The ideal solution would be to define Section as this (JPA annotations omitted):

where TestContent is an interface implemented by both Section and Question. Thus, an instance of Section could be declared as having type of either Section

or Section, which satisfies our constraint.

However, at runtime (when starting Tomcat), Hibernate (the JPA provider) threw an exception pointing out that Section had an unbounded type (or something like that). After a little digging around on the internet, I found a forum where someone explained that an Entity cannot have a generic type, because it’s not known until instantiation time what the linked Entity will be.

Therefore I had to compromise. I modified Section, removing it’s generic type and adding two explicit collections, one for Questions, one for Sections.

This is less than ideal because the Section API itself doesn’t naturally prevent a single Section instance from having both sub-sections and questions, even though we don’t want to allow this.

Example Two – complex object tree

Similarly, for my other example, one of the constraints is that a Question must have exactly one correct choice (from among it’s two or more choices). So our first inclination was to structure the Question class thusly:

But this caused problems when saving an edited Exam which had had a Question added to it. I no longer have the stack trace handy, but the gist of the Hibernate exception was that a transient (unsaved) object was detected in the object graph being merged (updated).

Alex and I dug in and finally examined the generated database schema. What we saw was that the QUESTION table had a CORRECT_CHOICE column which was a foreign key into another table, QUESTION_CHOICE I think it was. Alex and I theorized that there was a possible ordering problem in updating an Exam with a new Question and Choices – what if Hibernate attempted to set the CORRECT_CHOICE foreign key before inserting the new choices for the question?

I’m not 100% positive that’s the correct explanation, but in any case Alex made the executive decision to simplify our domain model and not spend any more time debugging. We added a boolean “isCorrect” property to Choice, and removed the “correctChoice” reference from Question:

Problem solved – we no longer got the Hibernate exception. But, as Abhishek pointed out, our domain objects no longer enforced the constraint that a question could have only one correct choice. With the updated classes, nothing would prevent instantiating a question with multiple choices marked as correct. This put the burden on additional validation code to enforce this constraint, and overall is just less than ideal.

How Terracotta Can Help

The point I am agonizingly slowly building to is, I think it’s acceptable to have these constraints on our persistent domain objects, but only on the ones that should be persisted. An anti-pattern that we at Terracotta have seen again and again is the misuse of the database and ORM to persist state that really does not belong in the System of Record, but rather is transient state that must be persisted only to scale applications by keeping the applications stateless. One of the Terracotta co-founders, Orion, coined the term “State Monster” to describe this abuse of the db, and recently Wille Faler wrote a very good blog describing this.

Terracotta can help by providing an alternative to making apps stateless for scalability purposes. With Terracotta, go ahead and write your application in the most natural way, including shared state that is only transient. Consider this helpful graph about data lifetimes when deciding what state belongs in the SOR and what state is merely transient or pending. Then, use Terracotta to both cluster and persist the POJOs that don’t belong in the SOR. The advantage is that Terracotta does not impose any constraints on the API of the sorts I have written about here – generics are fine, arbitrarily complex object graphs are no problem. Terracotta clustered objects don’t even have to implement Serializable.

Me Meme

unflattering

Via Mario

  1. Take a picture of yourself right now.
  2. Don’t change your clothes, don’t fix your hair…just take a picture.
  3. Post that picture with NO editing.
  4. Post these instructions with your picture.