SpiderLogic.com

SpiderLogic.com

O/R Mapping with Castor JDO in the Real World

David Colwell

12/09/2002

Castor JDO is an open source object-to-relational binding framework for Java™. Using XML mapping, an object query language, and a simple API it bridges Java objects with relational databases. Castor JDO has been available and improving for over three years and is available at http://castor.exolab.org. This article is a review of what Castor JDO is and what it isn't and describes some best practice usage in a multi-tier environment. Many articles already exist that go into detail on using Castor for simple applications scenarios. I attempt to describe the challenges and solutions I encountered using Castor on a real world application with complex data structures.

O/R mapping is fairly well understood by the software development community. This includes not only mapping from object attributes to fields in database tables but runtime issues such as concurrency and lazy loading. However, to date, most solutions either have a hefty price tag or require significant development investment. Toplink and other pricey tool sets provide a friendly IDE type environment with visual mapping tools. Enterprise Java Bean container managed persistence defines a standard, which is implemented by a number of vendors, but marries persistence to the EJB environment. The new Sun JDO specification appears to solve the same issues as Castor JDO. However, it is young and being a direct competitor to CMP it's not yet clear how well it will be supported.

Brief Look at Castor Basics

The first thing to recognize about Castor JDO is that it is not based on the Sun JDO specification. It was around before Sun JDO was even a JSR. As an open source project, Castor's success comes largely from the developer community that uses it. Subscribing to the developer mailing list typically yields over twenty postings daily from newbie issues to in-depth discussions with the main contributors. The website documentation describes simple case use of the API, XML mapping, and object query language. However, there is nothing to be found describing Castor's implementation and in-code documentation is sparse. Any sort of real application development quickly crosses issues that are best solved by searching the list archives. Currently, there are only two main contributors who acknowledge many items on the project wish list and often put a good-humored request out to the community for help.

Castor is designed around the following simple scenario:

In this situation, Castor works very well. The XML based mapping is straightforward in describing the mapping from object attributes to table values.

For example:

<mapping >
<class name="com.myproject.BusinessAccount">
    <map-to table="Account">
    <field name="id" type="string" >
      <sql name="id" type="char"/>

    </field>
    <field name="mainContact" type="com.myproject.Contact">
      <sql name="mainContactId" />
    </field>
</class>
</mapping>

This describes the mapping from a BusinessAccount class to an Account table, the conversion of the id field, and a relationship with a main contact of class Contact.

The object query language is based on a subset of the ODMG 3.0 OQL syntax specification. It allows the developer to describe a query using object relationships and field comparisons and derives the SQL joins and other clauses behind the scenes. For example:

SELECT a FROM com.myproject.Account a WHERE a.mainContact.name = $1

A value would be bound to the $1 variable and executed to return a Collection of results.

Castor does not require any special inheritance or interface implementation for Java objects to be persistable. Nor does it do any sort of post compiling. All objects are considered by Castor transactions to be in one of two states, transient or persistent. Castor does not recognize transient objects. Persistence objects are those which have been queried or created through Castor in the current transaction. Only persistence objects are considered for the commit and they become transient after.

The Castor project also defines an XML data binding framework that fits in nicely with the Java to relational database mapping. Our project has the need to asynchronously stream data from limited connection application to a backend server. The XML binding used in conjunction with JMS messaging has proven to be a good solution.

Most enterprise applications will have the need to query objects on a server, send these objects somewhere for processing by something like a client or other server, send them back and update the database. Of course, you won't want to leave a database transaction open during this sort of processing so Castor defines an update method to support what it calls a long transaction. Objects that were persistable in previous transactions can be marked for update in another transaction. Castor maintains a configurable cache of objects that pass through its doors and is able to recognize incoming update objects by using the Object.equals method.

Any object that participates in an update is required to implement the Timestampable interface which simply has get and set accessor methods for a timestamp. This allows Castor to enforce optimistic locking. Using a cache of known objects, it recalls the timestamp on an object the last time it passed through and will throw an exception if the updating object's timestamp does not match.

Castor in the Real World

Castor holds up to its promises in simple testing and trial runs. However, it has proven to fall short in some practical issues with our application of about twenty-five data classes and as many tables. Most of our problems come from the need to hold onto objects across transactions and perform complex updates. The following describes some of the issues we encountered and our current solutions.

When creating and updating objects, Castor can be told to attempt to iterate through all child objects reachable from the root object and call create or update on each. Call this a deep update/create. A shallow update/create is one which persists only the root object. In our initial approach to Castor, we were excited to use the deep update functionality. As with most n-tier applications we need to load a bunch of data, present it to the user, allow the user to update it or create more data, and save it out to the database. This worked fine in initial testing but quickly proved challenging in real development.

Castor has problems when a single value is represented by multiple object instances in the same object tree update. It will return a reference to the same object every time it is queried during a single transaction. However, if the same data record is retrieved multiple times from different transactions you will get multiple object instances. It is therefore possible, and not uncommon, to have an object graph with more than one object instance referencing the same data record. For example, you might have a table representing the fifty US States. Address records may then reference the State table. Consider an Employee class with both a home Address and a work Address. Both Address fields have reference to a State object. If they are loaded or populated and reference the same State, but from different transactions, Castor will throw a DuplicateIdentityException when updating the Employee. The deep update algorithm iterates through the object graph and marks individual objects for update during the commit phase. Castor fails by using the == operator for comparison and doesn't recognize the two objects as representing the same data until it's too late. Our solution was to implement a globally unique hashCode on all value objects and modify the Castor code to use this for comparison in addition to the == operator. Email me if you'd like the modified Castor code. We plan to give it back to Castor as a bug fix.

Castor Exceptions are vague and some incorrectly describe the situation. For example, attempting to update an object which has been bumped off the end of Castor's limited size cache of known objects yields an ObjectModifiedException complaining of a timestamp mismatch. This is a perplexing message, especially when it doesn't include the class type or data details it's upset about. Although frustrating, I should mention my appreciation for open source software. This type of issue would be much more difficult to debug without going into the Castor code with a debugger. We modified the source of the exception to include more information and incorporated it into our regular build.

Castor defines a logging mechanism that can hook into Log4J or stream to the console. It will output information for every action it takes and describe the derived SQL code. However, even valid exceptions such as those describing concurrency issues are difficult to track down. Our class dependency tree runs deep and, given some many-to-many relationships, updates can affect hundreds of objects. Stepping into code and looking at debug logging is tedious at best leaves one wanting more to go on.

Given the many issues with deep update functionality we decided to use Castor for shallow updates with deep update functionality driven by our own manager classes. Every Castor mapped data object in our system has a corresponding manager class. We also have a manager super class with common functionality such as simple exists, save, load, and create that works on single objects. A manager subclass such as the EmployeeManager is able to then define deep updates for specific scenarios. For example, EmployeeManager can define a deep Employee update by calling update on the passed in Employee object and call update on the AddressManager for both the home Address and business Address objects. This approach takes back control of the update environment, getting rid of most of the previously mentioned issues. Those that remain are valid development issues that are much more approachable and can be quickly solved using a debugger and stepping into the code.

A hurdle you'll face with shallow updates/creates is that Castor will not persist foreign keys to data objects that are not persistable at the time of commit. Instead, it writes out null. Recall that an object is only considered persistable if it was created/loaded/updated during the current transaction. For example, consider an Employee table with an AddressId field that is a foreign key to an Address table. If I load an Employee during one transaction and call update on the object in another without making the Address persistable, the Employee record will be written out with null in the AddressId. The solution is to call update on both the Employee and the Address objects in the second transaction. This alerts Castor to mark both for consideration during the commit phase. We define a template method named markRelationshipsForPersistence in the base manager class. The base update and create methods are coded to call this during their transactions. This gives manager classes such as EmployeeManager an opportunity to mark object references for persistence by calling their update methods. This will not hit the database unless the object is recognized as dirty. A better solution might be to delve into the Castor code itself and modify it to look for all referenced, but non-persistent objects in its cache of updateable objects.

Lazy loading is still immature in Castor. In the XML mapping file, you can currently mark Collections for lazy loading. At runtime, during a single transaction, the lazy loaded collection can be iterated over, loading objects only as they are requested. However, Castor fails if you attempt to load the collection from a second transaction. This has limited benefit given that our environment needs to lazy load collections to be considered at the user's leisure. Again, we don't want to leave a single transaction open for too long. Our solution to this is to roll our own lazy loading using Castor's lazy loading as a base. We have need to map the relationship to collections of objects for our XML binding needs, but don't want these to be loaded from the database until requested. Therefore, we mark the collections as lazy so they're not initially loaded. We define special accessor methods on the data objects for the collections that know how to go back to Castor, re-load the parent object and load the collection to be returned.

Our team sponsors an ongoing open source project called Arch4J, which is our attempt at implementing all sorts of best practice abstractions for enterprise architecture. For our use of Castor, we abstracted out an access layer so that there is no direct dependency from application code. This way it won't be too painful to swap in something else if we decide to down the road. The interface is defined in org.arch4j.persistence with Castor specific implementations in org.arch4j.persistence.castor.

Conclusion

In conclusion, Castor works well under the simple situations for which it was designed. It falls short in a real application environment involving complex data structures which may be updated from multiple interactions. Its benefits come from the convenient XML mapping that hides SQL and defines structure around the XML binding. With some insight to workarounds and a bit of extra development effort you can take advantage of what Castor has to offer.

In a perfect world I'd be independently wealthy and have boatloads of free time to spend fixing these issues and submitting patches to Castor instead of complaining. But for the time being I must search out the best value for our clients. A popular saying claims “open source software is only free if your time is worthless.” Although Castor is a better alternative to growing your own O/R mapping framework, one must seriously consider it's true cost and what that buys you.

Comments...