JPA implementation patterns

Sky-Tiger · 发表于 2009-7-16 22:40

This blog assumes that you are familiar with the Order/OrderLine example I introduced in the first two blogs of this series. If you are not, please review the example.

Consider the following code:

OrderLine orderLineToRemove = orderLineDao.findById(30);
orderLineToRemove.setOrder(null);

The intention of this code is to unassociate the OrderLine with the Order it was previously associated with. You might imagine doing this prior to removing the OrderLine object (although you can also use the @PreRemove annotation to have this done automatically) or when you want to attach the OrderLine to a different Order entity.

If you run this code you will find that the following entities will be loaded:

1. The OrderLine with id 30.
2. The Order associated with the OrderLine. This happens because the OrderLine.setOrder method invokes the Order.internalRemoveOrderLine method to remove the OrderLine from its parent Order object.
3. All the other OrderLines that are associated with that Order! The Order.orderLines set is loaded when the OrderLine object with id 30 is removed from it.

Depending on the JPA provider, this might take two to three queries. Hibernate uses three queries; one for each of the lines mentioned here. OpenJPA only needs two queries because it loads the Order with the OrderLine using an outer join.

However the interesting thing here is that all OrderLine entities are loaded. If there are a lot of OrderLines this can be a very expensive operation. Because this happens even when you are not interested in the actual contents of that collection, you are be paying a high price for keeping the bidirectional associations intact.

Sky-Tiger · 发表于 2009-7-16 22:40

So far I have discovered three different solutions to this problem:

1. Don't make the association bidirectional; only keep the reference from the child object to the parent object. In this case that would mean removing the orderLines set in the Order object. To retrieve the OrderLines that go with an Order you would invoke the findOrderLinesByOrder method on the OrderLineDao. Since that would retrieve all the child objects and we got into this problem because there are a lot of those, you would need to write more specific queries to find the subset of child objects you need. A disadvantage of this approach is that it means an Order can't access its OrderLines without having to go through a service layer (a problem we will address in a later blog) or getting them passed in by a calling method.
2. Use the Hibernate specific @LazyCollection annotation to cause a collection to be loaded "extra lazily" like so:

   @OneToMany(mappedBy = "order", cascade = CascadeType.PERSIST, fetch = FetchType.LAZY)
   @org.hibernate.annotations.LazyCollection(org.hibernate.annotations.LazyCollectionOption.EXTRA)
   public Set<OrderLine> orderLines = new HashSet<OrderLine>();

   This feature should cause Hibernate to be able to handle very large collections. For example, when you request the size of the set, Hibernate won't load all the elements of the collections. Instead it will execute a SELECT COUNT(*) FROM ... query. But even more interestingly: modifications to the collection are queued instead of being directly applied. If any modifications are pending when the collection is accessed, the session is flushed before further work is done.

   While this works fine for the size() method, it doesn't work when you try and iterate over the elements of the set (see JIRA issue HHH-2087 which has been open for two and a half years). The extra lazy loading of the size also has at least two open bugs: HHH-1491 and HHH-3319. All this leads me to believe the extra lazy loading feature of Hibernate is a nice idea but not fully mature (yet?).
3. Inspired by the Hibernate mechanism of postponing operations on the collection until you really need them to be executed, I have modified the Order class to do something similar. First an operation queue has been added as a transient field to the Order class:

   private transient Queue<Runnable> queuedOperations = new LinkedList<Runnable>();

   Then the internalAddOrderLine and internalRemoveOrderLine methods have been changed so that they do not directly modify the orderLines set. Instead they create an instance of the appropriate subclass of the QueuedOperation class. That instance is initialized with the OrderLine object to add or remove and then placed on the queuesOperations queue:

   public void internalAddOrderLine(final OrderLine line) {
      queuedOperations.offer(new Runnable() {
      public void run() { orderLines.add(line); }
      });
   }

   public void internalRemoveOrderLine(final OrderLine line) {
      queuedOperations.offer(new Runnable() {
      public void run() { orderLines.remove(line); }
      });
   }

   Finally the getOrderLines method is changed so that it executes any queued operations before returning the set:

   public Set<? extends OrderLine> getOrderLines() {
      executeQueuedOperations();
      return Collections.unmodifiableSet(orderLines);
   }

   private void executeQueuedOperations() {
      for (;

{
      Runnable op = queuedOperations.poll();
      if (op == null)
      break;
      op.run();
      }
   }

   If there were more methods that need the set to be fully up to date, they would invoke the executeQueuedOperations method in a similar manner.

   The downside here is that your domain objects get cluttered with even more "link management code" than we already had managing bidirectional associations. Abstracting out this logic to a separate class is left as an exercise for the reader. ;-)

Of course this problem not only occurs when you have bidirectional associations. It surfaces any time you are manipulating large collections mapped with @OneToMany or @ManyToMany. Bidirectional associations just makes the cause less obvious because you think you are only manipulating a single entity.

Sky-Tiger · 发表于 2009-7-16 22:40

The default way in JPA for primary keys is to use the @GeneratedValue annotation with the strategy attribute set to one of AUTO, IDENTITY, SEQUENCE, or TABLE. You pick the most appropriate strategy for your situation and that's it.
But you can also choose to generate the primary key yourself.

Using UUIDs for this is ideal and has some great benefits. In our current project we've used this strategy by creating an abstract base class our entities to inherit from which takes care of dealing with primary keys.

@MappedSuperclass
public abstract class AbstractBaseEntity implements Serializable {
private static final long serialVersionUID = 1L;

@Id
private String id;

public AbstractBaseEntity() {
this.id = UUID.randomUUID().toString();
}

@Override
public int hashCode() {
return id.hashCode();
}

@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (!(obj instanceof AbstractBaseEntity)) {
return false;
}
AbstractBaseEntity other = (AbstractBaseEntity) obj;
return getId().equals(other.getId());
}
}

Using UUIDs versus sequences in general has been widely discussed on the web, so I won't go into too much detail here. But here are some pro's and con's:

Sky-Tiger · 发表于 2009-7-16 22:40

Pros

* Write this base class once and every entity gets an Id for free. If you implement equals and hashcode as above you also throw that one in as a bonus.
* UUID are Universal Unique(what's in a name). This means that you get great flexibility if you need to copy/merge records from location a to b without having to re-generate keys. (or use complex sequence strategies).
* UUIDs are not guessable. This means it can be safer to expose them to the outside world in let's say urls. If this would be good practice is another thing.

Cons

* Performance can be an issue. See http://johannburkard.de/blog/pro ... ators-compared.html, Some databases don't perform well with UUIDs (at least when they are stored as strings) for indexes.
* Ordering, Sorting. Speaks for itself.
* JPA specific: you cannot test if a record has already been persisted by checking if the Id field has been set. One might argue if you need such checks anyway.

Conclusion

UUIDs are easy to use, but wouldn't it be nice if the JPA spec would open up to include UUID as a strategy? Some JPA implementations, like Hibernate, already have support for this:

@Id @GeneratedValue(generator="system-uuid")
@GenericGenerator(name="system-uuid",
strategy = "uuid")

Sky-Tiger · 发表于 2009-7-16 22:40

he JPA specification allows two ways for the persistence provider to access the persistent state of an entity. The persistence provider can either invoke JavaBeans style property accessors (getters and setters) or access the instance fields of the entity directly. Which method is used depends on whether you have annotated the properties or the fields of an entity.

The JPA 1.0 specification does not allow you to mix access types within an entity or even an entity hierarchy. If you have annotated both fields and properties, the behaviour is undefined. The JPA 2.0 specification has the @Access annotation that makes it possible mix access types within an entity or entity hierarchy.

But the interesting question remains; which access type to use? A question that has been discussed before, but one I couldn't resist commenting on too. ;-)

* Encapsulation - Property access is said to provide better encapsulation, because directly accessing fields is bad, right? Well actually, using property access obliges you to write getters and setters for all your persistent properties. These methods not only allow the JPA provider to set the fields, they also allow any other user of your entity class to do this! Using field access allows you to write only the getters and setters you want to (they're evil, remember?) and also write them as you want them, for example by doing validation in your setters or some calculation in your getters. In contrast, making these methods smarter when using property access is just asking for trouble.
* Performance - Some people prefer field access because it supposedly offers better performance than property access. But that is a very bad reason to choose field access. Modern optimizing JVMs will make property access perform just as fast as field access and in any case database calls are orders of magnitude slower than either field access or method invocations.
* Lazy loading in Hibernate - Hibernate's lazy loading implementation always initializes a lazy proxy when any method on that proxy is invoked. The only exception to this is the method annotated with the @Id annotation when you use property access. But when you use field access there is no such method and Hibernate initializes the proxy even when invoking the method that returns the identity of the entity. While some propose to use property access until this bug is fixed, I am not in favour of basing design decisions on framework bugs. If this bug really hurts your performance you might want to try and get the id of entity with the following code:


   Serializable id = ((HibernateProxy) entity).getHibernateLazyInitializer().getIdentifier()


   It's nasty, but at least this code will be localized to where you really need it.
* Field access in Hibernate - It is good to know that while field access is OK for Hibernate to populate your entities, your code should still access those values through methods. Otherwise you will fall into the first of the Hibernate proxy pitfalls mentioned by my colleague Maarten Winkels.

Sky-Tiger · 发表于 2009-7-16 22:41

JPA provides three ways to map Java inheritance hierarchies to database tables:

1. InheritanceType.SINGLE_TABLE - The whole inheritance hierarchy is mapped to one table. An object is stored in exactly one row in that table and the discriminator value stored in the discriminator column specifies the type of the object. Any fields not used in a superclass or a different branch of the hierarchy are set to NULL. This is the default inheritance mapping strategy used by JPA.
2. InheritanceType.TABLE_PER_CLASS - Every concrete entity class in the hierarchy is mapped to a separate table. An object is stored in exactly one row in the specific table for its type. That specific table contains column for all the fields of the concrete class, including any inherited fields. This means that siblings in an inheritance hierarchy will each have their own copy of the fields they inherit from their superclass. A UNION of the separate tables is performed when querying on the superclass.
3. InheritanceType.JOINED - Every class in the hierarchy is represented as a separate table, causing no field duplication to occur. An object is stored spread out over multiple tables; one row in each of the tables that make up its class inheritance hierarchy. The is-a relation between a subclass and its superclass is represented as a foreign key relation from the "subtable" to the "supertable" and the mapped tables are JOINed to load all the fields of an entity.

A nice comparison of the JPA inheritance mapping options with pictures, and including a description of the @MappedSuperclass option, can be found in the DataNucleus documentation.

Now the interesting question is: which method works best in what circumstances?

Sky-Tiger · 发表于 2009-7-16 22:41

SINGLE_TABLE - Single table per class hierarchy

The SINGLE_TABLE strategy has the advantage of being simple. Loading entities requires querying only one table, with the discriminator column being used to determine the type of the entity. This simplicity also helps when manually inspecting or modifying the entities stored in the database.

A disadvantage of this strategy is that the single table becomes very large when there are a lot of classes in the hierarchy. Also, columns that are mapped to a subclass in the hierarchy should be nullable, which is especially annoying with large inheritance hierarchies. Finally, a change to any one class in the hierarchy requires the single table to be altered, making the SINGLE_TABLE strategy only suitable for small inheritance hierarchies.

Sky-Tiger · 发表于 2009-7-16 22:41

TABLE_PER_CLASS - Table per concrete class

The TABLE_PER_CLASS strategy does not require columns to be made nullable, and results in a database schema that is relatively simple to understand. As a result it is also easy to inspect or modify manually.

A downside is that polymorphically loading entities requires a UNION of all the mapped tables, which may impact performance. Finally, the duplication of column corresponding to superclass fields causes the database design to not be normalized. This makes it hard to perform aggregate (SQL) queries on the duplicated columns. As such this strategy is best suited to wide, but not deep, inheritance hierarchies in which the superclass fields are not ones you want to query on.

Sky-Tiger · 发表于 2009-7-16 22:41

JOINED - Table per class

The JOINED strategy gives you a nicely normalized database schema without any duplicate columns or unwanted nullable columns. As such it is best suited to large inheritance hierarchies, be the deep or wide.

This strategy does make the data harder to inspect or modify manually. Also, the JOIN operation needed to load entities can become a performance problem or a downright barrier to the size of your inheritance strategy. Also note that Hibernate does not correctly handle discriminator columns when using the JOINED strategy.

BTW, when using Hibernate proxies, be aware that lazily loading a class mapped with any of the three strategies above always returns a proxy that is an instanceof the superclass.

Sky-Tiger · 发表于 2009-7-16 22:41

Are those all the options?

So to summarize you could say the following rules apply when choosing from JPA's standard inheritance mapping options:

* Small inheritance hierarchy -> SINGLE_TABLE.
* Wide inheritance hierarchy -> TABLE_PER_CLASS.
* Deep inheritance hierarchy -> JOINED.

But what if your inheritance hierarchy is very wide or very deep? And what if the classes in your system are modified often? As we found while building a persisted command framework and a flexible CMDB for our Java EE deployment automation product Deployit, the concrete classes at the bottom of a large inheritance hierarchy can change often. So these two questions often get a positive answer at the same time. Luckily there is one solution to both problems!

JPA implementation patterns

浏览过的版块