Scalability and Performance of jBPM Workflow Engine in a JBoss Cluster

Sky-Tiger · 发表于 2009-3-12 22:36

Using JBoss TreeCache as Hibernate's 2nd level cache
Preparation of TreeCache for use in your application takes a few, rather lengthy steps. But fear not, after all it is not very complicated. First, you have to create a cache MBean and save its definition in a file named jboss-service.xml:

<mbean code="org.jboss.cache.TreeCache" name="jboss.cache:service=TreeCache">
  <depends>jboss:service=Naming</depends>
  <depends>jboss:service=TransactionManager</depends>

  
  <attribute name="TransactionManagerLookupClass">org.jboss.cache.JBossTransactionManagerLookup</attribute>

  
  <attribute name="NodeLockingScheme">OPTIMISTIC</attribute>

  
  <attribute name="IsolationLevel">REPEATABLE_READ</attribute>

  
  <attribute name="CacheMode">REPL_ASYNC</attribute>

  
  <attribute name="UseInterceptorMbeans">true</attribute>

  
  <attribute name="ClusterName">Cache-Cluster</attribute>

  <attribute name="ClusterConfig">
<config>
   
   
   <UDP mcast_addr="228.1.2.3" mcast_port="45566" ip_ttl="64" ip_mcast="true"
         mcast_send_buf_size="150000" mcast_recv_buf_size="80000" ucast_send_buf_size="150000"
         ucast_recv_buf_size="80000" loopback="false"/>
   <PING timeout="2000" num_initial_members="3" up_thread="false" down_thread="false"/>
   <MERGE2 min_interval="10000" max_interval="20000"/>
   <FD shun="true" up_thread="true" down_thread="true"/>
   <VERIFY_SUSPECT timeout="1500" up_thread="false" down_thread="false"/>
   <pbcast.NAKACK gc_lag="50" max_xmit_size="8192" retransmit_timeout="600,1200,2400,4800" up_thread="false"
                  down_thread="false"/>
   <UNICAST timeout="600,1200,2400" window_size="100" min_threshold="10" down_thread="false"/>
   <pbcast.STABLE desired_avg_gossip="20000" up_thread="false" down_thread="false"/>
   <FRAG frag_size="8192" down_thread="false" up_thread="false"/>
   <pbcast.GMS join_timeout="5000" join_retry_timeout="2000" shun="true" print_local_addr="true"/>
   <pbcast.STATE_TRANSFER up_thread="false" down_thread="false"/>
</config>
  </attribute>

  
  <attribute name="InitialStateRetrievalTimeout">5000</attribute>

  
  <attribute name="SyncReplTimeout">10000</attribute>

  
  <attribute name="LockAcquisitionTimeout">15000</attribute>

  
  <attribute name="EvictionPolicyClass">org.jboss.cache.eviction.LRUPolicy</attribute>

  
  <attribute name="EvictionPolicyConfig">
<config>
   <attribute name="wakeUpIntervalSeconds">5</attribute>
   
   <region name="/_default_">
      <attribute name="maxNodes">5000</attribute>
      <attribute name="timeToLiveSeconds">1000</attribute>
      
      <attribute name="maxAgeSeconds">120</attribute>
   </region>

   <region name="/org/jboss/data">
      <attribute name="maxNodes">5000</attribute>
      <attribute name="timeToLiveSeconds">1000</attribute>
   </region>

   <region name="/org/jboss/test/data">
      <attribute name="maxNodes">5</attribute>
      <attribute name="timeToLiveSeconds">4</attribute>
   </region>
</config>
  </attribute>

  
  <attribute name="CacheLoaderConfiguration">
<config>
   
   <passivation>false</passivation>
   
   <shared>false</shared>

   

   <cacheloader>
      <class>org.jboss.cache.loader.JDBCCacheLoader</class>
      <properties>
      cache.jdbc.table.name=jbosscache
      cache.jdbc.table.create=true
      cache.jdbc.table.drop=true
      cache.jdbc.table.primarykey=jbosscache_pk
      cache.jdbc.fqn.column=fqn
      cache.jdbc.fqn.type=varchar(255)
      cache.jdbc.node.column=node
      cache.jdbc.node.type=blob
      cache.jdbc.parent.column=parent
      cache.jdbc.driver=oracle.jdbc.driver.OracleDriver
      cache.jdbc.url=jdbc

racle:thin:@[hostname]:1521

rcl
      cache.jdbc.user=[username]
      cache.jdbc.password=[password]
      </properties>
      <async>true</async>
      <fetchPersistentState>false</fetchPersistentState>
      <ignoreModifications>false</ignoreModifications>
      <purgeOnStartup>false</purgeOnStartup>
   </cacheloader>
</config>
  </attribute>
</mbean>
Then package this file into a SAR archive (which is a ZIP essentially, only with a different extension) with the following structure:

jbosscache.sar/
- META-INF/
   - jboss-service.xml
You have to package the SAR at your EAR's root level and add the following lines in your META-INF/jboss-app.xml:

<module>
<service>jbosscache.sar</service>
</module>
Should you encounter any serialization problems during startup or later use, you can switch back from JBoss serialization to standard Java serialization by adding the following JVM option:

-Dserialization.jboss=false
Your application should depend on the following artifacts in order to be able to use JBoss Cache as its cache provider (assuming you use Maven for building):

<dependencies>
  <dependency>
<groupId>org.jboss.cluster</groupId>
<artifactId>hibernate-jbc-cacheprovider</artifactId>
<version>1.0.1.GA</version>
<exclusions>
   <exclusion>
      <groupId>hibernate</groupId>
      <artifactId>hibernate3</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-common</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-jmx</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-system</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-j2ee</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-transaction</artifactId>
   </exclusion>
</exclusions>
  </dependency>
  <dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-jbosscache</artifactId>
<version>3.3.1.GA</version>
<exclusions>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-cache</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-system</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-common</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-minimal</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jboss</groupId>
      <artifactId>jboss-j2se</artifactId>
   </exclusion>
   <exclusion>
      <groupId>concurrent</groupId>
      <artifactId>concurrent</artifactId>
   </exclusion>
   <exclusion>
      <groupId>jgroups</groupId>
      <artifactId>jgroups-all</artifactId>
   </exclusion>
</exclusions>
  </dependency>
</dependencies>
Note: the exclusions are here to prevent version mismatches with the libraries already included in our project or provided by JBoss itself. You may have to adjust them manually for your application.

Sky-Tiger · 发表于 2009-3-12 22:37

If you don't use Maven, you have to download the mentioned libraries manually and include them on your classpath.

Now it's time to let Hibernate know something about our cache. A few options is more than enough:

hibernate.cache.provider_class=org.jboss.hibernate.jbc.cacheprovider.JmxBoundTreeCacheProvider
hibernate.treecache.mbean.object_name=jboss.cache:service=TreeCache
hibernate.cache.use_second_level_cache=true
hibernate.cache.use_query_cache=false
hibernate.transaction.manager_lookup_class=<your_transaction_manager_class>
Having done all this, you can cache your entity classes by marking them with the @Cache annotation. Remember that only read-only and transactional strategies are supported by clustered TreeCache.

Your newly created cache can be monitored in two ways - via Hibernate statistics module or via TreeCache JMX MBean, which we have already created.

To use Hibernate statistics, an additional dependency is needed in your POM:

<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-jmx</artifactId>
<version>3.3.1.GA</version>
</dependency>
In order to enable statistic gathering and exporting, the following has to be put in your Spring context file:

<bean id="jmxExporter"
   class="org.springframework.jmx.export.MBeanExporter">
  <property name="beans">
<map>
   <entry key="Hibernate:name=statistics">
      <ref local="statisticsBean"/>
   </entry>
</map>
  </property>
</bean>

<bean id="statisticsBean" class="org.hibernate.jmx.StatisticsService">
<property name="statisticsEnabled">
  <value>true</value>
</property>
<property name="sessionFactory">
  <ref local="hibernateSessionFactory"/>
</property>
</bean>
where "hibernateSessionFactory" is the ID of session factory Spring bean. With this change, Hibernate statistics module is available via JMX.

Sky-Tiger · 发表于 2009-3-12 22:38

You can monitor cached entity Fully Qualified Names (labelled Second level cache regions) and the ratio of put, hit and miss counts to verify that the cache is working as expected. Correctly cached jBPM after a while of operating should result in a very high hit/miss ratio, such as on this screenshot from JConsole:

Sky-Tiger · 发表于 2009-3-12 22:40

Tuning jBPM performance
jBPM in a default configuration scales well but provides only a fraction of its potential performance. The following graph shows how the escalation times fall with addition of subsequent nodes. Scenario which I tested consists of 1000 Calls, each automatically escalated twice – which results in 2000 total escalations. Each escalation results in a database update. All results are illustrative and subject to some fluctuation under different testing conditions.

We are seeking to achieve near-linear scalability. Linear scalability, relative to server resources, means that with a constant load, performance improves at a constant rate relative to additional resources.

The left chart shows comparison of real escalation time to theoretical time, based on linear acceleration. The right chart compares real acceleration to theoretical linear acceleration.

These results were collected using standard jBPM configuration – 1 JobExecutor thread, 10 second idleInterval, 1 hour maxIdleInterval and are meant to show only how jBPM scales in its default setup. It's not bad but the acceleration factor could be higher.

Sky-Tiger · 发表于 2009-3-12 22:41

Now, let's play with the configuration a little. jBPM has two options named idleInterval and maxIdleInterval which are of interest to us. When an Exception is thrown by the JobExecutor, it pauses for a period defined in idleInterval, which is then increased twofold until it reaches maxIdleInterval. Unfortunately for us, StaleObjectStateException is thrown each time an optimistic locking clash is detected, and this happens quite often with many concurrent JobExecutors trying to acquire a job. Reducing both values is crucial in order to achieve a high concurrency rate. Here are the results of reducing idleInterval to 500 milliseconds and maxIdleInterval to 1000 milliseconds:

The left chart shows comparison of real escalation time to theoretical time, based on linear acceleration. The right chart compares real acceleration to theoretical linear acceleration.

You can see that the acceleration curve is very close to the optimum now. Let's see if we can shift the whole time curve downwards.

Sky-Tiger · 发表于 2009-3-12 22:41

In order to increase throughput of the workflow engine you can change the number of JobExecutor threads per machine. I have performed the same performance tests as previously but this time on 4 nodes only, increasing the number of threads and experimenting with cache on or off. Caching increased the throughput by 15-23%, but the most significant gain comes from increasing the thread number:

Sky-Tiger · 发表于 2009-3-12 22:41

We can see that at 20 threads per machine we have reached the saturation point, increasing this value further does not yield significantly better results. Caching would probably bring even more light into the picture if the database was under constant load from other parts of the application.

Conclusion
You have seen a thorough study of jBPM clustering and tuning. The verdict is that jBPM is a very efficient workflow engine, it only requires turning the right knobs in order to get the most out of it. By adding 3 servers and tweaking jBPM configuration, we were able to increase the throughput over 16 times in comparison to 1 server environment with default setup. If you need to increase the throughput of your workflow, I suggest you take the following order of modifications:

increase the number of JobExecutor threads
add cache to decrease database load
add more servers as necessary
One thing is to be remembered though – database will always be a bottleneck at some point in time. After all, jBPM is mostly based on Hibernate. Therefore, if your efforts don't bring expected results, think about tuning / clustering your DB.

Links