Friday, July 24, 2015

Identity generation in Hibernate (3of3)

Hibernate may be configured to assign unique identifiers to newly created entities. This feature is easy to use and does not need any tuning in single threaded, and single node environment. On the other hand, providing efficient configuration in distributed system requires careful analysis, design, and some of development effort. This note is Hibernate extension of documents "Maximize insert throughput in Oracle RAC system", and "Identity generation in load balanced WebLogic/RAC environment".
Hibernate identity generators
There are two types of generators in Hibernate: stateless, and stateful. Stateless generators do not keep any state in Java variables, using external sources of unique values. It may be combination of IP address and system time for UUIDHexGenerator, or a database sequence for SequenceGenerator. More sophisticated and faster generators are stateful, keeping current value of identifier in Java variables. Such implementation is used by e.g. SequenceHiLoGenerator. Stateful generator is typically based on synchronized generate method, incrementing value of class' private variable during each request to get new identifier. 
Important! Identity generation class is a singleton managed by SessionFactory 1. It's not obvious, but the code contains synchronized generate method with no static class level variables. Synchronized generate method increments instance level variables, what confirms singleton nature of the generator. 

Hibernate generators based on database sequence
Generators based on database sequence (e.g. SequenceGenerator.java) use getBatcher.prepareSelectStatement(sequence sql). According to Hibernate documentation this code uses the same database connection, which will be used to flush object into a database. Documentation says that in an opened Hibernate session, JDBC connection will be obtained from the configured ConnectionProvider as needed to perform requested work 2. Looking into simplest Hibernate snippet, it's visible that first mandatory moment when database connection must be assigned to the session context is when transaction is started (line 2). 

1. Session session = HibernateUtil.getSessionFactory().openSession();
2. session.beginTransaction();
3. DBUser user = new DBUser();
4. user.setUsername("superman");
5. user.setCreatedBy("system");
6. session.save(user);
7. session.getTransaction().commit();

Interesting is that simple execution of above code shows that database channel is opened already in line (1), but let's assume that starting transaction is the mandatory point of getting the connection. After that point, at line (3), persisted object is created, with automatic assignment of identifier Final flush to database is performed in line (6) with transaction commit at line (7). Analyzing above we have to agree that logic between lines (2) and (7) must use single database connection. Elements of database connection lifecycle and configuration is described in Hibernate documentation 3, however without precise information. Above statements are not directly quoted, but deducted from the documentation. 

In managed application server environment transaction is automatically started by the container, thus the same database connection is used during whole method execution being defined as a single transaction. Potential RequiresNew definition, opening new transaction, does not disrupt the concept, as newly taken connection will be used by the identity generator, used in code opening new transaction.

The most important is that such behavior makes it possible to design identify generator which uses RAC node identifier - it may be just a sequence taken from a given node, or a smart identifier, prefixed by RAC node number.

Hibernate makes it possible to execute expensive, one time operations, in configure() method of identifier generator interface. This method is called by Hibernate SessionFactory during creation of generator class.

Hibernate HiLo generator
Typical database system uses one named sequence per table to create identifiers. To optimize sequence generation, sequence may be configured with cache option, assigning range of sequences to each requesting client. This technique minimizes sequence generation latency, unfortunately is not a perfect one for applications running in middleware layer. In Java/Hibernate environment, efficient generation of identifiers is available by using HiLo generator. The HiLo generator works as singleton - newly created sessions reuses already created generators. Singleton guarantees sharing single sequence between all threads - HiLo generator returns subsequent identification values for each requesting request from any thread. HiLo algorithm implements CACHE, known from Oracle sequence, doing it in Java space. It's important that majority of sequence generations are done in Java space; remote call to a database is required only to get new starting point of range (Hi value). 

Picture. Graphical description of HiLo logic 

Variable maxLo (4, 8), defined as generator initial parameter, is equivalent of Oracle's CACHE value - it's a size of sequence range handled purely in Java space. Variable hi defines beginning of range, and lo is offset inside of range. Range exhaustion is checked in line (4), by checking if offset (lo) is bigger than range size (maxLo). If lo is bigger than maxLo, new range starting value (hi) is taken from Oracle sequence (6).  
    1. public synchronized Serializable generate(SessionImplementor session, Object obj)
    2.     throws HibernateException
    3. {
    4.    if(lo > maxLo)
    5.    {
    6.        long hival = ((Number)super.generate(session, obj)).longValue();
    7.        lo = hival != 0L ? 0 : 1;
    8.        hi = hival * (long)(maxLo + 1);
    9.    }
    10.   return IdentifierGeneratorFactory.createNumber(hi + (long)(lo++), returnClass);
    11.}

Note that class is instantiated as a singleton, thus synchronized method changes values lo, hi, shared with all threads. 



###





References
1. Question on StackOverflow, Styczynski, 2015
2. Hibernate 3.5 JavaDoc, RedHat, 2004
3. JDBC connections, RedHat, 2004

No comments:

Post a Comment