Friday, July 24, 2015

Identity generation in load balanced WebLogic/RAC environment (2of3)

This note specifies technical aspects of identity generation adjusted to multi node environment utilizing load balanced channels to Oracle RAC database. It's detailed extension of document "Maximize insert throughput in Oracle RAC system".
Introduction
Data layout design in distributed environment should aim to minimize data movements and contention on resources. Ideally each processing thread should operate on own subset of resources. In the easiest situation, single process writing to one database has exclusive access to all resources. Adding more database processing nodes adds complexity. Enabling load balancing on database connections, makes things even more complex, as single writing thread may be directed to any node of the distributed database layer. Assuming writes hitting the same physical blocks, database is forced to perform extensive synchronization work to keep all nodes in coherent state. Typical system is not a single process - it's a multi node, and multi threaded software, what add so high level of complexity, that without proper design database is more busy with taking care of system coherent state than processing app's requests.

There are few technical solutions of above problem. All options are presented in document "Maximize insert throughput in Oracle RAC system". Each solution however has it's own side effects. One increases latency, another may lead to I/O problems. The ideal solution is based on the right design of data model, and data access strategies. Design created with care of application to data affinity and understanding of physical data structures. 

Proposed solution aims to improve performance of data insert. Solution is based on ordered nature of B-Tree structure used to store indices, and knowledge when RAC Cache Fusion needs to pass blocks over Interconnect. Prefixing primary key by, in short, thread id and database id, mitigates all problems related to both inter node cache synchronization and index leaf block contention. There is one limitation of the solution: prefixing indexed column is possible only for surrogate keys; all natural keys must be optimized using purely technical means. 

Picture. Prefixed identifier layout


Prefixing indexed column assigns primary key ranges to each writing thread. After writing few thousands records, B-Tree structure splits into groups of blocks, exclusively associated with writhers. Providing compound prefix consisting of WebLogic and RAC node guarantees flawless data insert in load balanced environment. Note that in the system with configured database connectivity in failover mode, only WebLogic node identifier is required.

Picture. Prefixed identifier mapped to WebLogic, RAC node cache, and B-Tree blocks


It's surprising that, Oracle's B-Tree on a leaf block level is a flat data structure called bidirectional list. With a big enough index, consisting of minimum hundreds of blocks, prefix's ranges guarantees exclusive access to data blocks for writers. Initially blocks are shared between ranges, but after number of block splits, access becomes exclusive. From that moment no two threads are writing to the same blocks.

Picture. Prefixed sequence and B-Tree block clustering

Key to an efficient identifier generation is a knowledge how to get elements of a compound key. This document describes how to get: (1) WebLogic identifier, (2) identifier of Oracle RAC node servicing Hibernate session, and (3) thread identifier. Next document in series "Identity generation in Hibernate" describes how to work with (4) ascending sequence using Hibernate HiLo generator. 

Requirements
All elaboration and analysis done so far makes it possible to prepare list of requirements describing ideal identifier working effectively in load balanced Oracle WebLogic / Oracle RAC environment.

Enterprise level identity generator used by application deployed on WebLogic, connected to RAC database via load balanced multi data source should stick to following requirements:
1.  app nodes uses distant identifiers
2. database nodes uses distant identifiers
3. threads uses distant identifiers
4. same thread uses subsequent or in proximity identifiers
5. identifiers are unique per entity 
6. identifiers are optimally stored in database data type e.g. NUMBER
7. identifiers must fit in Java long type
8. space for sequence value cannot be limited

Unique sequence must be prefixed by app node, and thread identifier to fulfill requirements (1) and (3). Moreover in load balanced environment node identifier must be extended by database id (2). The combination of node, thread, and sequence guarantee uniqueness of the identifier (5). As existing application uses Java long data type to store identifiers, whole prefixed identifier must fit in 19 digits (7). To save storage space (6), the sequence itself may be reversed to eliminate leading zeros. This operation removes zeros from the middle of the number, for the price of breaking proximity of subsequent sequence values (4). 

Notes:
  1. (6) is in opposition to (4). It should be decided what is more important - saving storage space or optimizing use of index blocks.
  2. Prefixing sequence by (8) consumes number of digits. With full prefix, 5 digits are taken, leaving 14 for sequence in case of Java long data type.

Identity generation in WebLogic connected to load balanced RAC
Oracle RAC system, based on "shared everything" paradigm, and coherent, distributed cache, makes it mandatory to generate identifier associated with RAC node used to service database operations. It's easy to achieve in case of having stable connection to a database instance, e.g. created by failover mode of Oracle client. In such mode, each client connects to one primary instance, with ability to switch to another in case of primary instance failure. In the opposite configuration, based on load balanced connections,  application uses potentially different physical connection, each time the connection is requested. It's a very good approach compatible with idea of resource sharing, and horizontal scalability, however being a source of issues when not properly used. In such situation, using generators sharing the same continuum of identifiers among all database connections, leads to known problems of (1) index leaf block contention and (2) increased use of interconnect.

Load balanced environments adds more factors to identity generation logic, as system must control affinity and exclusive use of cache blocks to eliminate contention and global cache synchronization. In the load balanced configuration, with connections pointing to different RAC instances, node identifier must be constructed using both WLS and RAC node identifier. This compound node identification splits B-Tree into areas exclusively used by each combination of client, and database instance. Using just one of them in load balanced environment is not optimal due to following issues:
  1. using WLS node identifier alone leads to index blocks movements between instances over interconnect, as WLS node may be directed to different RAC node for subsequent operations,
  2. using RAC node identifier alone leads to block contention as multiple WLS may write to the same index blocks.

Load balanced environment makes it impossible to create static node identifier. On one hand, WLS identifier is by definition stable. On the other one, dynamic database connection requires discovery of RAC node identifier from currently used physical connection. To achieve this it's mandatory to use stateless generator, going to a database instance each and every time before constructing unique identifier. 

Fault! Potential exception from above is possibility of pinning JDBC connection to a thread. This WebLogic performance optimization, makes it possible to eliminate load balancing, as physical connection is assigned to a thread. Unfortunately documentation says 15 that: "Pinned-To-Thread feature does not work with multi data sources, Oracle RAC, and IdentityPool".

Identification of Oracle RAC node
There are two possibilities to get node identification in Oracle RAC. The fundamental technique is based on executing SELECT command asking for instance environment. 

select userenv('instance') from dual;

This is a proper technique, but requiring remote operation on a database each time identifier is generated. In a real system such remote call must be eliminated, to speed up business object handling. Oracle JDBC driver may speed up subsequent executions of the same statement by using result set cache. It may be a good technique, however it should be verified if cache is kept on client side. 

Another option available starting from Oracle 11g,  is based on getting system environment by using getServerSessionInfo(). With this method, it's no longer needed to go to a database to discover node information. 

        DataSource ds = (DataSource) ic.lookup("jdbc/db");
        java.sql.Connection conn = ds.getConnection();
        oracle.jdbc.OracleConnection vendorCon = (oracle.jdbc.OracleConnection)
((weblogic.jdbc.extensions.WLConnection)conn).getVendorConnectionSafe();    
        oracle.jdbc.internal.OracleConnection orclConn = 
(oracle.jdbc.internal.OracleConnection)vendorCon;
        String instanceName = 
orclConn.getServerSessionInfo().getProperty("INSTANCE_NAME");
        Short instanceNo = InstanceName2InstanceNo(instanceName)

One more way of discovering node information is based on getting a connection string from Oracle connection object. Assuming that the connection string is stable pointing to only one instance (load balancing is configured in WLS not in Oracle client), this information may be used to get instance id, without going to a database.

        DataSource ds = (DataSource) ic.lookup("jdbc/db");
        java.sql.Connection conn = ds.getConnection();
        oracle.jdbc.OracleConnection vendorCon = (oracle.jdbc.OracleConnection)
((weblogic.jdbc.extensions.WLConnection)conn).getVendorConnectionSafe();    
        oracle.jdbc.internal.OracleConnection orclConn = 
(oracle.jdbc.internal.OracleConnection)vendorCon;
        String instanceURL = orclConn.getURL();
        Short instanceNo = InstanceURL2InstanceNo(instanceURL)

In both situations, transformation of received information into a numeric node's identifier must be performed.

Preparation of prefixed key
Prefixed key consists of four elements: WebLogic node id, RAC node id, thread id, and a sequence. Generation of a prefixed key should be a rapid operation - ideally operating only in Java space without a need of going to a database or a file system. 

Prefixed key elements should be prepared taking into consideration following comments:

Node id is converted from WebLogic name using configuration file. File is shared between WLS nodes. Id may be deducted from name by Java code, if it's possible to compute 1 digit long identifier from the name.

Database id is converted from RAC instance name, using hash structure initially filled from configuration file or a table, shared between all RAC instances. Id may be deducted from name by Java code, if it's possible to compute 1 digit long identifier from the name.

Thread id as a local variable per node, may be any number. For simplification of discovery and to save key space, each thread is automatically assigned to integer numbers. Thread.currentThread.getId() is stored in a hash structure, with a value of number of elements in the structure. Thread id may be, by definition, changed after restart. 

Sequence is taken from Java side generator handling ranges with Oracle sequence used to calculate minimal used value, what is required to survive restarts. Oracle sequence may be configured with CACHE NOORDER, but it's not mandatory, as db level sequence will be used not so often due the way how Hibernate's HiLo algorithm works 12.

Preparation of a prefixed key is based on simple mathematical operation: shifting digits left to proper positions. Operation may be done by multiplying dbId by 10^18, wlsId by 10^17, and threadId by 10^14. Optimization of node, thread conversion to number is done using hash maps, however it may be faster to extract number from string value representing instance or thread. Node id discovery should be performed only once in identifier generator's configure() method. Note that thread must be converted do id each time sequence is generated - caching it's identifier in ThreadLocal sounds like a good idea. In ideal situation each thread should have unique value to eliminate index leaf block contention.

Prefixed key must always have 19 digits, as fields are determined by its positions. Due to this requirement, WLS node cannot start from zero, but must start with 1. Due to maximum Long value of 9223372036854775807, WLS node must finish at 8 - 9 cannot be used as must be followed by max digit 2. It may be a good decision to change places of WLS and DB identifiers, as typically we have more app servers than db instances. Having WLS on second position we may address 10 instances by using numbers 0...9.

Optimized solution
Prefixing by client node, thread, and database node is the most intuitive way. There is however more compact possibility based on database session identifier. Session is established between client node, thread, and RAC node and stored on a RAC instance. This identifier enables possibility to replace WLS id and thread id. As session id is not unique in RAC cluster - must be prefixed by RAC instance. Assuming max. 999 sessions per database instance, prefix will consume 4 bytes, leaving 15 for sequence. It gives 10 times more space than with previous prefix. In case of sequence overflow it's possible to limit session identifier, even to one digit - it may add enough entropy to inserting stream to eliminate index leaf block contention. 
Picture. Prefixed identifier layout


Verification of correctness of prefixed key based on database instance id and session id is presented on below diagrams.

Picture. Prefixed identifier mapped to block clusters

Both RAC node id and session id may be taken from Oracle connection object. Both values are stored on client side - there is no need to execute expensive database call. Below code presents simple JDBC code. In case of working in WLS environment remember to use getVendorConnectionSafe(), as we know that reading properties does not change state of the connection. In other case WLS will assume that connection object is "infected" and may remove it from connection pool.

        DataSource ds = (DataSource) ic.lookup("jdbc/db");
        java.sql.Connection conn = ds.getConnection();
        oracle.jdbc.OracleConnection vendorCon = (oracle.jdbc.OracleConnection)
((weblogic.jdbc.extensions.WLConnection)conn).getVendorConnectionSafe();    
        oracle.jdbc.internal.OracleConnection orclConn = 
(oracle.jdbc.internal.OracleConnection)vendorCon;
        String instanceId = 
orclConn.getServerSessionInfo().getProperty("AUTH_INSTANCE_NO");
        String sessionId = 
orclConn.getServerSessionInfo().getProperty("AUTH_SESSION_ID");
         
Storage consumption
Oracle database stores NUMBER in a variable size structure. It's a floating point storage similar to IEEE754 format, but based on Base 100 instead of 2 - Oracle NUMBER type does not comply with the IEEE 754 14. Number's digits are stored by two, starting from right side of the number, each pair in one storage's byte. Each storage byte represents 2 digits of positive integer represented in standard Base10 format. Number of zeros is represented in first byte of the stored value, where exponent is stored 13.  Reading referenced document, and making few exercises with dump() function makes it easier to understand how it works. 

DUMP(9,16) Typ=2 Len=2: c1,a 
DUMP(9,10) Typ=2 Len=2: 193,10 
=> (10-1)*100^(1-1) = 9

DUMP(98,16) Typ=2 Len=2: c1,63 
DUMP(98,10) Typ=2 Len=2: 193,99 
=> (99-1)*100^(1-1) = 98

DUMP(9798,16) Typ=2 Len=3: c2,62,63 
DUMP(9798,10) Typ=2 Len=3: 194,98,99 
=> (98-1)*100^(2-1)+ (99-1)*100^(1-1)= 
= 9700 + 98 = 9787

DUMP(99798,16) Typ=2 Len=4: c3,a,62,63 
DUMP(99798,10) Typ=2 Len=4: 195,10,98,99 
=> (10-1)*100^(3-1)+ (98-1)*100^(2-1) + (99-1)*100^(1-1)=
= 90000 + 9700 + 98 = 99798

NUMBER guarantees storage of up to 38 significant digits without loosing precision. Oracle uses only required number of bytes required to store the value. Unfortunately prefixed key uses all positions of the number, thus maximum number of bytes is always used to store the value. Assuming Java's long used to store entity identifier, prefixed value consumes always 11 bytes, as full long uses 19 positions (11 = (19+1)/2+1).

DROP TABLE ORCL_NUMBER_CHECK;
CREATE TABLE ORCL_NUMBER_CHECK (description VARCHAR2(20), value NUMBER(38,0));

INSERT INTO ORCL_NUMBER_CHECK VALUES ('Long.MAX_VALUE', 9223372036854775807);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 1 not shifted', 1100100000000000001);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 1 not shifted', 2000100000000000001);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 1234234123', 2000112342341230000);

INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 14 digits', 1100000000000000001);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 13 digits', 110000000000000001);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 12 digits', 11000000000000001);

INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 20001001', 20001001);
INSERT INTO ORCL_NUMBER_CHECK VALUES ('seq 20001100', 20001100);

col value for 9999999999999999999999999999999999999;
select description, value, vsize(value) from ORCL_NUMBER_CHECK;

Creates following response:

DESCRIPTION   VALUE         VSIZE(VALUE)
-------------------- ---------------------------- ------------
Long.MAX_VALUE 9223372036854775807     11
seq 1 not shifted 1100100000000000001     11
seq 1 not shifted 2000100000000000001   11
seq 1234234123 2000112342341230000     9
seq 14 digits 1100000000000000001     11
seq 13 digits 110000000000000001     10
seq 12 digits 11000000000000001     10
seq 20001001 20001001     5
seq 20001100 20001100   4

9 rows selected.

Storage compression
It's tempting to compress somehow sentence value to leverage Oracle NUMBER physical capabilities. NUMBER stores optimally huge numbers suffixed by zeros. Identifier, represented by a sequence starts, at beginning, with low numbers. Proposed sequence prefixing, system always uses full 19 digits to store the identifier. Initial sequence value of 1 (one), prefixed by node id of 1 creates 1000000000000000001, which takes 11 bytes. System will most probably spend more time on sequences kept in lower area of sequence space. Having only 14 significant digits (as 5 are consumed by prefix), system may write data during 321 years of operation with assumption of writing 10.000 rows per second. It means that during normal system lifetime there will be a lot of leading zeros consuming storage. Of course it's possible to reserve less places for identifier, but it creates possibility of overflow. We would like to achieve two contradicting objectives: (1) save identifier storage space, and (2) delay risk of overflowing max possible integer value to be stored. It's, however, mandatory to choose one them, as both cannot be fulfilled in the same time. Find below explanation of failed attempts to save some space. I recommend to reserve full 11 bytes to store Java long identifier. 

Really bad idea Tempting idea is to shift number left to move leading zeros to right side of the number.

20001.00000000000100 -> 20001.10000000000000
20001.00000000000200 -> 20001.20000000000000
20001.00000000000300 -> 20001.30000000000000

The idea sounds interesting, as 20001.00000000000001 needs 11 bytes, and 20001.10000000000000 only 5. The problem is that it can't be done, as it eliminates possibility to use sequence of 10000000000000, which cannot be shifted anymore, but was already used by shifted e.g. 00000000000001. 

20001.099 -> 20001.99999999999999
20001.100 -> 20001.10000000000000
<- cannot be shifted. Conflict with already shifted 
20001.00000000000001 -> 20001.1000000000000

The problem is visible even earlier trying to shift 00000000000010. It's a completely wrong approach. 

Almost good idea is based on reversing the sequence. It will guarantee unique value of the number with moved zeros from beginning to the end. Saving 6 bytes per row.

20001.00000000000100 -> 20001.00100000000000
20001.00000000000101 -> 20001.10100000000000
20001.00000000000102 -> 20001.20100000000000

Such technique looks interesting, saving few bytes per column at beginning of data insertions. The problem is that after reaching middle point of sequence, storage consumption will start decreasing as reversed value will be prefixed by zeros. This solution may be considered when those 6 bytes are making significant difference in storage consumption. Reversing sequence breaks requirement (4) and leads to distribution of inserts into many blocks, even for single thread writing set of subsequent records. In corner case it leads to increased use on redo, as more blocks are updated, and to I/O transfers, when index cannot fit into memory.

Exemplary Prefixed Key computation
Find below exemplary computation of prefixed key value. Note that presented Java logic is presented just as example - it may be not optimal.

wls node: wls#1
thread: 4
database node: prod_instance#2
sequence: 989780816
identifier data type: long

Configuration - node2id mapping file
It's mandatory to map WLS node name to a number. If number is not place in node name, it's mandatory to use a file with mapping. File may be stored in domain config directory. 

prod_instance#1 1
prod_instance#2 2
wls#1 0
wls#2 1
wls#3 2
wls#4 3

Configuration  - code for Node2id mapping 
if instanceMap.isEmpty
instanceList = getInstance2idConfiguration()
instanceMap.putAll(instanceList)

Node2id mapping logic
instanceId = instanceMap(instance_name)
return instanceId // returns 2 of db, and 0 for wls

Thread2id mapping
synchronized
threadId=threadMap.get(4)
if ( threadId == null)
threadId = threadMap.size(); //234
threadMap.put(4, threadId)
return threadId //returns 234

Sequence
Taken from Hibernate HiLo generator - sequence generator based on configurable ranges with fast Java internal counter.

Computations
Above conversions return wlsId=0, DBId=2, threadId=234, and sequence=989780816, which must be packed into below long identifier

|19        |10
98 765 43210987654321
99 999 99999999999999

DBId must be multiplied by 10^(18) to move to location followed by 18 zeros.

20 000 00000000000000

WLSId must be multiplied by 10^(17) to move to location followed by 17 zeros.

00 000 00000000000000

Thread id must moved into it's position by multiplication by 10^(14) to move to location followed by 14 zeros.

00 234 00000000000000

Finally all four numbers must be added together to build final 19 digit long number.

20 234 00000989780816
98 765 43210987654321
|19        |10

Option A - reversed sequence
To save few storage bytes, sequence 00000989780816 may be reversed to 61808978900000.

00 000 61808978900000

Finally all four numbers must be added together to build final 19 digit long number with zeros at the end.

20 234 61808978900000
98 765 43210987654321
|19        |10

Prefixed key capacity
Oracle sequence supports maximum 28 digits.  With assumption of adding 10k rows/s this number has capacity to identify object during 3*10^16 years. Oracle NUMBER stores 38 significant digits. Prefixing sequence by thread (3 digits for max 999 threads) and node identifiers (2 digit for max 10 WLS, and 8 database nodes), leaves 33 digits for sequence. It's a space for 3*10^21 years of constant flow of data. The age of earth is 5*10^9 years. Such space may be used in newly created application with identifiers stored in BigDecimal. 

Applications already using Java long data type for object identification gets smaller id space. Assuming 5 digits consumed by prefix, sequence may use up to 14 digits. Constant flow of data with speed 10.000/s will fill available space in 321 years. 

Sequence overflow
In case of sequence overflow, the only possible thing is to limit number of threads to reuse one digits from this space. Thread identification logic, should be able to work in both 3 and 2 id numbers. In the latter case, some of threads will use the same id, what may lead to index leaf block contention problem. Database NUMBER datatype has a lot of space to extend size of prefixed key. Unfortunately in Java we are limited, when using long data type. It's different with BigDecimal. New systems should use identifiers based on BigDecimal.

###

References
1. Question on StackOverflow, Styczynski, 2015
2. Hibernate 3.5 JavaDoc, RedHat, 2004
3. JDBC connections, RedHat, 2004
4. Understanding Database Instances, Oracle, 11g
5. Predefined Parameters of Namespace USERENV, Oracle, 11g
6. NUMBER stores 38 significant digits
7. NUMBER uses base 10 to store numbers (it's not base 2 IEEE754)
8. NUMBER packs both leading and finishing zeros
9. Sequence handles up to 28 digits
10. Fast way of checking number of digits in number - divide and conquer
11. Sequence in RAC, Goodman, 2008
12. Hibernate/JPA Identity Generators, Lupu, 2011
13. Oracle NUMBER uses Base 100 - in fact.
14. Oracle NUMBER type does not comply with the IEEE 754
15. Using Pinned-To-Thread Property to Increase Performance


No comments:

Post a Comment