Using partitioning for multi-confidential multi-confidential work with dynamic tenants
I am writing a web application that needs to be multi-user. I am using JPA for persistence layer and I am evaluating EclipseLink with interest.
The multi-tenant strategy I want to use is one scheme for each client. Hibernate supports a strategy like this ( http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html#d5e4771 ) and I've already used it with success. However, AFAIK only supports it when using the native Hibernate API, while I want to use JPA.
EclipseLink, on the other hand, supports multi-table strategies with one table and multiple tables. However, it also supports partitioning and with a simple custom partitioning policy I can easily configure one partition for each client.
The first question might be whether using partitioning is appropriate for this use case or not.
The main problem is that the client base can (hopefully) grow over time, so I have to make EclipseLink "know" about new clients dynamically (ie: without restarting the webapp). As I understand it, in order to set up partitioning in EclipseLink, I need to set up my save system with different connection pools (or "nodes"): each node has its own configured datasource and name. On the other hand, the partitioning strategy will identify the node to use by its name. So far so good, but I plan on setting up my persistence system using Spring LocalContainerEntityManagerFactoryBean
. I can open clients dynamically on startup when processedLocalContainerEntityManagerFactoryBean
so I can pass all the required properties for all nodes / clients by then, but what happens if a new client is added after that? I don't think dynamically changing persistence properties will have any effect on an already built EntityManagerFactory
singleton instance ... and I'm afraid EclipseLink will complain if I ask for a section for which no matching node was found at the EntityManagerFactory
time of creation. Correct me if I am wrong.
I think that declaring the scope LocalContainerEntityManagerFactoryBean
as a "prototype" bean would be a very bad idea, and I think it won't work. On the other hand, since the client interaction is tied to a specific HTTP session, I can alternatively use the "middle" approach by declaring the scope LocalContainerEntityManagerFactoryBean
as "session", but I think that in this case I would have to manage issues like increased memory consumption and shared shared caching between multiple EntityManagerFactories
(one for each client using the application at a given time).
If I cannot get this strategy to work, I think I will have to give up partitioning altogether and go back to the principle of "dynamic data source routing", but in this case I am concerned that EclipseLink is shared (I think I will have to completely disable it and that would be a real disadvantage).
Thanks in advance for any feedback on this.
source to share
To be honest, I haven't tried Chris's suggestion, but opted for a more subtle solution. This is my decision.
- in my case, tenant = client; each client information is in its own database schema, potentially located in a separate instance of the DBMS (any vendor); in other words, I have one data source for each client.
- since I am using partitioning, this means each client has their own partition; each section is identified by a corresponding unique client ID
- each user entering the application belongs to a different client; I am using Spring Security for authentication and authorization, so I can get information about a user (including its owner) by asking
SecurityContextHolder
- I have defined my own EclipseLink
PartitioningPolicy
that defines the client of the current user as described in the previous point and then returns a list containing onlyAccessor
that identifies that client section -
all my tables need to be partitioned and I don't want to specify which on EVERY entity with annotations, so I registered this partitioning policy in EclipseLink at startup and set it as default; Briefly:
JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class); ServerSession serverSession = jpaEmf.getServerSession(); serverSession.getProject().addPartitioningPolicy(myCustomerPolicy); serverSession.setPartitioningPolicy(myCustomerPolicy);
Then, to dynamically add data sources to EclipseLink (called connection pools in EclipseLink terminology) so that the client ID specified by the above policy is mapped to a known "connection pool" in EclipseLink, I do the following:
- the listener intercepts any successful user login
-
this EclipseLink listener request to see what it already knows about the connection pool identified by the user's user ID; if so, we're done, EclipseLink can handle the section correctly; otherwise, a new connection pool is created and added to EclipseLink; proof of concept:
String customerId = principal.getCustomerId(); JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class); ServerSession serverSession = jpaEmf.getServerSession(); if (!serverSession.getConnectionPools().containsKey(customerId)) { DataSource customerDataSource = createDataSourceForCustomer(customerId); DatabaseLogin login = new DatabaseLogin(); login.useDataSource(customerId); login.setConnector(new JNDIConnector(customerDataSource)); Class<? extends DatabasePlatform> databasePlatformClass = determineDbVendorPlatform(customerId); login.usePlatform(databasePlatformClass.newInstance()); ConnectionPool connectionPool = new ExternalConnectionPool(customerId, login, serverSession); connectionPool.startUp(); serverSession.addConnectionPool(connectionPool); }
The login operation is of course done against the central database (or any other authentication source), so the above code occurs before any JPA request for a specific client is made (and hence the client connection pool is added to EclipseLink before the separation policy always refers to it).
However, it is important to consider an important aspect. In EclipseLink, data partitioning means that the identifiable piece of data (= object instance) is either in the same partition or replicated in the same way across multiple partitions. An object instance id is defined via id (= primary key). This means that there should not be two different instances of an entity of type E with the same id = x for two different tenants / tenants T1 and T2, otherwise EclipseLink may assume they are the same entity instance. This can result in mixed data from different clients being read / written during the same JPA session => disaster. Possible solutions:- in this scenario, the section used is determined by the current user; this means that it will be the same for every request made within the HTTP session; Since I use entity managers with transactions that have the most lifespan of the request (which itself propagates well within an HTTP session), simply disabling the shared EclipseLink cache avoids mixing data from different clients; however, this is still undesirable.
- the best option I could find is to make sure that all ids (= primary keys) are generated and that the generation is handled by EclipseLink in a central cross-user mode, so that id = x for object E is certainly only assigned one instance of only one client object; it actually means "decoupling" the client ID assignment sequences and prevents the use of MySQL auto-increment columns (eg the type of database ID generation); so I decided to use a table generation type for entity ids and put this table in a central database where user and customer data is stored.
The last little problem to properly implement Option 2 is that even though the EclipseLink documentation states that it is possible to specify the connection pool (= data source) to be sequenced on the table using a config parameter eclipselink.connection-pool.sequence
, this seems to be ignored when the default split policy installed as described above. In fact, my client partitioining policy is called for EVERY request, even those used to allocate IDs. For this reason, the policy must intercept these requests and route them to a central data source. I couldn't find a definitive solution to this problem, but the best options I could think of are the following:
- if the SQL line of the query begins with "UPDATE SEQUENCE", it means that this is a query to allocate an identifier, provided that the table dedicated to the distribution of sequences is called SEQUENCE (this is the default)
- if you accept the convention to add SEQUENCE suffix to your generators, if the completed request name ends with "SEQUENCE" it means it is a request to allocate an id
I chose option 2, correctly defining my generation mappings as such:
@Entity
public class MyEntity {
@Id
@TableGenerator(name = "MyEntity_SEQUENCE", allocationSize = 10)
@GeneratedValue(generator = "MyEntity_SEQUENCE")
private Long id;
}
This forces EclipseLink to use a named table SEQUENCE
containing one row, the column value SEQ_NAME
is MyEntity_SEQUENCE
. The request used to update this sequence to allocate an id will be called MyEntity_SEQUENCE
and we're done. However, I have made my partitioining policy customizable so that I can switch from one request-request identification strategy to another at any time in case something changes in the EclipseLink implementation that violates this "heuristic".
This is essentially the whole picture. It works well for now. Feedback, improvements, suggestions are welcome.
source to share
Check refreshMetadata in the EclipseLink EntityManagerFactory class described here: http://wiki.eclipse.org/EclipseLink/DesignDocs/340192#EntityManagerFactory which will force the singleton to reload the configuration data. This will not affect the startup of EntityManager instances, but will cause any new EntityManager to receive new configuration data that appears to match your usage.
The EntityManagerFactory needs to be deployed to access the interface http://javadox.com/org.eclipse.persistence/eclipselink/2.5.0/org/eclipse/persistence/jpa/JpaEntityManagerFactory.html :
JpaHelper.getEntityManagerFactory(em).refreshMetadata(properties);
source to share