Spring JPA: What is the cost of saveandflush and save?

Question

Spring JPA: What is the cost of saveandflush and save?

I have an application built from a set of microservices. One service receives the data, persists through Spring JPA and Eclipse, and then sends an alert (AMQP) to the second service.

Depending on the specific conditions, the second service then calls the RESTfull web service against the stored data to retrieve the stored information.

I noticed that sometimes the RESTfull service returns a null dataset even if the data was previously saved. Looking at the code for the persisting service, save was used instead of saveandflush, so my guess is that the data is not flushed fast enough for the downstream service to query.

Is there a cost with saveandflush that I should tire out or should I reasonably use the default?
Would the speed of data availability for downstream applications be responsive?

I have to say that the original save function is wrapped in @Transactional

+3

java spring spring-data-jpa eclipselink

skyman May 10 '17 at 4:48

source to share

1 answer

Edwin dalorzo · Accepted Answer · 2017-05-10T05:38:08+0000

Possible forecast of the problem

I think the problem here has nothing to do with save

vs saveAndFlush

. The problem stems from the nature of Spring methods @Transactional

and the misuse of these transactions in a distributed environment that includes both your database and the AMQP broker; and perhaps add to this poisonous mixture, some major misunderstandings about how the JPA context works.

In your explanation, you seem to be implying that you are starting a JPA transaction within a method @Transactional

and during the transaction (but before it is committed) you are sending messages to the AMQP broker; and later, on the other side of the queue, the consumer application receives messages and makes a call to the REST service. At this point, you will notice that the transactional changes from the publisher's side have not yet been tied to the database and therefore not visible to the user side.

The problem seems to be that you are propagating these AMQP messages inside your JPA transaction before it is disk bound. When the consumer reads the message and processes it, your publish-side transaction has not yet completed. Therefore, these changes are not reflected in the consumer application.

If your AMPQ implementation is Rabbit, then I've seen this problem before: when you start a method @Transactional

that uses the database transaction manager, and within that method, you use RabbitTemplate

to send the appropriate message.

If yours is RabbitTemplate

not using a broadcast channel (i.e. channelTransacted=true

), your message is delivered before the database transaction. I believe that by enabling broadcast channels (disabled by default) in yours RabbitTemplate

, you solve part of the problem.

<rabbit:template id="rabbitTemplate" 
                 connection-factory="connectionFactory" 
                 channel-transacted="true"/>

When the channel is broadcast, it RabbitTemplate

"joins" the current database transaction (which appears to be a JPA transaction). Once your JPA transaction is committed, it triggers some epilogue code that also commits changes to your Rabbit channel, which forces the actual "sending" of the message.

About saving vs saveAndFlush

You might think that fixing the changes in your JPA context should fix the problem, but you are wrong. Flushing your JPA context just forces changes to your entities (in memory only at this point) to be written to disk, but they are still written to disk as part of the corresponding database transaction, which will not be executed until the JPA transaction fixed. This happens at the end of your method @Transactional

(and, unfortunately, some time after you've already sent your AMQP messages - unless you are using a live feed as described above).

So, even if you clear your JPA context, your consumer app won't see these changes (following the classic database isolation rules) until your method @Transactional

completes in your publisher app.

When called, you save(entity)

EntityManager

don't need to sync the changes right away. Most JPA implementations simply mark entities as dirty in memory and wait until the last minute to synchronize all changes with the database and commit those changes at the database level.

Note. There are times when you might want some of these changes to go to disk immediately before the whimsical EntityManager

decides to do so. A classic example of this happens when a database table has a trigger that you need to run to create additional records that you will need later during a transaction. Thus, you are forcing a flush to disk in order to fire the trigger.

Flushing the context is just forcing the in-memory changes to be synchronized to disk, but that doesn't mean the database is immediately committed to those changes. Therefore, the changes you hide will not necessarily be visible to other transactions. Most likely, they will not be based on traditional database isolation levels.

2PC problem

Another classic problem is that your database and your AMQP broker are two independent systems. If it's a rabbit, then you don't have 2PC (two-phase adoption).

This way you can accommodate interesting scenarios eg. your database transaction commits successfully, but then Rabbit fails to transmit your message, in which case you will have to redo the whole transaction, possibly skipping the database side effects and just try again to send the message to Rabbit.

You should probably read this article on Distributed Transactions in Spring, with and without XA , the section on transactional transactions is especially helpful for solving this problem.

They offer a more complex definition of a transaction manager. For example:

<bean id="jdbcTransactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
    <property name="dataSource" ref="dataSource"/>
</bean>

<bean id="rabbitTransactionManager" class="org.springframework.amqp.rabbit.transaction.RabbitTransactionManager">
    <property name="connectionFactory" ref="connectionFactory"/>
</bean>

<bean id="chainedTransactionManager" class="org.springframework.data.transaction.ChainedTransactionManager">
    <constructor-arg name="transactionManagers">
        <array>
            <ref bean="rabbitTransactionManager"/>
            <ref bean="jdbcTransactionManager"/>
        </array>
    </constructor-arg>
</bean>

And then, in your code, you simply use this linked transaction manager to coordinate both the transactional part of the database and your transactional part of Rabbit.

Now there is still the possibility that you will transfer part of your database, but your Rabbit transactional transaction will fail.

So, imagine something like this:

@Retry
@Transactional("chainedTransactionManager")
public void myServiceOperation() {
    if(workNotDone()) {
        doDatabaseTransactionWork();
    }
    sendMessagesToRabbit();
}

This way, if your Rabbit transactional part failed for any reason and you were forced to redo the transaction with whole chains, you would not repeat the database side effects and simply send a Rabbit error message.

At the same time, if your part of the database fails, you never sent a message to Rabbit and there were no problems.

Alternatively, if the side effects of the database are idempotent, you can skip validation, just reapply the database changes, and just retry sending the message to the rabbit.

The truth is, initially, what you are trying to do seems deceivingly easy, but once you delve deeper into the various issues and understand them, you realize that it is a difficult business to do it right.

Spring JPA: What is the cost of saveandflush and save?

More articles: