PostgreSQL Table Partitioning + Hibernate
If you want to partition a PostgreSQL database used by a web application based on Hibernate following the steps described at http://www.postgresql.org/docs/9.2/static, you may get an exception when inserting a new row into a partitioned table:
ERROR [org.Hibernate.event.def.AbstractFlushingEventListener] - org.Hibernate.StaleStateException: Batch update returned unexpected row count from update ; actual row count: 0; expected: 1
A typical solution would be to tell Hibernate not to do the result check after the insert operation. For example, you could use:
@SQLInsert(sql = "INSERT INTO my_table(col_1, col_2, ...) VALUES (?,?, ...)", check=ResultCheckStyle.NONE)
But it’s fairly common to introduce table partitioning in a late phase of the project, so you may already have a big project using Hibernate over the whole application. This can make it difficult to find all the places you need to fix.
Read on to find out how to use PostgreSQL + Partitioning + Hibernate without touching the application based on Hibernate.
Hibernate’s update count=0 problem can be solved on two sides: Application and PostgreSQL. As we don’t want to change the application, we have to do it on the PostgreSQL side.
First we need to understand the basic problem:
The default trigger-based solution uses table inheritence and lets you create a trigger on the master table to delegate inserts from the master to partition tables. So what you have is:
measurement + TRIGGER measurement_y2008_m01 measurement_y2008_m02 measurement_y2008_m03 ...
In a very simplified form, the trigger delegating the inserts looks like:
CREATE OR REPLACE FUNCTION measurement_insert_trigger() RETURNS TRIGGER AS $$ BEGIN INSERT INTO measurement_y2008m01 VALUES (NEW.*); RETURN NULL; END; $$ LANGUAGE plpgsql;
From the point of view of PostgreSQL, everything is correct. The trigger inserts to the partition table instead of the master table. But for Hibernate it’s wrong, because the INSERT INTO measurement statement does not return the correct update count = 1, but 0 instead. This could be avoided by using RETURN NEW instead of RETURN NULL. But then the row returned by the trigger would also be inserted into the master table. Also, this could be solved by adding another ON DELETE trigger on the master table. But then, this would perform 2 inserts and 1 delete for every single insert, which is a mess. This solution would impede the write performance by a factor of 3.
What we really want to do is:
INSERT INTO measurement
instead of this INSERT INTO measurement_partitioned_table
RETURN update count = 1
DO ONLY ONE SINGLE INSERT
Fortunately, we can do this by using a RULE instead of a TRIGGER. Please note that RULEs are slower then TRIGGERs for one single INSERT, but it is negligible compared to the 2 inserts + 1 delete approach. See also http://www.postgresql.org/docs/9.2/static/rules-triggers.html for more information.
Back to the RULE:
CREATE RULE measurement_insert_rule AS ON INSERT DO INSTEAD INSERT INTO measurement_current VALUES (NEW.*)
This is it! It inserts into measurement_current instead of measurement table. It does only one insert. And it returns the correct update count — this is the main issue to make hibernate happy.
Now we can build the complete partitioning solution around this. As you may have noticed, a current table is used instead of month-specific variants like y2008_m1. We also cannot make it conditional like we could in the trigger.
What we have now is:
measurement + RULE measurement_current measurement_y2008_m02 measurement_y2008_m01 ...
Because all data goes to the current table, if we’re partitioning by month and the current date is 2008-04-01, we need an additional cron job (linux cron job is not a must), which would rename current to y2008_m3, create a new empty current table and move all data from April to current.
We need to move the data, because even if you do the rename at 2008-04-01 00:00:00, you may still can have data from april in the current table. After renaming and moving the data, we are safe that no data from March can be inserted to current, and no April data is left in the y2008_m3 table, so we can set the CHECKs the same way like the TRIGGER solution.
Lets look at this in a more programmatic way:
ALTER TABLE measurement_current RENAME TO measurement_y2008_m3 -- measurement_y2008_m3 can contain data from April as well
CREATE TABLE measurement_current (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS) INHERIT measurement; -- current is empty and ready for new data, all inserts go to this table
INSERT INTO measurement_current SELECT * FROM measurement_y2008_m3 WHERE logdate >= '2008-04-01'; -- moves all data from march table to current. -- Please note depending on execution time, this could become a long running operation
ALTER TABLE measurement_y2008m03 ADD CONSTRAINT y2008m03 CHECK ( logdate >= DATE '2008-03-01' AND logdate < DATE '2008-04-01' );
ALTER TABLE measurement_current ADD CONSTRAINT current CHECK ( logdate >= DATE '2008-04-01');
-- don't forget to set the CHECKs properly, this enables query planner to find the right table on SELECTs
The only difference between this solution and the TRIGGER approach is how inserts are delegated and how partition tables are created. You still have all the maintainability advantages, your queries get faster the same way and you don’t need to change your application at all.
Please note again that you are only able to insert current data and not data in the past — or in other words: all inserts always go into a single table. If you want to be able to insert any data, you will need to extend the moving data INSERT statement when creating a new partition table.
Just one more general note: only INSERTs are a problem. UPDATEs, DELETEs and SELECTs already automatically work by executing the query plan, which is influenced by the CHECK constraints to choose the right table.
Adjust it to your needs. Have fun.