db2 locking and MVCC

I had an interesting discussion about locking in db2 a while back. It was interesting because it challenged some long held assumptions I had about db2 and how it handles locking. As usual, when I started digging deeper it turns out it is much more complicated that it would seem on the surface.

First off, some background: I was having a conversation with a colleague about locking in various DBMS's and I made the statement that DB2 doesn't support MVCC. Thus, I contended, it is not possible for someone to read the previous version of a row that I have updated while I'm in a transaction that has altered it. At this point the fellow I was talking to looked at me as if I had just grown an arm out of my forehead. He stated (correctly, it turns out) that DB2 has supported this almost forever.

I was, however, VERY confident that I was correct and subsequently dug up the documentation. Oddly enough, the documentation seemed to support the notion that I was mistaken (gasp!). Well, at this point I HAD to get to the bottom of this.

So, I fired up an amazon ec2 instance with ubuntu and db2 9 udb and started my test.

First, I created a table
create tablespace test1
create table test (id integer)
keep forgetting that the more confident I am that I'm 100% correct,


Then fired up squirrel sql with two different connections turning off autocommit.
First I seeded some data:
insert into test values (1)
insert into test values (2)
commit
On connection 1 I entered
update test set id = 12 where id = 2
and on connection two I entered
select * from test where id = 1

When I issued the second select, in my head, I should have blocked for the first insert to finish,but it came back immediately. So now I had to sit back and wonder: "Am I imagining this behavior?" When I stop to think about it, my position seems suspect not matter how you slice it.

So I reread the definitions of db2's lock levels http://www.ibm.com/developerworks/data/library/techarticle/dm-0509schuetz/ as well as some ways things can go haywire http://www.ibm.com/developerworks/data/library/techarticle/dm-0509schuetz/ and got thinking back to the situation that caused me to think this.

The key thing that I was missing is that this ONLY applies to readers, if I change the second statement to an update, it WILL block (by default). So for example, if we run THIS sequence:
On connection 1 I entered
update test set id = 12 where id = 2
and on connection two I entered
update test set id = 100 where id = 2

The second update will wait for the first one to finish. By comparison, in an MVCC database, the second one can continue to operate with the version of the row as it was when its transaction began. This is they key thing that kept confusing me. DB2 treats readers and writers as unequal partners in the database instead of putting them on equal ground. While readers typically don't block, it's the writers that will cause problems.

In honesty, it sounds like there IS a setting in newer versions of db2 to enable MVCC-like behavior, but is NOT the default. In addition, there is certainly overhead to maintaining versions of data just to keep writers running concurrently. Certainly for read intensive operations, it might not be worth the overhead.

For a nice article, take a peek here: http://www.rtcmagazine.com/articles/view/101612

Mike.

Comments

Popular posts from this blog

Push versus pull deployment models

the myth of asynchronous JDBC

Installing virtualbox guest additions on Centos 7 minimal image