在做性能压力测试,测试结果不能通过,获取现场一个小时的AWR报告,发现大量的等待事件,是RAC,版本是11.2.0.4.0。

Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap: 1607 21-10月-14 20:00:03 560 67.9 2
End Snap: 1608 21-10月-14 21:00:11 573 12.4 2
Elapsed:   60.13 (mins)      
DB Time:   2,090.75 (mins)      
Event Waits Total Wait Time (sec) Wait Avg(ms) Wait Class
rdbms ipc reply 32,876,281 44.9K 1 35.8 Other
DB CPU   21.3K   17.0  
direct path read 435,808 18.8K 43 15.0 User I/O
DFS lock handle 4,204,866 7977.9 2 6.4 Other
log file sync 8,541 252.7 30 .2 Commit

1. 排在第一的等待事件是rdbms ipc reply , 解释是The rdbms ipc reply metric event is used to wait for a reply from one of the background processes.说明lgwr,dbwr等后台进程空闲,等待前台进程给予他们的工作任务。DFS lock handle这个等待事件很可疑,官方解释是:

The session waits for the lock handle of a global lock request. The lock handle identifies a global lock. With this lock handle, other operations can be performed on this global lock (to identify the global lock in future operations such as conversions or release). The global lock is maintained by the DLM.

大致意思是无法获得global cache lock的handle时候所记录的等待事件。

2. 在网上看了下大家的处理方式,序列的cache过小,数据库服务器CPU过高,做过相应的调整和监控,都不解决问题。在做性能测试的时候,

select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) "Lock",
to_char(bitand(p1, 65536)) "Mode",
p2, p3 , seconds_in_wait
from v$session_wait
where event = 'DFS lock handle';

发现了BB锁,意思是:2PC distributed transaction branch across RAC instances DX Serializes tightly coupled distributed transaction branches。

大致意思是分布式事务两个RAC实例中across。我随即做出调整,将weblogic连接改为只是连接一个RAC节点,再进行测试。测试结果如下:

Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap: 1680 24-10月-14 12:00:13 864 9.5 2
End Snap: 1681 24-10月-14 13:00:17 863 9.9 2
Elapsed:   60.07 (mins)      
DB Time:   80.28 (mins)      
Event Waits Total Wait Time (sec) Wait Avg(ms) Wait Class
DB CPU   2335.6   48.5  
rdbms ipc reply 5,326,201 645.6 0 13.4 Other
gc buffer busy acquire 39,052 226.7 6 4.7 Cluster
DFS lock handle 672,757 225.8 0 4.7 Other
DFS lock handle减少了非常多,但还是存在,不过性能测试结果好了很多。

3. 如何彻底解决呢?先说下DFS lock handle,说简单一点就是一个object在不同的实例中DML。metalink中有关于DFS lock handle的都是bug,目前尚不清楚数据库升级后是不是会好一点。