Translate

Search This Blog

A note to the orthodox DBA

DBA communities took statspack and AWR very warmly but as it happens usually there are many features, ignored to name few:

1. SQL Monitoring
2. AWR Baseline and AWR Comparison report
3. Real Application test
4. Incident management
5. ASH still ignored by lot of people
6. Query to AWR hist tables are not used.
7. SQL Advisory on SQL Tuning Set
8. Stored Outlines (released long back)
9. SQL PLANBASELINE
10. Gathering Plan with Statistics Hint
11. Stored Execution plan in AWR (certainly not v$SQLPLAN)
12. ORION for I/O calibration before DB setup or Oracle Calibration package in 11g post DB creation.
13. UTL_MAIL  to send email from Oracle Database


Similarly there have been many features which we accept without bothering its impact , to name few :
 
-Automatic Shared Memory Management
-Automatic Memory Management






Configure Huge Page in Linux for efficient memory utilization for performance tuning Oracle Database

Traditional Virtual Memory Page Size is only 4K and as as Linux does not have intimate shared memory model unlike Solaris the size of Page Table in Linux can grow significantly too large to become major slow down factor as number of dedicated connections(or shared servers - which is rare case)  grows and SGA has large size - slowing down system by consuming system in system CPU usage to scan large Page Table. Also as we know processor has very small portion set for TLB(a portion of page memory resides in processor - rest linked) it is very important to tune Page Memory size by reducing its size if it is slowing down system-Another good reason to tune Page table is incurs heavy overhead in context switching when portion of page table in CPU is flushed. System CPU usage on Linux box can be tracked by checking sys% vmstat and grep -i pagesize /proc/meminfo. If sys% is significantly high then check the size of Page Table during your typical or peak workload hours on Linux DB server.

Linux has option of configuring Large Page size 2MB so say if Page Table size for 8GB SGA with 100 connections is 1GB  then it reduces to less than 100MB with Huge pages. RHEL 6 Linux configures transparent huge pages also but Oracle does not go well with feature and recommends you to disable transparent Huge pages. Oracle's own flavour OEL has it disabled for UEK kernel.

Do you need Huge Pages:

Check system CPU usage by vmstat or sar .It should not be high. Check Page Table size by command:

 grep -i Page /proc/meminfo It should not be high.

If system CPU usage and Page Table both are re significantly high or Page Table size is too high then go for Huge pages. You can better check the pattern of system usage of CPU by history of sar and compare it with number of concurrent sessions in that window with AWR report. 

Advantages :
  • Configuring Huge pages reduces the page table scan time and saves CPU resources
  • By configuring Huge Pages the memory freed by small page table can be assigned to Oracle SGA and PGA so more better memory utilization. 
  •  Another advantage is SGA is allocated from Huge Pages pool so it is locked in memory and is not swapped(you do not need to set LOCK_SGA=TRUE init parameter] [ make sure whole SGA fits in Huge Page poo, if some of SGA is allocated from Huge pages pool and some from normal memory then this will slow down system and it may restart the system(I obsever same in one my machine  RHEL 4.4 with less Huge pages and DB was Oracle 10.2.0.5 ) or DB won't be able to start Oracle instance ] 
  • If your SGA is small and you do not have large number of dedicated connections your benefit from configure Huge pages may be insignificant. 
NOTE OF CAUTION : 

1) From Oracle 11g onward DB alert log begins with information with memory aread from SGA is allocated where Huge page on Linux system has been configure or not. How much of the Huge Menory pool is used by Oracle SGA and how much of the SGA memory from non Huge pool is allocated to SGA ].  Oracle 11g also comes with parameter use_large_pages and you can set it to 'ONLY' so that Oracle instance does not start if system lacks in required number of Huge pages so that Oracle instance memory area is not split in two - one portion in Huge pool and another in non Huge pool which slows down oracle database. Do not tempt to use parameter PRE_PGA_SGA=true, this parameter is not good for OLTP database, as it can slow down connection setup time. I've tested it with large number of concurrent connects and found it was 25% slower. It can be more with fewer connection due large cache effect of my heavily concurrent connection setup.

2) Huge Pages pool can not be used with AMM(automatic memory management introduced from 11g) but can be used with ASMM(automatic shared memory management 10g feature). I personally do not like either of these two and I like to configure SGA and PGA depending on typical need to avoid bugs(10gr2 has plenty of Shared Pool memory leakage bugs)  and overhead of shrink and expansion of SGA components at run time.

Configuring Huge Pages: It is simple. Changes needs in OS  setting only, so transparent to Database. You just optionally want to configure parameter use_large_page in 11g

  1. To configure huge pages check output of SGA shared memory segment size shown in ipcs -m and divide it by the Huge Page size which is 2MB by default  and add 4 to this number and set this obtained number as parameter  vm.nr_Huge_Pages=X in /etc/systctl.conf.  Say your SGA is of size 8GB and ipcs is showing it 8194MB then divide it by 2 which gives 4097 now add 5 in this number(to be safe) and this 4102 will the Huge Pages required for you - this is X for you to set in mentioned parameter. This parameter can be enable dynamically by sysctl -p but it is good to restart system. Another way to set Huge pages temporarily is run echo 4012 > /proc/sys/vm/nr_hugepages but it may not free up space as Huge Pages needs contiguous free space and your system may not have this much contiguous free space so recommended is to set from sysctl.conf file followed by reboot as sysctl -p also may have issue.
  2. Now parameter oracle soft memlock Y and oracle hard memlock Y needs to be set in /etc/security/limits.conf .Here Y is size of Huge pages memory pool in KB.  So for SGA of 8192 MB for whose number of Huge Pages came to be 4102 in above calculation in 1. it would  be 2*1024*4102 = 8400896. So add parameters as below and check its effectivness by ulimit -l after log off/ login
#### snippet of file /etc/security/limits.conf #######

oracle soft memlock  8400896
oracle hard memlock  8400896

 Check Huge Pages are being used by Oracle :
Check number of Huge Pages which should not be 0 now and check for free number of Huge pages by command: grep -i Page /proc/meminfo It should not be high.

--------------You are done ------------

You will also notice the reduced Page Table Size from output of above command. If your system allows you can benchmark before and after configuring Huge pages.


Another note of caution:
If you are system is very low on memory by configuring hugepages you are always grabbing memoy's defined portion and so free space may be low if you are short of free memory. So your PGA or any other processes may system may go for swapping. Anyway SGA will never be swapped. So make sure there are enough free memory in system after Hugepages have been allocated.


I first used huge pages in 2009 on 10.2.0.4 database It was Dataware house with 30GB SGA but there were approx only 25 concurrent sessions but memory foot print of page table was much less than other DB servers with similar configuration and I was happy though I did not have any bench mark as this Prod system 24*7 was build from scratch and there was no outage.

If you are using virtualized environment and you want more free memory you can set it for small SGA and fewer connections as well. I tested it with 1.4GB SGA and 32 connection in each 3 RAC instances and it saved me memory approx 70MB in each node.

You are wondering Whether Oracle EXADATA is equiped with pre configured Huge Pages. No it does not want because different clients may have different SGA sizing need depending on DB nature OLTP/Dataware house so it does not.

When you not want Huge Pages: If SGA is small or very few dedicated connections or many instances on same host

Best Linux I/O Scheduler(elevator) for Oracle database

There is no single answer for this question but it is easy to answer it depends on your workload.  Beginning from RHEL 4 (Completely Fair Queuing) scheduler is default for all RHEL flavors replacing default anticipatory scheduler and it is good change indeed as anticipatory scheduler was good only for slower disks and it did not go very well with swapping. 

  • But Oracle Enterprise Linux has chosen deadline scheduler as default scheduler for its kernel. Deadline scheduler tends to favor read more than than write unlike CFQ scheduler which treats both read/write as equal priority. And It is considered good for heavy read intensive operations like Data ware house databases but it can go good with OLTP databases also. I would recommend to tweak its disk write timeout attribute and read/write priority to be set equal. 

  • If you are using virtual machine then NOOP(no operation) scheduler is best for you as reordering and merging of disk I/O according to disk sectors etc all is taken care by your Virtual host in most of cases. Similarly NOOP is best for SAN storage for same reasons. Another advantages is NOOP scheduler works on basic elevator algorithm FIFO and consumes least CPU.

  • If you are using Solid sate disks also then NOOP is the only option to you if you are are on versions prior to RHEL 6.[ did not get opportunity to test it ]

I tested my 3 node RAC database test workload on virtual machines with three VDI disks on three different physical disks (equal read and write and I found CFQ and NOOP schedulers) doing better than Deadline and NOOP better than CFQ. But performance of CFQ was consistent more than NOOP. { ofcourse all three RAC nodes had same schedulers ]

It is very easy to change scheduler just place elevator=cfq, or elevator= deadline or elevator=noop  or elevator=ancp in /etc/grub.conf file in kernel you want to boot with and reboot system.

#### snippet of grub.conf to  add elevator=deadline to use deadline I/O scheduler for all disks ###

title Red Hat Enterprise Linux Server (2.6.18-8.el5)
        root (hd0,0) 
        kernel /vmlinuz-2.6.18-8.el5 ro root=/dev/sda2 elevator=deadline
        initrd /initrd-2.6.18-8.el5.img
   Another option from RHEL 5 is you can change it on fly and for specific disks. COOL. so depending on different needs or disk types you can choose different scheduler for different disks. For e.g. if your Linux sever is running Apache server and databases both and both placed on different disks then you may use cfq for apache while deadline for Oracle database and noop for SSD disks.

e.g. If you want noop to be used for disk sdc then you can do this on fly as below:

 echo noop > /sys/block/sdc/queue/scheduler 

Similarly you check  value of above parameter to see what elevator is being used for specific disk device.

In nutshell you need to test system beginning with with theoretically good scheduler setting for each disks for your workload and decide which scheduler gives you best performance.

ORACLE Instance (pid = ) - Error 1578 encountered while recovering transaction (, ) on : ORA-01578: ORACLE data block corrupted (file # , block # )Block corruption after crashed database recovered

There are many cases when automatic crash recovery fails and Oracle database has to be salvaged from manual recovery [without restore and without archive log ] but DB incurrs Data block corruption(fractured blocks) which could not be fixed by just applying current or active redo logs manually. This happens due abrupt outage of all instances of databases during heavy DML, generally due power failure In below case DB had same scenario when crash recovery failed and I had to manually give recover command. I first tried sqlplus but then aborted it as it would require supply different redo log file names at recovery prompt so I switch to RMAN. See bottom of post for alert log snippet and block corruption error. You can notice ORA-01578 appearing every minute while this error is not detected unless these blocks are accessed and in this case since Oracle was trying to rollback block changes of failed (terminated session due outage) transaction and blocks is fractured ORA-01578 is popping even though no user session connected.
  
RMAN> recover database ;

Starting recover at 18-JUN-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=146 instance=RAC1 devtype=DISK

starting media recovery

archive log thread 1 sequence 874 is already on disk as file +DATA4/redo08.log
archive log thread 2 sequence 1628 is already on disk as file +DATA4/redo14.log
archive log thread 2 sequence 1629 is already on disk as file +DATA4/redo11.log
archive log thread 2 sequence 1630 is already on disk as file +DATA4/redo12.log
archive log thread 3 sequence 368 is already on disk as file +DATA4/redo18.log
archive log thread 3 sequence 369 is already on disk as file +DATA4/redo15.log
archive log thread 3 sequence 370 is already on disk as file +DATA4/redo16.log
archive log filename=+DATA4/redo14.log thread=2 sequence=1628
archive log filename=+DATA4/redo08.log thread=1 sequence=874
archive log filename=+DATA4/redo18.log thread=3 sequence=368
archive log filename=+DATA4/redo15.log thread=3 sequence=369
archive log filename=+DATA4/redo11.log thread=2 sequence=1629
archive log filename=+DATA4/redo16.log thread=3 sequence=370
archive log filename=+DATA4/redo12.log thread=2 sequence=1630
media recovery complete, elapsed time: 00:03:14
Finished recover at 18-JUN-14


RMAN> open resetlogs database ;

database opened

RMAN> exit

[oracle@node1 bdump]$ tail -200f alert_RAC1.log
Wed Jun 18 01:52:17 IST 2014
ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo13.log'
Wed Jun 18 01:52:17 IST 2014
Media Recovery Log +DATA4/redo13.log
Wed Jun 18 01:53:45 IST 2014
Hex dump of (file 7, block 65672) in trace file /u01/app/oracle/admin/RAC/udump/rac1_ora_21662.trc
Corrupt block relative dba: 0x01c10088 (file 7, block 65672)
Fractured block found during media recovery
Data in bad block:
 type: 6 format: 2 rdba: 0x01c10088
 last change scn: 0x0000.01aea031 seq: 0x1 flg: 0x04
 spare1: 0x0 spare2: 0x0 spare3: 0x0
 consistency value in tail: 0x9c190601
 check value in block header: 0x30e7
 computed block checksum: 0x3c28
Reread of rdba: 0x01c10088 (file 7, block 65672) found same corrupted data
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo13.log'  ...
Wed Jun 18 01:53:52 IST 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Wed Jun 18 01:53:52 IST 2014
Media Recovery Log /u01/app/oracle/product/10.2.0/db_1/dbs/arch2_1628_850308178.dbf
Errors with log /u01/app/oracle/product/10.2.0/db_1/dbs/arch2_1628_850308178.dbf
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Wed Jun 18 01:53:52 IST 2014
ALTER DATABASE RECOVER CANCEL
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Wed Jun 18 01:53:57 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA3 was dismounted
SUCCESS: diskgroup DATA4 was dismounted
Wed Jun 18 01:54:22 IST 2014
ALTER DATABASE RECOVER  database until cancel using backup controlfile
Wed Jun 18 01:54:22 IST 2014
Media Recovery Start
Wed Jun 18 01:54:23 IST 2014
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: diskgroup DATA3 was mounted
Wed Jun 18 01:54:23 IST 2014
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 8 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel using backup controlfile   ...
Wed Jun 18 01:54:36 IST 2014
ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo14.log'
Wed Jun 18 01:54:36 IST 2014
Media Recovery Log +DATA4/redo14.log
SUCCESS: diskgroup DATA4 was mounted
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo14.log'  ...
Wed Jun 18 01:54:44 IST 2014
ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo08.log'
Wed Jun 18 01:54:44 IST 2014
Media Recovery Log +DATA4/redo08.log
ORA-279 signalled during: ALTER DATABASE RECOVER    LOGFILE '+DATA4/redo08.log'  ...
Wed Jun 18 02:00:26 IST 2014
ALTER DATABASE RECOVER    CONTINUE DEFAULT
Wed Jun 18 02:00:26 IST 2014
Media Recovery Log /u01/app/oracle/product/10.2.0/db_1/dbs/arch3_368_850308178.dbf
Errors with log /u01/app/oracle/product/10.2.0/db_1/dbs/arch3_368_850308178.dbf
ORA-308 signalled during: ALTER DATABASE RECOVER    CONTINUE DEFAULT  ...
Wed Jun 18 02:00:26 IST 2014
ALTER DATABASE RECOVER CANCEL
ORA-1547 signalled during: ALTER DATABASE RECOVER CANCEL ...
Wed Jun 18 02:00:35 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA3 was dismounted
SUCCESS: diskgroup DATA4 was dismounted
Wed Jun 18 02:00:41 IST 2014
ALTER DATABASE RECOVER  database until cancel using backup controlfile
Wed Jun 18 02:00:41 IST 2014
Media Recovery Start
Wed Jun 18 02:00:41 IST 2014
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: diskgroup DATA3 was mounted
Wed Jun 18 02:00:41 IST 2014
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 8 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: ALTER DATABASE RECOVER  database until cancel using backup controlfile   ...
Wed Jun 18 02:00:57 IST 2014
ALTER DATABASE RECOVER    CANCEL
Wed Jun 18 02:00:57 IST 2014
ORA-1547 signalled during: ALTER DATABASE RECOVER    CANCEL  ...
Wed Jun 18 02:00:57 IST 2014
ALTER DATABASE RECOVER CANCEL
ORA-1112 signalled during: ALTER DATABASE RECOVER CANCEL ...
Wed Jun 18 02:01:02 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA3 was dismounted
Wed Jun 18 02:01:13 IST 2014
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: diskgroup DATA3 was mounted
Wed Jun 18 02:01:14 IST 2014
SUCCESS: diskgroup DATA4 was mounted
Wed Jun 18 02:01:14 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA3 was dismounted
Wed Jun 18 02:01:14 IST 2014
SUCCESS: diskgroup DATA2 was mounted
Wed Jun 18 02:01:14 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA4 was dismounted
Wed Jun 18 02:01:14 IST 2014
SUCCESS: diskgroup DATA3 was mounted
SUCCESS: diskgroup DATA2 was mounted
Wed Jun 18 02:01:15 IST 2014
alter database recover datafile list clear
Wed Jun 18 02:01:15 IST 2014
Completed: alter database recover datafile list clear
Wed Jun 18 02:01:15 IST 2014
alter database recover datafile list
 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8
Completed: alter database recover datafile list
 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8
Wed Jun 18 02:01:15 IST 2014
alter database recover if needed
 start until cancel using backup controlfile
Media Recovery Start
WARNING! Recovering data file 1 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 2 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 3 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 4 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 5 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 6 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 7 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
WARNING! Recovering data file 8 from a fuzzy file. If not the current file
it might be an online backup taken without entering the begin backup command.
ORA-279 signalled during: alter database recover if needed
 start until cancel using backup controlfile
...
SUCCESS: diskgroup DATA4 was mounted
Wed Jun 18 02:01:15 IST 2014
alter database recover logfile '+DATA4/redo14.log'
Wed Jun 18 02:01:15 IST 2014
Media Recovery Log +DATA4/redo14.log
ORA-279 signalled during: alter database recover logfile '+DATA4/redo14.log'...
Wed Jun 18 02:01:15 IST 2014
alter database recover logfile '+DATA4/redo08.log'
Wed Jun 18 02:01:15 IST 2014
Media Recovery Log +DATA4/redo08.log
ORA-279 signalled during: alter database recover logfile '+DATA4/redo08.log'...
Wed Jun 18 02:01:16 IST 2014
alter database recover logfile '+DATA4/redo18.log'
Wed Jun 18 02:01:16 IST 2014
Media Recovery Log +DATA4/redo18.log
Wed Jun 18 02:03:19 IST 2014
ORA-279 signalled during: alter database recover logfile '+DATA4/redo18.log'...
Wed Jun 18 02:03:19 IST 2014
alter database recover logfile '+DATA4/redo15.log'
Wed Jun 18 02:03:19 IST 2014
Media Recovery Log +DATA4/redo15.log
ORA-279 signalled during: alter database recover logfile '+DATA4/redo15.log'...
Wed Jun 18 02:03:27 IST 2014
alter database recover logfile '+DATA4/redo11.log'
Wed Jun 18 02:03:27 IST 2014
Media Recovery Log +DATA4/redo11.log
Wed Jun 18 02:03:51 IST 2014
ORA-279 signalled during: alter database recover logfile '+DATA4/redo11.log'...
Wed Jun 18 02:03:51 IST 2014
alter database recover logfile '+DATA4/redo16.log'
Wed Jun 18 02:03:51 IST 2014
Media Recovery Log +DATA4/redo16.log
Wed Jun 18 02:04:03 IST 2014
ORA-279 signalled during: alter database recover logfile '+DATA4/redo16.log'...
Wed Jun 18 02:04:03 IST 2014
alter database recover logfile '+DATA4/redo12.log'
Wed Jun 18 02:04:03 IST 2014
Media Recovery Log +DATA4/redo12.log
Wed Jun 18 02:04:29 IST 2014
Incomplete recovery applied all redo ever generated.
Recovery completed through change 28235622
Wed Jun 18 02:04:29 IST 2014
Media Recovery Complete (RAC1)
Completed: alter database recover logfile '+DATA4/redo12.log'
Wed Jun 18 02:04:34 IST 2014
SUCCESS: diskgroup DATA3 was dismounted
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA4 was dismounted
Wed Jun 18 02:04:38 IST 2014
alter database open
Wed Jun 18 02:04:38 IST 2014
This instance was first to open
Wed Jun 18 02:04:38 IST 2014
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: diskgroup DATA3 was mounted
Wed Jun 18 02:04:38 IST 2014
SUCCESS: diskgroup DATA2 was dismounted
SUCCESS: diskgroup DATA3 was dismounted
Wed Jun 18 02:04:38 IST 2014
ORA-1589 signalled during: alter database open...
Wed Jun 18 02:04:46 IST 2014
alter database open resetlogs
Wed Jun 18 02:04:46 IST 2014
SUCCESS: diskgroup DATA2 was mounted
SUCCESS: diskgroup DATA3 was mounted
Wed Jun 18 02:04:46 IST 2014
RESETLOGS after complete recovery through change 28235622
Resetting resetlogs activation ID 2438300125 (0x915581dd)
SUCCESS: diskgroup DATA4 was mounted
Wed Jun 18 02:05:13 IST 2014
Setting recovery target incarnation to 3
Wed Jun 18 02:05:13 IST 2014
This instance was first to open
Wed Jun 18 02:05:13 IST 2014
SUCCESS: diskgroup DATA4 was dismounted
Wed Jun 18 02:05:13 IST 2014
Picked broadcast on commit scheme to generate SCNs
Wed Jun 18 02:05:13 IST 2014
Assigning activation ID 2438538385 (0x91592491)
SUCCESS: diskgroup DATA4 was mounted
Thread 1 opened at log sequence 1
  Current log# 7 seq# 1 mem# 0: +DATA4/redo07.log
Successful open of redo thread 1
Wed Jun 18 02:05:14 IST 2014
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Jun 18 02:05:14 IST 2014
SMON: enabling cache recovery
Wed Jun 18 02:05:16 IST 2014
Successfully onlined Undo Tablespace 1.
Dictionary check beginning
Dictionary check complete
Wed Jun 18 02:05:16 IST 2014
SMON: enabling tx recovery
Wed Jun 18 02:05:16 IST 2014
Database Characterset is WE8ISO8859P1
Wed Jun 18 02:05:16 IST 2014
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Wed Jun 18 02:05:17 IST 2014
Opening with internal Resource Manager plan
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=27, OS id=30068
Wed Jun 18 02:05:21 IST 2014
LOGSTDBY: Validating controlfile with logical metadata
Wed Jun 18 02:05:22 IST 2014
LOGSTDBY: Validation complete
Completed: alter database open resetlogs
Wed Jun 18 02:05:37 IST 2014
SMON: Restarting fast_start parallel rollback
SMON: ignoring slave err,downgrading to serial rollback
ORACLE Instance RAC1 (pid = 12) - Error 1578 encountered while recovering transaction (13, 35) on object 58984.
Wed Jun 18 02:05:37 IST 2014
Errors in file /u01/app/oracle/admin/RAC/bdump/rac1_smon_21583.trc:
ORA-01578: ORACLE data block corrupted (file # 7, block # 65672)
ORA-01110: data file 7: '+DATA/data01.dbf'
Wed Jun 18 02:05:38 IST 2014
Trace dumping is performing id=[cdmp_20140618020538]
Wed Jun 18 02:05:41 IST 2014
ORACLE Instance RAC1 (pid = 12) - Error 1578 encountered while recovering transaction (13, 35) on object 58984.
Wed Jun 18 02:05:41 IST 2014
Errors in file /u01/app/oracle/admin/RAC/bdump/rac1_smon_21583.trc:
ORA-01578: ORACLE data block corrupted (file # 7, block # 65672)
ORA-01110: data file 7: '+DATA/data01.dbf'
Wed Jun 18 02:05:42 IST 2014
Trace dumping is performing id=[cdmp_20140618020542]

In my case as DB was in norachive log mode and it was test database I dropped entire tablespace containing this corrupted block. Another option would be to drop this segment only if corrupted block is not segment header block

ORA-00333: redo log read error block 9167 count 1073

If database is not able to start after failed crash automatic recovery with error you receive ORA-00333: redo log read error block xxxx count xxxx then you should try manually recover the database. 

[oracle@11gRAC1 11g]$ srvctl start database -d ORCL
PRCR-1079 : Failed to start resource ora.orcl.db
CRS-5017: The resource action "ora.orcl.db start" encountered the following erro
ORA-00333: redo log read error block 9167 count 8184
. For details refer to "(:CLSN00107:)" in "/u01/11.2.0/grid/log/11grac1/agent/cr

CRS-2674: Start of 'ora.orcl.db' on '11grac1' failed
CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on th
CRS-5017: The resource action "ora.orcl.db start" encountered the following erro
ORA-00333: redo log read error block 9167 count 8184
. For details refer to "(:CLSN00107:)" in "/u01/11.2.0/grid/log/11grac2/agent/cr

CRS-2674: Start of 'ora.orcl.db' on '11grac2' failed
[oracle@11gRAC1 11g]$ sqlplus / as sysdba
SQL> startup mount
ORACLE instance started.

Total System Global Area  534519808 bytes
Fixed Size                  2229944 bytes
Variable Size             314575176 bytes
Database Buffers          209715200 bytes
Redo Buffers                7999488 bytes
Database mounted.
SQL> select * from v$log ;

    GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARC STATUS           FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ ---------
         1          1        195  241172480        512          1 NO  CURRENT               64028116 18-MAY-14   2.8147E+14
         2          1        194  241172480        512          1 NO  INACTIVE              63987815 16-MAY-14     64028116 18-MAY-14
         3          2        151  241172480        512          1 NO  CURRENT               64007840 16-MAY-14   2.8147E+14
         4          2        150  241172480        512          1 NO  INACTIVE              63941096 16-MAY-14     64007840 16-MAY-14

         SQL> select * from v$logfile;

    GROUP# STATUS  TYPE    MEMBER                                             IS_
---------- ------- ------- -------------------------------------------------- ---
         1         ONLINE  +DATA2/orcl/redo01.log                             NO
         2         ONLINE  +DATA2/orcl/redo02.log                             NO
         3         ONLINE  +DATA2/orcl/redo03.log                             NO
         4         ONLINE  +DATA2/orcl/redo04.log                             NO

SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-00333: redo log read error block 8192 count 2048
SQL> select name,state from v$asm_diskgroup ;

NAME                           STATE
------------------------------ -----------
DATA2                          CONNECTED
DATA3                          CONNECTED
DATA                           CONNECTED

SQL> select FILE#,STATUS,ERROR,RECOVER,FUZZY,CHECKPOINT_CHANGE# from v$datafile_header;

Disk group DATA2 ON which redo log resides is mounted and DB has successfully connected to it

     FILE# STATUS  ERROR                                                             REC FUZ CHECKPOINT_CHANGE#
---------- ------- ----------------------------------------------------------------- --- --- ------------------
         1 ONLINE                                                                    NO  YES           63941096
         2 ONLINE                                                                    NO  YES           63941096
         3 ONLINE                                                                    NO  YES           63941096
         4 ONLINE                                                                    NO  YES           63941096
         5 ONLINE                                                                    NO  YES           63941096
         6 ONLINE                                                                    NO  YES           63941096
         7 ONLINE                                                                    NO  YES           63941096

7 rows selected.


 #rebooted  crs but database did not start and both its instances terminated after giving error

SQL> startup mount
ORACLE instance started.

Total System Global Area  534519808 bytes
Fixed Size                  2229944 bytes
Variable Size             314575176 bytes
Database Buffers          209715200 bytes
Redo Buffers                7999488 bytes
Database mounted.
SQL> recover database ;
ORA-00283: recovery session canceled due to errors
ORA-00333: redo log read error block 9167 count 1073

This time block number is different and so is count

SQL> shutdown immediate
startup ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> ORACLE instance started.

Total System Global Area  534519808 bytes
Fixed Size                  2229944 bytes
Variable Size             314575176 bytes
Database Buffers          209715200 bytes
Redo Buffers                7999488 bytes
Database mounted.
SQL> recover database until cancel using backup controlfile ;ORA-00279: change 64028117 generated at 05/18/2014 11:34:34 needed for thread 1
ORA-00289: suggestion : /u01/app/oracle/product/11.2.0/db_1/dbs/arch1_195_847218868.dbf
ORA-00280: change 64028117 for thread 1 is in sequence #195

Specify log: {=suggested | filename | AUTO | CANCEL}
+DATA2/orcl/redo01.log
ORA-00279: change 64028117 generated at 05/16/2014 13:06:28 needed for thread 2
ORA-00289: suggestion : /u01/app/oracle/product/11.2.0/db_1/dbs/arch2_151_847218868.dbf
ORA-00280: change 64028117 for thread 2 is in sequence #151

Specify log: {=suggested | filename | AUTO | CANCEL}
+DATA2/orcl/redo03.log
Log applied.
Media recovery complete.




SQL> select distinct FUZZY from v$datafile_header;

FUZ
----
NO

SQL> alter database open resetlogs ;
Database altered.


[ DB is opened and you are done. You may have some corruption in undo tablespaces , data files or index tablespaces- always check alert log even after crash recovery or instance recovery but DB is opened and you can tackle other error errors with available backup or  dropping/recreating undo [if it does not have active and if undo tablespace has active SMU then by marking them to be dropped by hidden parameter, post about this later on ]
-------------------------------------------------------------------------------------------------------
 
####### From alert log

Incomplete read from log member '+DATA2/orcl/redo03.log'. Trying next member.
Abort recovery for domain 0
Aborting crash recovery due to error 333
Errors in file /u01/app/oracle/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_5083.trc:
ORA-00333: redo log read error block 9167 count 8184
Abort recovery for domain 0
Errors in file /u01/app/oracle/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_5083.trc:
ORA-00333: redo log read error block 9167 count 8184
ORA-333 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:22317:292} */...
Mon May 19 11:32:05 2014
Shutting down instance (abort)

snippet of trace file : /u01/app/oracle/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_5083.trc

Successfully allocated 3 recovery slaves
Using 45 overflow buffers per recovery slave
Thread 2 checkpoint: logseq 151, block 2, scn 64033559
  cache-low rba: logseq 151, block 9159
    on-disk rba: logseq 151, block 9190, scn 64078136
  start recovery at logseq 151, block 9159, scn 0
Thread 1 checkpoint: logseq 195, block 2, scn 64028117
  cache-low rba: logseq 195, block 75202
    on-disk rba: logseq 195, block 75256, scn 64077704
  start recovery at logseq 195, block 75202, scn 0
Incomplete read from log member '+DATA2/orcl/redo03.log'. Trying next member.
2014-05-19 11:32:05.362112 : Abort recovery for domain 0, flags = 0x4
2014-05-19 11:32:05.362192 : kjb_abort_recovery: abort recovery for domain 0 @ inc 3 (0)
2014-05-19 11:32:05.362209 : kjb_abort_recovery: domain flags=0x0, valid=0
ORA-00333: redo log read error block 9167 count 8184
2014-05-19 11:32:05.391695 : Abort recovery for domain 0, flags = 0x4
2014-05-19 11:32:05.391717 : aborting recovery of 0 (0) with cluster inc 3 (0) recovery 1
2014-05-19 11:32:05.391730 : kjb_abort_recovery: domain flags=0x0, valid=0
ORA-00333: redo log read error block 9167 count 8184


----------------------------

Add processor/core in physical/ virtual Linux machine

This look very trivial though it is simple but sure not trivial for non Linux expert so here I'm with my experience. If your Linux machine had only single CPU(or CPU with single Core) or you assigned it only 1 CPU and 1 core if it is virtual machine while installing Linux then installer installs NON SMP kernel so if you want too add new processor or add core in virtual machine then you must install SMP KERNEL and configure grub boot loader to use SMP.

1) add processor
2) install kernel-smp rpm
3) It will make grub appear as below you need to change highlighted as "replace 'nosmp' by 'smp' and 'noapic' by 'apic'  for line .ELsmp in below /etc/grub.conf "
4) optionally you can change default 1 to default 0 so that Linux boots with kernel-smp by default

[root@node2 ~]# uname -a
Linux node2.localdomain 2.6.9-55.0.0.0.2.ELsmp #1 SMP Wed May 2 14:59:56 PDT 2007 i686 i686 i386 GNU/Linux
[root@node2 ~]# cat /proc/cpuinfo
processor       : 0


[root@node2 ~]# cat /etc/grub.conf
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/sda5
#          initrd /initrd-version.img
#boot=/dev/sda
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Enterprise Linux Enterprise Linux AS (2.6.9-55.0.0.0.2.ELsmp)
        root (hd0,0)
        kernel /vmlinuz-2.6.9-55.0.0.0.2.ELsmp ro root=LABEL=/ rhgb quiet divider=10 clock=pit nosmp noapic nolapic
        initrd /initrd-2.6.9-55.0.0.0.2.ELsmp.img
title Enterprise (2.6.9-55.0.0.0.2.EL)
        root (hd0,0)
        kernel /vmlinuz-2.6.9-55.0.0.0.2.EL ro root=LABEL=/ rhgb quiet divider=10 clock=pit nosmp noapic nolapic
        initrd /initrd-2.6.9-55.0.0.0.2.EL.img
[root@node2 ~]# vi /etc/grub.conf
[root@node2 ~]# init 6

[root@node2 ~]#
login as: root
root@192.168.174.103's password:
Last login: Wed Jun  4 10:48:07 2014 from 192.168.174.1
[root@node2 ~]#  cat /proc/cpuinfo
processor       : 0
..

processor       : 1
..
processor       : 2
..
processor       : 3
..

[root@node2 ~]# uname -a
Linux node2.localdomain 2.6.9-55.0.0.0.2.ELsmp #1 SMP Wed May 2 14:59:56 PDT 2007 i686 i686 i386 GNU/Linux

ORA-01172: recovery of thread 2 stuck at block 224 of file 4

Error ORA-01172 should not come in ideal database It comes when database crashes(all RAC instances crashes) and automatic database crash recovery fails. Oracle first tries to recover with parallels slaves [ as default parallel recovery option is low, if it also fails then it tries serial recovery and it it also fails then gives below message. Instance is not terminated at the end of failed recovery. ]Remedy is to apply recover command manually so that oracle can apply ACTIVE and CURRENT redo logs of all threads. In below case simple recover command did job. [it did not prompt for any redo log file to apply] If you are unlucky (your database is not in archivelog mode and you do not have older but recent archive logs) then recover database or recover database until cancel command won't help you as oracle falsely wants older log sequence to be applied which is not visible in v$log.  Below is lucky one. Unlucky case will be posted sooner. 


SQL> alter database open ;
alter database open
*
ERROR at line 1:
ORA-01172: recovery of thread 2 stuck at block 224 of file 4
ORA-01151: use media recovery to recover block, restore backup if needed


SQL>  select * from v$recover_file;

no rows selected




SQL> recover database ;
Media recovery complete.
SQL> alter database open ;

Database altered.


From alert log , Thu May 29 21:44:40 2014
Hex dump of (file 4, block 224) in trace file /home/oracle/trace/diag/rdbms/orcl/ORCL1/trace/ORCL1_p002_4479.trc
Reading datafile '+DATA/orcl/datafile/undotbs2.274.847998769' for corruption at rdba: 0x010000e0 (file 4, block 224)
Reread (file 4, block 224) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 2 STUCK AT BLOCK 224 OF FILE 4
Slave exiting with ORA-1172 exception
Errors in file /home/oracle/trace/diag/rdbms/orcl/ORCL1/trace/ORCL1_p002_4479.trc:
ORA-01172: recovery of thread 2 stuck at block 224 of file 4
ORA-01151: use media recovery to recover block, restore backup if needed
Thu May 29 21:44:49 2014
Abort recovery for domain 0
Aborting crash recovery due to slave death, attempting serial crash recovery
Beginning crash recovery of 2 threads
Started redo scan
kcrfr_rnenq: use log nab 201057
kcrfr_rnenq: use log nab 43466
Completed redo scan
 read 351699 KB redo, 13118 data blocks need recovery
Thu May 29 21:45:07 2014
Started redo application at
 Thread 1: logseq 189, block 23667
 Thread 2: logseq 102, block 208354
Recovery of Online Redo Log: Thread 1 Group 5 Seq 189 Reading mem 0
  Mem# 0: +DATA2/orcl/redo05.log
Recovery of Online Redo Log: Thread 2 Group 8 Seq 102 Reading mem 0
  Mem# 0: +DATA2/orcl/redo08.log
Recovery of Online Redo Log: Thread 2 Group 7 Seq 103 Reading mem 0
  Mem# 0: +DATA2/orcl/redo07.log
Hex dump of (file 4, block 224) in trace file /home/oracle/trace/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_4415.trc
Reading datafile '+DATA/orcl/datafile/undotbs2.274.847998769' for corruption at rdba: 0x010000e0 (file 4, block 224)
Reread (file 4, block 224) found same corrupt data (logically corrupt)
RECOVERY OF THREAD 2 STUCK AT BLOCK 224 OF FILE 4
Abort recovery for domain 0
Aborting crash recovery due to error 1172
Errors in file /home/oracle/trace/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_4415.trc:
ORA-01172: recovery of thread 2 stuck at block 224 of file 4
ORA-01151: use media recovery to recover block, restore backup if needed
Abort recovery for domain 0
Errors in file /home/oracle/trace/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_4415.trc:
ORA-01172: recovery of thread 2 stuck at block 224 of file 4
ORA-01151: use media recovery to recover block, restore backup if needed
ORA-1172 signalled during: ALTER DATABASE OPEN /* db agent *//* {1:56922:2} */... ]
----------------end alert log snippet ---------



-----------------------------------
[oracle@11gRAC1 trace]$ rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Thu May 29 21:51:38 2014

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1375779469, not open)

RMAN> backup check logical validate database ;

Starting backup at 29-MAY-14
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=135 instance=ORCL1 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=196 instance=ORCL1 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=+DATA/orcl/data01.dbf
input datafile file number=00002 name=+DATA/orcl/datafile/sysaux.270.847998745
input datafile file number=00005 name=+DATA/orcl/datafile/users.263.847998771
input datafile file number=00004 name=+DATA/orcl/datafile/undotbs2.274.847998769
channel ORA_DISK_2: starting full datafile backup set
channel ORA_DISK_2: specifying datafile(s) in backup set
input datafile file number=00006 name=+DATA3/orcl/indexes01.dbf
input datafile file number=00001 name=+DATA/orcl/datafile/system.269.847998737
input datafile file number=00003 name=+DATA/orcl/datafile/undotbs1.261.847998757
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:34
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              12879        76876           30261376
  File Name: +DATA/orcl/datafile/sysaux.270.847998745
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              21416
  Index      0              17867
  Other      0              24638

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              1            38401           30253503
  File Name: +DATA/orcl/datafile/undotbs2.274.847998769
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              38399

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              1            641             19721
  File Name: +DATA/orcl/datafile/users.263.847998771
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              639

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7    OK     0              1            131072          30266334
  File Name: +DATA/orcl/data01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              97510
  Index      0              31833
  Other      0              1728

channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
Control File OK     0              1128
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current SPFILE in backup set
channel ORA_DISK_2: backup set complete, elapsed time: 00:00:22
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              12551        89606           30266212
  File Name: +DATA/orcl/datafile/system.269.847998737
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              47836
  Index      0              9749
  Other      0              19464

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              1            48641           30248852
  File Name: +DATA/orcl/datafile/undotbs1.261.847998757
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              48639

File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
6    OK     0              438          98561           30265514
  File Name: +DATA3/orcl/indexes01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              89706
  Other      0              8416

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:00
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Finished backup at 29-MAY-14

-------------------------------------------------

Which file system(OCFS) or RAW device vs ASM to choose for shared disk in RAC

Depending on manageability and performance trade off this is a very important question: My view based on my practical experience is below.
  • OCFS is the best for manageability as you have transparent and visible file system like any other file system NTFS or ext3/4 etc on Linux. However OCFS has bugs and under very heavy I/O load its performance deteriorates. Another reason to go for OCFS is you can have shared Oracle Home so patching Oracle home is quicker. In my first experience of Oracle RAC which was 9iR2 RAC on Production on Windows  2K we had OCFS file system and there were few bugs which due OCFSTOOL would give false warnings. And in one case OCFS was not freeing up space after deleting files on file system. Later on I used OCFS 2 on Linux for test purposes and I did not get same issues but it was very limited test. Oracle 11.2.0.1 had issue in directly accessing OCFS file system so it has to be mappped by NFS. OCFS has matured a lot but Oracle seems is not keen to promote it now.
  • I used pure RAW devices on single instance database in test environment and its performance was slightly better than on file system but it wa diifficult to manage. Once, in early of my career I got opportunity to recover a Oracle 8i Production OPS (previous name for RAC) Database  and only option to used shared device was raw only. It was my first attempt of any recovery on RAC database and I was not practically familiar with RAW devices by that time but Oracle documentation was good enough for me to partition new replaced disk and tell how to copy on raw devices the files from backup  

  • ASM(using RAW or ASMLIB or  UDEV) :  ASM can be said to lie between mid of the raw file system and file system. Performance benefit slightly reuced from RAW file system,better than file system,  but better manageability as advantage in its comparison. ASM can be used on top of raw devices services or by using ASMLIB(2 rpms provided by Oracle corp and  1 from OS kernel vendor say RHEL). From 11gR2 ASM using RAW devices is supported only with the systems upgraded from 11g. My preferred way from performance point of view is to use ASM is ASM with UDEV rule as it does not involve extra layer of ASMLIB kernel on top of kernel.Only advantage of ASMLIB is see is easy to maintain shared disks. It takes some time to be familiar with UDEV rule and it is not uniform across all shared disks type but I do not think you often add/remove disks. You will be surprised to know that even though Oracle recommends using ASMLIB for better performance but Oracle itself is not using it in Oracle RAC using Oracle VM templates. If you are using RHEL 6 and not OEL 6 then there is compelling reason that you must subscribe to RHLN to get ASL kernel driver else your only option to use is to use UDEV rule

After recovering Oracle 8i OPS for STP Noida [ night long task on voluntarily SOS call by my friend, so took no money :)  ] - year 2006



See my other post with label ASM for configuring UDEV.

tag: OCFS vs ASM, ASMLIV vs UDEV





BACKUP , RECOVERY and archive log handling in RAC

Here are few basic points to consider about safe backup and recovery in RAC environment :

  • If DB running in archive log mode then archive logs should be placed on a single location(say single ASM directory) in shared disk so that if you ever needs to recover then you do not need to copy all archive logs to a single location or supply archive log file names manually for each log sequence#. Configuring Flash recovery area is so good option for RAC databases.
  • If you have less storage available or you want faster archiving only then you should consider archiving logs to local disks of nodes.                                                                                           
  • In case all instances crashes(power failure in data centre) then you should bring all nodes one by one. As this way it is not only cluster ware will have safe recovery but also  Database crash recovery will also be safe. 
  • Configure RMAN control file auto back-up feature also to shared disks (say OCFS file system,if applicable) as this will allow control file to be restored from any node and hence so recovery also.
  • As RAC instances has more shared pool available than single instance you can increase large pool size to increase RMAN buffers for faster backup so you may multiplex more files in backupset and parallelize more if enough CPUs available but remember higher multiplexing can potentially slow down restore.

  • To benefit from more CPUs available in RAC than single instance database backup channels can be opened across all RAC nodes this when combines with benefits of compressed backup sets speeds up backup   

ORA-00314: log 3274 of thread 1, expected sequence# 3276 doesn't match 3276


I had power failure while there was heavy DMLs running on my database and after
restating machine I my database(running in noarchive log mode) did not start due 
failed automatic instance recovery due error:  "ORA-00314: log %s of thread %s, 
expected sequence# %s doesn't match %s". 
 
Strange part was lower log sequence number mentioned in error message was already 
applied fully to database and it was even not appearing in v$log as inactive,it was 
so old though there was another log group active so it means recovery was vital
 
So,I tried to recover using first broader recovery command preferred to start with as safe practice "recover
database" but it did not succeed then I tried same using RMAN and then using backup 
control file syntax. Recovery prompted for couple of log sequences which I supplied 
after joing and v$ and v$logfile for corresponding thread# and sequence# but recovery 
terminated saying "datafile 1 may need recovery" It just mean I was unlucky as in most
of cases this error is solved by recover database until cancel using backup controlfile
. Then I tried with recreating controlfile and then recovery and then I tried with 
clearing logfiles but it also did not give me any luck. Finally I had to restore from
cold backup. Luckily it was just a test database. 

 
 
oerr reasons: 
 
ORA-00314: log %s of thread %s, expected sequence# %s doesn't match %s
 *Cause:  The online log is corrupted or is an old version.
 *Action: Find and install correct version of log or reset logs.

ORA-00443: background process "LCK0" did not start

my 12cRAC Clusterware(grid infra) services stack did not start. When I checked I found CSS was online(it means shared disk was ok) but there was issue in starting CRSD. It did not start as its prerequiite ASM instance service did not start. So I tried to started manually by crsctl command and manually by sqlplus but I got error   ORA-00443: background process "LCK0" did not start

I looked into trace of lck01 and it was not informative but trace of last user process which terminated the instance had hint shared memory issue. I remember I did not decrease ASM instance size and machine had 1.7GB memory so wondered what  could be the issue. I tried to increase tmpfs from 700M (default) to 1Gb but it did not help. At t second thought it came that it was 12c and may require more memory. I had reduced memory allocated to this VM from 3G to 1.7GB in order to troubleshooting too long wait fpr putty login(due call to NAT network parameter in resolv.conf) and another issue freezing of machine due too slow ATA disk time wait in order of 60s. I increased memory allocated to VM and then ASM service
started fine.
    ------------------------------------ error stack----------------------------------

CRS-2672: Attempting to start 'ora.asm' on '12crac1'
CRS-5017: The resource action "ora.asm start" encountered the following error:
ORA-00443: background process "LCK0" did not start
. For details refer to "(:CLSN00107:)" in "/u01/12.1.0/grid/log/12crac1/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on '12crac1' failed

Note : of course error :ORA-00443 can come for n reasons and you may have different reason than me.

Performance Tuning Linux Virtual machine on VMWARE, Oracle Virtual box and Oracle VM for performance testing

Performance testing is generally not suggested on Virtualized machine. But if we  allocate resources properly, create test cases properly and evaluate performance accordingly then it is fairly good to go for virtualized machines. Since virtualization is slow and we may not always have best HW(cost matters) we may need to  tune the virtual machines and their operating systems very much.

If you want to build up virtual performance testing setup here are the check lists and steps for Oracle Virtual box, VMware and Oracle VM tuning so that you get maximum of available Hardware and software. Most of the suggestions I've tested practically.

Check List : 

1. If using Hardware virtualized machine HWM or HWM with paravirtualized drivers then

A) Does your processor supports HW virtualization of (VT-X feature, most do) and your CPU has physical virtualization enabled.

B) Does your processor supports HW virtualizated I/O  (VT-D feature),quite new HW feature(improves I/O tremendously) and your CPU has enabled it.

C) Does your processor supports nested paged table(most do) and your Hyper-visor is using it.

D) Have you installed the necessary add on services like VBOX add on or VMWARE tools.

E) Are you using paravirtualized Ethernet cards like VMXNET on vmware or Paravirtualized network- vitrio-net instead of Intel Pro in case of Oracle virtual box.

F) Have you checked which I/O scheduler is best for you. Most of the hyper-visors like VMWARE ESX-i takes care of disk I/O submission so NOOP gives best performance with less CPU overhead.

G) Make sure you are not over committing memory or processors than available, which is allowed by some virtualization like VMWARE.

2) A) If using pure paravirtualized machine then is your kernel uptodate.

B) If required have you disabled the HW virtualization supported by processor in BIOS.

C) Are you monitoring virtual machine from metal-ware i.e. Dom0 using specialized commands xentop etc in Oracle VM server.

3) If using Inetl i7 then disable hyper threading as hyper threads are actual not core physical Arithmetic and Logical units and virtual box performance slows down due it. Intel i5 may give performance as good as i7


Tune GUEST :

- In order to save resources are you running your Linux in run-level 3 or without x-window.

- remove unnecessary packages

- use virtual disks with fixed storage unless space is issue.

- Have you disabled the services not required.

- Instead of creating many disks under many controllers create few virual disks under few controllers ,preferably on separate physical hard disks. If using Virtual Box you can change mother board type to ACHI

- Have you tuned your file system( like ext4 or removing atime attribute ) and using the one offering best performance and less overheads[ Be aware of its crash and recovery as well ]

- when not required please put Virtual machines GUI in background[ not possible in Oracle Virtual box]

- If using Virtual Box then make sure you are using appropriate disk controller. SATA disk controller has advantage of less overhead while IDEL controller can leverage the DISK caching. Use SCSI only when you are in need to match it to your prod system you need to simulate.

- If you have enabled the disk caching then in abrupt termination of machine there is high chance of DB corruption so be ready with recovery solutions.

- Make sure there be fewer paging and no swapping at all in Guest. Anticipatory I/O scheduler does not go well with swapping. Make sure you are not using this one though this is the case only with older kernel say 2.4.1  Swapping at host level may be better than swapping at GUEST so may be over-committing memory better in few cases.

- If Oracle SGA is large say 4GB and many concurrent sessions say 100 then seriously consider using huge pages but do not pre-page SGA. Make sure whole SGA fits into Huge pages pool else it will rather slow down. Please monitor system for a good duration after setting up. Disable transparent Huge pages RHEL 6.
 It is not helpful. Please see post : Configure Huge Pages on Linux

- if using Oracle VM then increase VM metal-ware memory from default 512MB to 1024MB.

- If using Paravirtualized machine and RHEL then change I/O scheduler from CFQ to DEADLINE. You can changed write time-out of deadline scheduler attribute.

- If you installed Linux originally with 1 CPU and adding CPUs later it after then you need to install kernel-smp package also and use apic and smp switch in /etc/grub.conf .  Most of the cases single processors or two processors [ or two cores ]  out performs the four processors [ or four core ] due minimized context switching.
 
- Snapshots slows down the performance due required extra maintenance of machine state so make sure this is disabled while performance testing.

- Try to use less colors say 16 colors  and minimize the video memory say from 16MB to 8MB.

- use large pages(not the OS one ) in Oracle Virtual box. Increase it from command line using vboxsrvc command.  It can not be increased from GUI. Please read manual.

- Use appropriate kernel network parameters like default and maximum send and receive buffer size.

- Decrease the ticker frequncy  by adding divider=10 in /etc/grub.conf boot loader for corresponding kernel line. RHEL 6 is tickless so not required for that.


- OEL RHEL compatible kernel offers slightly better processor speed 3% in single task and 1% in multi tasking than OEL-UEK (not sure why - due processor scheduler or any reason) but it offers sligjly less performance than OEL UEK kernel (even though both schedulers are kept same deadline for testing). So you can test which one suits your CPU/I-O

 - If using performance testing then use CPU affinity to assign CPU to each node according to capacity so that if respective performance of nodes or cluster has to be checked then CPU's usage should be correct. Avoid CPU over commit as well.

 - Do not add many virtual hard disks if it is on same physical box or  it is not essential for test  as doing so will increase virtualization overhead.  If Virtual machine freezes under heavy I/O then consider increasing host memory and guest memory.   In such cases you will get warning like 'ATA disk waits exceed 61 secsetc.'

 - Few paging can be tolerated but swapping should not be. Increase Linux VM parameter to increase unit of swapped pages in single call which is logarithmic(natual).

- if You disk freezes(too slow I/O)  [confirmed by Virtual machine log reporting too much disk time warning for e.g. Virtual box then detach disks,readd them in different controller with changed caching and then add back to original controller.Else machine will continue to freeze even after reboot ]

- Some versions of Oracle has mutex bugs- manly due process scheduling issue which causes significant CPU wait- make sure you are not being hit by same.



--- Read manual of virtual machine provide and OS Tunning

-- Just for manageability : UEFI linux does not go well with Windows 7 with dual boot for want of signing boot-loader for security so when installing Linux make sure you does not boot from UEFI disk.

Creating service in Oracle RAC

To reduce the Oracle cluster overhead you may want your application or some of module of your application to connect only with some specific set of nodes while maintain connection fail-over the high availability feature  at the same time. This can be achieved by creating RAC specific service. Say you have 4 instance RAC database but you want your sales module to connect to database only to Instance 1 and Instance 2. I've made a video for creating service in Oracle RAC to connect applications to specific node(or set of nodes), called preferable instances  nodes and fail-over to other instances nodes . It is very simple without dbca(not in dbca from 11g)






Tuning Inserts in RAC



If Oracle database is not I/O bounded and is scalable by adding more CPU to single instance database then  it is likely to scale very well in RAC. However this is always not true in some cases and scalability of RAC can be limited  by many well known factors like slow interconnect or too busy interconnect. But there are few cases when RAC does not scale well in-spite of having very good interconnect. One such case is batch jobs with massive inserts on tables if there are too many right hand indexes(using ordered sequences or not ordered sequence with low cache, or ordered values coming from any source other than sequence) which caused major cluster overheads. In case 1) of UN-ordered sequences these GC waits can be minimized by increasing cache to large size (in order of thousands) and in case 2) sequences must be ordered then using reverse indexes or Hash partitioned indexes can minimize GC waits. Case 1) does not have CPU and I/O overhead while case 2) has major CPU and I/O overheads. So it must be evaluated that reduced GC waits benefits over-weigh additional CPU/ and I/O overhead required for Reverse/Hash partitioned indexes.  


TOTAL CLUSTER LATENCY WAITS (Prepare time and transfer time)


in graph 7:46 to 7:56 Reverse key index, 8:00 to 8:07 Hash partitioned indexes, 8:19 to 8:29 Normal Iidexe

Prepare time: 
  • Negligible in case of Reverse Indexes
  • Modest in Hash Partitioned indexes
  • Very High in Normal Index
Transfer time is highest in case of Reverse Indexes as keys are spread as random as number of rows [ cluster factor CF is so high in Reverse key indexes and optimize can tend to ignore this index due high CF ] in table and so high number of blocks are transferred and this is also the reason that Reverse key indexes requires large number of I/O may be ten fold or more higher than normal index. But provided there is no scarcity of buffered cache this request can be satisfied by normal GC waits (negligible GC busy waits ).

Normal indexes has severe BUFFER BUSY waits(gc buffer busy release and gc buffer busy acquire) contention and due large time of pinning and log flushing they have significantly highest CR Block creation time( preparation time).

It will become more clear from below graph which depicts Latency Preparation time :


Below graph depicts transfer latency for all cases which is less in HASH partitioned index case:





WORK LOAD both instances : highest in Reverse key index, least in Normal Index, modest in Hash

Latency of both RAC instances separately:

Cluster Latency :



Input - Output (I/O) summary Graphs:

Throughput(Waite event types) :


You can notice too high tower for single blocks reads coming from too many read calls for db file sequential reads for REVERSE key indexes while these are not high in case of HASH partitioned indexes and Normal Index. Later two's have very large REDO log write required for gcs log flush.


IO Latency: 


Throughput requests (File Type) :



Operating System statistics:

 CPU usage ( at node1[node2 has similar statistics] ):


Disk I/O ( at node1[node2 has similar statistics] ):





Load Average Run Queue ( at node1[node2 has similar statistics] ):


Network packet transfer rate ( at node1[node2 has similar statistics] ):



Conclusion: 

  • Contrary to common belief Reverse Indexes caused more logical I/O(and potentially physical I/Os) and more Interconnect traffic.Their I/O time is much more than CPU time. They just save time of pinning buffer and flushing and thus relieve pressure of Log file I/O. But this gain may not be enough to overweigh too I/O and CPU they may incu
  • Hash Indexes :   Best choice if there is heavy GC buffer busy events.
  • Normal Index: May still be better than above cases if not lot of right hand indexes and have fewer concurrency or fewer number of nodes.