An Oracle DBA Blog

My experiences with different Oracle products

Thursday, July 10, 2014

GoldenGate HANDLECOLLISION parameter


We should be extra cautious when using ‘HANDLECOLLISIONS’. Without HANDLECOLLISIONS, replicat abends if row updated at source is missing at target database.

To fix this issue if you plan to use HANDLECOLLISION then please make a note of how HANDLECOLLISION behaves in different scenarios -

With HANDLECOLLISION -
  - If updates are performed on non-key columns on a row @source that is missing @target  then, this change record is simply discarded by replicat @target.
 
  -The missing record at target is inserted only if the update is performed on key columns. Another point to note here is that, if supplemental logging is enabled on primary key then the missing row at target is inserted only with values of primary key columns, remaining columns are kept NULL. If supplemental logging is enabled for ALL columns then only the row is inserted in target DB with values of all columns.

Below is the test I performed

OBJECT ID is the primary key in TEST table. 


 SOURCE
TARGET
QA_SQL> alter table test add supplemental log data (primary key) columns;

Table altered.


QA_SQL> select object_id, object_name from test;

OBJECT_ID OBJECT_NAME
---------- --------------------
     35525 TEST
     35526 TEST_PK

PT_SQL> select object_id, object_name from test;
OBJECT_ID OBJECT_NAME
---------- --------------------
     35525 TEST

ROW with object_id 35526 is missing @target.
    
QA_SQL> update test set OBJECT_NAME='TEST_PK_2' where object_id=35526;

1 row updated.

QA_SQL> commit;

Commit complete.

PT_SQL> select object_id, object_name from test;      
                     - 
OBJECT_ID OBJECT_NAME
---------- --------------------
     35525 TEST

ROW @ TARGET IS DISCARDED BECAUSE UPDATE WAS NOT PERFORMED ON KEY COLUMN.

QA_SQL> update test set OBJECT_ID=35527 where object_id=35526;

1 row updated.

QA_SQL> commit;

Commit complete.

PT_SQL> select object_id, object_name from test;                            

OBJECT_ID OBJECT_NAME
---------- --------------------
     35527
     35525 TEST

PT_SQL> /

OBJECT_ID OBJECT_NAME
---------- --------------------
     35527
     35525 TEST

@TARGET THE ROW IS INSERTED ONLY WITH VALUES OF KEY COLUMNS BECAUSE UPDATE WAS PERFORMED ON KEY COLUMN
QA_SQL> alter table test add supplemental log data (all) columns;

Table altered.

QA_SQL> update test set OBJECT_ID=35527 where object_id=35527;

1 row updated.

QA_SQL> commit;

Commit complete.

PT_SQL> select object_id, object_name from test;                                                           

OBJECT_ID OBJECT_NAME
---------- --------------------
     35527 TEST_PK_2
     35525 TEST


@TARGET THE ROW IS INSERTED WITH VALUES OF ALL COLUMNS BECAUSE SUPPLEMENTAL LOGGING WAS ENABLED FOR ALL COLUMNS.

Wednesday, April 14, 2010

step by step processing of sql statements in oracle architecture

If it's a query (SELECT)
-------------------------------------------------------------------------------
1. Server process checks Library Cache to see if same statement has been issued previously. If so, acquire the existing execution plan
2. If no, then server process stores the submitted select in the library cache. Server process now parses it and develops an execution plan, storing the final execution plan with the SQL statement in the Library Cache
3. Server process visits Buffer Cache and checks to see if required data is already in cache. If so, read the existing buffers to obtain the data.
4. If not, the server process fetches the necessary blocks from disk and loads them into the buffer cache. Once in the cache, the server process is able to obtain the necessary row data.
5. Server process returns data to the user that requested it.

If its DML (INSERT/UPDATE/DELETE/MERGE)
--------------------------------------------------------------------------------
1. As above
2. As above
3. As above
4. Once the necessary data is in the buffer cache, the server process takes locks against each of the rows it is interested in
5. The server process obtains a reservation of space in an available undo segment and buffers in the cache to act as the in-memory store for that space. It writes the current version of the data into those in-memory buffers.
6. The server process obtains permission to access the Log Buffer, and writes the current AND FUTURE versions of the data about to be changed into that log buffer. If you're changing someone's age from 44 to 45, for example, only the ROWID and that age column is written in as a pair of "change vectors" -sometimes called the 'before and after images of the data'
7. Once the log buffer data has been written, the server process can now modify the data in the buffer cache (changing the relevant columns to their new values
8. Once all changes to data have been made, the server process returns an 'X rows updated/deleted/inserted' message to the user who submitted the original SQL
Note that changed data remains entirely in memory at this point, so a power failure would cause you to lose this update.

If it's a COMMIT
--------------------------------------------------------------------------------
1. The server process sends a message to the LGWR process to flush all outstanding redo in the log buffer to the online redo logs.
2. The LGWR process sends a message back to the server process saying 'flush complete'
3. The Server Process sends a message to the client saying 'commit comlpete'
4. The Server Process removes locks from rows stored in the buffer cache.
Note that changed data remains entirely in memory at this point, so a power failure would cause you to lose that data. However, the before and after images of the data have been saved to the online redo logs -so the nature of the change you've just committed is safely captured and recoverable. That's why, after a commit, you can be sure your data changes are recoverable (so long as your redo logs are safe, of course!)
And that's just a very generalised explanation of things, because a heck of a lot of internals take place at each and every one of those steps (latches, mutexes and fun stuff like that). But hopefully, that will give you the gist of what goes on.

Troubleshooting Locks


1. This script will find the sessions which are into WAITING state. (It filters out all idle sessions)
select sid, username, event, blocking_session,
seconds_in_wait, wait_time
from v$session where state in ('WAITING')
and wait_class != 'Idle';
Looking at this information, we can immediately find out that which session are waiting for a lock on a table and that locks being held by other sessions on the same resource (BLOCKING_SESSION).

2. If we want to know which SQL statements are causing this lock contention, we can find out easily, by issuing the following query joining the V$SESSION and V$SQL views-
Select sid, sql_text
from v$session s, v$sql q
where Sid in (254,270)
and
(q.sql_id = s.sql_id or
q.sql_id = s.prev_sql_id);
Note- In this example we have considered sessions with session id 254 is blocking 270. So unless session 254 commits or rollbacks the transaction, session 270 continues to wait.In this case if session 270 continues to wait we have to kill blocking session.

3. In Oracle Database 10g, wait events are divided into various wait classes, based on their type. The grouping of events lets you focus on specific classes and exclude nonessential ones such as idle events. We use V$SESSION_WAIT_CLASS view for this purpose.
Select wait_class_id, wait_class,
total_waits, time_waited
from v$session_wait_class
Where sid = 270;
The output of this query shows the wait classes and how many times the session (in our case session- 270) has waited for events in each waits class. For row lock contention, we should look into application wait class. We should check for how many times Application wait class occurred and for how much time the session has waited for this wait class (Time is represented in centiseconds (cs)—hundredths of a second—since the instance started.). If you think that this TIME_WAITED value is high for the session. We should explore the cause of these waits in the application wait class. The times for individual waits are available in the V$SYSTEM_EVENT view.

4. Use following query to identify individual waits in the application wait class (class id 4217450380):
Select event, total_waits, time_waited
from v$system_event e, v$event_name n
where n.event_id = e.event_id
and wait_class_id = 4217450380;
The output of this query shows lock contentions (indicated by the event enq: TX - row lock contention) and other wait events that constitute Application Wait Class. If waiting time in TX - row lock contention is higher, then it should concern you. It may be possible that a badly written application made its way through to the production database, causing these lock contention problems. However, we should not immediately draw that conclusion. The output of above query merely indicates that the users have experienced lock-contention-related waits for total number of time, for much total time in centiseconds. It is possible that mostly 1- or 2-cs waits and only one large wait account for the total wait time, and in that case, the application isn't faulty. A single large wait may be some freak occurrence skewing the data and not representative of the workload on the system.

5. Oracle 10g provides a new view, V$EVENT_HISTOGRAM, that shows the wait time periods and how often sessions have waited for a specific time period. Use following query against V$EVENT_HISTOGRAM
Select wait_time_milli bucket, wait_count
from v$event_histogram
where event =
'enq: TX - row lock contention';
The output looks like this:
BUCKET WAIT_COUNT
----------- ----------
1 252
2 0
4 0
8 0
16 1
32 0
64 4
128 52
256 706
512 392
1024 18
2048 7
4096 843
The V$EVENT_HISTOGRAM view shows the buckets of wait times and how many times the sessions waited for a particular event—in this case, a row lock contention—for that duration. For example, sessions waited 252 times for less than 1 millisecond (ms), once less than 16 ms but more than 1 ms, and so on. The sum of the values of the WAIT_COUNT column is same as the value shown in the event enq: TX - row lock contention, output of query no 5. The V$EVENT_HISTOGRAM view shows that the most waits occurred in the ranges of 256 ms, 512 ms, and 4,096 ms, which is sufficient evidence that the applications are experiencing locking issues and that this locking is the cause of the slowness in Bill's session. Had the view showed numerous waits in the 1-ms range, we wouldn't have been as concerned, because the waits would have seemed normal.

Reference Doc for application 10.2.0.3 patch in ACL.
Patch to be applied - 10g release 2 (10.2.0.3) Patch Set 2 for AIX 5L Based Systems (64-Bit)
Operating System – IBM – AIX 5.3
Servers to be patched – Oradb1.applin.com, Oradb2.applin.com, oradbdr.applin.com
Notes-
The Oracle Clusterware software must be at the same or newer level as the Oracle software in the Oracle RAC Oracle home. Therefore, we should/must always upgrade Oracle Clusterware before we upgrade Oracle RAC.
We must apply this patch in an existing Oracle Database 10g Oracle home. To identify Oracle home directories, view the /etc/oratab file.
A. Steps to make systems ready to apply the patch -
Copy the p5337014_10203_AIX64-5L.zip patch set installation archive to a directory that is not the Oracle home directory or under the Oracle home directory.
Enter the following command to unzip and extract the installation files :
$ gunzip p5337014_10203_AIX64-5L.zip

3. Set the ORACLE_HOME and ORACLE_SID environment variables
$ ORACLE_HOME= /oracle/product/10.2/db_1
$ ORACLE_SID=GACL1 -- GACL2 incase of oradb2
$ export ORACLE_HOME ORACLE_SID

4. Lock all database Users which end users use to connect to database.

5. Take full database export using following command.
exp system/***** file=/backup/full_export/afternoon/export_`date +%F`.dmp log=/backup/full_export/afternoon/explog_`date +%F`.log statistics=none full=y grants=y compress=y constraints=y triggers=y indexes=y consistent = y recordlength=65535 feedback=10000

6. Take coldbackup of the database and copy that backup to Oradbtest.

7. Shut down all processes in the Oracle home on each node that might be accessing a database, for example Oracle Enterprise Manager Database Control or iSQL*Plus:.
$ emctl stop dbconsole
$ emctl stop agent
$ isqlplusctl stop

8. Shut down all services in the Oracle home on each node that might be accessing a database:
$ srvctl stop service -d GACL --Not required in Ambuja Cements Ltd.

9. Shutdown all Oracle RAC instances on all cluster nodes by entering following command.
$ srvctl stop database -d GACL

10. Shut down Automatic Storage Management instances on all cluster nodes by entering following command
$ srvctl stop asm –n oradb1 –i ASM1
$ srvctl stop asm –n oradb2 –i ASM2

11. Stop all node applications on all cluster nodes by entering the following command as a root user
# srvctl stop nodeapps -n oradb1
# srvctl stop nodeapps -n oradb2

12. Log in as the root user and shut down Oracle Clusterware processes by entering following command on all nodes as the root user
# CRS_home/bin/crsctl stop crs
Wait for sometimes as it takes sometime to stop

13. Check output of $ crs_stat –t
It must show everything offline.

B. Steps to apply patch on Clusterware.

1. Log in as the oracle user oradb1
execute $ exec /usr/bin/shh-agent $SHELL
execute $ /usr/bin/ssh-add -- Provide pass phrase at this command.
2. Log in as root and do the step 1 -- Provide pass phrase WHEN prompted
3. Log in as the oracle user oradb2
execute $ exec /usr/bin/shh-agent $SHELL
. execute $ /usr/bin/ssh-add -- Provide pass phrase at this command.
4. Log in as root and do the steps 2 and 3 -- Provide pass phrase WHEN prompted
5. Execute
On oradb1 -> $ ssh oradb2 date -- This should not ask for password
On oradb2 -> $ ssh oradb1 date -- This should not ask you password.
6. On Oradb1 enter the following commands to start Oracle Universal Installer, where patchset_directory is the directory where we have unzipped the patch set software in Step A.2
$ cd patchset_directory/Disk1
$ ./runInstaller
3. Run /usr/sbin/slibclean when prompted. – Run this as a root.
4. On the Welcome screen, click next.
4. In the Specify Home Details screen, select the name of the CRS home need to update, then click Next
5. Click Next when the Selected nodes screen appears.
6. On the Summary screen, click Install.
This screen lists all of the patches available for installation
7. When prompted, run the $CRS_HOME/instal/root102.sh script as the root user. Since we are applying the patch set to an Oracle RAC installation, we will run the root102.sh script on each node of the cluster
Root102.sh will start CRS, nodeapps, asm, listener, instances and database.
8. At the End of Installation screen, click Exit, then click Yes to exit from Oracle Universal Installer
9. Execute $ srvctl stop database –d GACL.
10. Check output of $ crs_stat –t , it should show two instances and the database down.

C. Steps to apply patch on Oracle RDBMS Home.

1. Check the output of $ crs_stat –t it should show following output
{node1:root}/oracle -> crs_stat –t
Name Type Target State Host
------------------------------------------------------------
ora....B1.inst application OFFLINE OFFLINE
ora....B2.inst application OFFLINE OFFLINE
ora.ASMDB.db application OFFLINE OFFLINE
ora....SM1.asm application ONLINE ONLINE node1
ora....E1.lsnr application ONLINE ONLINE node1
ora.node1.gsd application ONLINE ONLINE node1
ora.node1.ons application ONLINE ONLINE node1
ora.node1.vip application ONLINE ONLINE node1
ora....SM2.asm application ONLINE ONLINE node2
ora....E2.lsnr application ONLINE ONLINE node2
ora.node2.gsd application ONLINE ONLINE node2
ora.node2.ons application ONLINE ONLINE node2
ora.node2.vip application ONLINE ONLINE node2

2. Log in as the oracle user oradb1
execute $ exec /usr/bin/shh-agent $SHELL
execute $ /usr/bin/ssh-add -- Provide pass phrase at this command.
3. Log in as root and do the step 1 -- Provide pass phrase WHEN prompted
4. Log in as the oracle user oradb2
execute $ exec /usr/bin/shh-agent $SHELL
. execute $ /usr/bin/ssh-add -- Provide pass phrase at this command.
5. Log in as root and do the steps 2 and 3 -- Provide pass phrase WHEN prompted
6. Execute
On oradb1 -> $ ssh oradb2 date -- This should not ask for password
On oradb2 -> $ ssh oradb1 date -- This should not ask you password.
7. On Oradb1 enter the following commands to start Oracle Universal Installer, where patchset_directory is the directory where we have unzipped the patch set software in Step A.2
$ cd patchset_directory/Disk1
$ ./runInstaller
8. Run /usr/sbin/slibclean when prompted. – Run this as a root.
9. On the Welcome screen, click next.
10. In the Specify Home Details screen, select the name of the Oracle Home need to update, then click Next
11. Click Next when the Selected nodes screen appears.
12. On the Summary screen, click Install.
This screen lists all of the patches available for installation
13. When prompted, run the $ORACLE_HOME/root.sh script as the root user. Since we are applying the patch set to an Oracle RAC installation, we will run the root.sh script on each node of the cluster
14. At the End of Installation screen, click Exit, then click Yes to exit from Oracle Universal Installer

D. Troubleshooting Steps -
After applying patch to RDBMS Home, the CRSD daemon was not coming up on node-1, because of which all nodeapps and database were down.
It happened because we ran into bug 5384061 and the OCR had inconsistent entries as a result of this bug.
We followed below steps to restore OCR from the backup taken before inconsistency took place in OCR
On Oradb2 the crsd.bin was being cored, when the core files were removed ($CRS_HOME/log/), we were able to bring up CRS but it was continuously coredumping.

1. We tried to shutdown CRS with crsctl stop crs, but failed, because CRSD was not communicating. So we disabled CRSD with command –
#etc/init.crs disable (On both the nodes)
#crsctl stop crs (On oradb2)

Still we were unable to stop CRS on Oradb1 so we rebooted it.

2. At this stage CRSD, EVMD and CSSD were down on both nodes.
3. We took export of OCR (from Oradb1) using following command.
Ocrconfig –export backup09092007.ocr –s online
4. Then we took import of the same file using following command. (on Oradb1)
Ocrconfig –import backup09092007.ocr –s online
5. We enabled CRS and then started it
6. Then we enabled CRS and started it on both the nodes.
# crsctl enable crs -- oradb1
# crsctl enable crs -- oradb2
# crsctl start crs --oradb1
# crsctl start crs -- oradb2
7. At this point crs_stat –t output showed online for all nodeapps and CRSD, CSSD, EVMD status was healthy.

E. Upgrading Database from 10.2.0.2 to 10.2.0.3
1. Start listener on both nodes using following commands.
$ srvctl start listener –n oradb1
$ srvctl start listener –n oradb2
2. Start database in nomount stage and make cluster database parameter false
Startup nomount
Sql>Alter system set cluster_database=false;
Sql>alter system set log_archive_dest_state_2=defer;
3. Then shutdown and startup the database in upgrade mode.
Shutdown immediate

Enter following cpmmands
SQL> STARTUP UPGRADE
SQL> SPOOL home/oracle/patch.log
SQL> @?/rdbms/admin/catupgrd.sql
SQL> SPOOL OFF
4. Shutdown and startup database, run utlrp.sql script to compile the invalid objects.
5. Take coldbackup of the database and archivelogs using RMAN.
6. Make cluster database parameter ‘true’ and shutdown and start both the nodes.


F. Steps to apply patch on Physical Standby Database

1. Cancel media recovery process on standby and shut it down.
2. Shut down all processes in the Oracle home on each node that might be accessing a database, for example Oracle Enterprise Manager Database Control or iSQL*Plus:.
$ emctl stop dbconsole
$ emctl stop agent
$ isqlplusctl stop
3. On Oradbdr enter the following commands to start Oracle Universal Installer, where patchset_directory is the directory where we have unzipped the patch set software in Step A.2
$ cd patchset_directory/Disk1
$ ./runInstaller
4. Run /usr/sbin/slibclean when prompted. – Run this as a root.
5. On the Welcome screen, click next.
6. In the Specify Home Details screen, select the name of the Oracle Home need to update, then click Next
7. Click Next when the Selected nodes screen appears.
8. On the Summary screen, click Install.
This screen lists all of the patches available for installation
9. When prompted, run the $ORACLE_HOME/root.sh script as the root user.
10. At the End of Installation screen, click Exit, then click Yes to exit from Oracle Universal Installer

G. Creating Standby Database
1. Take backup of current controlfile on primary for standby
Rman> backup current controlfile for standby format ‘/home/oracle/standby_%p’
-- do this step on oradb1
2. Copy all the backup pieces generated in Step E-5 and G-1 to standby database.
3. Startup database in Nomount stage
4. Restore standby controlfile using rman
$ Rman target /
Rman> restore standby controlfile from ‘/home/oracle/standby_1215466’;
Rman> alter database mount;
Rman> catalog backuppiece ‘path of the backuppiece’ -- copied in step 2
Rman> restore archivelog all;
Rman> recover standby database;
5. Start managed recovery process on standby.
Sql> alter database recover managed standby database disconnect from session;
6. On primary database enable the log_archive_dest_state_2 parameter
Sql> alter system set log_archive_dest_state_2=enable -- do this step on oradb1

Dedicated and shared server

If you are running in 9i or above, there are two parameters (not one) which govern whether or not you are capable of running in shared server mode.

With dispatchers and shared server processes configured, in other words, an instance can "support shared server connection requests". That's rather different than "running in shared server mode". The distinction is important because privileged actions (startup, shutdown, backup and recover commands) cannot be processed by a shared server process, so it's important for an instance that is configured for normal users to use shared server processes to still support the connection to dedicated server processes by suitably credentialled users.If a user does end up connected to a shared server process, there is usually a performance penalty to pay compared to using a dedicated server process. A user submits a query and instead of it being immediately processed by a server process, it gets submitted to a dispatcher ...which promptly sticks it on a job queue! You then have to wait for a shared server process to become free and decide to pick your job off the queue. That's inevitably slower than doing it the dedicated way.

People use shared server as the first line of scaling up their databases... and you're right that it primarily depends on the number of users connected to the server concurrently. In dedicated server mode, a new connection means a new process gets spawned (or a new thread on Windows) and a new connection socket is opened. Servers can only handle so many connection sockets, processes or threads before they start to keel over under the strain. Shared server, as the name suggest, means that new connections do not cause new server processes to be spawned. So 300 users can be processed with, maybe, 30 or 40 processes in total. If your box would normally keel over handling 300 dedicated connections, then clearly with that sort of sharing ratio, you'd be able to scale to nearer 3000 users before it starts wilting by using shared processes.
But it's also a bit subtler than that: a data warehouse would be daft to implement shared server, even if it did have 300+ concurrent users. That's because the users of such systems typically run queries that run for hours... and a shared process that is nabbed to perform one job for hours on end isn't really a shared process any more, is it?! So the rule of thumb as to when to implement shared server is yes, (a) when your concurrent user count starts reaching levels that your server just doesn't seem able to sustain any more AND (b) when you can be sure that the users tend to issue short, sharp queries -say, about 3 seconds or so to process, max.

Again, there are mixed states to get through, too. You might have lots of OLTP-type sub-3-second transactions on the same database on which one or two users regularly run big reports. In that case, you make sure the reporters have a tnsnames.ora that says SERVER=DEDICATED and the OLTP-type people use one that has SERVER=SHARED in it; configure the DISPATCHERS and SHARED_SERVER parameters for the instance and then those that can benefit from shared servers can do so and those that wouldn't won't be stealing shared processes from those that can!

The alternative approach for those with more cash is to go and buy better server hardware that can cope with the user community numbers! Shared Server configuration, however, comes free. You pays your money and you takes your choices!

-- Doc by HJR

Followers