Tuesday, November 26, 2019

Dataguard and LOG_ARCHIVE_MAX_PROCESSES


Any DBA who has set up Dataguard has seen this diagram of how the sending of logs to standby database works.

What I wanted to go through is the use of LOG_ARCHIVE_MAX_PROCESSES and how how it affects the sending of redo logs.

If you have set this parameter in your environment it is typically because you are seeing issues with the database cleaning out redo logs.
The primary reason this gets set from the default (4) is to add more processes capable of clearing out redo logs and making them archive logs.  The more processes running in parallel to clean out redo logs, the faster  the database can move on to start a new redo log.

But did you know the LOG_ARCHIVE_MAX_PROCESSES parameter also affects how many processes are used on your standby database to fill any gaps ?

In my example I have a primary database (BSG18) which on 18.3.0.  I also have a standby database (BSG18d).

In my primary database I set the parameter to 10.

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
log_archive_max_processes            integer     10


I then shutdown my dataguard database, and performed many log switches. I needed at least 10 log switches to demonstrate what happens.

Now once I start up my standby database, the LOG_ARCHIVE_MAX_PROCESSES is used to determine how many processes on my standby database to fill the gap.

Here is my query from V$DATAGUARD_PROCESS

select name,
           pid "standby PID",
           role, client_pid "Primary DB PID",
           client_role
    from V$DATAGUARD_PROCESS
    where type='NET'
order by 2;


NAME  standby PID              ROLE                     Primary DB PID CLIENT_ROLE     
----- ------------------------ ------------------------ -------------- ----------------
rfs   18116                    RFS async                          4165 async ORL multi 
rfs   26337                    RFS ping                           4153 gap manager     
rfs   26401                    RFS archive                        4159 archive gap     
rfs   26404                    RFS archive                        8752 archive gap     
rfs   26408                    RFS archive                        8738 archive gap     
rfs   26411                    RFS archive                        8750 archive gap     
rfs   26413                    RFS archive                        8740 archive gap     
rfs   26415                    RFS archive                        8748 archive gap     
rfs   26417                    RFS archive                        8742 archive gap     
rfs   26419                    RFS archive                        8744 archive gap     
rfs   26421                    RFS archive                        8746 archive gap     


If you look at the output, you can see all the pieces fit together.


Real Time redo process


The first process sends the redo information to the standby database in async mode, and is written to the standby redo logs.


oracle    4165     1  0 Nov05 ?        00:05:57 ora_tt02_bsg18


We can see on the primary database it is a "tt" process and starts up with the database.

oracle   18116     1  0 Nov12 ?        00:07:58 oraclebsg18d (LOCAL=NO)


and for the standby database , it is a sqlnet process.

Gap Manager process


Now let's look at the second process.

oracle    4153     1  0 Nov05 ?        00:00:26 ora_tt00_bsg18


We can see on the primary database it is also a "tt" process and also starts up with the database.

oracle   26337     1  0 Nov12 ?        00:00:18 oraclebsg18d (LOCAL=NO)

and for the standby database , it is a sqlnet process.


Gap fill processes


Now let's look at the remaining processes.


oracle    4159     1  0 Nov05 ?        00:00:14 ora_arc1_bsg18
oracle    8738     1  0 Nov05 ?        00:00:13 ora_arc2_bsg18
oracle    8740     1  0 Nov05 ?        00:00:14 ora_arc3_bsg18
oracle    8742     1  0 Nov05 ?        00:00:13 ora_arc4_bsg18
oracle    8744     1  0 Nov05 ?        00:00:13 ora_arc5_bsg18
oracle    8746     1  0 Nov05 ?        00:00:13 ora_arc6_bsg18
oracle    8748     1  0 Nov05 ?        00:00:13 ora_arc7_bsg18
oracle    8750     1  0 Nov05 ?        00:00:13 ora_arc8_bsg18
oracle    8752     1  0 Nov05 ?        00:00:13 ora_arc9_bsg18


When I look at the primary database, I can see 9 "arc" processes.
Why 9 processes? because LOG_ARCHIVE_MAX_PROCESSES of 10 tells the primary database that it can use 10 processes to fill the gap on the standby.
 The 1st process is the gap manager "tt', and the remaining 9 processes "arc" are used to send over the archive logs.

Increasing the LOG_ARCHIVE_MAX_PROCESSES can be used to increase the parallelization of sending archive logs to fill any gaps.



1 comment:

  1. If this is a real problem on the database, I would look carefully at a few things before I adjusted the log_archive_max_processes setting. Look at the IO speed on the DASD, look at the network speed to understand what is taking it so long, and consider increasing the number of redo log groups and standby redo log groups. In any reasonable environment, the log_archive_max_processes setting is not a value that should need tuing.

    ReplyDelete