Oracle Dataguard TAF (Transparent Application Failover) Issues

824 Views Asked by At

we have configured oracle TAF (Transparent Application Failover) for a dataguard database so that application can use same service name to connect database in case of any issue with primary database and have to switch to standby db but we are having a unique problem where application servers within the datacenter are able to connect to db but servers from different datacenter are failing to connect using taf service ..after 90 sec timeout interval its trying to connect to standby host and failing

Connection using direct hostname and sid are working perfectly fine even across the datacenter

Error :

Caused by: java.io.IOException: Socket read timed out, socket connect lapse 3 ms. plx9852.xyz.com/135.167.30.103 1524 3 1 true 
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:209) 
at oracle.net.nt.ConnOption.connect(ConnOption.java:161) 
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:470) 
... 54 more
pcdrest_taf.db.xyz.com=
(description=(connect_timeout=90)(retry_count=30)(retry_delay=3)(transport_connect_timeout=3)(load_balance=off)(failover=on)(address_list=(address=(protocol=tcp)(host=plx9843.xyz.com)(port=1524))(address=(protocol=tcp)(host=plx9852.xyz.com)(port=1524)))(connect_data=(service_name=pcdrest_taf.db.xyz.com)(failover_mode=(type=select)(method=basic))))

connection string on application using LDAP :

spring.datasource.jdbcUrl=jdbc:oracle:thin:@ldap://polarx.xyz.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx1.xyz.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx2.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx3.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx4.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com ldap://polarx5.sbc.com:3060/pcdrest_taf,cn=OracleContext,dc=db,dc=xyz,dc=com 

1

There are 1 best solutions below

0
ibre5041 On

Just beware Oracle changed meaning of transport_connect_timeout from seconds into milliseconds without any warning in release 12.1. So if you use this version there is no way to tell whether 3 means seconds or milliseconds.

Since ver 12.2, your value of 3 (miniseconds) is value is too low.

Moreover there were several bugs in Oracle JDBC driver related to TAF. For example:

  • Bug 12998506 RETRY_COUNT connection parameter is total number of connection attempts when using JDBC thin Description

The RETRY_COUNT connection parameter is the number of additional times a connection attempt should be made after the initial attempt has failed. Therefore if RETRY_COUNT is 2 a maximum of 3 connection attempts will be made for each address in the ADDRESS_LIST. However JDBC thin takes RETRY_COUNT to mean the total number of connection attempts so, in the above example, JDBC thin will make a maximum of 2 attempts for each address instead of the expected 3.

This is a follow on from bug 12760352 where addresses in the ADDRESS_LIST were being retried in the wrong order when using JDBC thin (e.g. if the address list contained A and B JDBC thin would attempt connections as A A ... B B ... instead of A B A B ...).

PS: the parameter retry_delay seems to be ignored by JDBC drivers since ver. 12c and higher.