1. Oracle SOA 10g, General
Clarifications for FAQ.
General FAQs
Oracle BPEL Process Manager
Enterprise Service Bus
Adapters
Troubleshooting
2. General
2.1
What are the Basic Steps for a Clustered Installation of BPEL, ESB and Web
Services Manager Components?
The BPEL, ESB and Web
Services Manager components have different clustering
characteristics and each
requires specific installation and configuration steps to setup
their HA cluster. The
Enterprise Deployment Guide 10.1.3.3 describes these steps in
great detail, see figure
1-1 for a representative architecture diagram.
The basic installation
steps for each node in a 10.1.3.3 cluster are as follows:
1.
Install Application Server 10.1.3.1
2.
[Optional] Install Oracle HTTP Server in separate Oracle Home
3.
Install BPEL Process Manager 10.1.3.1
4.
Install Enterprise Service Bus runtime 10.1.3.1
5.
Install Web Services Manager 10.1.3.1
6.
Install Application Server 10.1.3.3 / 10.1.3.4 patchset
7.
Install Enterprise Service Bus 10.1.3.3 / 10.1.3.4 designtime
8.
[Optional] Apply the latest 10.1.3.3 / 10.1.3.4 MLR patchset
Note the Enterprise
Deployment Guide 10.1.3.3 is currently the latest version. The steps
are identical for a new
install of 10.1.3.4.
2.2
Can I Use the Application Server "J2EE, Web Server and SOA Suite"
Advanced
Installation Type for a Clustered Setup?
No, the "J2EE, Web
Server and SOA Suite" installation type should never be chosen for
SOA Suite clustered
installations. Instead it's important to start with the advanced
installation option of
"J2EE and Web Server" or “J2EE” for each SOA Suite node in the
cluster, then separately
install BPEL, ESB and Web Services Manager into the
appropriate application
server Oracle Homes. See the Enterprise Deployment Guide for
details.
2.3
Do I Need to Install Oracle HTTP Servers in Separate Oracle Homes?
If you intend to deploy
Oracle HTTP Server (OHS) outside a firewall, then you must
install another Oracle
Home with OHS in that network zone. If not, it’s generally easier
to install OHS and the SOA
Suite components into the same Oracle Homes. For that
scenario, you can start
with the application server advanced installation for “J2EE and
Web Server”, and then
install the SOA components into the same Oracle Home.
2.4
Can BPEL, OWSM and ESB-Runtime Components be Deployed to the
Same
OC4J Container?
Yes, although your
performance goals might require separating these components into
different OC4J containers
to expand available memory resources but they can be
deployed together.
Only the ESB designtime
component must be deployed into a separate OC4J container
to operate in an
active-passive failover configuration.
2.5
What is an ONS Topology?
An ONS topology or
OPMN/ONS Cluster is a group of Oracle Application Server
instances that communicate
to each other through ONS. ONS topologies share
availability notifications
across the participants in the topology, this enables dynamic
routing from OHS to OC4J
(i.e. routing behavior that changes dynamically without
configuration changes) and
dynamic discovery of application server and OHS instances.
2.6
What are the Differences between Application Server Clustering and BPEL or ESB
Clustering?
Application server
clustering is independent of BPEL clustering but in practice both are
almost always used
together. Setting up an application server cluster however does not
enable nodes to share BPEL
specific data.
Similarly, application
server clustering is independent of ESB runtime clustering. ESB
runtime (ESBRT) nodes and
ESB designtime (ESBDT) nodes are clustered by
configuring each ESBRT /
ESBDT to share the same Metadata repository database. This
establishes the common
persistence layer for all ESB servers to share the same AQ
messaging system used for
group communication.
Application server
clustering is used for active-passive failover of the ESBDT servers.
The use of OC4J groups and
Application Server clustering allows OPMN to ensure only
one ESBDT instance is
active at any time and also controls the startup of the standby
ESBDT when the active
ESBDT goes down.
2.7
Is Time Synchronization Important between Nodes in a SOA Cluster?
Yes, each node in the SOA
cluster must have accurate time synchronization. BPEL
process activities can
have time-sensitive behavior, such as wait activities, and timedependent
process scheduling. Two
nodes without time synchronization may not
properly execute BPEL
processes.
2.8
What Load Balancer Configurations are Required?
The Enterprise Deployment
Guide 10.1.3.3 covers load balancer configurations in
sections 2.4, 3.1, 3.9,
3.10, 3.12. 4.1, 4.3.1, and 4.3.3.
3 Oracle BPEL Process Manager
3.1
Does BPEL Clustering Require an Application Server Cluster?
BPEL clustering operates
whether or not it runs on top of an application server cluster.
The BPEL cluster nodes
communicate via jGroups technology and share the same
dehydration store.
Note is BPEL 10.1.3.5.1,
jGroups was replaced with an internal database-based
mechanism to provide
cluster communication between BPEL nodes. See the section
“Replacement of JGroups
for BPEL Clustering…” in this paper for more details.
In practice, an
application server is almost always used since this handles OHS routing
requests to the available
BPEL instances. Without an application server cluster, manual
routing rules must be
configured from OHS to the OC4J instances running BPEL.
Thus with application
server clustering configured, when an OC4J container running
BPEL goes down, SOAP/HTTP
requests coming into an OHS instance in the cluster will
be automatically routed to
another available BPEL node.
3.2
Are there Other Advantages to Running an Application Server Cluster
with
a BPEL Cluster?
When using the BPEL Java
API to invoke BPEL processes, you can make use of the
oracle.j2ee.rmi.loadBalance
JNDI property and the OPMN provider URL to achieve
load balancing.
3.3
Why is it Necessary to Configure jGroups for BPEL Clustering Since the
BPEL
Engine is Stateless?
The BPEL jGroups cluster
allows one node to notify other cluster nodes of specific
process and domain
changes:
Process deployment or
undeployment
Creation or deletion of
BPEL domains
BPEL domain property
settings
bpel.xml property
changes for a BPEL project
These process and domain
notifications are passed using jGroups communication
between nodes.
Note in BPEL 10.1.3.5.1,
jGroups was replaced with an internal database-based
mechanism to provide
cluster communication between BPEL nodes. See the section
“Replacement of JGroups
for BPEL Clustering…” in this paper for more details.
3.4
How is a BPEL Cluster Defined?
BPEL clustering is
achieved with the appropriate configuration in the jgroupsprotocol.
xml and collaxa-config.xml
files for each node, and by pointing all nodes to a
common dehydration store
database. These clustering configuration steps are covered
in the Enterprise
Deployment Guide.
5
Note in BPEL 10.1.3.5.1,
jGroups was replaced with an internal database-based
mechanism to provide
cluster communication between BPEL nodes. See the section
“Replacement of JGroups
for BPEL Clustering…” in this paper for more details.
3.5
What Happens When a New BPEL Process is Deployed While Another
Node
is Down?
When the down node is
started, it will synchronize with the dehydration store database
and deploy the new process
onto itself.
3.6
Can I Cluster a BPEL Process by Deploying it into Two Domains of the
Same
BPEL Server?
No. Domains represent a
grouping of BPEL processes; creation of a new domain does
not add another physical
node to a cluster for high-availability.
In addition, the full path
of a BPEL process includes the domain name, process name,
and version extension.
Processes are considered different with variation on any one of
these three elements and
therefore cannot be clustered across different domains.
3.7
What Artifacts are Included in the Dehydration Store Database?
The following artifacts
are stored in the BPEL dehydration database:
ü
BPEL deployment suitcases,
including bpel.xml
ü
Domain definitions
(domains table)
ü
BPEL instances meta data
(cube_instance table)
ü
BPEL instances audit trail
(audit_trail table)
ü
Asynchronous invoke
messages (invoke_message table)
3.8
Can I Add New Nodes to the BPEL Cluster without Shutting Down the
Dehydration
Database or Existing BPEL Nodes?
In general, yes. You need
to configure the new node to point to the existing dehydration
database and properly
configure the jgroups-protocol.xml and collaxa-config.xml files.
jGroups supports two
different network protocols for cluster communication, TCP and
UDP. If TCP is used, then
all existing nodes must have the list of nodes within jgroupsprotocol.
xml file updated to
include the new node. In this case, the existing BPEL nodes
must be restarted to
reload the jGroups configuration. Since UDP uses multicast, no
such restart is necessary.
In BPEL 10.1.2, you must
copy the BPEL suitcase file (the jar file under
orabpel/domains/domain_name/deploy)
from the existing nodes to the new node. In all
versions, you must ensure
that the process design does not include hard coded values
such as host name that are
only suitable for a specific node.
3.9
When a BPEL Node Goes Down, How do Other Nodes Get the Instance
and
Continue Execution?
If the BPEL process
instance is waiting at a mid-process receive activity when the BPEL
node goes down, the
process instance is not in an active JTA transaction. The process
instance remains at the
dehydrated state until the message for which it is waiting
arrives. Upon arrival of
the message, an available BPEL node retrieves the instance from
the dehydration store and
continues processing.
If the BPEL process is in
the middle of an active JTA transaction when the server goes
down, the transaction is
rolled back. If this is an asynchronous invocation, the instance
is listed in the manual
recovery page of the BPEL console. From this page an
administrator can resubmit
the process and an available BPEL server in the cluster can
continue execution.
Recovery begins from the latest dehydration point or, if no
dehydration point was
reached before failure, the process starts at the beginning. The
recovery process can be
automated through custom programs utilizing the BPEL API.
If this is a synchronous
invocation, the client receives an error and is responsible for
resubmitting the message.
If the process instance is
waiting at the wait activity, then you must update the timer
table of one of the nodes
so that the wait activity awakens on that node. Otherwise,
even after the timer
expires, none of the nodes will complete the wait activity until a
manual recovery is
performed.
3.10
How Does an Administrator Perform Manual Recovery of a BPEL
Process?
Go to the BPEL Console,
navigate to the Process tab and then click the Perform Manual
Recovery link in the lower
left. The pending instances on this page are categorized by
invoke messages, callback,
and activity.
Exercise caution when
performing manual recovery. The BPEL console recovery page
shows all messages for
which the corresponding instance is not completed. This
includes instances that
are still in-flight and have not yet completed a transaction. Use
some criteria to recover
the right messages (e.g. look at the message age or payload via
the BPEL Java API), to
understand if the instance has truly rolled back and is in need of
recovery.
3.11
Does TCP have Advantages Over UDP as the jGroups Protocol?
UDP multicast does not
work when clustered nodes are located on different network
subnets (although special
network configuration can make this possible).
Adapter clustering also
uses jGroups to enable active-passive (singleton) adapter
endpoints. If there are
more than 2 (two) singleton adapters defined in an adapter
cluster, it is recommended
to use UDP instead of TCP. See the Adapters section in this
paper for more information
on adapter clustering.
3.12
How is BPEL Configured to Work with a Real Application Cluster (RAC)
Database?
This configuration is
covered in the Enterprise Deployment Guide 10.1.3.3 section
3.29.
and the Oracle
SOA Suite XA and RAC Guide.
3.13
Are Modifications to the Process Deployment Descriptor Propagated to
all
Nodes of the BPEL Cluster?
Yes. Changes to the
bpel.xml deployment descriptor file, such as number of activation
agents, is synchronized
across all nodes in BPEL cluster automatically.
3.14
Are Modifications to collaxa-config.xml or the BPELAdmin Console
Propagated
to all Nodes in the BPEL Cluster?
No. Changes to the
collaxa-config.xml or BPELAdmin console properties must be
applied to each node
manually.
3.15
Replacement of JGroups for BPEL Clustering and BPEL Singleton
Adapters
Communication in 10.1.3.5.1
Starting in version
10.1.3.5.1 BPEL clustering and BPEL singleton adapters no longer use
jGroups for cluster
communication and singleton adapter coordination between the
nodes of a BPEL cluster.
Note that ESB singleton
adapters still use jGroups; this remains unchanged from
previous versions.
To define version
10.1.3.5.1: it is the application of 10.1.3 Oracle Application Server
patch set 10.1.3.5.0
(patch ID 8626084) and Integration patch MLR #1 (patch ID 9034573)
or it is the 10.1.3.5.1
WebLogic SOA installation. This change is realized by the fix to
bug 8608385 being included
into 10.1.3.5.1.
Implementation of this
feature does not require any configuration on the user’s part.
Once a system has version
10.1.3.5.1 applied the functionality is fully implemented.
jGroups is replaced as the
means of propagating cluster messages across BPEL nodes
with a database-based
solution where cluster messages are inserted into a database
table (a BPEL cluster is
by definition nodes that share the same dehydration store
database). Each BPEL node
has a specific thread polling the database table for new
cluster messages. The same
database-based infrastructure is also used by cluster nodes
to nominate a master node
for the cluster. The master node will be the node where
adapter active-passive
(singleton) activations will be enabled.
Three new tables are
introduced into the orabpel schema to provide this functionality:
CLUSTER_MASTER – Contains
the NODE_ID of the BPEL cluster node that has the
active polling endpoint
for inbound singleton adapters. Adapter clusters are still
defined as before using a
clusterGroupId for the activation agent of the adapter in
bpel.xml. The master node
will change if the current master node is taken out of the
cluster and will be
replaced by one of the other active cluster nodes.
CLUSTER_MESSAGE – Keeps
track of changes made to the BPEL cluster at the BPEL
system and domain level
that need to be distributed to all cluster nodes via the posting
of the cluster messages.
Columns include DOMAIN_ID, NODE_ID, MSG_TYPE,
MSG_TEXT, and MSG_DATE.
CLUSTER_NODE – Maintains a
list of active nodes in the cluster by NODE_ID and
IP_ADDRESS. Nodes are
added and removed from this table as nodes become active or
inactive. One of the
NODE_IDs in this table will also be designated as the master node
in the CLUSTER_MASTER
table.
Cluster messages added to
the CLUSTER_MESSAGE table include the following:
BPEL domain
creation/deletion from /BPELAdmin
BPEL system logger level
changes from /BPELAdmin
Process
deployment/undeployment via ANT, JDeveloper, or /BPELConsole
Process state on/off and
process Lifecycle Active/Retired from /BPELConsole
Clearing of WSDL cache
from /BPELConsole
Process Descriptor changes
from /BPELConsole
BPEL domain logger level
changes from /BPELConsole
BPEL domain configuration
(domain.xml) changes from /BPELConsole
Rows from the
CLUSTER_MESSAGE table are removed automatically at regular
intervals once their
distribution requirements have been satisfied.
Formerly when debugging a
JGroups problem you would set logger
“org.collaxa.thirdparty.jgroups”
to Debug level in the BPELAdmin console. For this
new database based
solution set the logger “collaxa.cube.cluster” to Debug in
BPELAdmin. These messages
log into the BPEL container logfile, e.g.
$ORACLE_HOME/opmn/logs/default_group~oc4j_soa~default_group~1.log.
4 Enterprise Service Bus
4.1
What is the ESB Multi-Tier Deployment Architecture?
Oracle ESB has a
multi-tiered architecture which makes it flexible for managing service
metadata separate from
runtime instances. The ESB designtime server (ESBDT)
provides interfaces for
all metadata changes from JDeveloper, ant-based import/export
or browser-based ESB
Control changes. The ESBDT communicates with the ESB
runtime server (ESBRT)
using JMS and provides to the ESB management console for
instance tracking, error
management and failed message resubmission.
The ESBRT loads an
in-memory cache at startup that contains all service metadata and
artifacts such that
services run straight from memory. Subsequently all interactions with
the Metadata server goes
through JMS, which offloads instance tracking and error
handling from the runtime
server. Additionally, the runtime server is highly available
and scalable and
guarantees message delivery across a distributed cluster topology.
In addition, there is a
data-tier that consists of a database and WebDav server. The
database stores the ESB
metadata and hosts the AQ messaging system used in a
clustered configuration.
The WebDav server publishes the ESB service related artifacts
such as WSDLs and XSDs.
4.2
What are the Characteristics of an ESB Cluster?
An ESB deployment has two
types of components:
ESB Runtime Server that
supports active-active and active-passive
configurations.
ESB Designtime Server
that supports active-passive configuration. This
component is sometimes
referred to as the ESB Repository Server.
ESB Designtime Server
(ESBDT) and ESB Runtime Server (ESBRT) must be run on
different OC4J containers.
In version 10.1.3.1, the ESBDT and ESBRT must then be
installed to different
Oracle Homes.
Starting in 10.1.3.3, the
ESBRT and ESBDT components can run in the same Oracle
Home, but in different
OC4J containers.
All ESBDTs and ESBRTs in a
cluster share the same Metadata repository and the same
AQ-JMS as the underlying
messaging system.
4.3
The ESB Standalone Installer Prompts for 3 Different Installation
Options.
Which Option Should I Choose?
The graphic Oracle
Universal Installer for ESB standalone prompts for 3 different
installation options:
Repository, Runtime, and Repository and Runtime.
If you are following the
Enterprise Deployment Guide, the Runtime option is used to
install ESBRT into the
same OC4J container as BPEL PM within an existing Oracle
Home. After upgrading to
the 10.1.3.3 patchset, the ESBRT can be installed via ant by
following section 3.19 of
the Enterprise Deployment Guide 10.1.3.3.
The Repository option
should only be chosen when installing the ESBDT into its own
Oracle Home. Starting in
10.1.3.3, that is no longer required. Note the 10.1.3.1 graphical
installer will not allow
an ESBDT installation into an existing Oracle Home with ESBRT
already deployed. For this
reason you must install ESBDT via ant deployment after the
10.1.3.3 patchset is
applied, as described in the Enterprise Deployment Guide 10.1.3.3
section 3.19.
The Repository and Runtime
option should never be selected for high-availability
installations. This
installs ESBRT and ESBDT into the same OC4J Container, which is
not a valid HA
configuration.
4.4
Why Does ESB Clustering Require AQ JMS Messaging?
Before AQ JMS (i.e.
database persistence) is configured the individual ESB nodes use
their own local JMS
artifacts that use file-based persistence. This is not a shared
repository and does not
offer messaging consistency or shared data. In order to
configure a highly
available and shared system, database persistence must be setup for
all ESBRTs and ESBDTs.
This ensures ESB JMS artifacts point to the same place in the
database for use in the
cluster. When the database itself is setup in a highly available
fashion using RAC this
provides all ESB components a common place to access JMS
artifacts and a redundant
database service with continuity after a RAC node failure.
4.5
Why is it Necessary to Add “service-failover="1"” and Remove the
“numprocs="1"”
in opmn.xml to Configure the ESBDT Cluster?
Adding service-failover="1"
configures OPMN to allow only one active ESBDT at any
time in the cluster. The
numprocs property defines the number of JVMs for that OC4J
container. Active-passive
topologies cannot be enforced if a container is running
multiple JVMs. Removing
the numprocs="1" is required to ensure the value is not
changed in the future.
If the numprocs property
is not removed in conjunction with adding the service-failover
attribute, OPMN will not
start the container. Use the following command to verify if
you have properly
configured your opmn.xml file.
> opmnctl validate
Note the active-passive
setup of ESBDT is not achieved until the nodes are clustered via
OPMN by either multicast
or static discovery.
4.6
Within the orion-application.xml File, What Does the primary_oc4j Setting
Indicate?
This only applies to
version 10.1.3.1 where the primary_oc4j flag setting of "true" means
the application should
function as an ESBDT instance, and a setting of "false" means the
application should
function as an ESBRT instance. Starting in 10.1.3.3, this distinction is
no longer manually
configured and the primary_oc4j parameter is not relevant and can
be ignored.
4.7
Where do I Find the create_esb_topics.sql File? How do I Verify if the
ESB
Topics Have been Created?
The Enterprise Deployment
Guide section 3.22 references the create_esb_topics.sql file.
This file is located at
$ORACLE_HOME/integration/esb/sql/oracle. To verify if the ESB
topics exist, login to the
database as user oraesb and execute the following SQL
statement:
select object_name,
object_type from dba_objects where
object_type like 'QUEUE'
and object_name like '%ESB_%';
In the subsequent display
verify that the following queues exist:
ESB_JAVA_DEFERRED,
ESB_CONTROL, ESB_ERROR, ESB_ERROR_RETRY,
ESB_MONITOR
4.8
What is the Advantage to Deploying a Load-Balancer in Front of the
ESBDT
Nodes?
The ESBRT nodes poll the
ESBDT node for certain metadata information during
runtime (e.g. service
WSDLs and XSDs for BPEL processes). By incorporating a loadbalancer
in front of the ESBDT
active and passive nodes you can virtualize the IP
address of the ESBDT node
and this enables a more seamless ESBDT failover. Without
the virtual IP address,
when a failover occurs an administrator will need to manually
reconfigure the ESBRT
nodes to poll the new ESBDT active node.
4.9
How Can I Manually Configure the ESBRT Nodes to Point to a New
ESBDT?
Run the following SQL
script within the oraesb schema, replacing the param_value
parameters with your
appropriate values.
update esb_parameter set
param_value='<ESBDT hostname >' where
param_name =
'DT_OC4J_HOST';
update esb_parameter set
param_value='<ESBDT port>' where
param_name =
'DT_OC4J_HTTP_PORT';
Restart all ESB instances
after updated these parameters.
4.10
What is the Runtime Interaction between ESBRT and ESBDT?
The primary mechanism of
interaction between ESBDT and the ESBRT nodes is through
JMS topics.
The ESBDT subscribes to
both a monitor and error topic to which all ESBRT instances
publish their monitoring
and error messages. ESBDT then persists that information to
the database.
The ESBRT subscribes to
both an administration and retry topic to which the ESBDT
publishes control and
retry messages.
As mentioned above, the
ESBRT nodes also poll the ESBDT node for certain metadata
information during runtime
(e.g. service WSDLs and XSDs for BPEL processes).
4.11
Is it Necessary to Install ESBRT and BPEL in the Same Oracle Home?
No. However, depending on
the version, ESBRT clustered adapters will attempt to use
the jgroups-protocol.xml
file in the normal location of a BPEL installation
($OH/bpel/system/config/jgroups-protocol.xml).
If not present, the adapters will
default to an internal
configuration that uses UDP.
4.12
Is it Necessary to Install ESBRT and BPEL in the Same OC4J Container?
No, but when ESBRT and
BPEL are installed into the same container, the following
benefits apply:
Native Java calls between
ESB and BPEL
Transaction propagation
between ESB and BPEL
XREF and DVM functions are
available in BPEL
For interactions between
BPEL and ESB instances, drill-through from the ESB
console flow trace to BPEL
console flow trace and vice-versa
4.13
What are the Options for Updating ESB Metadata, i.e. the
oraesb.esb_parameter
Table?
Option 1: use the provided
ant utility as describe in the Enterprise Deployment Guide
section 3.24.
Option 2: use the
following SQL script. Note you must replace the hostname and port
parameter values with the
hostname and port to access your ESBDT instance.
delete esb_parameter
where param_name =
'PROP_NAME_DEFERRED_TOPIC_JNDI';
delete esb_parameter
where param_name =
'PROP_NAME_INITIAL_CONTEXT_FACTORY';
delete esb_parameter
where param_name = 'ACT_ID_RANGE';
insert into
esb_parameter
values('PROP_NAME_DEFERRED_TOPIC_JNDI','ESBTopics/Topics/ESB_JAVA
_DEFERRED');
insert into
esb_parameter
values('PROP_NAME_INITIAL_CONTEXT_FACTORY',
'com.evermind.server.rmi.RMIInitialContextFactory');
insert into
esb_parameter values('ACT_ID_RANGE', '400');
update esb_parameter set
param_value='<hostname of the load
balancer serving the
ESBDT>' where param_name = 'DT_OC4J_HOST';
update esb_parameter set
param_value='<load balancer port serving
ESBDT>' where
param_name = 'DT_OC4J_HTTP_PORT';
update esb_parameter set
param_value ='OracleOJMS/TCF' where
param_name = 'PROP_NAME_DEFERRED_TCF_JNDI';
update esb_parameter set
param_value ='OracleOJMS/XATCF' where
param_name =
'PROP_NAME_DEFERRED_XATCF_JNDI';
update esb_parameter set
param_value
='ESBTopics/Topics/ESB_CONTROL'
where param_name =
'PROP_NAME_CONTROL_TOPIC_JNDI';
update esb_parameter set
param_value ='OracleOJMS/XATCF' where
param_name =
'PROP_NAME_CONTROL_TCF_JNDI';
update esb_parameter set
param_value
='ESBTopics/Topics/ESB_ERROR'
where param_name =
'PROP_NAME_ERROR_TOPIC_JNDI';
update esb_parameter set
param_value ='OracleOJMS/TCF' where
param_name =
'PROP_NAME_ERROR_TCF_JNDI';
update esb_parameter set
param_value ='OracleOJMS/XATCF' where
param_name =
'PROP_NAME_ERROR_XATCF_JNDI';
update esb_parameter set
param_value
='ESBTopics/Topics/ESB_ERROR_RETRY'
where param_name =
'PROP_NAME_ERROR_RETRY_JNDI';
update esb_parameter set
param_value ='OracleOJMS/XATCF' where
param_name =
'PROP_NAME_ERROR_RETRY_TCF_JNDI';
update esb_parameter set
param_value
='ESBTopics/Topics/ESB_MONITOR'
where param_name =
'PROP_NAME_MONITOR_TOPIC_JNDI';
update esb_parameter set
param_value ='OracleOJMS/TCF' where
param_name =
'PROP_NAME_MONITOR_TCF_JNDI';
update wf_agents set
tcf_jndi='OracleOJMS/XATCF' where
queue_type='DEFERRED';
update wf_agents set
name ='ESBTopics/Topics/ESB_JAVA_DEFERRED'
where
queue_type='DEFERRED';
update wf_agents set
queue_name
='ESBTopics/Topics/ESB_JAVA_DEFERRED'
where
queue_type='DEFERRED';
commit;
4.14
What is the Relationship between the ESB Configuration Files
esb_config.ini
and orion-application.xml?
The settings in
$ORACLE_HOME/integration/esb/config/esb_config.ini file apply to all
ESB instances in the
Oracle Home.
If specified, the
$ORACLE_HOME/j2ee/<OC4J-container>/applications/<esb-rt or esbdt>/
META-INF/orion-application.xml
overrides the esb_config.ini settings for that ESB
instance.
4.15
Where Should I Specify the cluster_name Parameter for ESBRT
Instances?
Does ESBDT Require a cluster_name Setting?
In practice, it’s best to
set the cluster_name for all ESBRTs in a single Oracle Home
through the cluster_name
property in esb_config.ini. If you have requirements for
separate ESBRT clusters in
a single Oracle Home then override the esb_config.ini
property in the
appropriate orion-application.xml files.
Starting in 10.1.3.3,
cluster_name for an ESBDT instance is not relevant; you do not need
to set this property in
the ESBDT orion-application.xml.
4.16
What are the Correct ESB Console Settings for an ESB System in a
Cluster?
Cluster
Name: This should match the cluster_name provided in the
ESBRT's
esb_config.ini or
orion-application.xml file for the ESBRT(s) where you want the
deployed services to run.
Virtual
Host: Hostname of the ESBRT load balancer
Port:
HTTP port of the ESBRT load balancer
Topic
Location: ESBTopics/Topics/ESB_JAVA_DEFERRED (as defined in
the
PROP_NAME_DEFERRED_TOPIC_JNDI
of the ESB Metadata)
Connection
Factory Location: OracleOJMS/XATCF (as defined in
the
PROP_NAME_CONTROL_TCF_JNDI,
PROP_NAME_DEFERRED_XATCF_JNDI,
PROP_NAME_ERROR_RETRY_TCF_JNDI,
PROP_NAME_ERROR_XATCF_JNDI)
4.17
What is the Relationship Between ESB Systems and ESB Runtime
Clusters?
ESB systems are groupings
of ESB services defined in either JDeveloper or through the
ESB console, and visible
in the ESB console. Each system is configured for deployment
to an ESB runtime cluster
on the system definition page (multiple systems can be
assigned to the same
runtime cluster). The services of that ESB system are then
available on nodes running
that runtime ESB cluster, i.e. onto nodes where the
esb_config.ini or
orion-application.xml cluster_name property is a match.
4.18
What are the Advantages to Using Multiple ESB Systems in a Clustered
Runtime
Installation?
The services in an ESB
system can be scaled as needed across many runtime servers.
Each ESBRT server is
assigned to a single cluster that configures it to load services into
its memory cache only from
that cluster (group of ESB systems).
If all ESB Systems are
configured to the same ESB cluster, this is referred to as a
symmetric cluster. This
provides high availability such that if one ESBRT server goes
down other ESBRT servers
can process requests.
If ESB systems are
configured to different ESB clusters that is called an asymmetric
cluster. In this
configuration, different ESBRT servers will provide a different set of ESB
services.
A hybrid approach can
utilize both symmetric and asymmetric ESB runtime clusters to
both scale up services in
an ESB system as well as make them highly-available by
running more than one
ESBRT server against each cluster.
4.19
How do you Apply a Patch When ESBDT and ESBRT are in Separate
Containers?
Note:
This only applies to 10.1.3.3 installs up to MLR#8. Starting with 10.1.3.3 MLR#9,
the
patch
is automatically applied to all containers in the same Oracle Home.
The scenario is you have
installed ESBRT and BPEL into one OC4J container, e.g.
OC4J_SOA. ESBDT is
installed in the same application server but in a different
container, e.g.
OC4J_ESBDT. You now want to apply patch on top of this installation, in
both containers.
See MetaLink Note
549995.1: "How To Patch ESB RT And ESB DT Running In Separate
Container But In Single
Oracle Home".
4.20
Are There Other Recommendations for ESBDT Active-Passive
Configuration?
OPMN manages the
active-passive failover for each ESBDT instance by pinging the
active instance to check
its state. In order to reduce risk of a false positive (i.e. declaring
the active ESBDT down when
actually still up), you should increase the ping timeout
setting in the ESBDT
<process-type> element. The default is timeout value is 20 seconds
and by adding the
<pingtimeout> element you can increase it. See the OPMN
documentation for more
details.
<pingtimeout="timeout"
retry="num" interval="interval"/>
4.21
Why Doesn’t ESBDT Support an Active-Active Topology?
ESBDT subscribes to both a
monitor and error topic to which all ESBRT instances
publish their monitoring
and error messages. ESBDT then persists that information to
the database. Two ESBDT
instances active at the same would duplicate instance and
error data in the database
as each instance inserts the same data. ESBDT also maintains
an in-memory cache of
metadata; that information could get out of sync with two
ESBDTs.
Adapters
5.1
Which Technology Adapters Use Active-Passive Clustering?
Oracle SOA Suite includes
many technology adapters, of which File and FTP are
commonly configured in an
active-passive configuration (aka “singleton” setup). This is
due to file system
restrictions that make it impossible to avoid race conditions when
two nodes attempt to
retrieve the same file. Note circumstances sometimes require
active-passive configuration
of AQ or JMS adapters that subscribe to the same
messaging topic since each
adapter instance will consume the same message.
5.2
How Do I Setup a Singleton Adapter for BPEL?
Within the bpel.xml of the
BPEL process, add the “clusterGroupId” property in the JCA
Activation Agent for the
adapter.
If the BPEL PM servers in
the cluster are located across TCP/IP subnet boundaries, then
it is necessary to add the
attribute “clusterAcrossSubnet=true”.
This configuration uses
the default jGroups properties. To override those properties,
use the relevant
customizations below. Notice the convention of prefixing each
customized property name
with the name of the cluster group id.
<activationAgents>
<activationAgent …
partnerLink="MyInboundAdapterPL">
<property
name="clusterGroupId">myAdapterCluster</property>
<property
name="myAdapterCluster_mcast_addr">
224.0.0.35</property>
<property
name="myAdapterCluster_mcast_port">
45566</property>
<property
name="myAdapterCluster_ip_ttl">
32</property>
<property
name="myAdapterCluster_verify_suspect_timeout">
1500</property>
<property
name="myAdapterCluster_join_timeout">
5000</property>
<property
name="myAdapterCluster_join_retry_timeout">
2000</property>
<property
name="clusterAcrossSubnet">true</property>
<activationAgents>
<activationAgent
className="..." partnerLink="MyInAdapterPL">
<property
name="clusterGroupId">myBpelAdapterCluster
</property>
For more information, see
MetaLink Note 730515.1 How to Enable a Singleton Adapter
in BPEL Cluster
Environment.
5.3
How Do I Setup a Singleton Adapter for ESB?
In an ESB cluster the
adapters are deployed into all nodes of the cluster and by default
are active in all of the
nodes. To avoid race conditions adapters such as File or FTP, it’s
necessary to configure the
inbound adapter to be a singleton.
To configure an ESB
adapter as a singleton, open the ESB project in JDeveloper and edit
the *.esbsvc file for the
adapter and add the following property:
Alternatively, add the
same endpoint property via the Properties tab of the inbound
adapter service in the ESB
console.
For more information, see
[E10294-02] – Configuring Oracle Enterprise Service Bus for
Singleton Adapters, and
MetaLink Note 746108.1 How to Enable a Singleton Adapter in
an ESB Cluster
Environment.
5.4
How Does the Adapter Cluster Work?
The Adapter cluster uses
jGroups technology to communicate between adapter
instances and maintain an
active-passive configuration. There are differences in the
specific jGroups
configuration used depending on the BPEL or ESB version.
a) In SOA 10.1.3.3 both
BPEL and ESB use the jgroups-protocol.xml for the
configuration properties.
b) Starting in SOA
10.1.3.3 MLR#11, BPEL adapters use their own internal jGroups
configuration (separate
from jgroups-protocol.xml and not exposed) with UDP. To
force adapters clustering
to use the jgroups-protocol.xml, specify the following
parameter in the bpel.xml
of the BPEL process:
c) Starting in 10.1.3.4
MLR#5, ESB also uses its own internal jGroups configuration,
based on UDP. Similarly,
to configure an ESB adapter to use the jgroupsprotocol.
xml configuration, add the
endpoint property as in (b) above.
In a clustered
configuration, only one adapter instance will be allowed to start reading
or publishing messages.
The adapter framework instance initially chooses one at
random to assume the
primary activation. If a primary activation becomes
unresponsive, then any of
the remaining members of the cluster group will immediately
useJgroupConfigFile=true
<endpointProperties>
<property
name="clusterGroupId" value="ESB-Adapter-
Cluster"/>
</endpointProperties>
18
detect this, and reassign
the primary activation responsibility to one of activation agents
standing by.
5.5
What if ESB and BPEL are Not Installed Together? Does that Effect my
Adapter
Cluster Configuration?
If ESB-RT and BPEL are
installed into different containers and ESB-RT needs the
jgroups-protocol.xml file
for its jGroups properties (see above for when this is true), edit
the file
ORACLE_HOME/opmn/conf/opmn.xml and look for the ESB-RT container
definition. There, add
“-Dorabpel.home” to the JVM startup parameter. This will tell
ESB-RT where to find the
jgroups-protocol.xml file.
5.6
Can I Have More Than One Primary Activation Agent to Process High
Volume
Requests?
No, within one logical
cluster group, only one activation agent can be the primary. If
you need multiple primary
activations, you need as many separate cluster groups. If
there is only one cluster
group defined, only the (currently) primary activation will
process messages from the
endpoint.
5.7
What is the Relationship between the Adapter Cluster and the BPEL or
ESB
Cluster?
The “clusterGroupId”
property creates an adapter cluster that is independent of the
BPEL or ESB cluster.
Depending on the version and your configuration, they can share
the same jGroups
properties.
Always set an adapter’s
“clusterGroupId” to a different value than the names of your
BPEL and ESB clusters.
5.8
Should I Use TCP or UDP Communication in an Adapter Cluster?
If there are more than 2
(two) singleton adapters defined in an adapter cluster, use UDP
instead of TCP. If the
cluster spans across multiple subnets you can still use UDP with
the
“clusterAcrossSubnet=true” property on each adapter endpoint.
5.9
What Loggers are Useful for Adapters?
The essential BPEL loggers
for adapter troubleshooting are:
<domain_name>.collaxa.cube.activation
<domain_name>.collaxa.cube.ws
To change logger log
levels, go to the BPEL Console under domain configuration in the
Logging tab.
<process-type
id="oc4j_esbrt" module-id="OC4J"
status="enabled">
<module-data>
<category
id="start-parameters">
<data id="java-options"
value="-server …
-Dorabpel.home=/home/oracle/product/10.1.3.1/OracleAS_1/bpel
"/>
5.10
How Can I Improve Throughput with Active-Passive Adapters in BPEL?
Within BPEL the adapter
framework supports a server distribution fan-out pattern for
inbound messages. This
allows a single active endpoint to evenly spread (round robin)
the inbound message it
receives to create process instances across the BPEL servers in
the cluster.
This load distribution
feature helps improve throughput in an active-passive cluster
and can be configured in
the bpel.xml JCA Activation Agent:
The port numbers are the
ORMI (request) port of the OC4J instance where BPEL is
deployed. All BPEL servers
mentioned in the "bpelServers" property must have the
same server credentials.
5.11
How Do I Check if an Activation Agent is Setup Correctly for BPEL?
When starting the
container you should see in the OPMN log the following lines that
show the JCA Activation
Agent for each PartnerLink in every process loaded
successfully:
<2009-02-19
11:25:47,643> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
Loading
JCAActivationAgent for {blockingStartStop=false,
clusterGroupId=kclugage-adapter-cluster,
portType=Get_ptt}
<2009-02-19
11:25:47,643> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
JCAActivationAgent::load
- Locating Adapter Framework instance:
OraBPEL
<2009-02-19
11:25:47,658> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
Instantiating inbound
part of Adapter Framework instance: OraBPEL
<2009-02-19
11:25:47,689> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
JCAActivationAgent::load
- Done loading JCAActivationAgent for
processId='bpel://localhost/default/pool_FTP~1.0/
<activationAgent
className="oracle.tip.adapter.fw.agent.jca.JCAActivationAgent"
partnerLink="AdapterInboundPL">
<property
name="bpelServers">
bpel.host1.net:23791,
bpel.host2.net:23791,
bpel.host3.net:23791
</property>
<property
name="clusterGroupId">myBpelAdapterCluster
</property>
</activationAgent>
Next the JCA Activation
Agent is initialized and will join the adapter cluster, using the
UDP protocol in this
example:
5.12
How Do I Monitor Activation Agent Failover in BPEL?
Check the OPMN logs of a
BPEL container that is still running. You should see a new
node is now the Primary
Activation in the cluster group.
5.13
How Do I Monitor Adapter Activation in ESB?
If clustering is setup
correctly, a properly configured singleton adapter in ESB will only
be active in one ESBRT
instance. The container log.xml file of that instance (at
$ORACLE_HOME/j2ee/<container_name_ESBRT>/log/default_group/oc4j/log.xml)
will
have the following output:
<2009-02-19
11:25:54,533> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
JCAActivationAgent::init
- Initializing the JCA activation agent,
processId='bpel://localhost/default/pool_FTP~1.0/
<2009-02-19
11:25:54,533> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
JCAActivationAgent
joining cluster group kclugage-adaptercluster1.0
Feb 19, 2009 11:25:55 AM
org.collaxa.thirdparty.jgroups.protocols.UDP
createSockets
INFO: sockets will use
interface 144.25.146.161
<2009-02-19
17:07:15,590> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
JCAActivationAgent
appointed primary activation in cluster group
kclugage-adapter-cluster1.0
<2009-02-19
17:07:15,590> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
Adapter Framework
instance: OraBPEL - endpointActivation for
portType=Get_ptt,
operation=Get
<2009-02-19
17:07:15,606> <INFO>
<default.collaxa.cube.activation>
<FTP Adapter::Inbound> ENDPOINT
ACTIVATION CALLED IN FTP
ADAPTER
<2009-02-19
17:07:15,606> <INFO>
<default.collaxa.cube.activation>
<AdapterFramework::Inbound>
Adapter Framework
instance: OraBPEL - successfully completed
endpointActivation for
portType=Get_ptt, operation=Get
Starting initialization:
Initialization complete:
<MESSAGE>
<HEADER>
<TSTZ_ORIGINATING>2009-02-20T09:35:41.408-
08:00</TSTZ_ORIGINATING>
<COMPONENT_ID>tip</COMPONENT_ID>
<MSG_TYPE
TYPE="NOTIFICATION"></MSG_TYPE>
<MSG_LEVEL>1</MSG_LEVEL>
<HOST_ID>kclugage-pc2</HOST_ID>
<HOST_NWADDR>144.25.146.161</HOST_NWADDR>
<MODULE_ID>esb.server.service.impl.inadapter</MODULE_ID>
<THREAD_ID>25</THREAD_ID>
<USER_ID>SYSTEM</USER_ID>
</HEADER>
<CORRELATION_DATA>
<EXEC_CONTEXT_ID><UNIQUE_ID>144.25.146.161:66483:1235151341408:6
</UNIQUE_ID><SEQ>0</SEQ></EXEC_CONTEXT_ID>
</CORRELATION_DATA>
<PAYLOAD>
<MSG_TEXT>Creating
and initializing inbound JCA endpoint for
service: "Get"
WSDL location:
"esb:///ESB_Projects/SOACloneTests_Pool_FTP_ESB/pool_ftp_in.wsdl
" portType:
"Get_ptt" activation properties:"Get"</MSG_TEXT>
</PAYLOAD>
</MESSAGE>
<MESSAGE>
<HEADER>
<TSTZ_ORIGINATING>2009-02-20T09:35:41.455-
08:00</TSTZ_ORIGINATING>
<COMPONENT_ID>tip</COMPONENT_ID>
<MSG_TYPE
TYPE="NOTIFICATION"></MSG_TYPE>
<MSG_LEVEL>1</MSG_LEVEL>
<HOST_ID>kclugage-pc2</HOST_ID>
<HOST_NWADDR>144.25.146.161</HOST_NWADDR>
<MODULE_ID>esb.server.service.impl.inadapter</MODULE_ID>
<THREAD_ID>25</THREAD_ID>
<USER_ID>SYSTEM</USER_ID>
</HEADER>
<CORRELATION_DATA>
<EXEC_CONTEXT_ID><UNIQUE_ID>144.25.146.161:66483:1235151341408:6
</UNIQUE_ID><SEQ>14</SEQ></EXEC_CONTEXT_ID>
</CORRELATION_DATA>
<PAYLOAD>
<MSG_TEXT>Successfully
finished endpoint activation for
operation
"DefaultSystem.pool_ftp_in_RS.Get".</MSG_TEXT>
</PAYLOAD>
</MESSAGE>
5.14
What Additional Configuration is Necessary for File and FTP Adapters in
a
Cluster?
The File and FTP adapters
use a control file to coordinate information between
instances, such as the
sequence numbering of output files. In a clustered environment,
this control file must be
shared between all instances.
1. Create a folder on a
shared file system. This folder should have write
permission and be
accessible from all the systems that are running File and FTP
adapters. This is the
shared folder that will store the control files.
2. Edit the pc.properties
file available in the
ORACLE_HOME\bpel\system\service\config
directory on each of the nodes,
and set
oracle.tip.adapter.file.controldirpath to the shared folder name.
3. Restart the servers.
Please note that in the
case of ESB, you will need to rename
ORACLE_HOME\integration\esb\config\pc.properties.esb
to pc.properties and make
the relevant changes
there.
5.15
How Do I Enable Distributed Polling for an Inbound Database Adapter?
Inbound database adapters
in an active-active cluster configuration must be
coordinated to ensure two
instances polling a database table do not retrieve the same
rows (thereby creating
duplicate instances). Reservation Distributed Polling is the twostep
polling strategy used to
achieve this coordination. The first step is to reserve the
rows; the second step is
for processing. There are two types available:
MarkReservedValue
(Not Recommended for Clusters)
Each adapter instance is
configured with a unique MarkReservedValue that is
documented in the BPEL
tutorial:
OH/bpel/samples/tutorials/122.DBAdapter/advanced/polling/DistributedPolling
Note: The
MarkReservedValue of an adapter is unique only to that node (each adapter
thread is assigned its own
id), not across the cluster. Thus this type of Reservation
Distributed Polling can
only be used in non-clustered environments; otherwise
duplicate row processing
will occur.
Select
for Update (Recommended)
With the “select for
update no wait”, the rows are reserved and processed in one
transaction, so no
processing on the same rows can occur in parallel.
To use this feature,
within in the database adapter do not input a reserved value and check
the “Distributed Pooling”
checkbox in the database adapter wizard; Select For Update
No Wait will be used.
Note: If you are on 10.1.3.3.x
version you need to apply Patch 6690511. This patch is
already included in
10.1.3.4.
For more information, see
MetaLink Note 556375.1 How to Configure Distributed
Polling To Accomplish
Logical Delete Strategy for DB Adapter In HA Environment.
5.16
Can MQ and JMS Adapters Operate in an Active-Active Cluster?
JMS adapters can operate
as active-active only with DB persisted queues (AQ).
MQ adapters in 10.1.3.1 do
not support an active-active cluster, duplicate message reads
will occur. The 10.1.3.3
MQ adapter does support an active-active cluster.
For more information, see
MetaLink Note 452176.1 - Q: MQ-Adapter and Jms-Adapter:
Can they Operate In An
Active-Active Cluster?
6 Troubleshooting
6.1
How Can I Verify my ESB Cluster is Setup Correctly?
Reference MetaLink Note
470267.1 How To Verify ESB Cluster Configuration? This
provides instruction to
verify the following ESB components:
Register ESB demo project using Ant
Verify ESB cluster
configuration
Verify Asynchronous
routing rule
Verify Error Handling and
Resubmit function
Delete ESB services
6.2
Invoking ESB Services Registered In ESB Cluster Fails With Error
"Cannot
find service on current cluster”
Full error message:
oracle.tip.esb.server.common.exceptions.BusinessEventRetriableException:
Cannot find
service "" on
current cluster
Reference MetaLink Note
430348.1: Invoking ESB Services Registered In ESB Cluster
Fails With Error
"Cannot find service on current cluster".
This is caused by
incorrect configuration of the Slide Repository and JNDIs for the
Topic and Topic Connection
Factory in both the ESBDT and ESBRT servers. If they are
not configured correctly,
ESB web service providers cannot be generated after
registering ESB services.
The solution is provided by following the steps in the
Enterprise Deployment
Guide as outlined in the Note.
6.3
Error in log.xml File During ESB Startup: "Invalid Destination
Topics/ESB_MONITOR"
and "No resource named
'ojmsRP/Topics/ESB_MONITOR'
found"
The following functional
symptoms are also present:
SOAP Endpoint URI shown
in a successfully registered ESB service is not
accessible in a browser.
ESB service does not run
successfully.
This is caused by
incorrect configuration of JNDIs for the Topic and Topic Connection
Factories for one or more
ESBRTs and ESBDTs.
This can be fixed by
following section 3.23 Configuring JNDIs for the Topic and Topic
Connection Factory of the
Enterprise Deployment Guide 10.1.3.3, making sure that
these steps are done on
all of the ESBRTs and ESBDTs.
6.4
After Cluster Node Fails, ESB Returns Connection Refused Error
When an ESBRT cluster node
fails, the remaining node throws the following exception:
ORABPEL-12600
Generic error.
oracle.tip.esb.server.common.exceptions.BusinessEventRetriableExc
eption: An unhandled
exception
has been thrown in the
ESB system. The exception reported is:
"org.collaxa.thirdparty.apache.wsif.WSIFException:
exception on
JaxRpc invoke: HTTP
transport
error:
javax.xml.soap.SOAPException:
java.security.PrivilegedActionException:
javax.xml.soap.SOAPException:
Message send failed: Connection
refused
This is caused by
connections being cached during failover and can be solved by editing
the opmn.xml file to add
-DHTTPClient.disableKeepAlives=true to the ESBRT JVM
startup parameters.
Reference MetaLink Note 560228.1: “Upon Cluster Failover, ESB
returns Connection refused
error.”
6.5
When Deploying a BPEL Process to a Node in the Cluster, it Does Not
Propagate
Across the Cluster
Prior to 10.1.3.3 MLR#11,
in order to propagate across the cluster, the BPEL process
must be deployed with an
incremental version number. In 10.1.3.3 MLR#11 or greater,
process re-deployments
with the same version number are automatically propagated.
If the process does not
propagate across the cluster, there is most likely an issue with
the BPEL jGroups
configuration.
A simple test for
verifying the clustering configuration is to create a new BPEL domain,
then verify the domain
folder is created on the file system for each node under
$OH/bpel/domains. If the
new domain sub-folder is created only on one node the
clustering configuration
is not correct. See the jGroups related troubleshooting tips in
this section.
Note in BPEL 10.1.3.5.1,
jGroups was replaced with an internal database-based
mechanism to provide
cluster communication between BPEL nodes. See the section
“Replacement of JGroups
for BPEL Clustering…” in this paper for more details.
6.6
Verify jGroups Initializes Correctly
Check the opmn log for the
BPEL container for all nodes at
$ORACLE_HOME/opmn/logs/<group
name><container name><group name>~1.log.
This logfile will contain
jGroups related information during startup and steady-state
operation. Soon after
startup you should find log entries for UDP or TCP.
Example
jGroups Log Entries for UDP
Apr 3, 2008 6:30:37 PM
org.collaxa.thirdparty.jgroups.protocols.UDP
createSockets
INFO: sockets will use
interface 144.25.142.172
Apr 3, 2008 6:30:37 PM
org.collaxa.thirdparty.jgroups.protocols.UDP
createSockets
INFO: socket
information:
local_addr=144.25.142.172:1127,
mcast_addr=228.8.15.75:45788,
bind_addr=/144.25.142.172,
ttl=32
sock: bound to
144.25.142.172:1127, receive buffer size=64000,
send buffer size=32000
mcast_recv_sock: bound
to 144.25.142.172:45788, send buffer
size=32000, receive
buffer size=64000
mcast_send_sock: bound
to 144.25.142.172:1128, send buffer
size=32000, receive
buffer size=64000
Apr 3, 2008 6:30:37 PM
org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler
bindToInterfaces
\------------------------------------------------------\-
GMS: address is
144.25.142.172:1127
\------------------------------------------------------\-
Example
jGroups Log Entries for TCP
Apr 3, 2008 6:23:39 PM
org.collaxa.thirdparty.jgroups.blocks.ConnectionTable
start
INFO: server socket
created on 144.25.142.172:7900
Apr 3, 2008 6:23:39 PM
org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler
bindToInterfaces
\------------------------------------------------------\-
GMS: address is
144.25.142.172:7900
\------------------------------------------------------\-
In the log below the
"socket created on" indicates that the TCP socket is established on
the own node at that IP
address and port the "created socket to" shows that the second
node has connected to the
first node, matching the logfile above with the IP address and
port.
Example
jGroups Log Entries for TCP, with Connection to 2nd Node
Apr 3, 2008 6:25:40 PM
org.collaxa.thirdparty.jgroups.blocks.ConnectionTable
start
INFO: server socket
created on 144.25.142.173:7901
Apr 3, 2008 6:25:40 PM
org.collaxa.thirdparty.jgroups.protocols.TP$DiagnosticsHandler
bindToInterfaces
\------------------------------------------------------\-
GMS: address is
144.25.142.173:7901
\------------------------------------------------------\-
Apr 3, 2008 6:25:41 PM
org.collaxa.thirdparty.jgroups.blocks.ConnectionTable
getConnection
INFO: created socket to
144.25.142.172:7900
Verify that the IP address
in the line "INFO: sockets will use interface" (for UDP) or
"INFO: server socket
created on" and "INFO: created socket to" (for TCP) are correct.
If
not, refer to the next
troubleshooting section with instructions for setting the bind_addr
property.
6.7
Verify jGroups is Binding to the Correct Network Interface for UDP or
TCP
Machines often have
multiple network interfaces, virtual or physical, that can confuse
jGroups communication if
jGroups binds to the wrong one.
In the UDP or TCP section
of jgroups-protocol.xml, specify the machine IP address that
jGroups should use for
binding to that node's network interface by adding the
bind_addr property. This
will prevent jGroups from binding to an incorrect NIC
interface/IP address on
its own node.
UDP
Settings with bind_addr
<config>
<UDP
mcast_send_buf_size="32000"
...
ucast_recv_buf_size="64000"
mcast_addr="228.8.15.75"
bind_addr="<xxx.xxx.xxx.xxx>"
receive_on_all_interfaces="false"
loopback="true"
...
</config>
TCP
Settings with bind_addr and receive_on_all_interfaces
<config>
<TCP
bind_addr="<xxx.xxx.xxx.xxx>"
receive_on_all_interfaces="false”
...
Alternatively, add the
following startup parameter to the JVM settings in opmn.xml.
This overrides any value
set in the jgroup-protocol.xml file.
-Dbind.address=<xxx.xxx.xxx.xxx>
Save the opmn.xml file,
the execute opmnctl reload and opmnctl
restartproc
process-type=<container>
from the command line.
Note the value of
bind_addr will vary from node to node depending on the IP address
that the particular node
will use to communicate with other members of the cluster.
6.8
Test Connectivity between BPEL Nodes
Test connections between
different cluster nodes using ping, telnet, and traceroute. The
presence of firewalls and
number of hops between cluster nodes can affect performance
as they have a tendency to
take down connections after some time or simply block
them.
Also reference MetaLink
Note 413783.1: “How to Test Whether Multicast is Enabled on
the Network.”
6.9
Verify UDP jGroups Settings for mcast_addr and mcast_port
mcast_addr:
set this to a value in the multicast address range of 224.0.0.0 to
239.255.255.255, this
value must be the same on all nodes of the cluster. The default is
228.8.15.75, it is not
necessary to use this value. Ping the IP address and choose a
different address if you
receive any responses as this means that address is already in
use. It’s a good idea to
check with your network admin as well.
mcast_port:
set this to a free port on all nodes of the cluster, this value
must be the same
on each. The default in
the file is 45788, it is not necessary to use this value.
6.10
Verify TCP jGroups Settings for start_port, initial_hosts and port_range
start_port:
when BPEL initializes, this is the first port number where jGroups
will
attempt to establish a TCP
socket. If the listed port is not available jGroups will continue
to increment by one until
a free port is found and bound to successfully. This value can
be set to any value and
can vary from node to node.
initial_hosts
and port_range: initial_hosts is a list of all hosts in the
cluster that this
node will attempt to
connect via jGroups using TCP. The port number enclosed in [] is
the first port that the
own node will attempt to reach on the listed host. A TCP binding
will be established with
each port starting from the first port (enclosed by []) and each
successive port
incrementing by one until the number of ports tried equals the value of
port_range.
6.11
jGroup Nodes May Drop Out of the BPELCluster
Check the jGroup logging
within the OPMN log file to ensure the intended nodes have
not been dropped from the
jGroup cluster. This can happen due to network outages or
high CPU utilization on
the affected nodes that disrupt the heartbeat messages required
to maintain active cluster
membership.
To reduce risk of nodes
dropping from the cluster you can increase the fault detection
FD
timeout setting. FD timeout specifies the maximum time in
milliseconds for a
cluster node to respond to
a message. A node will be dropped from a cluster after the
max number of tries is
reached without a response. Both the UDP and TCP sections use
FD timeout and max_tries.
Patch 6354719 changed
jGroups behavior to automatically rejoin its own node to the
cluster, if it drops out
for some reason. This patch is available as either a standalone
patch in releases 10.1.3.1
and 10.1.3.3 or in 10.1.3.3.1 MLR #2 and above.
Alternatively, restart the
BPEL server.
6.12
Enable Detailed jGroups Logging
1.
Stop the BPEL OC4J container
2.
Open the
$OH/j2ee/<BPEL-OC4J-container>/config/j2ee-logging.xml and add the
following log_handler for
jGroups. This will create a log.xml file in the path
directory with jGroups
debug messages.
jGroups
Log Handler Configuration, Part I
<log_handler
name="jgroups-handler"
class="oracle.core.ojdl.logging.ODLHandlerFactory">
<property
name="path"
value="%ORACLE_HOME%/bpel/system/logs/jgroups"/>
<property
name="maxFileSize" value="10485760"/>
<property name="maxLogSize"
value="104857600"/>
<property
name="encoding" value="UTF-8"/>
<property
name="supplementalAttributes"
value="J2EE_APP.name,J2EE_MODULE.name"/>
</log_handler>
3.
In the same file add the following logger, just below the
org.quartz logger
jGroups
Log Handler Configuration, Part II
<logger
name="org.collaxa.thirdparty.jgroups" level="FINEST"
useParentHandlers="false">
<handler
name="jgroups-handler"/>
</logger>
4.
Start the BPEL OC4J container and you should start to see jGroups
trace level
messages in the log.xml
output file
The above would set the
jGroups logger to FINEST which is log4j TRACE level. If you
want to see only ERROR
message, you can use the em-console to change the level for
this
"org.collaxa.thirdparty.jgroups" to SEVERE or you can directly edit
this file change
the level to
"SEVERE".
6.13
A New BPEL Domain is Created When a Cluster Node is Down, That
Node
Fails to Pickup New Domain on Restart
This is addressed by patch
6086453, available in the 10.1.3.3.1 initial patchset and above.
6.14
BPEL Processes Missing from Console When BPEL Cluster Uses a BigIP
Loadbalancer
with Persistence Set to On
When BPEL PM restarts, it
first checks the connectivity of the soap server url defined in
collaxa-config.xml. When
the soap server URL is set to a BigIP load balancer virtual
hostname, BPEL PM doesn't
receive an HTTP response 200 because a GUI window is
opened to confirm user
cookie settings. Since this is a BigIP product feature, the only
way to get around this is
set the persistence settings of BigIP to "simple" where there
will be no GUI popup
windows.
References
Above information is gathers
from below references for more SOA Suite high-availability installation,
configuration, architecture and best practice documents.
Oracle® Application Server
Enterprise Deployment Guide 10g Release
3 (10.1.3.3.0)
Oracle SOA Suite XA and RAC
Database Configuration Guide
Architecting BPEL Systems
for High Availability, October 2007
Oracle® SOA Suite Best
Practices Guide 10g Release 3 (10.1.3.3.0)
Oracle® Application Server
Adapters for Files, FTP, Databases, and
Enterprise Messaging User’s
Guide 10g Release 3 (10.1.3.1.0)
Oracle® Application Server
High Availability Guide 10g Release 3
(10.1.3.1.0)
Setting up the Oracle Web
Services Manager High-Availability (HA)
Topology
Configuring Oracle Web
Services Manager for High Availability (HA)
Oracle® Business Activity
Monitoring Installation Guide 10g
(10.1.3.1.0)
Oracle® Application Server
Adapter Concepts Guide
Oracle Application Server Enterprise Deployment Guide