After we restarted it, whenever we tried to access the grid console, we got error "Backend WLS or EM application seems to be down".
Agents failed to upload XML files to OMS and "emctl pingOMS" was giving an error "EMD pingOMS error: No response header from OMS".
We checked WebLogic and OMS .trc, .log and .out files, but there was no error recorded, neither before nor after the crush.
To correct this issue:
1. Stop OMS
emctl stop oms -all
2. Kill -9 all WebLogic and OMS processes still running after the stop. You can find these processes, using ps.
ps -ef | grep EMGC_ADMINSERVER ps -ef | grep EMGC_OMS1 ps -ef | grep oms
3. Delete every .lok file you find under WebLogic Domain
find . -name "*.lok"
These files were:
../gc_inst/user_projects/domains/GCDomain/config/config.lok ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/ldap/ldapfiles/EmbeddedLDAP.lok ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/tmp/EMGC_OMS1.lok ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/data/ldap/ldapfiles/EmbeddedLDAP.lok ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/tmp/EMGC_ADMINSERVER.lok
4. Start OMS
emctl start oms
The best matching Oracle Documents about this incident are:
- ID 943790.1: What are the .lok Files Used For in a WebLogic Server (WLS) Domain? In general, these files are a mechanism to ensure file and server locks and to prevent a server from being booted twice.
- ID 957377.1: Weblogic Fails To Start With Error "Unable To Obtain Lock"
- ID 1235753.1: 11g Grid Control: OMS Startup Shows "AdminServer Could Not Be Started" but OMS is able to Startup
Your 127.0.0.1 entry should be exactly like:
127.0.0.1 localhost.localdomain localhost
And you should remove any entry about IPv6, so delete or comment:
::1 localhost6.localdomain6 localhost6