Thursday, February 2, 2012

Backend WLS or EM application seems to be down

A few days ago, the server, where OMS and its DB reside, crushed.
After we restarted it, whenever we tried to access the grid console, we got error "Backend WLS or EM application seems to be down".
Agents failed to upload XML files to OMS and "emctl pingOMS" was giving an error "EMD pingOMS error: No response header from OMS".
We checked WebLogic and OMS .trc, .log and .out files, but there was no error recorded, neither before nor after the crush.

To correct this issue:

    1. Stop OMS
     emctl stop oms -all

    2. Kill -9 all WebLogic and OMS processes still running after the stop. You can find these processes, using ps.
     ps -ef | grep EMGC_ADMINSERVER
     ps -ef | grep EMGC_OMS1
     ps -ef | grep oms

    3. Delete every .lok file you find under WebLogic Domain
     find . -name "*.lok"

    These files were:
    ../gc_inst/user_projects/domains/GCDomain/config/config.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/data/ldap/ldapfiles/EmbeddedLDAP.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/tmp/EMGC_OMS1.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/data/ldap/ldapfiles/EmbeddedLDAP.lok
    ../gc_inst/user_projects/domains/GCDomain/servers/EMGC_ADMINSERVER/tmp/EMGC_ADMINSERVER.lok

    4. Start OMS
     emctl start oms

The best matching Oracle Documents about this incident are:
  • ID 943790.1: What are the .lok Files Used For in a WebLogic Server (WLS) Domain? In general, these files are a mechanism to ensure file and server locks and to prevent a server from being booted twice.
  • ID 957377.1: Weblogic Fails To Start With Error "Unable To Obtain Lock"
  • ID 1235753.1: 11g Grid Control: OMS Startup Shows "AdminServer Could Not Be Started" but OMS is able to Startup
Another cause for this error could be incorrect entries in your /etc/hosts file.
Your 127.0.0.1 entry should be exactly like:
127.0.0.1    localhost.localdomain    localhost

And you should remove any entry about IPv6, so delete or comment:
::1    localhost6.localdomain6    localhost6

10 comments:

  1. what was the version of the grid control where you faced this issue, 11g or 12c?

    ReplyDelete
  2. Thanks! You saved a ton of my Time Today! Thanks! BTW this solution works for 12c also!

    Varun

    ReplyDelete
  3. Am greatful!! this helped me today.

    ReplyDelete
  4. Thanks man, this procedure saved my bacon today.

    ReplyDelete
  5. Great!! Thanks!!

    ReplyDelete
  6. Great & Thanks !!

    It helped me in resolving the issue.

    ReplyDelete