Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.

Reconnect engines to engine server

We have an engineserver managing four engines to process various requests that come in from the web.
Works fantastic, 99% of the time.
But sometimes the engines crash and we hear about it only after users start reporting that particular things don't seem to be happening anymore. That could be immediately but it also could be days later.

This morning I had such an example. I logged onto the server and the engineserver was still running and I could see the requests/commands in the engineserver but there were no corresponding engines.
What I've always done and what I did this time was exit the engineserver and restart it and just let the customer know it will be all good moving forward.

However, before today, I don't think I've ever looked at the engineserver itself after hiding it. That means I've never noticed the requests in the editbox. Either that or they remain in the editbox only till they are processed by a waiting engine?

But it led me to thinking, if the requests I was seeing were still to be processed because the engineserver was waiting for an idle engine, is there a way I can start an engine or four and have them reconnect to the waiting engineserver and therefore catch up on the idle requests? Or has the approach I've always taken, of accepting those things are just lost, the reality?

Comments

  • Sorry, those requests are lost.

    Are these engines full one crashing or are they hanging/getting stuck?
  • They were non-existent.
    It’s not the first time but it’s infrequent.
  • I'll look into the possibility to detecting an engine has crashed replacing it. Hung engines are a different matter. There isn't a way of distinguishing between a hung engine and a busy one.
  • So I said infrequent but it's happened again today.
    This time all four engines were still visible but each had the windows message oengine.exe has crashed...
  • You might want to enable logging to see if it's a particular command triggering these crashes.
  • Here's a link to download version 1.3.5. Whenever the RevCAPI returns an error, it attempts to restart the engine. I was able to manually kill engines and see them restart. Some caveats:

    1. I was only able to test this by closing engines myself. Since I don't have crashing engines, I can't say for certain that this new feature will detect such a thing, but it's got a better chance than the last version.

    2. The request an engine was working when it crashes is not recovered, but the failure is logged if you've turned on logging. So, at least you can see what it was working on when it crashed.
  • Excellent.
    I'll put it in and we'll see how it goes.
    And I've turned on logging now just to see what we get.
  • I went back and had a look and the enginserver and all four engines were gone.
    Do the logs get written away somewhere obvious?
  • From Windows Event Viewer...

    Faulting application name: SRPEngineServer.exe, version: 1.3.5.0, time stamp: 0x63e1967e
    Faulting module name: ntdll.dll, version: 10.0.14393.5427, time stamp: 0x63368a30
    Exception code: 0xc0000006
    Fault offset: 0x00060319
    Faulting process id: 0x1bd0
    Faulting application start time: 0x01d93a9539141bd3
    Faulting application path: P:\oi94\SRPEngineServer.exe
    Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
    Report Id: f555d06c-7637-4508-819a-d87c766df65c
    Faulting package full name:
    Faulting package-relative application ID:
  • and in a separate event at the same time

    Windows cannot access the file for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program SRP EngineServer MFC Application because of this error.

    Program: SRP EngineServer MFC Application
    File:

    The error value is listed in the Additional Data section.
    User Action
    1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.
    2. If the file still cannot be accessed and
    - It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.
    - It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.
    3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.
    4. If the problem persists, restore the file from a backup copy.
    5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.

    Additional Data
    Error value: C00000C4
    Disk type: 0
  • The logs would be in whatever sub folder you specified in the INI's LogDirectory property. Given the above error, it's possible the engine server was denied access to writing the logs. I wonder if it's related to why your engines are crashing.

    The other event log is something I'd expect to see if the SRP Engine Server crashed. Was it running still when you discovered all the engines were gone?
  • No Engine server. No Engines
  • This is different than before you tried logging, correct? Before the Engine Server was fine but the engines were gone.

    Crashing on writing to the disk smacks of an environmental issue. Disk getting full perhaps?
  • Have had both experiences in the past.
    Sometimes not all four engines.
    Sometimes everything gone, like this time.
    Sometimes all four engines gone but engineserver still running.
    Sometimes all four engines gone and engineserver still appearing in taskbar until you try to do something with it and then it disappears.

    Sufficient disk space available.

    Found the logs but they contain nothing bar the first line stating
    Start OpenEngine log - 2/7/2023 12:42:15


    I agree it's likely environmental but I've run out of ideas of where to point the IT team.

    Have come to accept that engines crashing is a given. Note, the engines related to the engineserver crash less often than other engines we have running. What other engines? Up to ten, to service the web and another ten or so running other permanent server processes.

    We can go weeks, possibly months problem free and then have a few days of panic and stress as things crash on the regular and then seem to sort themselves out again. As yet, no-one has identified anything that is out of the ordinary during the stress filled days.
  • edited February 2023
    Just attempted to start the log on one of the engineserver engines to see if it would create more informative results this time and the engine crashed immediately I released the button.
    A blank log file was created.

    Just the one engine crashed, all three others and the engineserver still functioning.

    Nothing in the Windows event viewer that corresponds with the crash

  • A new engine has now started to replace the one that crashed.

    Well done Kev.
    I'll get some mileage out of that.
  • Cool. Glad it's helping. Keep me posted if anything goes awry.
Sign In or Register to comment.