Welcome to the SRP Forum! Please refer to the SRP Forum FAQ post if you have any questions regarding how the forum works.
Reconnect engines to engine server
We have an engineserver managing four engines to process various requests that come in from the web.
Works fantastic, 99% of the time.
But sometimes the engines crash and we hear about it only after users start reporting that particular things don't seem to be happening anymore. That could be immediately but it also could be days later.
This morning I had such an example. I logged onto the server and the engineserver was still running and I could see the requests/commands in the engineserver but there were no corresponding engines.
What I've always done and what I did this time was exit the engineserver and restart it and just let the customer know it will be all good moving forward.
However, before today, I don't think I've ever looked at the engineserver itself after hiding it. That means I've never noticed the requests in the editbox. Either that or they remain in the editbox only till they are processed by a waiting engine?
But it led me to thinking, if the requests I was seeing were still to be processed because the engineserver was waiting for an idle engine, is there a way I can start an engine or four and have them reconnect to the waiting engineserver and therefore catch up on the idle requests? Or has the approach I've always taken, of accepting those things are just lost, the reality?
Works fantastic, 99% of the time.
But sometimes the engines crash and we hear about it only after users start reporting that particular things don't seem to be happening anymore. That could be immediately but it also could be days later.
This morning I had such an example. I logged onto the server and the engineserver was still running and I could see the requests/commands in the engineserver but there were no corresponding engines.
What I've always done and what I did this time was exit the engineserver and restart it and just let the customer know it will be all good moving forward.
However, before today, I don't think I've ever looked at the engineserver itself after hiding it. That means I've never noticed the requests in the editbox. Either that or they remain in the editbox only till they are processed by a waiting engine?
But it led me to thinking, if the requests I was seeing were still to be processed because the engineserver was waiting for an idle engine, is there a way I can start an engine or four and have them reconnect to the waiting engineserver and therefore catch up on the idle requests? Or has the approach I've always taken, of accepting those things are just lost, the reality?
Comments
Are these engines full one crashing or are they hanging/getting stuck?
It’s not the first time but it’s infrequent.
This time all four engines were still visible but each had the windows message oengine.exe has crashed...
1. I was only able to test this by closing engines myself. Since I don't have crashing engines, I can't say for certain that this new feature will detect such a thing, but it's got a better chance than the last version.
2. The request an engine was working when it crashes is not recovered, but the failure is logged if you've turned on logging. So, at least you can see what it was working on when it crashed.
I'll put it in and we'll see how it goes.
And I've turned on logging now just to see what we get.
Do the logs get written away somewhere obvious?
Faulting application name: SRPEngineServer.exe, version: 1.3.5.0, time stamp: 0x63e1967e
Faulting module name: ntdll.dll, version: 10.0.14393.5427, time stamp: 0x63368a30
Exception code: 0xc0000006
Fault offset: 0x00060319
Faulting process id: 0x1bd0
Faulting application start time: 0x01d93a9539141bd3
Faulting application path: P:\oi94\SRPEngineServer.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: f555d06c-7637-4508-819a-d87c766df65c
Faulting package full name:
Faulting package-relative application ID:
Program: SRP EngineServer MFC Application
File:
The error value is listed in the Additional Data section.
User Action
1. Open the file again. This situation might be a temporary problem that corrects itself when the program runs again.
2. If the file still cannot be accessed and
- It is on the network, your network administrator should verify that there is not a problem with the network and that the server can be contacted.
- It is on a removable disk, for example, a floppy disk or CD-ROM, verify that the disk is fully inserted into the computer.
3. Check and repair the file system by running CHKDSK. To run CHKDSK, click Start, click Run, type CMD, and then click OK. At the command prompt, type CHKDSK /F, and then press ENTER.
4. If the problem persists, restore the file from a backup copy.
5. Determine whether other files on the same disk can be opened. If not, the disk might be damaged. If it is a hard disk, contact your administrator or computer hardware vendor for further assistance.
Additional Data
Error value: C00000C4
Disk type: 0
The other event log is something I'd expect to see if the SRP Engine Server crashed. Was it running still when you discovered all the engines were gone?
Crashing on writing to the disk smacks of an environmental issue. Disk getting full perhaps?
Sometimes not all four engines.
Sometimes everything gone, like this time.
Sometimes all four engines gone but engineserver still running.
Sometimes all four engines gone and engineserver still appearing in taskbar until you try to do something with it and then it disappears.
Sufficient disk space available.
Found the logs but they contain nothing bar the first line stating
I agree it's likely environmental but I've run out of ideas of where to point the IT team.
Have come to accept that engines crashing is a given. Note, the engines related to the engineserver crash less often than other engines we have running. What other engines? Up to ten, to service the web and another ten or so running other permanent server processes.
We can go weeks, possibly months problem free and then have a few days of panic and stress as things crash on the regular and then seem to sort themselves out again. As yet, no-one has identified anything that is out of the ordinary during the stress filled days.
A blank log file was created.
Just the one engine crashed, all three others and the engineserver still functioning.
Nothing in the Windows event viewer that corresponds with the crash
A new engine has now started to replace the one that crashed.
Well done Kev.
I'll get some mileage out of that.