openshowvar periodic timeout issue in KRC4 UL version

  • Hi guys,

    I have been reading this forum for quite some time and usually I was able to find my answers in the past. But now I stumbled upon a new problem.

    We are working on a KR 6 R900-2 Robot with a KRC 4 Compact controller.

    We are using openshowvar to control the robot externally. So no motion logic inside the KRC4 itself.


    Due the request to have a UL certification, in all future robotic cell we used the UL KRC version.

    In these UL version we are witnessing a communication timeout between the KRC and external PC. (in TCP-IP)


    We found out that

    * These communication timeouts are around 3-6 sec.

    * These timeouts happen in the same time everyday

    * non UL version of KRC 4 doesn't encounter this problem


    We replaced the external PC and Ethernet switch...which seemed to help a bit in the beginning but the problem reoccurred.


    Did someone encounter this problem before?


    Thank you all very much!

    Edited 2 times, last by Eli Dejo ().

  • i would say you are looking at it the wrong way.


    KSS is developed, debugged and released...

    Some of releases (but not all) are selected to go through UL certification process and as a result there are fewer UL releases compared to number of all KSS releases.


    To check if there is any difference I did binary comparison on UL/nonUL versions and they are exactly the same. Of course for comparison, both product versions need to be exactly the same - so first pick UL version, then find matching non-UL version and compare. Comparing version A.B.C.D with version A.B.C.E will say they are different - which is obvious from different version number.


    if you need a UL system, you need to use UL hardware as well as UL certified KSS. if you have problem with one KSS release you can try different one (higher or lower) as long as major version matches - because that is what your software license covers. The downside is that there are fewer UL releases to chose from and if this does not solve it, it may take a while until next UL version release. Even then it is questionable if your issue will be resolved.


    Normal approach if there are problems is to contact KUKA for support but i doubt this will help since product you have problem with is not a KUKA product. if you are developing something custom, you are on your own. or in this case you will likely be told to talk to developers of openshowvar about support.




    Btw.


    I think your problem sounds somewhat similar to one very weird experience i encountered a while ago. It was with a PC app talking to robot via EKI. in this case external PC was acting as a TCP server while robot was acting as a TCP client. This was nothing special and was working solid, but eventually that PC was added to clients network, to be managed by their IT so they can push updates.


    And sure enough, not long after, connection was getting lost from time to time. We could not quite put finger on it. So i was adding more and more workarounds and diagnostics code to capture anything suspicions but no dice... Then the issue would occur much more frequently, almost daily...


    So, after a really lengthy search, correlation was found with the help of Windows and EventViewer... It appeared that problem was triggered when IT maintenance script was run - usually in the middle of the night. This used to be run once a month then it was changed to every night because "it does not hurt...". And it did not cause problem every time, it was more like 50/50.


    With this clue we were able to narrow the issue down to one specific command and we were finally able to replicate the issue on our end... It was not over yet, this was still one of the ugliest battles i can remember but I was thrilled that we finally had something in sight. And i could replicate problem by simply typing a command in command box.


    Then the TCP server code was modified and tested until problem was resolved. Never had the issue again.

    1) read pinned topic: READ FIRST...

    2) if you have an issue with robot, post question in the correct forum section... do NOT contact me directly

    3) read 1 and 2

  • Thank you for the detailed answer panic mode.

    I was thinking of comparing KSS version. Happy to know that you did the same thing and found no difference.

    On the HW side the only addition we have is adding a LED indictor through X53 and that's about it. I thought that maybe something is different in the KSS 8.6 vs KSS 8.6 UL, But I understand that's not the case.


    Maybe the problem is really 8.6 vs 8.5 . Because we have same robots in same network and the UL one has the same timeout issues.


    Our external PC is running on linux. Do you think running Eventviewer on the KRC4 PC will help us find the problem?

  • Our external PC is running on linux. Do you think running Eventviewer on the KRC4 PC will help us find the problem?

    Maybe? IIRC, OpenShowVar runs in the Windows side of the robot's brain, but it still has to access the KUKA system hooks to get access to the VxWorks variables. If the "breakage" is in between Windows and VxWorks inside the KRC, I'm not sure there's a good diagnostic tool for that.


    Maybe Wireshark running in Windows on the KRC, monitoring the virtual network connection between Windows and VxWorks? I'd be very careful about this, though. Definitely take a KSR Image backup of the robot before installing WS, just in case the pcap drivers cause issues.


    I would suggest running WireShark on the Linux PC as well. This might at least narrow things down a bit -- if the issue is on the network between the KRC and PC, WS should give some indication of where the chokepoint is.

  • Hi guys,

    So after some research, I found that the clock on krc4 and the external PC is not synced.

    So searching for events in event viewer or scheduling pings didn't find anything.

    After shifting the timestamps I found that events happen periodically exactly while I am getting the timeouts.


    The thing is that it's the same process id which corresponds to the "service control manager"

    These are the messages I am getting:

    and then a following similar event with one change :" The start type......from demand start to auto start"


    I saw that the thread ID is different between the different events so I tried to understand which process its talking about. It seems that all the thread ID are pinpointing to several KUKA processes:

    OPC_UA, Kuka scheduler, cross3, smart_hmi and more.


    I am planning to run the thread logger at the exact times of the timeouts and pinpoint the exact processes that open these threads.


    Was anyone encountered with this kind of problem? Any tips of how to counter this when I will find the process/services causing it? I can't just close it. should I change the service startup type from manual to automatic?


    Appreciate any help

  • you need a plan but you don't seem to have one and you are letting important things slide... things that could point you in the right direction.


    your screenshot only showed the crumbs, an equivalent of a timestamp but there is also Details tab which may offer more clues. when there is a difficult problem, you need all the help you can get and ignoring pieces of information marked as Details is not a winning strategy.


    this also means not looking only at one event but see the sequence of events - before and after. maybe installer just did something important and then went back to sleep (aka "start on demand"). what did it do just before this event?



    for example:

    you say this causes problem at specific time of the day. that could mean something is probably scheduled so you can start monitoring things just before next event is about to occur. then you can try to catch which processes are running, what resources are consumed etc. (RAM, CPU...).


    also, did you try searching the internet to see what others have dealt with when encountering the same event?


    something called "Windows Module Installer" is a obviously a standard part of windows but question is what it does.... and it should not be finding and installing anything new while the robot is in the middle of operation.


    to me that sounds like some driver that may not work correctly, possibly something you have installed. note that "installed" could be result of something being merely "connected" - a mouse, USB stick, keyboard, monitor - ANYTHING. because anything connected can cause all kind of unanticipated issues. and when connected hardware has also hardware issues (loose connection, low battery or what not) things can really get funky.


    i saw cases where robot would even fail to boot if certain thing was connected to the KPC. someone also mentioned case where the issue turned out to be the operator trying to charge his phone on KRC4. so try to remove anything that was not part of original robot system supply and see if the problem (whatever it may be) still happens. or try different device or try using different USB port etc.


    did you try checking device manager for anything appearing as "new hardware"


    i just did quick search for "Windows Modules Installer service was changed". several results suggest issue can be because of something added (hardware or software) and it could be causing high CPU usage.


    obviously such thing would have an effect on anything that is not real time. and Windows is not RT ... which is why KSS uses vxWorks for all time critical processes. so anything running on Windows will be of low priority and subject to delays and timeouts. That includes anything you may have added there for the purpose of communication with the robot.

    1) read pinned topic: READ FIRST...

    2) if you have an issue with robot, post question in the correct forum section... do NOT contact me directly

    3) read 1 and 2

  • panic mode, thanks for the answer. For some reason, I didn't see an email update about your post. But I can say that it seemed I solved the problem (at least in the lab). I used process monitor program and found out the trusted installer was trying to check for windows updates. Even though It was configured to work at night.


    Using the program "Process monitor" enabled pinpointing (using the thread ID) what’s happening in detail. Looking at the log it seems that this is what happens:

    1. Trusted Installer is started and trying to run the service wuauserv (Windows Update Service),
    2. Windows update trying to check for new updates (?)
    3. Windows update fails
    4. Werserv (Windows error report) service is activated an tried to report the error
    5. Werserv fails and tries again
    6. Trusted Installer closes the thread

    VWnDpE_dlx2sBHpxifl5W7i4H_n3KSUX-zDoLIYaPHwfohBirV4PM2nBb1nzmiKpQ8Obmoy8N-WnE75Q3lkWnYreWEgwSAtVb8G50gCXgoo0YyfSC0Ptn-HFdZoJD_IfT3S0tgmh60wpSzGCJ12zyTrlgj0HaqoCux975XVjeGYvs9QqpfUKvbZSScdwmQ


    I Tried to disable several services - :

    1. Windows error reporting - Still got the errors in the event viewer
    2. Windows update service - Still got the errors in the event viewer
    3. Windows Module installer - Corresponds to the Trusted Installer. It seems that after disabling this service this issue stopped happening.


    So thank you for your support.

  • The thing is that now in the production of robotic cells. I am getting the same error but the "smoking gun" isn't coming from a windows service but from the the cross3 Kuka app.


    I am going to try and use the process monitor to get more details about what's happening there.

    If you guys have any tips on how to solve issues with cross3, it will be highly appreciated.

    Edited once, last by Eli Dejo ().

  • Hi guys,


    MOM, I couldn't understand what you said.


    An update:

    I tried to access the robot remotely (it's on a remote site) in order to set the logging for the processes running when the timeouts happened. I noticed that just by entering using RDP, I caused a timeout again.

    After some additional check, it seems that every time the CPU usage was high (when opening the event log viewer, connecting using RDP or even replacing the application in windows) the timeout happens .

    Hence I needed to check what is the CPU usage when these events happen without causing them manually. For that, I used windows included program “Performance monitor” which has the ability to log derivatives of CPU usage.

    DysLeB-cHwV1vuAhRQB8Kmp3ziKJxA_5YrySvPwiC7SXkW27d-pU3wgjzyCHoUpn2Prw-CxR7Fa7U56op1IByacqkxCRXs7dasj3opM3KEVceRP5zXBYupeeX4NUTvJfX8EbYFnpikIMvOLsmRz2XdK91u6uYFw4GRHPcoEIJaLQiH1DDqiTkRUtlpC1Tw


    8iDM790MGhCKqmC2IPMfIJBV7v7L3FykPoQE9_aDHhKvJ2T9v33CFyXu5p7Bhu-Jk22wgZeAEuS4y-r_pobmXTN-KdlO7BaT3k8xcuH3wdhRqzrIOZfOpkc2SgkKvhQQmsgIg7kIY53rCmyU6duRTbvegTc063uyXXXBqH4X5lkGXnaO1dPexyJEqH4gLg


    A clear relation can be seen between CPU usage to timeouts. Added to that is the fact that we can cause timeouts by increasing CPU usage. It seems like a highly possible cause that high CPU usage causing the timeouts.

    YeXi01Fi0FGlGqKZtV5KKF_gu0_BgH6BKNC_h6V6YMdavl_1TloSUhKXIxcITvHZMVAjh2ILqpTjYS8po2HL_QbvzPCFSSp3y8hY06GXSdn5_Z8i7A1GB_r-_OaZ8A6jnX8aGkdWy6CJSyRDI77qJuQsqjORXg1SuujmnReRsTNxrj8Oo9DydQRRUlJOyg

    After analyzing the process monitor output it seems that 90% of the processes ran in that timeslot of 3 minutes were done by TiWorker.exe application. Which is another application that has to do with windows updates.


    I disabled again the same services I disabled in the lab. Waiting for some results.


    But I can't shake the fact that the CPU usage can get so high so quickly in the KRC4. I want to be able to at least connect using RDP.


    In the previous cells which had KSS 8.5 , non-UL and windows 7 embedded this issues didn't happen.






  • The CPU on the KRC controllers is very low end celeron, the one I'm working right now is a dual core 2Ghz Celeron with 4GB of RAM, its very easy to overwhelm such a system.


    No idea if its possible to upgrade the CPU(its socketed, but no idea how KSS will react to that), but going to any 4 core CPU might fix all your issues.

  • what software exactly was added to the KRC?

    does KRC perform as expected? is the robot affected by changes?

    are you sure that the bottleneck is KRC and not the external PC or applicaiton running on that external PC?

    1) read pinned topic: READ FIRST...

    2) if you have an issue with robot, post question in the correct forum section... do NOT contact me directly

    3) read 1 and 2

  • One thing to keep in mind here is the split-brain nature of KRCs. VxWorks/KSS is running hard realtime, zero-jitter, for all the critical robot operations (Motion, I/O, safety, etc). Windows gets the short end of the stick, with a lower priority for system resources, and OpenShowVar is a Windows application that talks to the VxWorks/KSS side of the KRC through the virtual network connection (via Cross3). Even the KUKA-HMI user interface gets shorted when there's competition for system resources.


    I can only theorize here, but I'd guess that Windows gives RDP connections higher priority by default, and OpenShowVar probably isn't very high priority. And while other Windows-side apps simply get "laggy" (slow refresh on the KUKA HMI, for example), OpenShowVar appears to depend on regular updates every X ms. So OSV is trying to use the non-realtime side of the controller to perform realtime tasks. This works as long as there's plenty of system resources to go around, but when things get tight....

  • OpenShowVar is a client. it ships with KVP which is a server. KVP is the only piece of software that need to be added on the robot side. i was playing with KVP couple of years ago. now i use C3B - have not done any benchmarking but i just did a crude comparison since KVP and C3B serve the same purpose and C3B is backwards compatible with KVP.


    tried polling same list of variables using same setup but using different server and client combo.

    for data i picked:

    $AXIS_ACT $POS_ACT $TOOL $BASE $LOAD $IN[] $OUT[] tool_data[] base_data[] $PROG_INFO[]

    which is plenty to look at (fills the 4k screen using 8pt font)


    Contender #1

    Server = KVP (compiled, VB6)

    Client = client based on KVP client example, compiled i think it was in VB.NET or C# don't recall.


    Contender #2

    Server = C3B (compiled, C++)

    Client = custom client written from scratch in VB.NET



    in both cases server was running inside wmware OL8.5 and client was running on the Win10 host (same laptop). Then task manager in OL was used to monitor server CPU load.


    KVP sits around 3-4% CPU usage and occasionally peaks at 8-9%

    C3B sits around 0% and occasionally peaks at 1-2%


    again, the numbers are purely from observing CPU load using task manager. values don't really sit in one place and they are rounded to an integer.


    so just to get the second opinion, i tried checking what resource monitor would come up with.

    here these values are averaged and therefore much more stable, also they include decimal places allowing better idea how they really compare. and it turns out that KVP was averaging 2.6% of the CPU, while C3B was averaging 0.1%.


    side note, KVP uses ASCII messages which are shorter (strings are 8bit). the C3B client was favoring newer message format that C3B uses and most here strings in payload takes twice as much room (16bit). as expected network load in resource monitor showed 860kbps for KVP and was switching between 1 and 2Mmps when running C3B.

    1) read pinned topic: READ FIRST...

    2) if you have an issue with robot, post question in the correct forum section... do NOT contact me directly

    3) read 1 and 2

Advertising from our partners