Welcome to our new forum
All users of the legacy CODESYS Forums, please create a new account at account.codesys.com. But make sure to use the same E-Mail address as in the old Forum. Then your posts will be matched. Close

BUG: scheduling while atomic: EtherCAT_Master

Anonymous
2019-03-11
2019-03-29
  • Anonymous - 2019-03-11

    Originally created by: michele_sponchiado

    Hi, we are currently using CODESYS Control V3 version 3.5.8.10 Mar 3 2016 on a custom board, and we are experiencing random oops BUGs "", typically once every 8 -10 hours, from the Linux kernel when executing the EtherCAT_Master module, see the following kernel log taken with kernel debug messages enabled:

    192.168.0.196 login: BUG: scheduling while atomic: EtherCAT_Master/0x00000001/625, CPU#0
    [<c003bb7c>] (dump_stack+0x0/0x14) from [<c02fd3cc>] (__schedule+0x558/0x7cc)
    [<c02fce74>] (__schedule+0x0/0x7cc) from [<c02fd7a0>] (schedule+0x48/0x108)
    [<c02fd758>] (schedule+0x0/0x108) from [<c02fedd8>] (rt_spin_lock_slowlock+0xf8/0x1f4)
    Β r5 = A0000013Β  r4 = C18FA000
    [<c02fece0>] (rt_spin_lock_slowlock+0x0/0x1f4) from [<c02ff17c>] (rt_spin_lock+0x40/0x44)
    [<c02ff13c>] (rt_spin_lock+0x0/0x44) from [<c007134c>] (futex_lock_pi+0x1a4/0x978)
    [<c00711a8>] (futex_lock_pi+0x0/0x978) from [<c0071fdc>] (do_futex+0x4bc/0xf80)
    [<c0071b20>] (do_futex+0x0/0xf80) from [<c0072b08>] (sys_futex+0x68/0xfc)
    [<c0072aa0>] (sys_futex+0x0/0xfc) from [<c0037018>] (__sys_trace_return+0x0/0x28)
    Β r8 = C0037048Β  r7 = 000000F0Β  r6 = 406B1490Β  r5 = 0031E5B8
    Β r4 = 00000000
    Β 
    

    In our application, the EtherCAT period is set to 4ms; the EtherCAT stack works stable (checked over a period of a week), but when the BUG arises, if the axes are moving their movement is rough for some milliseconds then it goes back to be smooth.

    We set a trigger on the axes drivers (we are using Sanyo RS3) to check whether an Ethercat error is generated, the trigger is set on the "" driver internal variable, but no errors are generated on the Ethercat stack, so I can state that the EtherCAT communication runs fine: if just a frame is lost the trigger is immediately generated by the driver.
    We triple-checked the code in our Ethercat_Master PLC task, but we found nothing suspicious; the code is quite simple, it just copies the currently calculated axes positions (these are generated from a different processor and always available to the ARM) into the EtherCAT variables that keep the positions. We set the positions to a fixed value and the oops still appears, so it seems that the BUG is independent form the code written into our module.

    The Linux kernel version is 2.6.18, the CPU is an ARM9 @ 444MHz.
    We monitored the jitter, and we found that the maximum jitter goes up to 8ms when the BUG is generated, while normally the maximum jitter is well below 1ms.
    Looking for the reasons for the BUG, it seems the problem is that the Ethercat_Master goes to sleep while holding a spinlock or something similar.

    Can you help us with this issue?

    If you need some more information, please do not hesitate to contact us!

    BR

    Michele Sponchiado

     
  • Anonymous - 2019-03-12

    Originally created by: michele_sponchiado

    I just wanted to add that after a week of test, we have this situation:

    The maximum jitter value has risen up to 12milliseconds; the EtherCAT communication is still OK, the inverters are OK, no trigger from the EtherCAT error frame rate.

    Have you any news about this issue?

    BR
    Michele Sponchiado

     
  • Anonymous - 2019-03-14

    Originally created by: michele_sponchiado

    I add some other information:

    Could it be that the ETherCAT_Master task sleeps waiting for a resource used by the other PLC tasks?
    Apparently the EtherCAT_Master PLC code does not read from/write to shared resources, so I wouldn't expect that it would sleep...

     
  • Anonymous - 2019-03-29

    Originally created by: michele_sponchiado

    Hello, dear Codesys forum!
    The problem disappeared after we removed a patch in the Linux kernel SD card driver that was forcing a msleep on long SD card query status reply waiting.
    Best regards and thanks for your help!
    Michele

     

Log in to post a comment.