naasking

DSM 6.0 network drops? SMB particularly bad

Recommended Posts

I recently upgraded from a rock solid DSM 5 to DSM 6 for some much needed features. The upgrade went pretty smoothly, but the network and file services on DSM 6 update 10 seem to periodically drop. A 1GB file transfer over SMB or NFS will be chugging along at 100MB/s and then suddenly drop to zero. It sometimes resumes maybe 30 seconds later.

 

Drops aren't predictable that I can see. Sometimes I won't see one for quite some time, and sometimes it happens repeatedly in the span of 10 mins. I don't see anything in the SMB logs, even if I log in via ssh and inspect them manually. SMB seems to get hit the worst. Right now I'm on the NAS via ssh, but can't load it via Windows explorer over SMB.

 

Usually changing the supported SMB version fixes the issue relatively quickly, because that resets the network configuration thus restarting some services. But not always.

 

About the only error I see in /var/log/messages is the following:

 

2017-04-20T17:33:14-04:00 MagiSAN synow3: net_get_mac.c:165 ioctl mac failed
2017-04-20T17:33:14-04:00 MagiSAN synow3: net_get_mac.c:63 Failed to get local original mac

 

At this point I'm thinking it might be a network driver issue, but I don't see any issues being logged. Unless I'm missing some log somewhere? Any suggestions would be much appreciated.

Share this post


Link to post
Share on other sites

One question, do you use mac from loader or its real mac from NIC?

Share this post


Link to post
Share on other sites

The DSM install instructions suggested changing the MAC in the grub file, which I did slightly change from the default (changed 2-3 chars), but I didn't think to use the actual NIC MAC.

 

I just tried changing the MAC to a randomly generated one and the problems persist (might actually be worse as transfer speeds are dramatically slower, but that's possibly due to cold caches).

 

The usual methods of checking the MAC address just report the random number I input... Ah, just found

ethtool -P eth0

which returned something different, so I will try that.

Share this post


Link to post
Share on other sites

Try to start xpenology with option

set mac1=

this will use actuall mac from NIC.

 

PS Just checked transfer on my setup. Baremetal DSM 5.2 tranfer file to ESXI 6.0, DSM 6.1. File size 4GB, tranfer speed about 75MB/s over 1Gb network and mac setup like above.

Share this post


Link to post
Share on other sites

Same problem with the MAC from "ethtool -P eth0" too. NAS is an AMD machine, so dmesg at the beginning reports:

[    0.000000] CPU: vendor_id 'AuthenticAMD' unknown, using generic init.
              CPU: Your system may be unstable.

Pretty sure I specified the correct grub line to boot though, or is that message an indication that I got it wrong?

 

I just saw your other message about the empty line, so I'll try that next.

Share this post


Link to post
Share on other sites

Transfer from CIFS mounted remote folder is about 75 MB/s and is not max, can be higher but my DSM 6.1 shows that volume is 100% occupied.

With remote folder mointed as NFS transfer can be as high as 125MB/s.

Share this post


Link to post
Share on other sites

Leaving mac1 setting blank seems worse. Full network disconnects seem to happen more frequently. The eth0 MAC actually changes on each boot so I think it's random if it's left empty, doesn't actually use the device MAC.

Share this post


Link to post
Share on other sites

Thats weird, in my case leaving blank mac1 etc takes mac adresses from ESXI settings.

Share this post


Link to post
Share on other sites

Still getting network drops. I can reproduce it pretty consistently now, I just have to boot then start a large SMB transfer. SMB quickly fails and I can't ping the server for a minute, then IP connectivity is restored. I can reach SMB by IP, but not by name. Lookup by name still hasn't come back after 5 mins. I usually reboot the samba process and lookup by name works again.

 

No messages in /var/log/messages, no errors in dmesg that I haven't already mentioned. I'm now using the MAC I found and inputted the proper MB serial number, but no dice. I'm out of ideas.

 

Let me know if you have any suggestions for where to look in the logs to diagnose the problem, and thanks for your help!

Share this post


Link to post
Share on other sites

Its last i can think off, maybe the issue is not related to ethernet. Did you resource monitor on web to check what happened with system during file copy?

Share this post


Link to post
Share on other sites

It has to be network related. I can't even ping the NAS when the SMB copy halts. Resource monitor and ssh connections also temporarily fail. ssh reconnects when IP comes back up, but the web interface logs me out, so I can't view the resource monitor throughout.

 

ifconfig eth0 reports 0 errors. SMB transfer speed is brutally slow when it reconnects (max 21MB/s vs. typical 100+MB/s), even after I restart smbd, but seems to be a little more reliable. If I switch from SMB 2 with jumbo frames to just SMB 2, transfer speed goes back up to 100+MB/s. This whole thing is just weird.

Share this post


Link to post
Share on other sites

Hmm, I just noticed that synology is only recognizing 2.5GB of the 4GB of RAM installed in the NAS. dmesg reports the following error:

 

[Thu Apr 20 18:51:27 2017] SMBIOS 2.6 present.
[Thu Apr 20 18:51:27 2017] DMI: System manufacturer System Product Name/E35M1-I DELUXE, BIOS 1501 04/25/2013
[Thu Apr 20 18:51:27 2017] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[Thu Apr 20 18:51:27 2017] e820: remove [mem 0x000a0000-0x000fffff] usable
[Thu Apr 20 18:51:27 2017] e820: last_pfn = 0x13f000 max_arch_pfn = 0x400000000
[Thu Apr 20 18:51:27 2017] MTRR default type: uncachable
[Thu Apr 20 18:51:27 2017] MTRR fixed ranges enabled:
[Thu Apr 20 18:51:27 2017]   00000-9FFFF write-back
[Thu Apr 20 18:51:27 2017]   A0000-BFFFF write-through
[Thu Apr 20 18:51:27 2017]   C0000-D1FFF write-protect
[Thu Apr 20 18:51:27 2017]   D2000-E7FFF uncachable
[Thu Apr 20 18:51:27 2017]   E8000-FFFFF write-protect
[Thu Apr 20 18:51:27 2017] MTRR variable ranges enabled:
[Thu Apr 20 18:51:27 2017]   0 base 000000000 mask F00000000 write-back
[Thu Apr 20 18:51:27 2017]   1 base 0A7F00000 mask FFFF00000 uncachable
[Thu Apr 20 18:51:27 2017]   2 base 0A8000000 mask FF8000000 uncachable
[Thu Apr 20 18:51:27 2017]   3 base 0B0000000 mask FF0000000 uncachable
[Thu Apr 20 18:51:27 2017]   4 base 0C0000000 mask FC0000000 uncachable
[Thu Apr 20 18:51:27 2017]   5 disabled
[Thu Apr 20 18:51:27 2017]   6 disabled
[Thu Apr 20 18:51:27 2017]   7 disabled
[Thu Apr 20 18:51:27 2017] e820: update [mem 0xa7f00000-0x13effffff] usable ==> reserved
[Thu Apr 20 18:51:27 2017] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 1007MB of RAM.

I don't know if this was the case under DSM 5, but it still seems strange that such an old mobo would still have this problem.

Share this post


Link to post
Share on other sites

I confirmed that the default boot selection was not AMD for some reason, which is why I received that AuthenticAMD error in dmesg. When I boot with the correct option I see 3.6 GB of memory instead of the previous 2.5 GB. 500MB would be reserved for onboard devices, like built-in graphics and network, which seems reasonable, so that's solved.

 

I also reset the my BIOS memory timings to auto, and network seems pretty stable so far. So fingers crossed...

Share this post


Link to post
Share on other sites
I confirmed that the default boot selection was not AMD for some reason, which is why I received that AuthenticAMD error in dmesg. When I boot with the correct option I see 3.6 GB of memory instead of the previous 2.5 GB. 500MB would be reserved for onboard devices, like built-in graphics and network, which seems reasonable, so that's solved.

 

I also reset the my BIOS memory timings to auto, and network seems pretty stable so far. So fingers crossed...

 

Strange though that you were able to boot without selecting the AMD boot line in Grub menu.

Anyhow, use your NIC real MAC address. Not that this was the problem but this is what I recommend to everyone. No need to generate anything. If your problem is solved please add [sOLVED] to the title.

Share this post


Link to post
Share on other sites

Unfortunately it's not solved. Network drops are still fairly common over SMB at least. I noticed this sequence of errors in /var/log/synoservice.log right around the time of the network drop:

2017-04-21T09:20:33-04:00 MagiSAN synoservice: service_type_action.c:130 synoservice: Type [LINK_SENS] restart finished
2017-04-21T09:21:07-04:00 MagiSAN synoservice: service_resume_by_reason.c:12 synoservice: resume [avahi] by reason [ipv4_change] ...
2017-04-21T09:21:07-04:00 MagiSAN synoservice: service_restart.c:21 synoservice: restart [synotunnel] ...
2017-04-21T09:21:07-04:00 MagiSAN synoservice: service_restart.c:34 synoservice: [synotunnel] is not enabled, skip restart action ...
2017-04-21T09:21:07-04:00 MagiSAN synoservice: service_restart.c:52 synoservice: finish restart [synotunnel].
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:69 synoservice: Type [iP_SENS] restarting
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [nmbd] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [ftpd-ssl] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [snmp] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [pppoerelay] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [avahi] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [iscsitrg] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [ssdp] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:69 synoservice: Type [LINK_SENS] restarting
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [nmbd] restart
2017-04-21T09:21:12-04:00 MagiSAN synoservice: service_type_action.c:82 synoservice: service [ssdp] restart
2017-04-21T09:21:14-04:00 MagiSAN synoservice: service_reload.c:20 synoservice: reload [nginx].
2017-04-21T09:21:14-04:00 MagiSAN synoservice: service_restart.c:21 synoservice: restart [nmbd] ...
2017-04-21T09:21:15-04:00 MagiSAN synoservice: service_restart.c:52 synoservice: finish restart [nmbd].
2017-04-21T09:21:15-04:00 MagiSAN synoservice: service_restart.c:21 synoservice: restart [avahi] ...
2017-04-21T09:21:15-04:00 MagiSAN synoservice: service_restart.c:52 synoservice: finish restart [avahi].
2017-04-21T09:21:15-04:00 MagiSAN synoservice: service_reload.c:46 synoservice: finish reload [nginx].
2017-04-21T09:21:15-04:00 MagiSAN synoservice: service_type_action.c:130 synoservice: Type [LINK_SENS] restart finished
2017-04-21T09:21:20-04:00 MagiSAN synoservice: service_reload.c:20 synoservice: reload [nginx].
2017-04-21T09:21:20-04:00 MagiSAN synoservice: service_restart.c:21 synoservice: restart [nmbd] ...
2017-04-21T09:21:21-04:00 MagiSAN synoservice: service_restart.c:52 synoservice: finish restart [nmbd].
2017-04-21T09:21:21-04:00 MagiSAN synoservice: service_restart.c:21 synoservice: restart [avahi] ...
2017-04-21T09:21:22-04:00 MagiSAN synoservice: service_restart.c:52 synoservice: finish restart [avahi].
2017-04-21T09:21:22-04:00 MagiSAN synoservice: service_reload.c:46 synoservice: finish reload [nginx].
2017-04-21T09:21:23-04:00 MagiSAN synoservice: service_type_action.c:130 synoservice: Type [iP_SENS] restart finished

This repeats many times, and the timing seems roughly correlated with network drops. You can see all the network services restarting (including OpenVPN which shows up on synosys.log), but I don't see any reasons logged. Any thoughts?

 

Some other observations:

* Some threads on synology forums[1] report the same log entries as aboe, but I'm not using any of PPTP, IPSEC/L2TP or AFP.

* Now that I've been up and running for a good half hour, network drops seem less frequent (could be a coincidence).

* The above restart messages occur every time the OpenVPN connection on the NAS connects/disconnects, presumably because the IP changes so all IP services are restarted. Perhaps the problem is ultimately with the OpenVPN connection.

 

[1] https://forum.synology.com/enu/viewtopic.php?t=118970

Share this post


Link to post
Share on other sites

hi

 

same issues here on my asrock and hpgen 8 systems

look here from page 3 , watch the videos, same issue as you have? :smile:

with asrock system , i managed it with lowering memory from 4 to 2

with hpgen 8, i totally switch over to esx, with another sata controller , read that topic , everything is there

Share this post


Link to post
Share on other sites

@naasking

have you managed to solve this?

I have almost exactly the same board.

I am using Asus E35M1-M Pro with Realtek 8111E Gigabit.

SMB will stall and comes back every 30 seconds.

 

You can try the solution below?

swapping the drivers to loading sequence.

But i solved mine by getting a refurb Intel 1000pro PT.

On 2/5/2017 at 9:15 PM, Bear said:

I got tired of waiting for the other loader, and gave this a try.

I have a Asrock C2550D4I, and there was only one snag when installing this.

 

The network drivers didn't work (nothing after the "Booting the kernel."), but a change in the file like this post https://xpenology.com/forum/topic/6253-dsm-6xx-loader/?do=findComment&comment=57513 suggested worked!

 

Basically in the grub.cfg file, this section should look like this

 


function loadinitrd {
       if [ -s $img/$info ]; then
               if [ -n "$has_serial" ]; then
                       terminal_output --remove serial
               fi
               cat $img/$info
               if [ -n "$has_serial" ]; then
                       terminal_output --append serial
               fi
       fi
       if [ -s $img/$extra_initrd ]; then
               initrd $img/$extra_initrd $img/ramdisk.lzma
       else
               initrd $img/ramdisk.lzma
       fi
}
 

 

And its this like you change


initrd $img/$extra_initrd $img/ramdisk.lzma
 

 

Updated to .9 with no problem.

 

Share this post


Link to post
Share on other sites

I have not managed to solve this. Flipping the image load order as you suggested makes my NAS inaccessible over the network. It appears to still boot except for the network drivers.

 

Another person suggested restricting the accessible RAM to 2GB which solved it for their AMD box, so I may try that next since I'm currently running with 4GB.

Share this post


Link to post
Share on other sites

just want to confirm that the latest boot loader changed the name to:

 

if [ -s $img/$extra_initrd ]; then
               initrd $img/$extra_initrd $img/rd.gz

Share this post


Link to post
Share on other sites
Posted (edited)

Yes, I just swapped the order of the parameters that were already there. NAS no longer booted with network connectivity.

Edited by naasking

Share this post


Link to post
Share on other sites
On 9/12/2017 at 3:29 AM, IG-88 said:

hi,

 

contiuing from here

i have created a new v3 with the source of dsm 6.1.3 (15152)

there are drivers that would be usefull but do not load after compiling (SATA/PATA), i marked them and commented the reason, maybe i will find time to look into this and try to find the point in the kernel source where the function is and why it might happen (i'm coder so i dont expext much to find out), if some else is able to find out comment it here

all modules are tested with 6.1.3 (sucsesfull loaded with insmod)

as before it contains all modules and firmware jun has used, so in theory what worked ootb with 1.02b should also with this extra.lzma

 

v.3: http://s000.tinyupload.com/?file_id=71323561438971251178

Modules log

 


net/ethernet
Atheros L2 Fast Ethernet support						atl2.ko

---temp remove - Broadcom 440x/47xx ethernet support				b44.ko
->
b44: Unknown symbol ssb_device_is_enabled (err 0)
b44: Unknown symbol ssb_pcicore_dev_irqvecs_enable (err 0)
b44: Unknown symbol ssb_bus_may_powerdown (err 0)
b44: Unknown symbol ssb_pcihost_register (err 0)
b44: Unknown symbol ssb_device_disable (err 0)
b44: Unknown symbol ssb_device_enable (err 0)
b44: Unknown symbol ssb_driver_unregister (err 0)
b44: Unknown symbol __ssb_driver_register (err 0)
b44: Unknown symbol ssb_bus_powerup (err 0)
b44: Unknown symbol ssb_clockspeed (err 0)
b44: Unknown symbol ssb_dma_translation (err 0)

Intel(R) PRO/100+ support							e100.ko
Intel(R) 82576 Virtual Function Ethernet support				igbvf.ko
Intel(R) PRO/10GbE support							ixgb.ko
Intel(R) 82599 Virtual Function Ethernet support				ixgbevf.ko
nForce Ethernet support								forcedeth.ko
Marvell MDIO interface support							mvmdio.ko


net/usb
USB RTL8150 based ethernet device support					rtl8150.ko
Realtek RTL8152 Based USB 2.0 Ethernet Adapters					r8152.ko
Conexant CX82310 USB ethernet port						cx82310_eth.ko
ASIX AX88xxx Based USB 2.0 Ethernet Adapters					asix.ko
Prolific PL-2301/2302/25A1 based cables						plusb.ko

	
block
Promise SATA SX8 support							sx8.ko

scsi
Adaptec AACRAID Support								aacraid.ko
Adaptec AIC94xx SAS/SATA Support						aic94xx.ko
3ware 9xxx SATA-RAID support							3w-9xxx.ko 3w-sas.ko
HP Smart Array SAS								hpsa.ko
Marvell 88SE64XX/88SE94XX SAS/SATA support					mvsas.ko
ARECA (ARC11xx/12xx/13xx/16xx) SATA/SAS RAID Host Adapter			arcmsr.ko
HighPoint RocketRAID 3xxx/4xxx Controller support				hptiop.ko
Intel(R) C600 Series Chipset SAS Controller					isci.ko
Marvell UMI driver								mvumi.ko


ata
---temp remove - NVIDIA SATA support						sata_nv.ko
---temp remove - Silicon Image SATA support					sata_sil.ko
---temp remove - VIA SATA support						sata_via.ko
---temp remove - Promise SATA TX2/TX4 support					sata_promise.ko
---temp remove - Promise SATA SX4 support					sata_sx4.ko
-> Unknown symbol syno_libata_index_get (err 0)


---temp remove - JMicron PATA support								pata_jmicron.ko
---temp remove - Marvell PATA support via legacy mode						pata_marevell.ko
---temp remove - VIA PATA support								pata_via.ko									---temp remove - CMD / Silicon Image 680 PATA support						pata_sil680
---temp remove - Intel PATA old PIIX support							pata_oldpiix.ko
---temp remove - Intel SCH PATA support								pata_sch.ko
---temp remove - Intel PATA MPIIX support							pata_mpiix.ko
---temp remove - SERVERWORKS OSB4/CSB5/CSB6/HT1000 PATA support					pata_serverworks.ko
-> Unknown symbol syno_libata_index_get (err 0)


firmware
bnx2/bnx2-mips-06-6.2.3.fw
bnx2/bnx2-mips-09-6.2.1b.fw
e100/d101m_ucode.bin
e100/d101s_ucode.bin
e100/d102e_ucode.bin
tigon/tg357766.bin

 

Hide

 

The newest version for testing is down below in the tread as its still work in progess and for testing (no "stable" version)"  

 

how about loading driver from IG-88?

Share this post


Link to post
Share on other sites

I have having the same problem with my amd system. I was thinking it was my network cable but after reading this I am not so sure. I will update my loader to the latest version and see what happens

Share this post


Link to post
Share on other sites

The problem seems well-known in every Linux distro. My board has the Realtek 8111E chipset, and along with the 8168, these load the driver for the Realtek 8169. Unfortunately, this driver is known to produce unreliable ethernet connections on these chipsets. Not sure what I can do to remedy this, as all the recommended solutions that I can find suggest building the drivers Realtek provides from source, and I'm not setup to do that with the Synology images.

 

Anyone have any ideas?

Share this post


Link to post
Share on other sites

Hmm, a module r8168 is installed and running. The version says 8.044.02, which seems to only be one minor version behind the up to date one on the Realtek site which is 8.045.08.

 

If someone could point me in the right direction to update this module, I could play around with this and hopefully get it working.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now