Connection loss / instability?


Recommended Posts

Hi :)

 

I am currently setting up an xpen rig with the following specs:

 

 - z77-based MB

 - i7 3770k

 - DDR3 1066 RAM

 - 2x 120GB SSDs (RAID1 array -> volume1)

 - 4x 3TB HDDs (RAID5 array -> volume2)

 - HDDs are connected to a jmb585-based 5ports sATA controller (PCIe3.0 x4), while SSDs are connected to the onboard sATA3 (6Gbps) ports.

 - quad port GbE NIC AIC (PCIe2.0 x4) -> Bond connection, 4x GbE using LACP. Network infrastructure is compatible with LACP)

 - 550W PSU

 

I used DS3617xs (loader 1.03b), DSM 6.2.3-25426 Update 3

 

Unfortunately, this setup is unstable. On occasions, the connection is lost. The rig is no longer detected by Synology Assistant either.

All I can do is a hard reset and then it comes back online.

 

I wonder if it could be a problem with the network connection or if it could be resulting from an issue with sleep management. I already tried to disable memory compression and HDD hibernation.

 

What should I do next to troubleshoot this problem?

 

Thank you very much in advance for your help.

 

Best,

-a-

 

PS: after the last hard reset, DSM started data scrubbing on Volume2 upon reboot. Is this useful to locate the origin of the issue?

 

Edited by asheenlevrai
Link to post
Share on other sites
Posted (edited)

I tried to use only 1 Ethernet cable (normal GbE, no longer using LACP aggregation) but the problem remains.

 

I am now testing with a different (single port) GbE NIC.

Edited by asheenlevrai
Link to post
Share on other sites
Posted (edited)
3 hours ago, nemesis122 said:

Hi 

I had the same issue with 3617 and 1.03b

try 3615 with 1.03b and all is fine.

 

Thanks @nemesis122 :s

 

Well it would be quite inconvenient to reset the whole system up, right?

I mean, there is no way to go straight from ds3617xs to ds3615xs without erasing the disks, right?

 

note: so far the test with the other NIC (single port GbE) didn't lead to any connection failure. It seems to indicate that the problem is thus caused by the quad-port NIC.

 - driver?

 - configuration/setting?

- hw? This quad-port NIC is brand new.

 

What is different between ds3617xs and ds3615xs?

 

Thanks again.

-a-

Edited by asheenlevrai
Link to post
Share on other sites
2 hours ago, asheenlevrai said:

Well it would be quite inconvenient to reset the whole system up, right?

I mean, there is no way to go straight from ds3617xs to ds3615xs without erasing the disks, right?

You can switch platforms, it's called a migration install.

 

2 hours ago, asheenlevrai said:

What is different between ds3617xs and ds3615xs?

https://xpenology.com/forum/topic/13333-tutorialreference-6x-loaders-and-platforms/

 

  • Like 1
Link to post
Share on other sites
9 hours ago, flyride said:

You can switch platforms, it's called a migration install.

Thanks :)

For some reason I keep on forgetting that migration does not necessarily have to go from one model to a newer one.

🥴

 

9 hours ago, flyride said:

The only difference in this table (AFAICT) is the max number of CPU threads. Since my CPU is 4c/8t, this won't be limiting.

 

Now, I wonder why the NIC would be problematic on a ds3617xs rig and work fine on a ds3615xs rig, but I guess it would be easier to migrate and see what happens than to actually troubleshoot the issue with the ds3615xs

 

note: so far the test with the alternative NIC (single port GbE) still didn't lead to any connection failure.

Link to post
Share on other sites
10 hours ago, asheenlevrai said:

The only difference in this table (AFAICT) is the max number of CPU threads. Since my CPU is 4c/8t, this won't be limiting.

there are som differences in the default drivers from synology, like newer lsi sas drivers, newer mellanox 10G nic drivers but the kernel with dsm 6.2 is the same for both

with 7.0 there will be more difference, 3617 get 4.4 kernel like 918+ has and 3615 stays on 3.10, kernel, if that is of importance depends on what loader(s) we might see for 7.0 (atm 6.2.4 and 7.0 are off limit with loader 1.03c/1.04b)

 

 

10 hours ago, asheenlevrai said:

Now, I wonder why the NIC would be problematic on a ds3617xs rig and work fine on a ds3615xs rig, but I guess it would be easier to migrate and see what happens than to actually troubleshoot the issue with the ds3615xs

 

note: so far the test with the alternative NIC (single port GbE) still didn't lead to any connection failure.

did you check that the hardware of the 4port nic is reliable? maybe boot up i live linux and copy some data (that way anything hardware related is the same as with dsm)

 

  • Like 1
Link to post
Share on other sites
3 hours ago, IG-88 said:

did you check that the hardware of the 4port nic is reliable? maybe boot up i live linux and copy some data (that way anything hardware related is the same as with dsm)

 

hw? This quad-port NIC is brand new.

 

But I'll check that too. Thanks

Link to post
Share on other sites
Posted (edited)
On 6/23/2021 at 11:36 PM, flyride said:

You can switch platforms, it's called a migration install.

 

https://xpenology.com/forum/topic/13333-tutorialreference-6x-loaders-and-platforms/

 

Hi @flyride :)

 

I'm currently trying to migrate from 3717xs to 3615xs in order to see if this solves my NIC problems on this rig (as suggested by @nemesis122 on the 3rd post of this thread).

 

I made a new USB loader using 1.03b for 3615xs

(I assumed it would be better than using 1.02b and DSM 6.1.x, right?)

I rebooted using this USB dongle and Synology Assistant detects the rig as migratable.

 

Then I start the migration process by providing the .pat file I could find here:

https://archive.synology.com/download/Os/DSM/6.2.3-25426-3

-> synology_bromolow_3615xs.pat

 

I get an error 13 (file corrupted).

 

I guess that's a noob mistake but I cannot figure out what I'm doing wrong...

 

Please help :)

Best,

-a-

 

Edited by asheenlevrai
Link to post
Share on other sites

Thanks @flyride

I paid attention using the right vid & pid when making the USB loader for 3615xs but I must have made a mistake somewhere I guess.

I'll try fixing that by pressing C at boot and re-entering vid and pid.

If it still fails, I'll re-make the dongle.

Link to post
Share on other sites
8 hours ago, asheenlevrai said:

Thanks @flyride

I paid attention using the right vid & pid when making the USB loader for 3615xs but I must have made a mistake somewhere I guess.

I'll try fixing that by pressing C at boot and re-entering vid and pid.

If it still fails, I'll re-make the dongle.

I did all that.

I still get Error 13 even after re-making the USB dongle

Link to post
Share on other sites

After reading this, I wondered if the USB medium could be the source of the error 13. Thus I tried burning the loader image on an USB dongle that I know previously worked in another Xpen rig. -> I still got the error 13, though.

 

Now, for my 3617xs rigs I used the serial generator from here. It worked OK. I'm wondering if the version for 3615xs might return invalid serials or something and maybe this leads to error 13?

 

Tx

-a-

 

Link to post
Share on other sites
On 7/5/2021 at 4:00 PM, asheenlevrai said:

Then I start the migration process by providing the .pat file I could find here:

 

https://archive.synology.com/download/Os/DSM/6.2.3-25426-3

-> synology_bromolow_3615xs.pat

 

I get an error 13 (file corrupted).

 I figured this was the origin of the problem for the emigration install.

I shouldn't use this file for DSM6.2.3-25426-3 but rather the install file for DSM6.2.3-25426 (no update3)

https://archive.synology.com/download/Os/DSM/6.2.3-25426

-> DSM_DS3615xs_25426.pat

 

🤪

 

Then -> no error 13 -> migration OK

Link to post
Share on other sites
  • 1 month later...

By reading a bit more, I just realized that grub.cfg contains the following argument:

 

set netif_num=1

 

I wonder if this could be the source of all my problems with my NICs.

I never touched it (left "=1" while I have multiple LAN ports.

 

Were can I find more information about what it does and how I am supposed to set it up.

Especially when the onboard LAN is disabled. Does it still count as one or not?

 

Thanks a lot for your help.

best,

-a-

 

Link to post
Share on other sites
Posted (edited)

OK...

 

AFAICU from this post, set netif_num= should not matter too much since it should automagically be corrected according to what is declared as set mac1=, set mac2= etc...

 

(I still wonder what would happen if grub.cfg sets more mac addresses than there are actual physical LAN ports, though. For instance if 4 macs are set in grub.cfg while only one LAN port is present. In this case netif_num would be 4)

 

Anyways. I decided to migrate the rig to 1.04b (918+) using supported hardware:

4770k

z87-express based MB

Same PSU, same disks and same RAM

no more PCIe sATA controller required

Same PCIe quad port NIC (based on RTL8111G chipset)

the onboard LAN is disabled in BIOS

 

After migration, DSM only detects 2 LAN ports out of the 4. This is new. A new problem...

 

I'm currently testing if LACP over 2 ports is stable or not (single-port Ethernet was stable for 24h, which still might be luck)

 

I tried another unit of the same quad-port NIC -> same thing (only 2 LAN ports detected)

I tried another quad-port NIC (based on i350) -> same thing (only 2 LAN ports detected)

 

Any ideas?

 

Thank you very much in advance for your help.

best,

-a-

 

 

Edited by asheenlevrai
Link to post
Share on other sites

Thanks

 

Yes I migrated from 103b 3617xs to 1.04b 918+.

 

I don't know what maxlanport is, so I didn't change anything.

 

I'll google that and see if I can find information.

Or maybe you can point to a link if you have time.

 

Thanks a lot

-a-

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.