Recommended Posts

Hey there,

today was an interesting day. I finally got all parts together and decided to upgrade my server to support more drives. 5 hours later, and two heart attacks, I finally got everything running. But lets start from the beginning.

(I am not mentioning my self-made problems of updating my BIOS which I shared here (3 posts total):

 

So I ahave currently 10 disk plus two SSDs in my setup and want to expand my storage. Case has 24 hot-swap ports, so all fine. Already have 6 (4 usable via SAS cable atm) internal ports and am using an LSI SAS 9220-8i for 8 more ports. Of course my synoinfo file was adjusted to handle 12/14 drives. Now I bought another controller, LSI SAS 9300-8i which can also handle 8 ports. Good news is, DSM support SAS2 and SAS3 (but obviously I am using SATA6Gb/s) natively.

Now I had to adjust my synoinfo to allow myself the 22 ports I need. So I thought I would be smart, obviously that one did not work out well :D

I am using the 3615xs with 1.03b

 

Default Synology configuration:
esataportcfg="0xff000"   
internalportcfg="0xfff"
usbportcfg="0x300000"

maxdisks="12"

 

config I had working with my one SAS controller:

internalportcfg="0x3fff"
usbportcfg="0x300000"
esataportcfg="0xfc000"

maxdisks="14"

 

Now I thought, well lets add these 22 drives I can use total (8+8+6) but I still wanted to be able to mount USB drives, you newer know. I found online that one guy was able to use 48 hard drives, so I assumed this worked. Used a hex to binary editor and wanted to have the following

 

22 disks, 0 esata, 2 usb
esataportcfg="0x0"
internalportcfg="0xfffff"
usbportcfg="0xc00000"
maxdisks="22"

WARNING: That did not work!

 

So what happend? I think either I messed up with the configuration, or the system/DMS is not able to handle so many disks. Either way, because I thought I could not harm anything, 'cause I just added the support for more drives, I saved and rebooted.

 

MEEEEP

 

So system came online and just saw four disks, those on the internal controller. That were both SSD (in one pool; healthy) and two disks from my second Storage Pool (10 disks, SHR2). System state CRITICAL, Volume crashed. All apps - gone! Oh well.

So I hoped that the system were unable to mount the volume, but was just performing read-only operations on it. The other part of me hoped, as 2 out of 10 were there, that in worst case if I bring the rest online, I could repair the array. Needless to say I have made a full backup before, but my pulse was anyhow over 120 xD

One other thing that worried me was the fact that synology might use internal "System Change numbers/sequeces" like databases do - to identify the status of a volume. More to that later.

 

I did some reseach online and foudn this page: https://zective.com/code/xpenology/

(Note I do not own that site or have anything to do with it - and it Chinese!)

Google was my friend so I inserted the defaults on top - and it showed me that my maximum seems to be 22 disks. So I inserted what I needed, screwed over esata and USB and gone full internal ports.

 

Configuration as follow:

22 disks, 0 esata, 0 usb
esataportcfg="0x0"
internalportcfg="0x3fffff"
usbportcfg="0x0"
maxdisks="22"

 

Aaaand I rebooted.

System came back online, now showing 22 available disks. SSD was still healthy and my SHR2 pool was back in the GAME! Man I felt happy. So I assume the following happend:

- every disk has a small piece for the DSM system on it

- there is a sequence number stored to save the "latest status"

- when trying to mount the volume/bring the array online with just two than 10 disks, it failed, and the disks were not touched

- once the needed number of drives came back online (8+) the array gets mounted, data was there and not touched

- System partitions on all these 10 drives were out of sync and showed amber. Had to click "repair" in storage overview, and DSM applied the increments from the intact SSDs (latest status/sequence) to all the drives

- Synology services were all offline, had to restart them all manually

 

So far so good!

Not seeing any issue with the services in general, snapshots are there, but config for the next snapshots were gone. Nevermind, easy to recreate them.

 

I am now running a full scrub on the pool, just checking for errors. I will also verify the file integrity for my super critical files with a 1:1 check, but thats more for me to sleep better.

 

TLDR; doing "serverstuff" can be exiting and is sometimes replacing a good old coffee. Thanks everyone who has made it to the end!

 

-IceBoosteR

Edited by IceBoosteR
  • Like 2
Link to post
Share on other sites
On 11/21/2020 at 1:40 AM, IceBoosteR said:

22 disks, 0 esata, 2 usb
esataportcfg="0x0"
internalportcfg="0xfffff"
usbportcfg="0xc00000"
maxdisks="22"

WARNING: That did not work!

 

at least internalportcfg is two disks off, its 20 disks (every "f" is 4 bit in binary aka 4 disks)

usb does look ok for 22 disks

 

also you should keep in mind that when dsm does bigger updates (200-300 MB *.pat file, like 6.2.2. to 6.2.3) it will overwrite your synoinfo.conf in the update process and you will be back on 12 drives default after this and then you would need to redo your extension again

 

On 11/21/2020 at 1:40 AM, IceBoosteR said:

- every disk has a small piece for the DSM system on it

try "cat /proc/mdstat" to see the mdadm (software) raid

dsm (system) is a raid1 over all disks (excluding cache drives), 2.4GB, 2nd partition is swap 2GB, also raid1 and the pace after this is going into the raid for pools/volumes as raid type depending on your configuration (SHR is using LVM2 to create volumes from different raid types)

 

On 11/21/2020 at 1:40 AM, IceBoosteR said:

- there is a sequence number stored to save the "latest status"

its a normal mdadm raid set and it handles the sequence numbers, dsm just uses it is every other linux distribution does

when tere are not enough disks available to assemble the raid then it simply fails and the later running lvm and mount process will simply not work because devices (/dev/mdX) are not present - the dsm system is usually unaffected as its running from a raid1 and as long as at least one disk is working it will come up

 

On 11/21/2020 at 1:40 AM, IceBoosteR said:

- System partitions on all these 10 drives were out of sync and showed amber. Had to click "repair" in storage overview, and DSM applied the increments from the intact SSDs (latest status/sequence) to all the drives

 

you used the raid1 with just 2 disks (4 disks minis 2 cache that dont have system) and increased the sequence number by "using" that raid set, the other disks can not join on next boot as they are out of sync, the repair add's them again to the raid

imho it did not apply increments as there is no log for this, it simply was overwriting the partitions with copy's of the working and as its only 2.4 + 2 GB it will not take long, it you do the same with a 4TB disks dropped out of a raid set you will see that it takes hours because the 3.xTB data partition of your volume is completely recreated (a log based resync would only take second or minutes)

 

 

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.