flyride Posted January 23, 2020 Share #126 Posted January 23, 2020 Your drives have reordered yet again. I know IG-88 said your controller deliberately presents them contiguously (which is problematic in itself) but if all drives are up and stable, I cannot see why that behavior would cause a reorder on reboot. I remain very wary of your hardware consistency. Look through dmesg and see if you have any hardware problems since your power cycle boot. Run another hotswap query and see if any drives have changed state since your power cycle boot. Run another mdstat - is it still slow? 1 Quote Link to comment Share on other sites More sharing options...
C-Fu Posted January 23, 2020 Author Share #127 Posted January 23, 2020 (edited) 37 minutes ago, flyride said: Run another mdstat - is it still slow? Yeah it is. Slow, but working. # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [raidF1] md4 : active raid5 sdl6[0] sdn6[2] sdm6[1] 11720987648 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/3] [UUU__] md2 : active raid5 sdb5[0] sdk5[12] sdo5[11] sdq5[9] sdp5[8] sdn5[7] sdm5[6] sdl5[5] sdf5[4] sde5[3] sdd5[2] sdc5[1] 35105225472 blocks super 1.2 level 5, 64k chunk, algorithm 2 [13/12] [UUUUUUUUUU_UU] md5 : active raid1 sdo7[3] 3905898432 blocks super 1.2 [2/0] [__] md1 : active raid1 sdb2[0] sdc2[1] sdd2[2] sde2[3] sdf2[4] sdk2[5] sdl2[6] sdm2[7] sdn2[11] sdo2[8] sdp2[9] sdq2[10] 2097088 blocks [24/12] [UUUUUUUUUUUU____________] md0 : active raid1 sdb1[1] sdc1[2] sdd1[3] sdf1[5] 2490176 blocks [12/4] [_UUU_U______] unused devices: <none> 37 minutes ago, flyride said: I remain very wary of your hardware consistency. Just out of curiosity, does this mean that if I were to replace the current sas card to another, would it fix? An IBM sas expander has just arrived, would this + my current 2 port sas card help somehow? Or is it because of something else, like motherboard? I just did a notepad++ compare with Post ID 113 for fdisk, and seems like nothing has changed. # fdisk -l /dev/sd? Disk /dev/sda: 223.6 GiB, 240057409536 bytes, 468862128 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: dos Disk identifier: 0x696935dc Device Boot Start End Sectors Size Id Type /dev/sda1 2048 468857024 468854977 223.6G fd Linux raid autodetect Disk /dev/sdb: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 43C8C355-AE0A-42DC-97CC-508B0FB4EF37 Device Start End Sectors Size Type /dev/sdb1 2048 4982527 4980480 2.4G Linux RAID /dev/sdb2 4982528 9176831 4194304 2G Linux RAID /dev/sdb5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdc: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 0600DFFC-A576-4242-976A-3ACAE5284C4C Device Start End Sectors Size Type /dev/sdc1 2048 4982527 4980480 2.4G Linux RAID /dev/sdc2 4982528 9176831 4194304 2G Linux RAID /dev/sdc5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdd: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 58B43CB1-1F03-41D3-A734-014F59DE34E8 Device Start End Sectors Size Type /dev/sdd1 2048 4982527 4980480 2.4G Linux RAID /dev/sdd2 4982528 9176831 4194304 2G Linux RAID /dev/sdd5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sde: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: E5FD9CDA-FE14-4F95-B776-B176E7130DEA Device Start End Sectors Size Type /dev/sde1 2048 4982527 4980480 2.4G Linux RAID /dev/sde2 4982528 9176831 4194304 2G Linux RAID /dev/sde5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdf: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 48A13430-10A1-4050-BA78-723DB398CE87 Device Start End Sectors Size Type /dev/sdf1 2048 4982527 4980480 2.4G Linux RAID /dev/sdf2 4982528 9176831 4194304 2G Linux RAID /dev/sdf5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdk: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: A3E39D34-4297-4BE9-B4FD-3A21EFC38071 Device Start End Sectors Size Type /dev/sdk1 2048 4982527 4980480 2.4G Linux RAID /dev/sdk2 4982528 9176831 4194304 2G Linux RAID /dev/sdk5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdl: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 849E02B2-2734-496B-AB52-A572DF8FE63F Device Start End Sectors Size Type /dev/sdl1 2048 4982527 4980480 2.4G Linux RAID /dev/sdl2 4982528 9176831 4194304 2G Linux RAID /dev/sdl5 9453280 5860326239 5850872960 2.7T Linux RAID /dev/sdl6 5860342336 11720838239 5860495904 2.7T Linux RAID Disk /dev/sdm: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 423D33B4-90CE-4E34-9C40-6E06D1F50C0C Device Start End Sectors Size Type /dev/sdm1 2048 4982527 4980480 2.4G Linux RAID /dev/sdm2 4982528 9176831 4194304 2G Linux RAID /dev/sdm5 9453280 5860326239 5850872960 2.7T Linux RAID /dev/sdm6 5860342336 11720838239 5860495904 2.7T Linux RAID Disk /dev/sdn: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 09CB7303-C2E7-46F8-ADA0-D4853F25CB00 Device Start End Sectors Size Type /dev/sdn1 2048 4982527 4980480 2.4G Linux RAID /dev/sdn2 4982528 9176831 4194304 2G Linux RAID /dev/sdn5 9453280 5860326239 5850872960 2.7T Linux RAID /dev/sdn6 5860342336 11720838239 5860495904 2.7T Linux RAID Disk /dev/sdo: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 1713E819-3B9A-4CE3-94E8-5A3DBF1D5983 Device Start End Sectors Size Type /dev/sdo1 2048 4982527 4980480 2.4G Linux RAID /dev/sdo2 4982528 9176831 4194304 2G Linux RAID /dev/sdo5 9453280 5860326239 5850872960 2.7T Linux RAID /dev/sdo6 5860342336 11720838239 5860495904 2.7T Linux RAID /dev/sdo7 11720854336 19532653311 7811798976 3.7T Linux RAID Disk /dev/sdp: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 1D5B8B09-8D4A-4729-B089-442620D3D507 Device Start End Sectors Size Type /dev/sdp1 2048 4982527 4980480 2.4G Linux RAID /dev/sdp2 4982528 9176831 4194304 2G Linux RAID /dev/sdp5 9453280 5860326239 5850872960 2.7T Linux RAID Disk /dev/sdq: 2.7 TiB, 3000592982016 bytes, 5860533168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: gpt Disk identifier: 54D81C51-AB85-4DE2-AA16-263DF1C6BB8A Device Start End Sectors Size Type /dev/sdq1 2048 4982527 4980480 2.4G Linux RAID /dev/sdq2 4982528 9176831 4194304 2G Linux RAID /dev/sdq5 9453280 5860326239 5850872960 2.7T Linux RAID # dmesg | tail [38440.957234] --- wd:0 rd:2 [38440.957237] RAID1 conf printout: [38440.957238] --- wd:0 rd:2 [38440.957239] disk 0, wo:1, o:1, dev:sdo7 [38440.957258] md: md5: set sdo7 to auto_remap [1] [38440.957260] md: recovery of RAID array md5 [38440.957262] md: minimum _guaranteed_ speed: 600000 KB/sec/disk. [38440.957263] md: using maximum available idle IO bandwidth (but not more than 800000 KB/sec) for recovery. [38440.957264] md: using 128k window, over a total of 3905898432k. [38440.957535] md: md5: set sdo7 to auto_remap [0] This looks like OK, right? Oddly enough fgrep hotswap doesn't return anything. But the last few hundred lines of cat /var/log/disk.log are 2020-01-24T02:23:21+08:00 homelab kernel: [38410.974657] md: md5: set sdo7 to auto_remap [1] 2020-01-24T02:23:21+08:00 homelab kernel: [38410.974659] md: recovery of RAID array md5 2020-01-24T02:23:21+08:00 homelab kernel: [38410.974945] md: md5: set sdo7 to auto_remap [0] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.005717] md: md5: set sdo7 to auto_remap [1] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.005718] md: recovery of RAID array md5 2020-01-24T02:23:21+08:00 homelab kernel: [38411.005961] md: md5: set sdo7 to auto_remap [0] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.038632] md: md5: set sdo7 to auto_remap [1] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.038634] md: recovery of RAID array md5 2020-01-24T02:23:21+08:00 homelab kernel: [38411.038873] md: md5: set sdo7 to auto_remap [0] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.074782] md: md5: set sdo7 to auto_remap [1] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.074784] md: recovery of RAID array md5 2020-01-24T02:23:21+08:00 homelab kernel: [38411.074973] md: md5: set sdo7 to auto_remap [0] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.106766] md: md5: set sdo7 to auto_remap [1] 2020-01-24T02:23:21+08:00 homelab kernel: [38411.106767] md: recovery of RAID array md5 2020-01-24T02:23:21+08:00 homelab kernel: [38411.106956] md: md5: set sdo7 to auto_remap [0] And that's the current time. Edited January 23, 2020 by C-Fu Quote Link to comment Share on other sites More sharing options...
flyride Posted January 23, 2020 Share #128 Posted January 23, 2020 Everything is going up and down right now. You can see the changed drive assignments between the two last posted mdstats. We can't do anything with this until it's stable. 1 Quote Link to comment Share on other sites More sharing options...
C-Fu Posted January 23, 2020 Author Share #129 Posted January 23, 2020 (edited) 8 minutes ago, flyride said: You can see the changed drive assignments between the two last posted mdstats. Damn. You're right. Usually when something like this happens... is there a way to prevent the sas card from doing this? Like a setting or a bios update or something. Or does this mean that the card is dying? If I take out say, sda - the SSD and put it back in, will the assignments change and revert back? Or whatever drive connected to the sas card. Sorry I'm just frustrated but still wanna understand Edited January 23, 2020 by C-Fu Quote Link to comment Share on other sites More sharing options...
flyride Posted January 23, 2020 Share #130 Posted January 23, 2020 I can't really answer your question. Drives are going up and down. That can happen because the interface is unreliable, or the power is unreliable. A logic problem in the SAS card is way more likely to be a total failure, not an intermittent one. If it were me, I would completely replace all your SATA cables and the power supply. Quote Link to comment Share on other sites More sharing options...
C-Fu Posted January 24, 2020 Author Share #131 Posted January 24, 2020 16 hours ago, flyride said: If it were me, I would completely replace all your SATA cables and the power supply. I just changed from 750W psu to a 1600W psu that's fairly new (only a few day's use max), so I don't believe the PSU is the problem. When I get back on monday, I'll see if I can replace the whole system (I have a few motherboards unused) and cables and whatnot and reuse the SAS card if that's not likely the issue, and maybe reinstall Xpenology. Would that be a good idea? Quote Link to comment Share on other sites More sharing options...
flyride Posted January 24, 2020 Share #132 Posted January 24, 2020 3 hours ago, C-Fu said: I just changed from 750W psu to a 1600W psu that's fairly new (only a few day's use max), so I don't believe the PSU is the problem. When I get back on monday, I'll see if I can replace the whole system (I have a few motherboards unused) and cables and whatnot and reuse the SAS card if that's not likely the issue, and maybe reinstall Xpenology. Would that be a good idea? If all your problems started after that power supply replacement, this further reinforces the idea of stable power. You seem reluctant to believe that a new power supply can be a problem (it can). For what it's worth, 13 drives x 5w equals 65w, that shouldn't be a factor. In any debugging and recovery operation, the objective should be to manage the change rate and therefore risk. Replacing the whole system would violate that strategy. Do the drive connectivity failures implicate a SAS card problem? Maybe, but a much more plausible explanation is physical connectivity or power. If you have an identical SAS card, and it is passive (no intrinsic configuration required), replacing it is a low risk troubleshooting strategy. Do failures implicate the motherboard? Maybe, if you are using on-board SATA ports, but the same plausibility test applies. However, there is more variability and risk (mobo model, BIOS settings, etc). Do failures implicate DSM or loader stability? Not at all; DSM boots fine and is not crashing. And if you reinstall DSM, it's very likely your arrays will be destructively reconfigured. Please don't do this. So I'll stand by (and extend) my previous statement - if this were my system, I would change your power and cables first. If that doesn't solve things, maybe the SAS card, and lastly the motherboard. 1 Quote Link to comment Share on other sites More sharing options...
IG-88 Posted January 24, 2020 Share #133 Posted January 24, 2020 (edited) On 1/24/2020 at 5:18 PM, flyride said: So I'll stand by (and extend) my previous statement - if this were my system, I would change your power and cables first. If that doesn't solve things, maybe the SAS card, and lastly the motherboard. if the changes in order are related to the sas controller/driver its possible to use a 8 port sata/ahci controller https://xpenology.com/forum/topic/19854-sata-controllers-not-recognized/?do=findComment&comment=122709 Edited January 25, 2020 by IG-88 Quote Link to comment Share on other sites More sharing options...
flyride Posted January 24, 2020 Share #134 Posted January 24, 2020 That looks like a pretty nice low cost card, and with an onboard PCIe switch too. Quote Link to comment Share on other sites More sharing options...
IG-88 Posted January 25, 2020 Share #135 Posted January 25, 2020 On 1/24/2020 at 8:59 PM, flyride said: That looks like a pretty nice low cost card, and with an onboard PCIe switch too. it might not get the most out of a system with ssd only but on the other hand people with that in mind would no use sata drives, m.2 nvme would be the solution for this also there are usually still onboard sata connectors for using ssd's with "full" 6Gb/s Quote Link to comment Share on other sites More sharing options...
IG-88 Posted February 2, 2020 Share #136 Posted February 2, 2020 @flyride synology seem to not use bitmap with its raid in theory it could be a great help to have this when a drive drops out of the raid as it could be re-synced in just a few minutes or seconds we know by the event number that most of the fallen out drive is good and we don't have to write multi TB data to get it into the raid again any reason (beside performance) not to do a mdadm --grow --bitmap=internal /dev/md2 "It uses space that the alignment requirements of the metadata assure us is otherwise unused. For v0.90, that is limited to 60K. For 1.x it is 3K. As this is unused disk space, bitmaps can be added to an existing md device without the risk to take away space from an existing filesystem on that device." and if there would be a performance problem it could be removed at any time without impact mdadm --grow --bitmap=none /dev/md2 https://raid.wiki.kernel.org/index.php/Write-intent_bitmap https://raid.wiki.kernel.org/index.php/Mdstat#bitmap_line it does not help to resolve hardware problems but might decrease the odds another drive is failing in the time windows of decreased redundancy by shorten the time to a minimum Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.