• 0

Volume degraded : whats next?


Question

 

Hi there,

 

I got myself a nice DX920 with the worst drives ever : Toshiba N300. I got 3 of them and already have 2 fails. It never happened to me. DO NOT BUY THOSE DRIVES!

 

I end up with a degraded Volume2 and I need you to guide me on how to solve that.

It was a volume with 4 drives of 10Tb.

Toshiba Drive 3 and 4 are failing.

I took off the Drive 4 and put a new Seagate 10Tb and then got scared/worried and I put back the Drive4 Toshiba in.

 

I have a working 10Tb replacement drive but I am unsure of how to process from there. The new drive is not in the volume... scary...

 

In the printscreen below you will see how it looks BEFORE i replace the drive 4.

Should I repair BEFORE?

Or insert the drive 4 and then repair?

 

Thanks

 

image.thumb.png.3ae5fbc476094b99d5b8aeae5f8d5870.pngimage.thumb.png.1a9ab6315bf0b07a4ce9b9e5704fa8d5.pngimage.thumb.png.4044caf4ce2d03480f73fbc72b9d7954.png

Link to post
Share on other sites

22 answers to this question

Recommended Posts

  • 0

System Partition failed does not mean that a drive is failing - it means the the copy of the DSM OS on that particular drive is inconsistent with the others, so it is not being used.  You have three (maybe two) other copies of it.  It is not a big deal.

 

Unfortunately replacing drive #4 was the wrong thing for your data.  Did you try to boot up with the replacement drive in place?

 

DSM aggressively reports a drive as "failing" whenever there is a SMART failure.  It may or may not be critical.

 

Find out what is actually happening before doing anything else.  First post the SMART status of drive #3.  Then go to command line and execute cat /proc/mdstat which will show the actual status of your array and post the result.

Edited by flyride
Link to post
Share on other sites
  • 0
30 minutes ago, flyride said:

System Partition failed does not mean that a drive is failing - it means the the copy of the DSM OS on that particular drive is inconsistent with the others, so it is not being used.  You have three (maybe two) other copies of it.  It is not a big deal.

 

Unfortunately replacing drive #4 was the wrong thing for your data.  Did you try to boot up with the replacement drive in place?

 

DSM aggressively reports a drive as "failing" whenever there is a SMART failure.  It may or may not be critical.

 

Find out what is actually happening before doing anything else.  First post the SMART status of drive #3.  Then go to command line and execute cat /proc/mdstat which will show the actual status of your array and post the result.

Hey... Flyride! Thanks again for help others. Appreciated! Again :-)

 

Here it is!

 

image.png.9cbec94b1425e54c3396d65bb93025b5.pngimage.thumb.png.bd3674f7dd7b9724d11311309bed956b.png

Link to post
Share on other sites
  • 0

There is a SMART detail page from the UI that might be helpful.  In lieu of that, post

 

# smartctl -x -d sat /dev/sata2p

 

Also I'm not clear whether you ran the system with the replaced drive #4 or if the array is really still usable safely.  Post the results of this command:

 

# mdadm --examine /dev/sata[1234]p3 | egrep 'Event|/dev/sata'

 

You may need to elevate to root before running at least the smartctl command.

 

Note that the drive sequence 1,2,3,4 is actually 4,1,2,3 and logical drive #2 is the one with the actual issue.

Edited by flyride
Link to post
Share on other sites
  • 0
20 minutes ago, flyride said:

There is a SMART detail page from the UI that might be helpful.  In lieu of that, post

 

# smartctl -x -d sat /dev/sata2p

 

Also I'm not clear whether you ran the system with the replaced drive #4 or if the array is really still usable safely.  Post the results of this command:

 

# mdadm --examine /dev/sata[1234]p3 | egrep 'Event|/dev/sata'

 

You may need to elevate to root before running at least the smartctl command.

 

Note that the drive sequence 1,2,3,4 is actually 4,1,2,3 and logical drive #2 is the one with the actual issue.

image.png.69603bbe4b9e9b2be6bc0592334e11a1.png

I also moved to root: 

root@Syno_Main:~# smartctl -x -d sat /dev/sata2p
smartctl 6.5 (build date May  7 2020) [x86_64-linux-4.4.59+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

Smartctl open device: /dev/sata2p [SAT] failed: No such device
root@Syno_Main:~# mdadm --examine /dev/sata[1234]p3 | egrep 'Event|/dev/sata'
mdadm: cannot open /dev/sata[1234]p3 : No such file or directory


 

 

Link to post
Share on other sites
  • 0
3 minutes ago, flyride said:

The enclosure causes the devices to be named (and maybe classified) differently and it makes different results.  Try:

 

# find /dev -name sata1p

 

and

 

# fdisk -l

root@Syno_Main:~# find /dev -name sata1pfind /dev -name sata1p
find: paths must precede expression: /dev
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
root@Syno_Main:~# fdisk -l
Disk /dev/ram0: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram1: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram2: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram3: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram4: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram5: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram6: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram7: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram8: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram9: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram10: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram11: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram12: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram13: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram14: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/ram15: 640 MiB, 671088640 bytes, 1310720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/sata2: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2F384465-FFE1-11EA-9C11-0015177343D0

Device         Start         End     Sectors  Size Type
/dev/sata2p1    2048     4982527     4980480  2.4G Linux RAID
/dev/sata2p2 4982528     9176831     4194304    2G Linux RAID
/dev/sata2p3 9437184 19532668927 19523231744  9.1T Linux RAID


Disk /dev/sata1: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2EABC2F3-FFE1-11EA-9C11-0015177343D0

Device         Start         End     Sectors  Size Type
/dev/sata1p1    2048     4982527     4980480  2.4G Linux RAID
/dev/sata1p2 4982528     9176831     4194304    2G Linux RAID
/dev/sata1p3 9437184 19532668927 19523231744  9.1T Linux RAID


Disk /dev/sata3: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2BC1525F-FFE1-11EA-9C11-0015177343D0

Device         Start         End     Sectors  Size Type
/dev/sata3p1    2048     4982527     4980480  2.4G Linux RAID
/dev/sata3p2 4982528     9176831     4194304    2G Linux RAID
/dev/sata3p3 9437184 19532668927 19523231744  9.1T Linux RAID


Disk /dev/sata4: 9.1 TiB, 10000831348736 bytes, 19532873728 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2D0D263F-FFE1-11EA-9C11-0015177343D0

Device         Start         End     Sectors  Size Type
/dev/sata4p1    2048     4982527     4980480  2.4G Linux RAID
/dev/sata4p2 4982528     9176831     4194304    2G Linux RAID
/dev/sata4p3 9437184 19532668927 19523231744  9.1T Linux RAID


Disk /dev/md0: 2.4 GiB, 2549940224 bytes, 4980352 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/md1: 2 GiB, 2147418112 bytes, 4194176 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram0: 565 MiB, 592445440 bytes, 144640 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram1: 565 MiB, 592445440 bytes, 144640 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram2: 565 MiB, 592445440 bytes, 144640 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/zram3: 565 MiB, 592445440 bytes, 144640 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/md3: 27.3 TiB, 29987680813056 bytes, 58569689088 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 65536 bytes / 196608 bytes


GPT PMBR size mismatch (239649 != 245759) will be corrected by w(rite).
Disk /dev/synoboot: 120 MiB, 125829120 bytes, 245760 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0361D5BB-196D-4EB0-AC6B-A8D9B7DE9DCB

Device         Start    End Sectors Size Type
/dev/synoboot1  2048  67583   65536  32M EFI System
/dev/synoboot2 67584 239615  172032  84M Linux filesystem

 

Link to post
Share on other sites
  • 0

Ok, I got it now.  Partitions are labeled p1 p2 p3 instead of just numerical sequence.

 

# smartctl -x -d sat /dev/sata2

 

and

 

# mdadm --examine /dev/sata[1234]p3 | egrep 'Event|/dev/sata'

Link to post
Share on other sites
  • 0
Quote

 

root@Syno_Main:~# smartctl -x -d sat /dev/sata2
smartctl 6.5 (build date May  7 2020) [x86_64-linux-4.4.59+] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA HDWG11A
Serial Number:    Z9F0A004FBDG
LU WWN Device Id: 5 000039 9d8cb009a
Firmware Version: 0603
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA >3.2 (0x1ff), 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jan 29 19:02:51 2021 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  73) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1003) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME                                                   FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate                                              PO-R--   100   100   050    -    0
  2 Throughput_Performance                                           P-S---   100   100   050    -    0
  3 Spin_Up_Time                                                     POS--K   100   100   001    -    9417
  4 Start_Stop_Count                                                 -O--CK   100   100   000    -    154
  5 Reallocated_Sector_Ct                                            PO--CK   100   100   050    -    0
  7 Seek_Error_Rate                                                  PO-R--   100   001   050    Past 0
  8 Seek_Time_Performance                                            P-S---   100   100   050    -    0
  9 Power_On_Hours                                                   -O--CK   096   096   000    -    1873
 10 Spin_Retry_Count                                                 PO--CK   100   100   030    -    0
 12 Power_Cycle_Count                                                -O--CK   100   100   000    -    103
191 G-Sense_Error_Rate                                               -O--CK   100   100   000    -    1
192 Power-Off_Retract_Count                                          -O--CK   100   100   000    -    37
193 Load_Cycle_Count                                                 -O--CK   100   100   000    -    154
194 Temperature_Celsius                                              -O---K   100   100   000    -    36 (Min/Max 17/52)
196 Reallocated_Event_Count                                          -O--CK   100   100   000    -    0
197 Current_Pending_Sector                                           -O--CK   100   100   000    -    0
198 Offline_Uncorrectable                                            ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count                                             -O--CK   200   200   000    -    0
220 Disk_Shift                                                       -O----   100   001   000    -    219283467
222 Loaded_Hours                                                     -O--CK   096   096   000    -    1872
223 Load_Retry_Count                                                 -O--CK   100   100   000    -    0
224 Load_Friction                                                    -O---K   100   100   000    -    0
226 Load-in_Time                                                     -OS--K   100   100   000    -    529
240 Head_Flying_Hours                                                P-----   100   100   001    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O     51  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O    513  Pending Defects log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x24       GPL     R/O  49152  Current Device Internal Status Data log
0x25       GPL     R/O  49152  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    90%      1806         0

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       1 (0x0001)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    36 Celsius
Power Cycle Min/Max Temperature:     21/37 Celsius
Lifetime    Min/Max Temperature:     17/52 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      5/55 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    478 (224)

Index    Estimated Time   Temperature Celsius
 225    2021-01-29 11:05    35  ****************
 ...    ..(  8 skipped).    ..  ****************
 234    2021-01-29 11:14    35  ****************
 235    2021-01-29 11:15    36  *****************
 236    2021-01-29 11:16    35  ****************
 ...    ..( 11 skipped).    ..  ****************
 248    2021-01-29 11:28    35  ****************
 249    2021-01-29 11:29    36  *****************
 250    2021-01-29 11:30    35  ****************
 ...    ..(  9 skipped).    ..  ****************
 260    2021-01-29 11:40    35  ****************
 261    2021-01-29 11:41    36  *****************
 262    2021-01-29 11:42    35  ****************
 ...    ..( 22 skipped).    ..  ****************
 285    2021-01-29 12:05    35  ****************
 286    2021-01-29 12:06    36  *****************
 287    2021-01-29 12:07    35  ****************
 ...    ..( 49 skipped).    ..  ****************
 337    2021-01-29 12:57    35  ****************
 338    2021-01-29 12:58    36  *****************
 339    2021-01-29 12:59    36  *****************
 340    2021-01-29 13:00    35  ****************
 341    2021-01-29 13:01    35  ****************
 342    2021-01-29 13:02    36  *****************
 343    2021-01-29 13:03    35  ****************
 344    2021-01-29 13:04    35  ****************
 345    2021-01-29 13:05    36  *****************
 346    2021-01-29 13:06    35  ****************
 347    2021-01-29 13:07    36  *****************
 348    2021-01-29 13:08    35  ****************
 349    2021-01-29 13:09    35  ****************
 350    2021-01-29 13:10    36  *****************
 351    2021-01-29 13:11    36  *****************
 352    2021-01-29 13:12    36  *****************
 353    2021-01-29 13:13    35  ****************
 354    2021-01-29 13:14    36  *****************
 355    2021-01-29 13:15    35  ****************
 ...    ..(  2 skipped).    ..  ****************
 358    2021-01-29 13:18    35  ****************
 359    2021-01-29 13:19    36  *****************
 360    2021-01-29 13:20    36  *****************
 361    2021-01-29 13:21    35  ****************
 362    2021-01-29 13:22    36  *****************
 363    2021-01-29 13:23    36  *****************
 364    2021-01-29 13:24    35  ****************
 365    2021-01-29 13:25    36  *****************
 ...    ..(  5 skipped).    ..  *****************
 371    2021-01-29 13:31    36  *****************
 372    2021-01-29 13:32    35  ****************
 373    2021-01-29 13:33    35  ****************
 374    2021-01-29 13:34    36  *****************
 ...    ..( 12 skipped).    ..  *****************
 387    2021-01-29 13:47    36  *****************
 388    2021-01-29 13:48    35  ****************
 389    2021-01-29 13:49    36  *****************
 ...    ..( 67 skipped).    ..  *****************
 457    2021-01-29 14:57    36  *****************
 458    2021-01-29 14:58    37  ******************
 459    2021-01-29 14:59    36  *****************
 ...    ..(  8 skipped).    ..  *****************
 468    2021-01-29 15:08    36  *****************
 469    2021-01-29 15:09    37  ******************
 470    2021-01-29 15:10    36  *****************
 ...    ..(  6 skipped).    ..  *****************
 477    2021-01-29 15:17    36  *****************
   0    2021-01-29 15:18    37  ******************
   1    2021-01-29 15:19    36  *****************
   2    2021-01-29 15:20    37  ******************
   3    2021-01-29 15:21    37  ******************
   4    2021-01-29 15:22    36  *****************
 ...    ..( 15 skipped).    ..  *****************
  20    2021-01-29 15:38    36  *****************
  21    2021-01-29 15:39    37  ******************
  22    2021-01-29 15:40    36  *****************
 ...    ..( 61 skipped).    ..  *****************
  84    2021-01-29 16:42    36  *****************
  85    2021-01-29 16:43    37  ******************
  86    2021-01-29 16:44    36  *****************
  87    2021-01-29 16:45    36  *****************
  88    2021-01-29 16:46    37  ******************
  89    2021-01-29 16:47    36  *****************
  90    2021-01-29 16:48    37  ******************
  91    2021-01-29 16:49    37  ******************
  92    2021-01-29 16:50    36  *****************
  93    2021-01-29 16:51    37  ******************
 ...    ..(  3 skipped).    ..  ******************
  97    2021-01-29 16:55    37  ******************
  98    2021-01-29 16:56    36  *****************
  99    2021-01-29 16:57    37  ******************
 ...    ..( 27 skipped).    ..  ******************
 127    2021-01-29 17:25    37  ******************
 128    2021-01-29 17:26     ?  -
 129    2021-01-29 17:27    21  **
 130    2021-01-29 17:28    21  **
 131    2021-01-29 17:29    21  **
 132    2021-01-29 17:30    22  ***
 133    2021-01-29 17:31    22  ***
 134    2021-01-29 17:32    23  ****
 135    2021-01-29 17:33     ?  -
 136    2021-01-29 17:34    22  ***
 ...    ..(  2 skipped).    ..  ***
 139    2021-01-29 17:37    22  ***
 140    2021-01-29 17:38    23  ****
 141    2021-01-29 17:39    23  ****
 142    2021-01-29 17:40    24  *****
 143    2021-01-29 17:41    25  ******
 144    2021-01-29 17:42    25  ******
 145    2021-01-29 17:43    26  *******
 146    2021-01-29 17:44    27  ********
 147    2021-01-29 17:45    27  ********
 148    2021-01-29 17:46    28  *********
 149    2021-01-29 17:47    28  *********
 150    2021-01-29 17:48    29  **********
 151    2021-01-29 17:49    29  **********
 152    2021-01-29 17:50    29  **********
 153    2021-01-29 17:51    30  ***********
 154    2021-01-29 17:52    30  ***********
 155    2021-01-29 17:53    31  ************
 ...    ..(  2 skipped).    ..  ************
 158    2021-01-29 17:56    31  ************
 159    2021-01-29 17:57    32  *************
 ...    ..(  2 skipped).    ..  *************
 162    2021-01-29 18:00    32  *************
 163    2021-01-29 18:01    33  **************
 ...    ..(  2 skipped).    ..  **************
 166    2021-01-29 18:04    33  **************
 167    2021-01-29 18:05    34  ***************
 168    2021-01-29 18:06    33  **************
 169    2021-01-29 18:07    34  ***************
 ...    ..(  4 skipped).    ..  ***************
 174    2021-01-29 18:12    34  ***************
 175    2021-01-29 18:13    35  ****************
 176    2021-01-29 18:14    34  ***************
 177    2021-01-29 18:15    35  ****************
 ...    ..(  5 skipped).    ..  ****************
 183    2021-01-29 18:21    35  ****************
 184    2021-01-29 18:22    36  *****************
 185    2021-01-29 18:23    35  ****************
 186    2021-01-29 18:24    35  ****************
 187    2021-01-29 18:25    35  ****************
 188    2021-01-29 18:26    36  *****************
 189    2021-01-29 18:27    35  ****************
 190    2021-01-29 18:28    35  ****************
 191    2021-01-29 18:29    35  ****************
 192    2021-01-29 18:30    36  *****************
 ...    ..( 24 skipped).    ..  *****************
 217    2021-01-29 18:55    36  *****************
 218    2021-01-29 18:56    37  ******************
 219    2021-01-29 18:57    36  *****************
 220    2021-01-29 18:58    36  *****************
 221    2021-01-29 18:59    36  *****************
 222    2021-01-29 19:00    37  ******************
 223    2021-01-29 19:01    37  ******************
 224    2021-01-29 19:02    36  *****************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 3) ==
0x01  0x008  4             103  ---  Lifetime Power-On Resets
0x01  0x010  4            1873  ---  Power-on Hours
0x01  0x018  6     39014373458  ---  Logical Sectors Written
0x01  0x020  6       106537169  ---  Number of Write Commands
0x01  0x028  6     52563865324  ---  Logical Sectors Read
0x01  0x030  6       137322282  ---  Number of Read Commands
0x01  0x038  6      6742800000  ---  Date and Time TimeStamp
0x02  =====  =               =  ===  == Free-Fall Statistics (rev 1) ==
0x02  0x010  4               1  ---  Overlimit Shock Events
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4             679  ---  Spindle Motor Power-on Hours
0x03  0x010  4             679  ---  Head Flying Hours
0x03  0x018  4             154  ---  Head Load Events
0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
0x03  0x028  4               0  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
0x03  0x040  4              37  ---  Number of High Priority Unload Events
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               0  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              36  ---  Current Temperature
0x05  0x010  1              34  N--  Average Short Term Temperature
0x05  0x018  1              39  N--  Average Long Term Temperature
0x05  0x020  1              52  ---  Highest Temperature
0x05  0x028  1              17  ---  Lowest Temperature
0x05  0x030  1              50  N--  Highest Average Short Term Temperature
0x05  0x038  1              31  N--  Lowest Average Short Term Temperature
0x05  0x040  1              45  N--  Highest Average Long Term Temperature
0x05  0x048  1              39  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              55  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               5  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              53  ---  Number of Hardware Resets
0x06  0x010  4              34  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  4            0  Command failed due to ICRC error
0x0002  4            0  R_ERR response for data FIS
0x0003  4            0  R_ERR response for device-to-host data FIS
0x0004  4            0  R_ERR response for host-to-device data FIS
0x0005  4            0  R_ERR response for non-data FIS
0x0006  4            0  R_ERR response for device-to-host non-data FIS
0x0007  4            0  R_ERR response for host-to-device non-data FIS
0x0008  4            0  Device-to-host non-data FIS retries
0x0009  4            2  Transition from drive PhyRdy to drive PhyNRdy
0x000a  4            0  Device-to-host register FISes sent due to a COMRESET
0x000b  4            0  CRC errors within host-to-device FIS
0x000d  4            0  Non-CRC errors within host-to-device FIS
0x000f  4            0  R_ERR response for host-to-device data FIS, CRC
0x0010  4            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  4            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  4            0  R_ERR response for host-to-device non-data FIS, non-CRC

root@Syno_Main:~# mdadm --examine /dev/sata[1234]p3 | egrep 'Event|/dev/sata'
/dev/sata1p3:
         Events : 17517
/dev/sata2p3:
         Events : 17272
/dev/sata3p3:
         Events : 17517
/dev/sata4p3:
         Events : 17517

That is crazy long!!!

Link to post
Share on other sites
  • 0

Ok.  This tells us a few things, mostly positive.

 

Your sata2 device (which is physical disk #3 of 4) has a SMART status indicated seek failures at some point but that is not flagging the drive as SMART failed.  DSM has determined that the drive has failed because there was a problem completing a SMART Extended test in the past. The drive may be fine, but requires further testing for DSM to unflag it.

 

Whatever has happened to the array has caused sata2 (physical disk #3) to drop out of the array, but only very recently.  Whatever you did with disk #4 happened when the array was offline so no harm done (good news).

 

You have two options to clear this up:

  1. Attempt to run a SMART Extended test on physical disk #3 to see if it will clear the flag. If it does, just resync the array with disk #3
  2. Replace disk #3 with your spare, and resync the array

After you restore your array redundancy, correct disk #4's System Partition error by going to the Storage Pool and click "Fix System Partition"

Link to post
Share on other sites
  • 0
2 minutes ago, flyride said:

Ok.  This tells us a few things, mostly positive.

 

Your sata2 device (which is physical disk #3 of 4) has a SMART status indicated seek failures at some point but that is not flagging the drive as SMART failed.  DSM has determined that the drive has failed because there was a problem completing a SMART Extended test in the past. The drive may be fine, but requires further testing for DSM to unflag it.

 

Whatever has happened to the array has caused sata2 (physical disk #3) to drop out of the array, but only very recently.  Whatever you did with disk #4 happened when the array was offline so no harm done (good news).

 

You have two options to clear this up:

  1. Attempt to run a SMART Extended test on physical disk #3 to see if it will clear the flag. If it does, just resync the array with disk #3
  2. Replace disk #3 with your spare, and resync the array

After you restore your array redundancy, correct disk #4's System Partition error by going to the Storage Pool and click "Fix System Partition"

 

 

Hi Flyride. 
 

My favorite solution is the option 2 but I need to be sure I understand.

 

You are telling me to take away the disk 3 and put a new drive instead of the disk 3. Once this is done I need to resync the arrray. How would I do that? Is that simple? Is it within the DSM GUI?

 

Thanks

 

 

Link to post
Share on other sites
  • 0

Everything can be done from the UI.

 

Replace the drive, then go to Storage Manager and ensure that the Storage Pool for the enclosure is still in a Degraded state.

If it shows Crashed, don't do anything else, take screenshots and report back.

 

The replacement drive should be visible in the HDD list in Storage Manager as Not Initialized

Then, from the Storage Pool window, select Action, then Repair, and select your replacement drive.

 

Wait several hours for the array to resync.  Monitor progress from Storage Manager or cat /proc/mdstat

 

When everything is done and the array is Healthy, then Fix System Partition from the Storage Manager.

Link to post
Share on other sites
  • 0
root@Syno_Main:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sata4p3[0] sata3p3[3]
      29284844544 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [U__U]

md1 : active raid1 sata3p2[2] sata4p2[3] sata2p2[1]
      2097088 blocks [4/3] [_UUU]

md0 : active raid1 sata3p1[0] sata4p1[1]
      2490176 blocks [4/2] [UU__]

unused devices: <none>


 

Link to post
Share on other sites
  • 0

Ugh. I got confused between the screenshot and the description of what happened with Drive #4. You unfortunately actually did create an array failure by replacing #4 and booting the NAS. I did ask if you had booted it and you did not answer. In any case, this means we need the "failing" drive (which hopefully is not actually failing) to be functional in order to restore redundancy.

 

Option 2 is now invalid, except as a last-ditch emergency method of recovering your data.

 

Power off your NAS, remove your spare drive, restore the original disk #3 and remove disk #4.  Set #4 aside as insurance for your data.  Then install the spare into #4 slot.

 

Then boot up the NAS and you should again see the array as Degraded and drive #4 as Not Initialized.  If that isn't the case, stop and report back.

Otherwise, repair the array per instructions.  Don't bother with the SMART Extended test for now.

Link to post
Share on other sites
  • 0

So far so good.  Ignore the disk Failing status for now, hopefully that is a red herring.

 

Either Drive 3 is going to perform and be fully functional to recreate a full parity set for Drive 4, or it will completely fail and we will still have a broken array.  Then we will go on to the Insurance drive (old #4) which has a mostly-intact parity set of your data.

 

This is going to take a long time (hours).  Don't interrupt it.  It might report errors on #3 but it will retry, let it do that.  You can monitor with cat /proc/mdstat or watch the parity consistency % increase in Storage Manager.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.