flyride

Suppress virtual disk SMART errors from /var/log/messages

Recommended Posts

One annoyance when running DSM under ESXi is that virtual disks can't properly handle its SMART interrogations. This is because Synology embedded a custom version of the smartctl binary into its own libraries and utilities, ignoring standard config files that could generate compatible queries or suppress them. The result is spurious error messages logged to /var/log/messages every few seconds, wasting disk space and SSD lifecycle, and making it hard to see what's happening. If you use virtual disks and are not familiar with this, monitor the messages logfile with the command below to see how frequently DSM tries (and fails) to query the drives.

 

# tail -f /var/log/messages

 

The problem has been around for a long time and is well-documented here. An indirect fix was discovered when the virtual disks were attached to the LSI Logic SAS dialect of the ESXi virtual SCSI controller, but this solution worked reliably only under 6.1.x.  On 6.2.x, the virtual SCSI controller tends to result in corrupted (but recoverable) arrays.

 

I recently migrated my 6.1.7 system to 6.2.3, so I had to convert my virtual SCSI controller to SATA, and of course, the logfile chatter was back. I don't really care about SMART on virtual disks (and you probably don't either) so I decided to get rid of the log messages once and for all.  Syslog-ng has a lot of capability to manage the log message stream, so I knew it was possible. The results follow:

 

We need to install two files, first a syslog-ng filter:

# ESXiSmart.conf
# edit the [bracket values] with drive slots where SMART should be suppressed
# in this example /dev/sda through /dev/sdl are suppressed

filter fs_disks { match("/sd[a-l]" value("MESSAGE")); };

filter fs_badsec { match("/exc_bad_sec_ct$" value("MESSAGE")); };
filter fs_errcnt { match("disk_monitor\.c:.*Failed\ to\ check" value("MESSAGE")); };
filter fs_tmpget { match("disk/disk_temperature_get\.c:" value("MESSAGE")); };
filter fs_health { match("disk/disk_current_health_get\.c:" value("MESSAGE")); };
filter fs_sdread { match("SmartDataRead.*read\ value\ /dev/.*fail$" value("MESSAGE")); };
filter fs_stests { match("SmartSelfTestExecutionStatusGet.*read\ value\ /dev/.*fail$" value("MESSAGE")); };
filter fs_tstget { match("smartctl/smartctl_test_status_get\.c:" value("MESSAGE")); };

filter fs_allmsgs { filter(fs_badsec) or filter(fs_errcnt) or filter(fs_tmpget) or filter(fs_health) or filter(fs_sdread) or filter(fs_stests) or filter(fs_tstget); };
filter f_smart { filter(fs_disks) and filter(fs_allmsgs); };

log { source(src); filter(f_smart); };

Save this to /usr/local/etc/syslog-ng/patterndb.d/ESXiSmart.conf

 

You will need to edit the string inside the brackets on the first "fs_disks" line to refer to those disks that should be SMART suppressed. If you want all SMART errors suppressed, just leave it as is. In my system, I have both virtual and passthrough disks, and the passthrough disks SMART correctly. So as an example, I have [ab] selected for the virtuals/dev/sda and /dev/sdb, leaving SMART log messages intact for the passthrough disks.

 

Please note that the file is extremely sensitive to syntax. A missing semicolon, slash or backslash error, or an extra space will cause syslog-ng to fail completely and you will have no logging. To make sure it doesn't suppress valid log messages, this filter matches SMART-related error messages with references to the selected disks. However, it cannot actually remove them from the log file because there is a superseding match command embedded in DSM's syslog-ng configuration.

 

The second file adds our filter to a dynamic exclusion list that DSM's syslog-ng configuration compiles from a special folder. There is only one line:

and not filter(f_smart)

Save it to /usr/local/etc/syslog-ng/patterndb.d/include/not2msg/ESXiSmart

 

Reboot to activate the new configuration, or just restart syslog-ng with this command:

 

# synoservice --restart syslog-ng

 

If you want to make sure that your syslog-ng service is working correctly, generate a test log:

 

# logger -t "test" -p error "test"

 

And then check /var/log/messages as above. If you have made no mistakes in the filter files, you should see the test log entry and the bogus SMART messages should stop. As this solution only modifies extensible structures under /usr/local, it should survive an upgrade as long as there is no major change to message syntax.

  • Thanks 1

Share this post


Link to post
Share on other sites
Posted (edited)

Thank you very much! Your instructions are working very well! :-)

 

I applied it in the same way as yours (only for supressing SMART errors on /dev/sda and /dev/sdb) and /var/log/messages is a lot quieter now!!

 

EDIT:

After enabling your fix for suppressing the SMART-Error messages I recognize a new error message which comes every minute:

 

Every minute an error will be logged in /var/log/messages:

2020-05-19T06:17:38+02:00 diskstation ovs-appctl: ovs|00001|daemon_unix|WARN|/var/run/openvswitch/ovs-vswitchd.pid: open: No such file or directory

 

But the openvswitch is not in use or even activated/configured! I never touched it or used docker on this DSM.

 

Workaround:

mkdir -p /var/run/openvswitch
touch /var/run/openvswitch/ovs-vswitchd.pid

Then the error messages are stopping instantly. :-)

 

...just for others who might get the error every minute also...

 

Edited by Balrog

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.