Proxmox guest 3617xs with "permanent errors in the file vm-disk" (ZFS)

Valk · January 11, 2023

Hey there guys! I rly need some idea on how to recover VM 3671xs on Proxmox ZFS. So my setup is 2 ZFS mirror SSDs for Proxmox and VMs and 5 4tb HDDs disks in raidz1 pool. My 3617xs VM gets 50gib from mirror pool as volume1 and whole 8.8Tb from hdd pool (what a scam from zfs btw to get only 8.8tb usable space from 5 4tb disks ).

I can't perform a VM backup from Proxmox and sometimes my volumes crash inside VM and so far I was able to convert them back to read-write but it's not for long).

Spoiler

INFO: Starting Backup of VM 3617 (qemu)
INFO: Backup started at 2023-01-08 08:06:15
INFO: status = running
INFO: VM Name: 3617
INFO: include disk 'sata0' 'local-zfs:vm-3617-disk-0' 50G
INFO: exclude disk 'sata1' 'hddzfs:vm-3617-disk-0' (backup=no)
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-3617-2023_01_08-08_06_15.vma.zst'
INFO: started backup task 'cb27f70b-9ab0-47e5-a510-38d13c8a254a'
INFO: resuming VM again
INFO: 0% (348.0 MiB of 50.0 GiB) in 3s, read: 116.0 MiB/s, write: 73.6 MiB/s
INFO: 2% (1.1 GiB of 50.0 GiB) in 6s, read: 245.3 MiB/s, write: 75.4 MiB/s
INFO: 3% (1.9 GiB of 50.0 GiB) in 10s, read: 225.7 MiB/s, write: 68.9 MiB/s
INFO: 8% (4.0 GiB of 50.0 GiB) in 13s, read: 708.1 MiB/s, write: 76.3 MiB/s
INFO: 8% (4.2 GiB of 50.0 GiB) in 14s, read: 184.0 MiB/s, write: 65.0 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 3617 failed - job failed with err -5 - Input/output error
INFO: Failed at 2023-01-08 08:06:29
INFO: Backup job finished with errors
TASK ERROR: job errors

here is zpool status -v

Spoiler

pool: hddzfs
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 16K in 05:09:34 with 3 errors on Sun Jan 8 05:33:35 2023
config:

NAME STATE READ WRITE CKSUM
hddzfs ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-HGST_HUS724040ALA640_PN2334PCKVV28B ONLINE 0 0 0
ata-HGST_HMS5C4040BLE640_PL2331LAH496HJ ONLINE 0 0 0
ata-WDC_WD40PURZ-85TTDY0_WD-WCC7K4NYNNP4 ONLINE 0 0 0
ata-WDC_WD40PURZ-85TTDY0_WD-WCC7K6EKUF02 ONLINE 0 0 0
ata-WDC_WD4000FYYZ-01UL1B2_WD-WMC130F0YDNE ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

hddzfs/vm-3617-disk-0:<0x1>

pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 1.62M in 00:03:17 with 2 errors on Wed Jan 11 08:16:30 2023
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-Micron_5100_MTFDDAK480TBY_172117438175-part3 ONLINE 0 0 10
ata-Micron_5100_MTFDDAK480TBY_17211743822A-part3 ONLINE 0 0 15

errors: Permanent errors have been detected in the following files:

rpool/data/vm-3617-disk-0:<0x1>

What can I do from inside of 3617 VM to find the bad apples and maybe remove some files which give those errors to ZFS? Or it's a lost cause and I doomed to restart the whole NAS VM from scratch and that is such a pain in my circumstances...

Valk · January 11, 2023

My thinking was like SSH: unmount volumes and do e2fsck or badlocks and go from there but I don't actually know how to unmount volume1 and volume2 and don't fck up the DSM. Any guide or similar thread maybe?

Sign In

Proxmox guest 3617xs with "permanent errors in the file vm-disk" (ZFS)

Recommended Posts

Valk

Link to comment

Share on other sites

Valk

Link to comment

Share on other sites

Join the conversation

Forums

What's new

MUST READ

Members