Pages : 1
#1 Le 25/09/2023, à 13:39
- had83
Raid 5 avec mdadm, superblock corrompu
J'ai depuis un an monté un serveur nas maison sous ubuntu. Il est équipé de 3 disques de stockages, et le raid 5 est créé via mdadm.
Je suis coincé depuis quelques mois sur un soucis, et je suis venu à bout de toutes mes idées.
Je n'arrive plus à accéder au système de fichiers. Pourtant mdadm indique que les Superblocks sont persistents, mais quand j'analyse chaque disque, on m'indique que les superblocs sont défectueux et qu'il faut que je trouve un superblock de backup.
Bien évidemment, c'est tombé la semaine où je comptais mettre en place des backups réguliers automatisées des données…
Si quelqu'un sait comment me sortir de cette affaire ou m'indiquer d'autres manipulation à tester, je suis preneur.. merci beaucoup !
Je met ici quelques informations:
➜ sudo mdadm --detail /dev/md127
Version : 1.2
Raid Level : raid5
Total Devices : 3
Persistence : Superblock is persistent
State : inactive
Working Devices : 3
Name : nas-server:127 (local to host nas-server)
UUID : d72e47b6:c65a6bf7:618698a3:32a8dbcb
Events : 224
Number Major Minor RaidDevice
- 8 1 - /dev/sda1
- 8 33 - /dev/sdc1
- 8 17 - /dev/sdb1
➜ cat /proc/mdstat*
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdb1[0](S) sda1[3](S) sdc1[1](S)
11720656342 blocks super 1.2
unused devices: <none>
➜ sudo smartctl -s on -a /dev/sda1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-33-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke,
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Serial Number: ZTT4KNE7
LU WWN Device Id: 5 000c50 0e4485d83
Firmware Version: 0001
User Capacity: 4000787030016 bytes [4,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 25 13:37:53 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART Enabled.
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 483) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30a5) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 100 100 006 Pre-fail Always - 9657
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 5
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 045 Pre-fail Always - 598
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1139 (220 231 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 236
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 253 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 066 066 040 Old_age Always - 34 (Min/Max 29/34)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 5
194 Temperature_Celsius 0x0022 034 040 000 Old_age Always - 34 (0 29 0 0 0)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 9657
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 0h+27m+56.905s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 0
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 9657
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
➜ sudo smartctl -s on -a /dev/sdb1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-33-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke,
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Serial Number: ZTT4KPPZ
LU WWN Device Id: 5 000c50 0e448366a
Firmware Version: 0001
User Capacity: 4000787030016 bytes [4,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 25 13:38:10 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART Enabled.
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 484) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30a5) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 100 064 006 Pre-fail Always - 33775
3 Spin_Up_Time 0x0003 096 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 221
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 076 060 045 Pre-fail Always - 44446422
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1140 (163 65 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 221
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 1 1 1
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 067 058 040 Old_age Always - 33 (Min/Max 28/33)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 34
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 692
194 Temperature_Celsius 0x0022 033 042 000 Old_age Always - 33 (0 21 0 0 0)
195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 33775
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 587h+23m+24.989s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1236796324
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 118836248678
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
➜ sudo smartctl -s on -a /dev/sdc1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-33-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke,
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Serial Number: ZTT4K2CT
LU WWN Device Id: 5 000c50 0e4481263
Firmware Version: 0001
User Capacity: 4000787030016 bytes [4,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Sep 25 13:38:37 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
SMART Enabled.
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 495) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x30a5) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail Always - 177528038
3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 232
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always - 45671672
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1138 (15 82 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 232
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 066 056 040 Old_age Always - 34 (Min/Max 29/34)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 47
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 657
194 Temperature_Celsius 0x0022 034 044 000 Old_age Always - 34 (0 22 0 0 0)
195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age Always - 177528038
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 562h+13m+18.989s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 25777854319
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 93954151338
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Hors ligne
#2 Le 26/09/2023, à 18:18
- geole
Re : Raid 5 avec mdadm, superblock corrompu
Peux-tu donner le retour de
sudo fsck /dev/md127
et si pas correct, de celui-ci
sudo fsck -y -b 32768 /dev/md127
Dernière modification par geole (Le 28/09/2023, à 11:07)
Les grilles de l'installateur … _subiquity
"gedit admin:///etc/fstab" est proscrit, utilisez "pkexec env DISPLAY=$DISPLAY XAUTHORITY=$XAUTHORITY xdg-open /etc/fstab" Voir
Les partitions EXT4 des disques externes => … #p22697248
Hors ligne
#3 Le 26/09/2023, à 18:54
- had83
Re : Raid 5 avec mdadm, superblock corrompu
Bonjour geole !
Merci pour cette réponse, voici les informations demandées :
➜ sudo fsck /dev/md127
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/md127
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
e2fsck -b 32768 <device>
➜ sudo fsck -y -b 32768 /dev/md127
fsck from util-linux 2.37.2
e2fsck 1.46.5 (30-Dec-2021)
fsck.ext2: Bad magic number in super-block while trying to open /dev/md127
The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem. If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
e2fsck -b 32768 <device>
Dans l'espoir que cela te parle...
Hors ligne
#4 Le 27/09/2023, à 15:44
- geole
Re : Raid 5 avec mdadm, superblock corrompu
On va essayer d'autres superblocs de secours. Voici le début de la liste.
Superblocs de secours stockés sur les blocs :
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 10240000
On sait que le premier a déjà été fait sans succès
Si aucun ne donne de résultat, je ne vois pas trop l'intérêt d'en utiliser d'autres sauf si tu le demandes. Il faudra regarder testdisk voir photorec
On va quand même regarder si la trace dit quelque chose
journalctl --no-pager -b -g md127
Les grilles de l'installateur … _subiquity
"gedit admin:///etc/fstab" est proscrit, utilisez "pkexec env DISPLAY=$DISPLAY XAUTHORITY=$XAUTHORITY xdg-open /etc/fstab" Voir
Les partitions EXT4 des disques externes => … #p22697248
Hors ligne
#5 Le 27/09/2023, à 16:43
- MicP
Re : Raid 5 avec mdadm, superblock corrompu
/dev/sda 7 Seek_Error_Rate 0x000f 100 253 045 Pre-fail Always - 598 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1139 (220 231 0) 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 236 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 9657 /dev/sdb 7 Seek_Error_Rate 0x000f 076 060 045 Pre-fail Always - 44446422 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1140 (163 65 0) 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 221 195 Hardware_ECC_Recovered 0x001a 100 064 000 Old_age Always - 33775 /dev/sdc 7 Seek_Error_Rate 0x000f 077 060 045 Pre-fail Always - 45671672 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1138 (15 82 0) 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 232 195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age Always - 177528038
Je ne suis pas un spécialiste de l'analyse des données SMART,
mais je trouve quand même qu'il y a beaucoup d'erreurs qui pourraient avoir été causées,
entre autres possibilités, par une alimentation ou/et une connectique défaillante.
Je laisse à ceux qui savent le soin d'interpréter ces retours.
Dernière modification par MicP (Le 27/09/2023, à 17:03)
Hors ligne
#6 Le 28/09/2023, à 11:17
- geole
Re : Raid 5 avec mdadm, superblock corrompu
Bonjour MicP.
Ces anomalies n'empêchent pas l'exécution d'un FSCK, tout au plus un ralentissement du fonctionnement.
Par exemple
Les Seek_Error_Rate sont respectivement 100, 076 et 077 alors que l'alerte est fixée à 045.
Les autres compteurs ne font pas l'objet de suivi d'alerte.
Hardware_ECC_Recovered Sont respectivement 100,100 et 082 alors que le seuil est fixé à 000.
Dernière modification par geole (Le 28/09/2023, à 11:19)
Les grilles de l'installateur … _subiquity
"gedit admin:///etc/fstab" est proscrit, utilisez "pkexec env DISPLAY=$DISPLAY XAUTHORITY=$XAUTHORITY xdg-open /etc/fstab" Voir
Les partitions EXT4 des disques externes => … #p22697248
Hors ligne
#7 Le 28/09/2023, à 14:34
- MicP
Re : Raid 5 avec mdadm, superblock corrompu
Bonjour geole
Sur tous les disques que j'ai eu, la valeur brute de l'attribut 195 est toujours restée à zéro.
Mais j'ai parcouru plusieurs posts concernant l'attribut 195 de disques ST4000DM004
et on retrouve aussi cette augmentation rapide de la valeur RAW de l'attribut 195.
Donc, c'est peut-être un comportement "normal" pour ce type de disque.
Hors ligne
#8 Le 29/09/2023, à 08:12
- had83
Re : Raid 5 avec mdadm, superblock corrompu
Merci pour toutes ces informations.
J'ai en amont de mon NAS un onduleur afin d'éviter les coupures. Le NAS n'est pas allumé en permanence par contre.
J'ai tenté les commandes proposées mais sans succès donc je suis en train d'utiliser Photorec, je récupère une partie des données actuellement. Je fais passer plusieurs fois le script et un script maison de tri qui copie tout sur un cloud actuellement. J'espère récupérer une bonne partie tout de même. J'avais déjà tenté avec foremost il y a quelques semaines mais sans grand succès, beaucoup de fichiers étaient corrompus. Avec photorec c'est pas mal la !
Merci à vous encore !
Hors ligne
Pages : 1