Contenu | Rechercher | Menus

Annonce

DVD, clés USB et t-shirts Ubuntu-fr disponibles sur la boutique En Vente Libre

Si vous avez des soucis pour rester connecté, déconnectez-vous puis reconnectez-vous depuis ce lien en cochant la case
Me connecter automatiquement lors de mes prochaines visites.

À propos de l'équipe du forum.

#1 Le 31/10/2022, à 16:26

valenthildette

[Résolu] RAID 5 cassé sur NAS OMV

Bonjour,
J'ai eu un souci  hier, alors qu'il tourne depuis des années, mon RAID 5 (5 disques 2 To) a cessé de fonctionner brutalement, en plein milieu de la copie d'un fichier.
Comme ça m'arrive de temps en temps, j'ai redémarré la VM mais à la reconnexion plus de file system dans OMV.
le md0 avait changé de UUID (plus le même que celui attendu par OMV). Déjà ça c'est bizarre !

meme si je n'arrive pas à le remettre en route "comme avant" aavec les partages et tout, j'aimerais bien arriver à le monter en lecture seule pour récupérer un max de fichiers dedans (surtout les photos).

j'ai tenté de le reconstruire avec mdadm mais :
- parfois il me dit qu'il n'y a pas assez de disques pour reconstruire
- parfois il me crée un nouveau md127 avec juste 3 disques (inutilisable)
- parfois il me dit que les disques dont dans le raid, parfois non

je bidouille un peu mais je ne suis pas très bon quand même smile

Pouvez-vous svp m'aider à tenter de récupérer les données ?

Merci d'avance pour votre patience et votre compréhension smile

Dernière modification par valenthildette (Le 05/11/2022, à 09:28)


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#2 Le 31/10/2022, à 17:42

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour.
Je ne connais pas bien ce que tu peux faire via cet OMV et quelles applications sont à ta disposition.
Je vois que tu peux utiliser MDADM
Ma question serait de savoir si tu peux installer/utiliser l'application smartmontools  et poster un rapport smartctl pour chacun de tes cinq disques.
Cela permettra de connaitre leur état et probablement de comprendre pourquoi cela ne peut pas repartir.
Quelquefois, il faut les déconnecter du nas pour les brancher directement dans ubuntu.
Si cela doit être le cas, commence par un seul.

Il faut savoir qu'un disque qui a quelques secteurs illisibles est éjecté au redémarrage.
J'ai noté que tu sais qu'avec deux disques éjectés, le nas est inutilisable.

Dernière modification par geole (Le 31/10/2022, à 17:51)

En ligne

#3 Le 01/11/2022, à 08:22

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour Geole, merci de t'intéresser au pb smile
OMV c'est OpenMediaVault une distro basée sur Debian pour les NAS diy.
je n'ai pas pu lancer les smartctl depuis la VM, je les ai donc lancés depuis l'hôte.
D'ailleurs, ça me donne une idée, je peux peut-être essayer d'assembler le RAID de puis l'hôte, qu'en penses-tu ?
sur l'hôte sdb correspond à vda sur la VM, ..., sdf à vde

voici les rapports :

root@pve:~# smartctl --all /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.60-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRM7ZS
LU WWN Device Id: 5 000cca 221ca4874
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Nov  1 08:05:42 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (22477) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 375) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   131   131   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   204   204   024    Pre-fail  Always       -       478 (Average 231)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       349
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       39
  9 Power_On_Hours          0x0012   091   091   000    Old_age   Always       -       64409
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       334
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1724
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1724
194 Temperature_Celsius     0x0002   157   157   000    Old_age   Always       -       38 (Min/Max 19/66)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~#
root@pve:~# smartctl --all /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.60-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRN1KS
LU WWN Device Id: 5 000cca 221ca4b6e
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Nov  1 08:07:09 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (21889) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 365) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       1
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       98
  3 Spin_Up_Time            0x0007   167   167   024    Pre-fail  Always       -       518 (Average 350)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       336
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   110   110   020    Pre-fail  Offline      -       40
  9 Power_On_Hours          0x0012   091   091   000    Old_age   Always       -       64428
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       324
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1749
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1749
194 Temperature_Celsius     0x0002   133   133   000    Old_age   Always       -       45 (Min/Max 19/74)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~#
root@pve:~# smartctl --all /dev/sdd
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.60-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRJ0TS
LU WWN Device Id: 5 000cca 221ca3c52
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Nov  1 08:08:06 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (21595) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 360) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   098   098   016    Pre-fail  Always       -       5
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       97
  3 Spin_Up_Time            0x0007   163   163   024    Pre-fail  Always       -       519 (Average 369)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       201
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       68
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       39
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       22220
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       189
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       749
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       749
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Min/Max 19/66)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       68
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 2
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 15313 hours (638 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 16 ea 24 a2 00  Error: UNC at LBA = 0x00a224ea = 10626282

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 40 e8 48 a5 a6 40 08  16d+22:27:01.879  READ FPDMA QUEUED
  60 40 a8 68 95 a6 40 08  16d+22:27:01.877  READ FPDMA QUEUED
  60 b0 10 e8 7a a6 40 08  16d+22:27:01.822  READ FPDMA QUEUED
  60 40 b0 a8 75 a6 40 08  16d+22:27:01.822  READ FPDMA QUEUED
  60 40 40 b8 6d a6 40 08  16d+22:27:01.767  READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 15313 hours (638 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 66 ea 24 a2 00  Error: UNC at LBA = 0x00a224ea = 10626282

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 40 48 a8 96 a2 40 08  16d+22:26:34.278  READ FPDMA QUEUED
  60 10 40 98 93 a2 40 08  16d+22:26:34.167  READ FPDMA QUEUED
  60 c0 80 d8 90 a2 40 08  16d+22:26:34.166  READ FPDMA QUEUED
  60 40 78 98 8b a2 40 08  16d+22:26:34.102  READ FPDMA QUEUED
  60 40 70 a8 83 a2 40 08  16d+22:26:33.991  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     14928         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~#
root@pve:~# smartctl --all /dev/sde
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.60-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRMSSS
LU WWN Device Id: 5 000cca 221ca4a5d
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Nov  1 08:09:23 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (22477) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 375) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       106
  3 Spin_Up_Time            0x0007   172   172   024    Pre-fail  Always       -       416 (Average 426)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       305
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       104
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   110   110   020    Pre-fail  Offline      -       40
  9 Power_On_Hours          0x0012   089   089   000    Old_age   Always       -       81910
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       293
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1472
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1472
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Min/Max 17/63)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       104
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~#
root@pve:~# smartctl --all /dev/sdf
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.60-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRM94S
LU WWN Device Id: 5 000cca 221ca4898
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Nov  1 08:10:00 2022 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (21448) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 358) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       65537
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       99
  3 Spin_Up_Time            0x0007   196   196   024    Pre-fail  Always       -       498 (Average 239)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       334
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   110   110   020    Pre-fail  Offline      -       40
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       61738
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       310
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1750
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1750
194 Temperature_Celsius     0x0002   133   133   000    Old_age   Always       -       45 (Min/Max 20/75)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       1
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 6 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 occurred at disk power-on lifetime: 11446 hours (476 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 21 08 64 00  Error: UNC at LBA = 0x00640821 = 6555681

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 58 30 06 64 40 08  27d+16:43:08.683  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 08  27d+16:43:08.681  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08  27d+16:43:08.680  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08  27d+16:43:08.672  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  27d+16:43:08.671  SET FEATURES [Set transfer mode]

Error 5 occurred at disk power-on lifetime: 11446 hours (476 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 21 08 64 00  Error: UNC at LBA = 0x00640821 = 6555681

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 48 30 06 64 40 08  27d+16:42:51.694  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 08  27d+16:42:51.693  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08  27d+16:42:51.692  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08  27d+16:42:51.684  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  27d+16:42:51.683  SET FEATURES [Set transfer mode]

Error 4 occurred at disk power-on lifetime: 11446 hours (476 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 21 08 64 00  Error: UNC at LBA = 0x00640821 = 6555681

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 30 06 64 40 08  27d+16:42:34.707  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 08  27d+16:42:34.706  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08  27d+16:42:34.705  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08  27d+16:42:34.696  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  27d+16:42:34.695  SET FEATURES [Set transfer mode]

Error 3 occurred at disk power-on lifetime: 11446 hours (476 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 21 08 64 00  Error: UNC at LBA = 0x00640821 = 6555681

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 38 30 06 64 40 08  27d+16:42:17.719  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 08  27d+16:42:17.718  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08  27d+16:42:17.717  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08  27d+16:42:17.708  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  27d+16:42:17.707  SET FEATURES [Set transfer mode]

Error 2 occurred at disk power-on lifetime: 11446 hours (476 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 0f 21 08 64 00  Error: UNC at LBA = 0x00640821 = 6555681

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 30 30 06 64 40 08  27d+16:42:00.731  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 08  27d+16:42:00.730  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08  27d+16:42:00.729  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08  27d+16:42:00.720  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08  27d+16:42:00.720  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     28451         -
# 2  Short offline       Completed without error       00%     28427         -
# 3  Short offline       Completed without error       00%     28412         -
# 4  Short offline       Completed without error       00%     28331         -
# 5  Short offline       Completed without error       00%     28307         -
# 6  Short offline       Completed without error       00%     28283         -
# 7  Short offline       Completed without error       00%     28259         -
# 8  Short offline       Completed without error       00%     28235         -
# 9  Short offline       Completed without error       00%     28211         -
#10  Short offline       Completed without error       00%     28187         -
#11  Short offline       Completed without error       00%     28163         -
#12  Short offline       Completed without error       00%     28139         -
#13  Short offline       Completed without error       00%     28115         -
#14  Short offline       Completed without error       00%     28091         -
#15  Short offline       Completed without error       00%     28067         -
#16  Short offline       Completed without error       00%     28043         -
#17  Short offline       Completed without error       00%     28019         -
#18  Short offline       Completed without error       00%     27995         -
#19  Short offline       Completed without error       00%     27971         -
#20  Short offline       Completed without error       00%     27947         -
#21  Short offline       Completed without error       00%     27923         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@pve:~#

désolé c'est un peu long wink


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#4 Le 01/11/2022, à 09:35

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour.
Pour ce premier disque, il de très bonne qualité.
Le nombre de lignes retournées est normal.
Tu peux donner les autres ...

Pour le réassemblage depuis ubuntu, c'est possible
Mais en premier
Il faut finir les rapports smartclt
En second, il faudra lister l'état du raids pour connaitre la version utilisée car si ce n'est pas la dernière, il faudra le préciser au moment du réassemblage.
Si tu connais bien les commandes,  c'est parfait.
Sinon, j'ouvrirais mon aide-mémoire.

En ligne

#5 Le 01/11/2022, à 09:38

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour.
Pour ce premier disque, il de très bonne qualité.
Le nombre de lignes retournées est normal.
Tu peux donner les autres ...

Pour le réassemblage depuis ubuntu, c'est possible
Mais en premier
Il faut finir les rapports smartclt
En second, il faudra lister l'état du raids pour connaitre la version utilisée car si ce n'est pas la dernière, il faudra le préciser au moment du réassemblage.
Si tu connais bien les commandes,  c'est parfait.
Sinon, j'ouvrirais mon aide-mémoire.
Ajout.
Je viens de voir que tu as mis tous les retours dans un même paquet.
Je retourne regarder les quatre autres disques.

En ligne

#6 Le 01/11/2022, à 09:45

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Tu peux te préparer à acheter deux disques et à lire la documentation ddrescue
https://doc.ubuntu-fr.org/ddrescue
pour les dupliquer.
ce sont sdd et sde.
Je vais mettre les extraits.

 smartctl --all /dev/sdd
=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRJ0TS
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   098   098   016    Pre-fail  Always       -       5
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       68
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       22220
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       68
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       3
root@pve:~# smartctl --all /dev/sde
=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 7K2000
Device Model:     Hitachi HDS722020ALA330
Serial Number:    JK1171YAGRMSSS
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       104
  9 Power_On_Hours          0x0012   089   089   000    Old_age   Always       -       81910
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       104
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       2

Nota. Ne pas se fier à la très faible quantité de secteurs illisibles ( 3 et 2)  , Cela peut augmenter extrêmement rapidement lorsqu on duplique la totalité du disque.

Dernière modification par geole (Le 01/11/2022, à 10:12)

En ligne

#7 Le 01/11/2022, à 10:47

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

Merci geole smile
est-ce que je peux utiliser une autre machine pour le ddrescue ?
P.ex. je démonte sdd, je le mets avec un disque neuf dans une autre machine, je "piste" les nouveaux noms des disques (/dev/sd<x> /dev/sd<y>) et je lance
ddrescue   /dev/sd<x>    /dev/sd<y>    /home/documents/log

questions :
- comment je m'assure de la position des disques pour ne pas bêtement copier le neuf sur l'ancien roll ?
- est-ce que je peux faire "disque à disque" (la doc parle de fichier.img) ?
- quelles options utiliser (-n me parait indispensable) ?
dans les exemples ils utilisent des options -d -R -f -c1 que je n'ai pas vues ailleurs... peut-être dans le man ?


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#8 Le 01/11/2022, à 11:54

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Avec plein de disques que tu montes,  les noms peuvent changer à chaque montage, dans ce contexte, il fait utiliser l'identification unique

ls -als /dev/disk/by-id

devrait donner les valeurs
et  je 'ai déjà mis les deux  id des deux disques défaillants:   JK1171YAGRJ0TS et   JK1171YAGRMSSS

Je propose une copie disque à disque  si tu remplaces par des disques de même taille.
Tu peux peux faire sur les machines que tu veux. Tu peux même lancer en parallèle.
donc le début

ddrescue -n  /dev/disk/by-id/*_JK1171YAGRJ0TS    /dev/disk/by-id/*<id>    /home/documents/log-SDD
ddrescue  -n  /dev/disk/by-id/*_JK1171YAGRMSSS   /dev/disk/by-id/*<id>    /home/documents/log-SDE

Si tu remplaces par des disques de taille plus grande et que les disques ont des partitions, il est préférable de dupliquer par partition après avoir avoir créé les nouvelles partitions.
Si tu le peux, prends des disques Rotation Rate:    7200 rpm
car sinon, ce sont les plus lents qui gagnent.

Hors de question de copier temporairement sur des fichiers de taille  à 2TO

Le MAN explique les options.
Les exemples décomposent en deux:
D'abord la copie de ce qui est facile à lire.
Puis la copie de ce qui est difficile   en montrant quelques valeurs de paramètres.

Si on ne met rien, la commande prend des paramètres par défaut dont la taille du bloc = 512.  Ce qui est ton contexte.

Dernière modification par geole (Le 01/11/2022, à 12:04)

En ligne

#9 Le 01/11/2022, à 14:17

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

encore merci pour ta patience vraiment appréciée wink
reste à se procurer les disques et lancer le clonage.
Je reviens dans quelques jours avec le résultat.
Tu voudras que je publie les logs avant de continuer ?

Au passage, je viens d'apprendre encore un truc grâce à toi, le UUID contient le S/N du disque, je l'ignorais grave tongue


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#10 Le 01/11/2022, à 14:42

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Reviens au moment de la duplication
Car  c'est toujours plein d'imprévus.

En ligne

#11 Le 01/11/2022, à 15:32

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

c'est en cours pour le premier disque.
j'ai dû utiliser l'option --force pour qu'il accepte de copier sur le dd entier.
encore +4H de patience, pour le moment...

on peut pas poster de capture d'écran ?


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#12 Le 01/11/2022, à 16:06

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Pour le suivi,  il faut que tu ouvres un autre terminal  et que tu frappes

ddrescuelog  -tvv /home/documents/log-SDD

En ligne

#13 Le 01/11/2022, à 17:37

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

le terminal principal est plus verbeux que le log smile


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#14 Le 01/11/2022, à 17:49

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Dans l'instance qui tourne, c'est réactualisé en permanence, il n'est pas possible de faire une capture.
La commande en question prend un instantané que tu peux transmettre par copier/coller

a@p:~$ ddrescuelog  -tvv trace

trace
   current pos:        0 B,  current status: finished
mapfile extent:    15929 B,  in      1 area(s)

     non-tried:        0 B,  in      0 area(s)  (  0%)
       rescued:    15929 B,  in      1 area(s)  (100%)
   non-trimmed:        0 B,  in      0 area(s)  (  0%)
   non-scraped:        0 B,  in      0 area(s)  (  0%)
    bad-sector:        0 B,  in      0 area(s)  (  0%)
a@p:~$ 

Dernière modification par geole (Le 01/11/2022, à 21:23)

En ligne

#15 Le 01/11/2022, à 20:54

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

premier disque fini avec succès smile

/home/eric/Documents/log-SSS
   current pos:    2000 GB,  current status: finished
mapfile extent:    2000 GB,  in      1 area(s)

     non-tried:        0 B,  in      0 area(s)  (  0%)
       rescued:    2000 GB,  in      1 area(s)  (100%)
   non-trimmed:        0 B,  in      0 area(s)  (  0%)
   non-scraped:        0 B,  in      0 area(s)  (  0%)
    bad-sector:        0 B,  in      0 area(s)  (  0%)

Demain je vais au taf, j'essaye de lancer le 2° avant de partir...


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#16 Le 01/11/2022, à 20:56

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

et l'écran principal :

root@eric-MS-7597:/home/eric# ddrescue -n --force /dev/disk/by-id/*GRMSSS /dev/disk/by-id/*51P4 /home/eric/Documents/log-SSS
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:  289689 MB, non-trimmed:        0 B,  current rate:    123 MB/s
     ipos:    2000 GB, non-trimmed:        0 B,  current rate:  38625 kB/s
     opos:    2000 GB, non-scraped:        0 B,  average rate:  96066 kB/s
non-tried:        0 B,  bad-sector:        0 B,    error rate:       0 B/s
  rescued:    2000 GB,   bad areas:        0,        run time:  5h 47m  2s
pct rescued:  100.00%, read errors:        0,  remaining time:         n/a
                              time since last successful read:         n/a
Finished

AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#17 Le 01/11/2022, à 21:31

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Pour une fois, ce fut ultra-rapide.
Peux-tu refaire un rapport smartctl du disque émetteur.

En ligne

#18 Le 02/11/2022, à 07:24

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

hello,
Arf! j'ai vu ton message trop tard, j'ai déjà mis les 2 autres dans le four smile
La suite ce soir... ou demain matin smile


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#19 Le 02/11/2022, à 20:57

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

voici le rapport de la 2° copie :

root@eric-MS-7597:/home/eric# ddrescue -n --force /dev/disk/by-id/*0TS /dev/disk/by-id/*331 /home/eric/Documents/log-0TS
GNU ddrescue 1.23
Press Ctrl-C to interrupt
     ipos:  915934 MB, non-trimmed:        0 B,  current rate:   17749 B/s
     opos:  915934 MB, non-scraped:     3072 B,  average rate:  96381 kB/s
non-tried:        0 B,  bad-sector:     1024 B,    error rate:     170 B/s
  rescued:    2000 GB,   bad areas:        2,        run time:  5h 45m 55s
pct rescued:   99.99%, read errors:        3,  remaining time:          1s
                              time since last successful read:          0s
Finished                                      

et aussi le smartctl du disque source :

root@eric-MS-7597:/home/eric# smartctl -A /dev/disk/by-id/*0TS
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       1
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       98
  3 Spin_Up_Time            0x0007   137   137   024    Pre-fail  Always       -       533 (Average 519)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       205
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       68
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   114   114   020    Pre-fail  Offline      -       38
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       22256
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       192
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       753
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       753
194 Temperature_Celsius     0x0002   136   136   000    Old_age   Always       -       44 (Min/Max 19/66)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       68
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

et celui du disque de destination :

root@eric-MS-7597:/home/eric# smartctl -A /dev/disk/by-id/*331
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-41-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   199   194   051    Pre-fail  Always       -       133
  3 Spin_Up_Time            0x0027   174   169   021    Pre-fail  Always       -       6266
  4 Start_Stop_Count        0x0032   075   075   000    Old_age   Always       -       25382
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       22834
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       16
 16 Total_LBAs_Read         0x0022   002   198   000    Old_age   Always       -       117410437019
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25373
194 Temperature_Celsius     0x0022   111   106   000    Old_age   Always       -       39
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       30

est-ce que tu veux toujours le smartctl du 1er disque ?


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#20 Le 02/11/2022, à 22:21

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonsoir.
C'était juste par curiosité afin de voir comment il avait supporté afin de voir ce qu'on pouvait en faire
La premiere duplication montrait que tout avait été récupéré.
Ce qui ne semble pas le cas de la seconde duplication
bad area 2 avec current pending secteurs 4
Cela vaudrait le coup de relancer sans le paramêtre -n et avec le paramètre -r 99

Dernière modification par geole (Le 02/11/2022, à 22:33)

En ligne

#21 Le 03/11/2022, à 07:09

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

Hello,

effectivement, ça valait le coup !

root@eric-MS-7597:/home/eric# ddrescue -r 99 --force /dev/disk/by-id/*0TS /dev/disk/by-id/*331 /home/eric/Documents/log-0TS
GNU ddrescue 1.23
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 2000 GB, tried: 4096 B, bad-sector: 1024 B, bad areas: 2

Current status
     ipos:  915934 MB, non-trimmed:        0 B,  current rate:    3584 B/s
     opos:  915934 MB, non-scraped:        0 B,  average rate:       7 B/s
non-tried:        0 B,  bad-sector:        0 B,    error rate:       0 B/s
  rescued:    2000 GB,   bad areas:        0,        run time:      9m 12s
pct rescued:  100.00%, read errors:      195,  remaining time:         n/a
                              time since last successful read:         n/a
Finished                                    

et le log :

root@eric-MS-7597:/home/eric# ddrescuelog  -tvv /home/eric/Documents/log-0TS 

/home/eric/Documents/log-0TS
   current pos:  915934 MB,  current status: finished
mapfile extent:    2000 GB,  in      1 area(s)

     non-tried:        0 B,  in      0 area(s)  (  0%)
       rescued:    2000 GB,  in      1 area(s)  (100%)
   non-trimmed:        0 B,  in      0 area(s)  (  0%)
   non-scraped:        0 B,  in      0 area(s)  (  0%)
    bad-sector:        0 B,  in      0 area(s)  (  0%)

quelle est la prochaine étape STP?


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#22 Le 03/11/2022, à 13:44

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour
9 minutes pour copier 2 secteurs, il a du s'y reprendre  pas mal de fois.
Je n'ai pas compris la panne de ton NAS.
A mon avis cela vaut le coup de retenter de remettre en route le NAS
Je ne connais du tout son pilotage.
Sinon, avec l'application MDADM, on peut tenter quelque chose en commençant par faire un point

sudo parted -l
sudo lsblk -o SIZE,NAME,FSTYPE,LABEL,MOUNTPOINT

et cette commande à adapter.

sudo mdadm --examine /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1  | egrep -i 'Update Time|Device Role' 

Dernière modification par geole (Le 03/11/2022, à 14:06)

En ligne

#23 Le 04/11/2022, à 10:24

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour geole,

désolé  pour le délai de réponse.

voici ce que donne :
parted -l

root@vm-omv:~# parted -l
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 34,4GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type      File system     Flags
 1      1049kB  33,3GB  33,3GB  primary   ext4            boot
 2      33,3GB  34,4GB  1022MB  extended
 5      33,3GB  34,4GB  1022MB  logical   linux-swap(v1)


Model: Virtio Block Device (virtblk)
Disk /dev/vdd: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  2000GB  2000GB  primary  ntfs         raid


Error: /dev/vdb: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Warning: Unable to open /dev/sr0 read-write (Système de fichiers accessible en
lecture seulement).  /dev/sr0 has been opened read-only.
Model: QEMU QEMU DVD-ROM (scsi)
Disk /dev/sr0: 775MB
Sector size (logical/physical): 2048B/2048B
Partition Table: mac
Disk Flags:

Number  Start   End     Size    File system  Name   Flags
 1      2048B   6143B   4096B                Apple
 2      3652kB  6371kB  2720kB               EFI


Error: /dev/vde: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vde: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Error: /dev/vdc: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Error: /dev/vda: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

lsblk -o SIZE,NAME,FSTYPE,LABEL,MOUNTPOINT

root@vm-omv:~# lsblk -o SIZE,NAME,FSTYPE,LABEL,MOUNTPOINT
 SIZE NAME   FSTYPE            LABEL                         MOUNTPOINT
  32G sda
  31G ├─sda1 ext4                                            /
   1K ├─sda2
 975M └─sda5 swap                                            [SWAP]
 739M sr0    iso9660           openmediavault 20210709-09:01
 1,8T vda    linux_raid_member nas:md0
 1,8T vdb    linux_raid_member nas:md0
 1,8T vdc    linux_raid_member nas:md0
 1,8T vdd    linux_raid_member nas:md0
 1,8T └─vdd1
 1,8T vde    linux_raid_member nas:md0
root@vm-omv:~#

et enfin mdadm --examine /dev/vda /dev/vdb /dev/vdc /dev/vdd /dev/vde  | egrep -i 'Update Time|Device Role'

root@vm-omv:~# mdadm --examine /dev/vda /dev/vdb /dev/vdc /dev/vdd /dev/vde  | egrep -i 'Update Time|Device Role'
mdadm: Unknown keyword INACTIVE-ARRAY
    Update Time : Sun Oct 30 14:14:45 2022
   Device Role : Active device 0
    Update Time : Sun Oct 30 14:14:45 2022
   Device Role : Active device 1
    Update Time : Sun Oct 30 14:14:45 2022
   Device Role : Active device 4
    Update Time : Fri Jan 14 02:23:48 2022
   Device Role : Active device 2
    Update Time : Sun Oct 30 14:14:45 2022
   Device Role : Active device 3
root@vm-omv:~#

àa n'a pas l'air bien brillant wink


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#24 Le 04/11/2022, à 16:07

valenthildette

Re : [Résolu] RAID 5 cassé sur NAS OMV

bon, finalement je me suis lancé.

en faisant 

root@vm-omv:~# mdadm --assemble /dev/md0 /dev/vda /dev/vdb /dev/vdc /dev/vdd /dev/vde
mdadm: /dev/md0 assembled from 4 drives - not enough to start the array while not clean - consider --force.

il ne recréait pas l'array il manquait un disque, et il me créait un md127 avec 3 disques en spare. (a, b, e)
j'ai fait

root@vm-omv:~# mdadm --examine /dev/vda
/dev/vda:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ed532c11:43752f86:8d7cca6a:24be687a
           Name : nas:md0
  Creation Time : Mon Aug 22 20:51:54 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=1200 sectors
          State : active
    Device UUID : 05b37241:84158fd1:522f7008:4361e7a6

    Update Time : Sun Oct 30 14:14:45 2022
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : afa27232 - correct
         Events : 142388

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.AA ('A' == active, '.' == missing, 'R' == replacing)
root@vm-omv:~#
root@vm-omv:~#
root@vm-omv:~# mdadm --examine /dev/vdb
/dev/vdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ed532c11:43752f86:8d7cca6a:24be687a
           Name : nas:md0
  Creation Time : Mon Aug 22 20:51:54 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1200 sectors
          State : active
    Device UUID : a64ae8a2:cc186976:d34f9171:1ea37232

    Update Time : Sun Oct 30 14:14:45 2022
       Checksum : aa966f28 - correct
         Events : 142388

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.AA ('A' == active, '.' == missing, 'R' == replacing)
root@vm-omv:~#
root@vm-omv:~#
root@vm-omv:~# mdadm --examine /dev/vdc
/dev/vdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ed532c11:43752f86:8d7cca6a:24be687a
           Name : nas:md0
  Creation Time : Mon Aug 22 20:51:54 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=261864 sectors, after=1200 sectors
          State : clean
    Device UUID : 1ff2747f:c6f27bef:bf24c6c4:8577b988

    Update Time : Fri Jan 14 02:23:48 2022
  Bad Block Log : 512 entries available at offset 264 sectors
       Checksum : a83a27fb - correct
         Events : 89585

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAAA ('A' == active, '.' == missing, 'R' == replacing)
root@vm-omv:~#
root@vm-omv:~#
root@vm-omv:~#
root@vm-omv:~# mdadm --examine /dev/vdd
/dev/vdd:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ed532c11:43752f86:8d7cca6a:24be687a
           Name : nas:md0
  Creation Time : Mon Aug 22 20:51:54 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1200 sectors
          State : active
    Device UUID : d077e55d:e388e41a:8cc43f75:6496aa19

    Update Time : Sun Oct 30 14:14:45 2022
       Checksum : f4f5746b - correct
         Events : 142388

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AA.AA ('A' == active, '.' == missing, 'R' == replacing)
root@vm-omv:~#
root@vm-omv:~#
root@vm-omv:~# mdadm --examine /dev/vde
/dev/vde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : ed532c11:43752f86:8d7cca6a:24be687a
           Name : nas:md0
  Creation Time : Mon Aug 22 20:51:54 2016
     Raid Level : raid5
   Raid Devices : 5

 Avail Dev Size : 3906767024 (1862.89 GiB 2000.26 GB)
     Array Size : 7813531648 (7451.56 GiB 8001.06 GB)
  Used Dev Size : 3906765824 (1862.89 GiB 2000.26 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=1200 sectors
          State : active
    Device UUID : 49e44b12:7059c9cb:7837f5f8:70ef0dc0

    Update Time : Sun Oct 30 14:14:45 2022
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 84617db5 - correct
         Events : 142388

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : AA.AA ('A' == active, '.' == missing, 'R' == replacing)
root@vm-omv:~#

les disques sont tous "active" sauf /dev/vdc qui est "clean"
j'ai lancé

root@vm-omv:~# mdadm --assemble --force /dev/md0 /dev/vda /dev/vdb /dev/vdc /dev/vdd /dev/vde
mdadm: Marking array /dev/md0 as 'clean'
mdadm: /dev/md0 has been started with 4 drives (out of 5).
root@vm-omv:~#
root@vm-omv:~# mdadm --manage /dev/md0 --add /dev/vdc
mdadm: added /dev/vdc

et là, il est en train de reconstruire le raid :

root@vm-omv:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 vdc[6] vda[8] vde[7] vdd[5] vdb[1]
      7813531648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UU_UU]
      [>....................]  recovery =  1.9% (38332080/1953382912) finish=429.2min speed=74358K/sec

unused devices: <none>
root@vm-omv:~#

Y en a pour la nuit, on verra le résultat demain smile


AMD Athlon 64X2, 8 Gio RAM, SSD 60 Gio + DD 500 Gio, LM 16-64bits, GeForce 7800 GTX
...et plein d'ordinosaures autour...;)

Hors ligne

#25 Le 04/11/2022, à 17:10

geole

Re : [Résolu] RAID 5 cassé sur NAS OMV

Bonjour
Je pense que cela va  marcher.
Je pense que tu as vu ce résultat
Update Time : Fri Jan 14 02:23:48 2022

Depuis le 14 janvier, tu avais un disque (VDD de la première commande et VDC de la seconde) qui était éliminé du RAIDS.
Habituellement, on doit être prévenu de cet incident. Tu avais plus de neuf mois pour le remplacer.

A mon avis, lorsque le nas a vu qu'un second disque était abîmé, il a stoppé immédiatement. D'où le redémarrage assez facile.


et pour un suivi plus pratique

watch -n 60 cat /proc/mdstat

Dernière modification par geole (Le 04/11/2022, à 17:29)

En ligne