【工具】Smartctl安装及使用指南-Euler的博客

作用

Smartctl是smartmontools工具包中的核心组件，用于监控和诊断硬盘健康状况。主要功能包括：

SMART监控：读取硬盘的SMART（Self-Monitoring, Analysis, and Reporting Technology）数据
健康评估：评估硬盘的整体健康状况和可靠性
故障预警：提前发现硬盘潜在故障，避免数据丢失
性能分析：监控硬盘的使用情况和性能指标
温度监控：实时监控硬盘工作温度
错误统计：统计读写错误、坏道等关键指标

重要提示：定期使用smartctl检查硬盘健康状态是数据安全的重要保障，建议每月检查一次。

安装

macOS安装

brew install smartmontools

Ubuntu/Debian安装

sudo apt update
sudo apt install smartmontools

CentOS/RHEL安装

# CentOS 7
sudo yum install smartmontools

# CentOS 8/RHEL 8+
sudo dnf install smartmontools

Arch Linux安装

sudo pacman -S smartmontools

使用

基本命令格式

smartctl [选项] 设备名

macOS使用

1. 查看磁盘设备

diskutil list

输出结果示例：

/dev/disk0 (internal, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *251.0 GB   disk0
   1:             Apple_APFS_ISC Container disk2         524.3 MB   disk0s1
   2:                 Apple_APFS Container disk3         245.1 GB   disk0s2
   3:        Apple_APFS_Recovery Container disk1         5.4 GB     disk0s3

2. 查看磁盘SMART信息

smartctl -a disk0

输出结果示例：

smartctl 7.5 2025-04-30 r5714 [Darwin 24.5.0 arm64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       APPLE SSD AP0256Z
Serial Number:                      0ba0291160042e23
Firmware Version:                   2077.120
PCI Vendor/Subsystem ID:            0x106b
IEEE OUI Identifier:                0x000000
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               3
Local Time is:                      Mon Jul 28 09:54:43 2025 CST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0004):   Frmw_DL
Optional NVM Commands (0x0004):     DS_Mngmt
Maximum Data Transfer Size:         256 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     0.00W       -        -    0  0  0  0        0       0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          99%
Percentage Used:                    0%
Data Units Read:                    17,880,873 [9.15 TB]
Data Units Written:                 15,708,975 [8.04 TB]
Host Read Commands:                 333,287,929
Host Write Commands:                160,714,913
Controller Busy Time:               0
Power Cycles:                       131
Power On Hours:                     146
Unsafe Shutdowns:                   3
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Linux使用

1. 查看磁盘设备

sudo fdisk -l

输出结果示例：

Disk /dev/nvme1n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors
Disk model: HS-SSD-C2000Pro 512G                  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 86B6DC24-13EC-4386-98B9-3575FC356A21

Device         Start        End    Sectors   Size Type
/dev/nvme1n1p1  2048      34815      32768    16M Microsoft reserved
/dev/nvme1n1p2 34816 1000214527 1000179712 476.9G Microsoft basic data

2. 查看SMART信息

sudo smartctl -a /dev/nvme1n1

输出结果示例：

smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.8.0-64-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       HS-SSD-C2000Pro 512G
Serial Number:                      30022636448
Firmware Version:                   HBAF28FT
PCI Vendor/Subsystem ID:            0x126f
IEEE OUI Identifier:                0x000000
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          512,110,190,592 [512 GB]
Namespace 1 Utilization:            265,949,810,688 [265 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Mon Jul 28 10:10:05 2025 CST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0b):         S/H_per_NS Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        45 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    17%
Data Units Read:                    63,410,903 [32.4 TB]
Data Units Written:                 65,780,159 [33.6 TB]
Host Read Commands:                 1,554,661,478
Host Write Commands:                1,353,534,964
Controller Busy Time:               41,937
Power Cycles:                       1,157
Power On Hours:                     28,672
Unsafe Shutdowns:                   331
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   117
Thermal Temp. 1 Total Time:         474

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

常用命令选项

快速健康检查

# 检查硬盘健康状态
smartctl -H /dev/sda

# 获取基本信息
smartctl -i /dev/sda

# 获取SMART属性
smartctl -A /dev/sda

自检命令

# 开始短时间自检（约2分钟）
smartctl -t short /dev/sda

# 开始长时间自检（约2小时）
smartctl -t long /dev/sda

# 开始一致性检查
smartctl -t conveyance /dev/sda

# 查看自检结果
smartctl -l selftest /dev/sda

错误日志查看

# 查看错误日志
smartctl -l error /dev/sda

# 查看所有日志
smartctl -l all /dev/sda

关键参数解读

健康状态指标

参数	含义	正常值	警告阈值
`SMART overall-health`	整体健康状态	PASSED	FAILED
`Critical Warning`	严重警告	0x00	非0值
`Temperature`	工作温度	<60°C	>70°C
`Available Spare`	可用备用空间	>10%	<5%
`Percentage Used`	使用百分比	<80%	>90%

使用统计指标

参数	含义	说明
`Data Units Read/Written`	读写数据量	监控硬盘使用强度
`Host Read/Write Commands`	读写命令数	反映I/O负载
`Power Cycles`	电源循环次数	硬盘开关机次数
`Power On Hours`	通电时间	硬盘累计工作时间
`Unsafe Shutdowns`	不安全关机	异常断电次数

错误统计指标

参数	含义	正常值	说明
`Media and Data Integrity Errors`	媒体和数据完整性错误	0	数据损坏错误
`Error Information Log Entries`	错误日志条目	0	记录的错误数量
`Controller Busy Time`	控制器忙时间	较低	控制器负载

故障诊断

常见问题排查

1. 硬盘健康状态为FAILED

# 查看详细错误信息
smartctl -l error /dev/sda

# 查看SMART属性详情
smartctl -A /dev/sda

可能原因：

坏道过多
温度过高
机械故障
固件问题

2. 温度异常

# 监控温度变化
watch -n 5 'smartctl -A /dev/sda | grep Temperature'

解决方案：

检查散热系统
清理灰尘
改善通风条件

3. 读写错误

# 查看错误日志
smartctl -l error /dev/sda

# 运行自检
smartctl -t long /dev/sda

预警机制

建议设置定期检查脚本：

#!/bin/bash
# 硬盘健康检查脚本

DEVICES=("/dev/sda" "/dev/sdb" "/dev/nvme0n1")

for device in "${DEVICES[@]}"; do
    if [ -e "$device" ]; then
        health=$(smartctl -H "$device" | grep "SMART overall-health")
        if [[ $health == *"FAILED"* ]]; then
            echo "警告: $device 健康状态异常!"
            # 可以添加邮件通知或日志记录
        fi
    fi
done

最佳实践

1. 定期检查

日常检查：每周检查一次健康状态
深度检查：每月运行一次长时间自检
温度监控：实时监控硬盘温度

2. 数据备份

在发现硬盘异常时立即备份重要数据
建立定期备份机制
使用RAID等冗余技术

3. 环境维护

保持适宜的工作温度（20-35°C）
确保良好的通风条件
避免频繁的开关机

4. 监控工具

使用smartd守护进程进行自动监控
配置邮件或短信告警
集成到监控系统中

总结

Smartctl是硬盘健康监控的重要工具，通过定期使用可以有效预防硬盘故障。关键要点：

定期检查：建立检查计划，及时发现问题
参数解读：理解关键指标的含义和阈值
故障处理：掌握常见问题的诊断和解决方法
预防为主：通过监控和预警避免数据丢失

重要提醒：硬盘故障可能导致数据丢失，建议在发现异常时及时备份数据并考虑更换硬盘。

目录CONTENT

【工具】Smartctl安装及使用指南

作用

安装

macOS安装

Ubuntu/Debian安装

CentOS/RHEL安装

Arch Linux安装

使用

基本命令格式

macOS使用

1. 查看磁盘设备

2. 查看磁盘SMART信息

Linux使用

1. 查看磁盘设备

2. 查看SMART信息

常用命令选项

快速健康检查

自检命令

错误日志查看

关键参数解读

健康状态指标

使用统计指标

错误统计指标

故障诊断

常见问题排查

1. 硬盘健康状态为FAILED

2. 温度异常

3. 读写错误

预警机制

最佳实践

1. 定期检查

2. 数据备份

3. 环境维护

4. 监控工具

总结

评论区