记录日常各种问题

配置

  1. 主板: H11SSL-i
  2. CPU: EPYC 7282
  3. 直通卡: LSI 9207-8i 之前给E5准备的, 目前主板使用SFF-8643 to 4Sata, LSI暂时吃灰
  4. 硬盘: 4 x 4T raid5, 1 x 4T 热备; 一块银河8T冷备

VM启动前的hook脚本

最开始我是在pve-guests.service里添加了pre参数,后来发现升级后这个service就会被重置, 还是自己加一个service吧
首先是两个文件内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
root@pve00:~# cat /etc/systemd/system/pve-hook.service
[Unit]
Description=PVE Hooks (From Phenix93)
ConditionPathExists=/usr/local/bin/pve-pre-hook.sh
After=networking.service
Before=pve-manager.service
Before=pve-guests.service

[Service]
Type=oneshot
ExecStart=-/usr/local/bin/pve-pre-hook.sh

[Install]
WantedBy=multi-user.target

root@pve00:~# cat /usr/local/bin/pve-pre-hook.sh
#!/bin/bash

# SATA controller reset_method
#echo "bus" > /sys/bus/pci/devices/0000:44:00/reset_method
lspci -n|grep 1022:7901 |awk '{print $1}' | while read _id; do echo "set reset_method for PCI ${_id}"; echo "bus" > /sys/bus/pci/devices/0000:$_id/reset_method;done

# SR-IOV
echo "reset SR-IOV for enp65s0"
echo 0 > /sys/class/net/enp65s0f0/device/sriov_numvfs
echo 0 > /sys/class/net/enp65s0f1/device/sriov_numvfs
echo "set numvfs for enp65s0"
echo 2 > /sys/class/net/enp65s0f0/device/sriov_numvfs
echo 2 > /sys/class/net/enp65s0f1/device/sriov_numvfs

# time wait

sleep 20

enalbe 新服务

1
2
3
root@pve00:~# systemctl daemon-reload
root@pve00:~# systemctl enable pve-hook.service
Created symlink /etc/systemd/system/multi-user.target.wants/pve-hook.service → /etc/systemd/system/pve-hook.service.

重启后运行systemd-analyze balme或者systemd-analyze plot看一下启动情况

常用命令

直通相关

硬盘相关

有个参考的stackoverflow

1
2
3
4
5
6
7
# 断开SATA
echo 1 > /sys/class/scsi_device/13:0:0:0/device/delete
OR
echo 1 > /sys/block/sda/device/delete

# 重新加载SATA
echo "- - -" > /sys/class/scsi_host/host13/scan

PCI相关

操作接口kernel文档

  • 查询pci设备使用的驱动: lspci -k, lspci -k -s 44:00
  • 树状pci设备: lspci -t
  • 查看设备id: lspci -nv -s 44:00
  • 解绑PCI驱动: echo "0000:44:00.0" /sys/bus/pci/drivers/ahci/unbind, 给44:00解绑achi驱动
  • 手动绑定PCI驱动: echo "0000:44:00.0" > /sys/bus/pci/drivers/vfio/bind, 给44:00绑定vfio驱动

主板记录

SATA Controller

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
使用情况(PVE下查看)
root@pve00:~# lspci|grep SATA
43:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
44:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
86:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)
87:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 51)

root@pve00:~# dmesg |grep -i SATA
[ 2.706895] ahci 0000:86:00.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
[ 2.707379] ata1: SATA max UDMA/133 abar m2048@0xf0100000 port 0xf0100100 irq 57
[ 2.707686] ahci 0000:87:00.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps 0x1 impl SATA mode
[ 2.708011] ata2: SATA max UDMA/133 abar m2048@0xf0000000 port 0xf0000100 irq 149
[ 2.708424] ahci 0000:43:00.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode
[ 2.709872] ata3: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400100 irq 151
[ 2.709875] ata4: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400180 irq 152
[ 2.709877] ata5: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400200 irq 153
[ 2.709879] ata6: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400280 irq 154
[ 2.709881] ata7: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400300 irq 155
[ 2.709882] ata8: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400380 irq 156
[ 2.709884] ata9: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400400 irq 157
[ 2.709886] ata10: SATA max UDMA/133 abar m2048@0xb0400000 port 0xb0400480 irq 158
[ 2.710287] ahci 0000:44:00.0: AHCI 0001.0301 32 slots 8 ports 6 Gbps 0xff impl SATA mode
[ 2.711658] ata11: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300100 irq 168
[ 2.711660] ata12: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300180 irq 169
[ 2.711662] ata13: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300200 irq 170
[ 2.711664] ata14: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300280 irq 171
[ 2.711666] ata15: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300300 irq 172
[ 2.711668] ata16: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300380 irq 173
[ 2.711670] ata17: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300400 irq 174
[ 2.711672] ata18: SATA max UDMA/133 abar m2048@0xb0300000 port 0xb0300480 irq 175
[ 3.020421] ata2: SATA link down (SStatus 0 SControl 300)
[ 3.020422] ata1: SATA link down (SStatus 0 SControl 300)
[ 3.025484] ata7: SATA link down (SStatus 0 SControl 300)
[ 3.025485] ata18: SATA link down (SStatus 0 SControl 300)
[ 3.025524] ata9: SATA link down (SStatus 0 SControl 300)
[ 3.025529] ata15: SATA link down (SStatus 0 SControl 300)
[ 3.025560] ata3: SATA link down (SStatus 0 SControl 300)
[ 3.025569] ata13: SATA link down (SStatus 0 SControl 300)
[ 3.025598] ata5: SATA link down (SStatus 0 SControl 300)
[ 3.025607] ata14: SATA link down (SStatus 0 SControl 300)
[ 3.025636] ata6: SATA link down (SStatus 0 SControl 300)
[ 3.025643] ata16: SATA link down (SStatus 0 SControl 300)
[ 3.025676] ata8: SATA link down (SStatus 0 SControl 300)
[ 3.025684] ata17: SATA link down (SStatus 0 SControl 300)
[ 3.025716] ata11: SATA link down (SStatus 0 SControl 300)
[ 3.025738] ata10: SATA link down (SStatus 0 SControl 300)
[ 3.025762] ata12: SATA link down (SStatus 0 SControl 300)
[ 3.025777] ata4: SATA link down (SStatus 0 SControl 300)

测试发现, H11SSL-i实际只使用了2个controller, 猜测是给-nc主板使用的

1
2
3
4
5
6
7
0000:43:00.0
I-SATA0-3 => ata3-6(host2-5)
I-SATA4-7 => ata7-10(host6-9)

0000:44:00.0 SATA controller:
SATA8-11 ata11-14(/sys/bus/scsi/devices/host10-13)
SATA12-15 ata15-18(host14-17)

参考

评论