首先呢是发现了容器会重启的问题,然后查看日志报错

shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS          PORTS                                         NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 17 seconds   0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp   grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 17 seconds   0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp   prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Up 3 seconds    0.0.0.0:8080->8080/tcp, [::]:8080->8080/tcp   cadvisor

查看
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS                           PORTS                                                                   NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 50 seconds                    0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp                             grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 51 seconds                    0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp                             prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Restarting (255) 4 seconds ago                                                                           cadvisor
再次查看,
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS                           PORTS                                                                   NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 53 seconds                    0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp                             grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 54 seconds                    0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp                             prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Restarting (255) 7 seconds ago                                                                           cadvisor
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS                                    PORTS                                                                   NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 56 seconds                             0.0.0.0:3000->3000/tcp, [::]:3000->3                          000/tcp   grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 56 seconds                             0.0.0.0:9090->9090/tcp, [::]:9090->9                          090/tcp   prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Restarting (255) Less than a second ago                                                                           cadvisor

报错内容就是如下

.0:8080->8080/tcp, [::]:8080->8080/tcp   cadvisor
shitou@aishitou:~$ sudo docker logs eec3a45791bc
[sudo] password for shitou:
F0226 15:29:53.822922       1 cadvisor.go:146] Failed to create a Container Manager: mountpoint for cpu not found
F0226 15:29:58.289714       1 cadvisor.go:146] Failed to create a Container Manager: mountpoint for cpu not found
F0226 15:30:02.550374       1 cadvisor.go:146] Failed to create a Container Manager: mountpoint for cpu not found
F0226 15:30:06.154412       1 cadvisor.go:146] Failed to create a Container Manager: mountpoint for cpu not found

分析

从错误日志来看,cadvisor 启动失败的原因是无法找到 CPU 相关的挂载点(mountpoint for cpu not found)。这是因为 cadvisor 需要访问宿主机的 cgroup 目录来监控容器和系统资源,而默认情况下这些目录没有被正确挂载到容器中。

shitou@aishitou:~$ mount | grep cgroup | grep cpu
#目的是查看宿主机是否已挂载 CPU 相关的 cgroup 子系统(cpu 和 cpuacct 是 cgroup 中用于 CPU 资源控制的关键子系统)
#输出为空,说明当时系统中没有挂载 CPU 相关的 cgroup 子系统,这正是 cadvisor 报错的原因。

shitou@aishitou:~$ ls -ld /sys/fs/cgroup 
dr-xr-xr-x 12 root root 0 Aug 14 07:30 /sys/fs/cgroup
#检查 cgroup 根目录是否存在(/sys/fs/cgroup 是 cgroup 子系统的默认挂载点)。
#输出显示目录存在(dr-xr-xr-x),权限正常,排除了目录不存在的问题。

shitou@aishitou:~$ sudo mkdir -p /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct
[sudo] password for shitou:
shitou@aishitou:~$ sudo mount -t cgroup -o cpu cpu /sys/fs/cgroup/cpu #这里少写了
mount: /sys/fs/cgroup/cpu: cpu already mounted or mount point busy.
       dmesg(1) may have more information after failed mount system call.

shitou@aishitou:~$ sudo mount -t cgroup -o cpuacct cpuacct /sys/fs/cgroup/cpuac                       ct
#手动创建 CPU 相关的 cgroup 子目录(因为之前检查发现没有挂载,可能是目录缺失导致)。

shitou@aishitou:~$ mount | grep cgroup | grep cpu
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
#试图将 cpu 子系统挂载到 /sys/fs/cgroup/cpu。

--------------------------------------------------------------------------------------------------
shitou@aishitou:~$ sudo fuser -m /sys/fs/cgroup/cp
Specified filename /sys/fs/cgroup/cp does not exist.
报错 cpu already mounted or mount point busy说明该挂载点可能已被占用,或系统已通过其他方式挂载了 cpu 子系统。
--------------------------------------------------------------------------------------------------
上面错误,因为少写了一个U,哈哈cpu,
shitou@aishitou:~$ sudo fuser -m /sys/fs/cgroup/cpu
#没有输出任何内容,说明该目录当前没有被任何进程占用(fuser 命令会列出占用指定文件 / 目录的进程 ID,无输出即表示无进程占用)。
=========================================================================================
shitou@aishitou:~$ mount | grep cgroup | grep cpu
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
#只显示了 cpuacct 子系统的挂载信息(/sys/fs/cgroup/cpuacct),而 cpu 子系统并未单独挂载。这与你系统使用 cgroup v2 的特性完全一致 ——v2 采用统一目录结构,所有子系统(包括 CPU 控制功能)都整合在 /sys/fs/cgroup 根目录下,无需也不支持单独挂载 cpu 子系统。
=========================================================================================
记住这个,因为我在后面改成了cgroup v1
=========================================================================================
shitou@aishitou:~$ sudo mount -t cgroup -o cpu cpu /sys/fs/cgroup/cpu
mount: /sys/fs/cgroup/cpu: cpu already mounted or mount point busy.
       dmesg(1) may have more information after failed mount system call.
成功挂载了 cpuacct 子系统(后续 mount 命令显示其已挂载)。

shitou@aishitou:~$ mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdeleg                          ate,memory_recursiveprot) 
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
再次尝试挂载 cpu 子系统仍失败,最终通过 mount | grep cgroup 发现系统实际使用的是 cgroup v2(显示 cgroup2 on /sys/fs/cgroup),而你手动挂载的 cpuacct 是 cgroup v1 的子系统,两者可能不兼容。

shitou@aishitou:~$ docker start eec3a45791bc
eec3a45791bc
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS          PORTS                                         NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 12 minutes   0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp   grafana





shitou@aishitou:~$ docker start eec3a45791bc
eec3a45791bc
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS          PORTS                                         NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 12 minutes   0.0.0.0:3000->3000/tcp, [::]:3000->3000/tcp   grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 12 minutes   0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp   prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Up 2 seconds    0.0.0.0:8080->8080/tcp, [::]:8080->8080/tcp   cadvisor
shitou@aishitou:~$ docker ps
CONTAINER ID   IMAGE                    COMMAND                  CREATED                                  STATUS                                    PORTS                                                                   NAMES
ab357e46450f   grafana/grafana:latest   "/run.sh"                5 months ago                             Up 13 minutes                             0.0.0.0:3000->3000/tcp, [::]:3000->3                          000/tcp   grafana
553e54dcef0d   prom/prometheus:latest   "/bin/prometheus --c…"   5 months ago                             Up 13 minutes                             0.0.0.0:9090->9090/tcp, [::]:9090->9                          090/tcp   prometheus
eec3a45791bc   google/cadvisor:latest   "/usr/bin/cadvisor -…"   5 months ago                             Restarting (255) Less than a second ago                                                                           cadvisor
shitou@aishitou:~$ docker stop eec3a45791bc
eec3a45791bc
 sudo docker logs eec3a45791bc
F0226 15:29:53.822922       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found
F0226 15:29:58.289714       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found
F0226 15:30:02.550374       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found
F0226 15:30:06.154412       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found
F0226 15:30:09.577217       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found
F0226 15:30:13.126774       1 cadvisor.go:146] Failed to create a Container Man                          ager: mountpoint for cpu not found

正在进一步解决 cgroup 相关的挂载问题(尤其是 cpu 子系统挂载失败),并尝试通过配置系统启动参数来调整 cgroup 版本(从 v2 切换到 v1,以更好地兼容 cadvisor)。

shitou@aishitou:~$ sudo vim /etc/fstab
shitou@aishitou:~$ sudo ls /etc/fstab
/etc/fstab
shitou@aishitou:~$ cp /etc/fstab /etc/fstab.bf
cp: cannot create regular file '/etc/fstab.bf': Permission denied
shitou@aishitou:~$ sudo cp /etc/fstab /etc/fstab.bf
shitou@aishitou:~$ sudo vim /etc/fstab
shitou@aishitou:~$ udo mount -a
Command 'udo' not found, but can be installed with:
sudo apt install udo

添加了 cpu 子系统的自动挂载配置(比如类似 none /sys/fs/cgroup/cpu cgroup defaults,cpu 0 0 的行),目的是让系统启动时自动挂载 cpu 子系统,避免每次手动挂载。
none  /sys/fs/cgroup/cpu  cgroup  defaults,cpu,cpuacct  0  0
=========================================================================================================================

shitou@aishitou:~/monitor$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/975f86b3-b8ea-4365-902d-5c61c0ad749c / ext4 defaults 0 1
# /boot was on /dev/sda3 during curtin installation
/dev/disk/by-uuid/069c7f52-682b-45f4-8c2e-2f2f38f9c60a /boot ext4 defaults 0 1
/dev/disk/by-uuid/8b865cd9-1d8f-48e5-bfe6-6a649909b853 none swap sw 0 0
# /boot/efi was on /dev/sda1 during curtin installation
/dev/disk/by-uuid/2D20-0B2E /boot/efi vfat defaults 0 1
none  /sys/fs/cgroup/cpu  cgroup  defaults,cpu,cpuacct  0  0
-------------------------------------------------------------------------------------------
shitou@aishitou:~/monitor$ cat /etc/fstab.bf
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/sda2 during curtin installation
/dev/disk/by-uuid/975f86b3-b8ea-4365-902d-5c61c0ad749c / ext4 defaults 0 1
# /boot was on /dev/sda3 during curtin installation
/dev/disk/by-uuid/069c7f52-682b-45f4-8c2e-2f2f38f9c60a /boot ext4 defaults 0 1
/dev/disk/by-uuid/8b865cd9-1d8f-48e5-bfe6-6a649909b853 none swap sw 0 0
# /boot/efi was on /dev/sda1 during curtin installation
/dev/disk/by-uuid/2D20-0B2E /boot/efi vfat defaults 0 1

===========================================================================================================

-------------------------------------------------------------------------------
shitou@aishitou:~$ sudo  mount -a
mount: /sys/fs/cgroup/cpu: none already mounted or mount point busy.
       dmesg(1) may have more information after failed mount system call.
mount: (hint) your fstab has been modified, but systemd still uses
       the old version; use 'systemctl daemon-reload' to reload.
shitou@aishitou:~$ mount | grep cgroup | grep cpu

cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
--------------------------------------------------------------------------------

执行 sudo mount -a 让 fstab 中的配置立即生效(无需重启),但报错 mount: /sys/fs/cgroup/cpu: none already mounted or mount point busy,说明:
要么 cpu 子系统已通过其他方式挂载(但实际 mount | grep cgroup | grep cpu 显示未挂载,更可能是冲突);
要么与系统当前的 cgroup v2 机制冲突(cgroup v2 是统一挂载,不支持 v1 子系统的单独挂载)。
也就是从这个时候开始切换的
---------------------------------------------------------------------------------------------------



========================================================================================================
shitou@aishitou:~$ sudo nano /etc/fstab

shitou@aishitou:~$ sudo umount /sys/fs/cgroup/cpu
umount: /sys/fs/cgroup/cpu: not mounted.
尝试卸载/sys/fs/cgroup/cpu这个挂载点(取消该目录与 cgroupcpu子系统的关联)。
但提示not mounted,说明该目录当前并未被挂载,无需卸载。

shitou@aishitou:~$ sudo systemctl daemon-reload
重新加载配置文件
shitou@aishitou:~$ sudo umount /sys/fs/cgroup/cpu
umount: /sys/fs/cgroup/cpu: not mounted.
shitou@aishitou:~$ sudo systemctl daemon-reload
shitou@aishitou:~$ sudo mount -a
mount: /sys/fs/cgroup/cpu: none already mounted or mount point busy.
       dmesg(1) may have more information after failed mount system call.

挂载/etc/fstab中所有未挂载的文件系统(按配置自动挂载)。
这里报错/sys/fs/cgroup/cpu: none already mounted or mount point busy,说明fstab中关于/sys/fs/cgroup/cpu的配置与当前系统(仍在使用 cgroup v2)冲突,无法挂载。

shitou@aishitou:~$ mount | grep cgroup | grep cpu
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)

查看系统中与cgroup和cpu相关的挂载记录。
输出cpuacct on /sys/fs/cgroup/cpuacct...说明cpuacct子系统已挂载,但cpu子系统仍未挂载(这正是cadvisor报错的核心原因)。
---------------------------------------------------------------------------------------------------
cpuacct:这是 cgroup(控制组)的一个子系统,主要用于CPU 资源使用的计量(accounting),即跟踪和统计进程或进程组消耗的 CPU 时间(用户态、内核态等)。
on /sys/fs/cgroup/cpuacct:表示 cpuacct 子系统被挂载到了 /sys/fs/cgroup/cpuacct 目录下。通过这个目录,用户可以配置和查看该子系统的相关参数(如进程组的 CPU 使用统计)。
type cgroup:说明这是一个 cgroup 类型的文件系统(cgroup 本质上是通过特殊文件系统实现的资源管理机制)。
(rw,relatime,cpuacct):挂载选项。其中:
rw 表示该文件系统以可读写模式挂载;
relatime 表示文件访问时间会相对延迟更新(优化性能);
cpuacct 是该 cgroup 子系统的标识。
---------------------------------------------------------------------------------------------
shitou@aishitou:~$ mount | grep cgroup2 
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdeleg                          ate,memory_recursiveprot)
检查系统是否在使用cgroup v2(新一代 cgroup 机制)。
输出cgroup2 on /sys/fs/cgroup...确认当前系统确实在使用cgroup v2,而cadvisor对其兼容性较差。

shitou@aishitou:~$ ls /sys/fs/cgroup/cgroup.controllers
/sys/fs/cgroup/cgroup.controllers
查看cgroup v2的控制器文件(cgroup.controllers是cgroup v2的标志性文件,记录了可用的资源控制子系统)
输出说明cgroup v2正常运行。
-------------------------------------------------------------------------------------

shitou@aishitou:~$ sudo nano /etc/default/grub


shitou@aishitou:~$ sudo cp /etc/default/grub /etc/default/grub.bf

shitou@aishitou:~$ sudo nano /etc/default/grubGRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0" 增加了着这个
目的是禁用cgroup v2
(cadvisor对v1兼容性更好)。  这里就又引出了一个问题,我的yaml文件是否是有问题,并且是否合理,
-----------------------------------------------------------------------------------
shitou@aishitou:~$ sudo update-grub
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.8.0-71-generic
Found initrd image: /boot/initrd.img-6.8.0-71-generic
Found linux image: /boot/vmlinuz-6.8.0-64-generic
Found initrd image: /boot/initrd.img-6.8.0-64-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
根据/etc/default/grub的最新配置,重新生成GRUB启动菜单(使内核参数生效)。
输出显示已找到系统内核并更新配置,但新参数需重启系统后才会实际生效。