Dao's BLOG

NVidia-smi shows API mismatch

By: leizhou | Post date: April 28, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

Sometime when updated NVidia driver and CUDA on Rocky Linux systems, running nvidia-smi shows that kernel driver version mismatch. If you run

dmseg

It will show:

NVRM: API mismatch: the client has the version aaa.bbb, but
NVRM: this kernel module has the version ccc.ddd. Please
NVRM: make sure that this kernel module and all NVIDIA driver
NVRM: components have the same version.

And aaa.bbb is not the same with ccc.ddd.

This happens that the corresponding nvidia driver was not properly registered by dkms.

Other solution suggested to reboot the server, reinstall drivers, recreate initramfs, and rmmod of corresponding nvidia mods. These methods sometimes works. When they are all not working, you can try

dkms install -m nvidia -v 570.144

where replacing 570.144 to your most recent installed nvidia driver version. Then reboot the server. This should work.

When dnf update kernel failed to generate initramfs

By: leizhou | Post date: April 24, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

This sometimes happen when the automatic nvidia kernel module installation fails.

The workaround is:

Boot into an older kernel;
ls /boot/ and find the kernel name of the vmlinuz without corresponding initramfs. For example, it could be vmlinuz-6.14.3-300.fc42.x86_64. Your kernel version will be 6.14.3-300.fc42.x86_64, in the form of major.mid,minor-nnn.osver.arch. Let’s call it $KVER
Check the existence of /lib/modules/$KVER/ by ls it. If exists, do “depmod -v $KVER”. This will create modules.dep in the folder /lib/modules/$KVER/.
Do dracut –force –kver $KVER. If not working, use a lower version of gcc like “CC=gcc-14 dracut –force –kver $KVER”.
Reboot into the newest kernel. Usually this will not contail nvidia driver kernel module.
Run the nvidia driver installation downloaded from NVidia driver site, like NVIDIA-Linux-x86_64-570.144.run. This will install the NVidia driver kernel module into the kernel. If not working, use a lower version of gcc, like “CC=gcc-14 ./NVIDIA-Linux-x86_64-570.144.run”
Now you have a good kernel with proper graphic driver kernel module.

Setting firewalld to allow nodes on intranet to access internet

By: leizhou | Post date: April 4, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

firewall-cmd --zone=public --add-interface=<internet interface> --permanent
firewall-cmd --zone=internal --add-interface=<intranet interface as gateway> --permanent
firewall-cmd --set-default-zone=public --permanent
firewall-cmd --reload
firewall-cmd --get-default-zone
firewall-cmd --new-policy internal-public --permanent
firewall-cmd --reload
firewall-cmd --policy internal-public --add-ingress-zone=internal --permanent
firewall-cmd --policy internal-public --add-egress-zone=public --permanent
firewall-cmd --policy internal-public --set-target=ACCEPT --permanent
firewall-cmd --reload
firewall-cmd --info-policy internal-public

When upgrading OS, dnf and rpm fails on SHA1 packages

By: leizhou | Post date: April 2, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

First, use

rpm -q gpg-pubkey –qf ‘%{NAME}-%{VERSION}-%{RELEASE}\t%{SUMMARY}\n’

to identify keys from obsolete repositories, then use

rpm -e gpg-pubkey-xxxxxxxx-yyyyyyyy

to remove the keys that were imported from the SHA1 era.

Then have the offending packages removed by

rpm -q –nosignature –querybynumber xxxx

where you can get the xxxx from the stderr messages from

rpm -qa >/dev/null

Grub force install to boot sector

By: leizhou | Post date: April 2, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Uncategorized, Work related

When upgrading OS or replacing drives, the UEFI booting may not be able to automatically installed on the boot drive, resulting not able to update the boot menu to include the new kernels.

When you do grub2-install <your boot device>, it errors out with information:

Installing for x86_64-efi platform.
grub2-install: error: This utility should not be used for EFI platforms because it does not support UEFI Secure Boot. If you really wish to proceed, invoke the –force option.
Make sure Secure Boot is disabled before proceeding.

Do not worry. Just force it,

grub2-install –force <your boot device>

then update the grub menu,

grub2-mkconfig -o /boot/grub2/grub.cfg

it will work.

Linux ssh log in super slow

By: leizhou | Post date: February 5, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

It was found that systemd-logind malfunctioned,

By restarting it —

systemctl restart systemd-logind

The problem is resolved.

tcsh script behavior change

By: leizhou | Post date: January 31, 2025 | Comments: No Comments

Posted in categories: Computer Tips, Work related

Recently I noticed a tcsh behavior change.

CentOS7, Rocky 8, Ubuntu 20.04, Fedora 41, and Linux Mint if you have a string variable in tcsh,

set a=”mystring”

and attempt to get its path when treating is as a filename:

set b=”$a:h”

it will return $a itself.

To be noted that the expect behavior when a=”mypath/mystring”,

set b=”$a:h”

get string $b as mypath

However, in Rocky9, $b will be “” empty string when $a does not contain any slash.

This behavior caused some of our tcsh scripts to malfunction.

The reason is still under investigating.

Window 11 cannot install Asian Keyboard

By: leizhou | Post date: December 13, 2024 | Comments: No Comments

Posted in categories: Computer Tips, Work related

It happened to me that when I attempt to install Chinese/Japan/Korean input method to my windows 11 box, it fails on installing the “Basic Typing” after attempted to download for about 30 seconds with error

“Sorry, we’re having trouble installing this feature. You can try again later. Error code 0x0”.

And all other components like Handwriting, Text-to-speech, Speech recognition also fail. And the installed Asian keyboard does not work showing the feature is not ready.

I tried the tricks in this: [https://answers.microsoft.com/en-us/windows/forum/all/windows-11-unable-to-download-language-packs/b78b04da-2c75-45d8-a828-f553441b220f] but none of them works.

The workaround I found is to download

26100.1.240331-1435.ge_release_amd64fre_CLIENT_LOF_PACKAGES_OEM.iso

from https://files.rg-adguard.net/file/025cfc5d-f5fa-7d00-246e-76c04a40e210

and extract the corresponding language pack .cab files like

Microsoft-Windows-Client-Language-Pack_x64_zh-cn.cab

Microsoft-Windows-LanguageFeatures-Basic-zh-cn-Package~31bf3856ad364e35~amd64~~.cab

Microsoft-Windows-LanguageFeatures-Handwriting-zh-cn-Package~31bf3856ad364e35~amd64~~.cab

Microsoft-Windows-LanguageFeatures-Speech-zh-cn-Package~31bf3856ad364e35~amd64~~.cab

Microsoft-Windows-LanguageFeatures-TextToSpeech-zh-cn-Package~31bf3856ad364e35~amd64~~.cab

and install them one by one in PowerShell with Admin privilege like:

Add-WindowsPackage -Online -PackagePath “.\Microsoft-Windows-LanguageFeatures-Basic-zh-cn-Package~31bf3856ad364e35~amd64~~.cab”

After all of these, the “Basic typing” is still not available but the input method works.

When dnf/yum update stuck on cleaning up…

By: leizhou | Post date: October 21, 2024 | Comments: No Comments

Posted in categories: Uncategorized

Sometimes when you are doing dnf/yum update, the progress may stop on the last step – cleaning up packages for hours, if you have a super large data drive. This may be caused by an installing script falsely attempts to scan through multi-million files on your data drive that is not mounted in a regular location. If this is the case, you can do the following:

Open another terminal, use “top” to find out which process is keeping working, like texlua etc will show up on top.

Then you can do “lsof | grep <process_name> to find out which drive this process is scanning through.

When you find it, for example, if it is “/data/home”, you can do “umount -l <volume_name>”, (here it is “umount -l /data/home”), wait 10 seconds, then “mount /data/home” to remount it. Then the process that scanning the drive will think there is no more files, and quit it.

This will allow the dnf/yum finish without any error.

Restart docker failed on docker0 network interface

By: leizhou | Post date: October 8, 2024 | Comments: No Comments

Posted in categories: Computer Tips, Work related

Sometimes when you attempt to restart docker.service, it fails on cannot restart docker0 network interface.

In this case, you can simple do

ifdown docker0

Then you can start docker.service again.

My work, my view, and my imagination

Pages

Categories

NVidia-smi shows API mismatch

When dnf update kernel failed to generate initramfs

Setting firewalld to allow nodes on intranet to access internet

When upgrading OS, dnf and rpm fails on SHA1 packages

Grub force install to boot sector

Linux ssh log in super slow

tcsh script behavior change

Window 11 cannot install Asian Keyboard

When dnf/yum update stuck on cleaning up…

Restart docker failed on docker0 network interface

Recent Posts

Archives

My Account