Llama.cpp on Fedora 40 with cuda support

llama.cpp logoNot long ago, I bought a laptop with Nvidia card. Partially because I wanted to check out some newer, fancy games (but in reality, when I do have some free time finally, I usually end up playing DOS games anyway). The second, more important reason for choosing Nvidia instead of going down the more trodden AMD/Radeon path (at least that’s my understanding that Nvidia is still more of a hassle on Fedora) was that I wanted to give CUDA a try for educational purposes.

I have to say that using a laptop that combines an integrated Intel GPU with an Nvidia GPU can be pretty daunting sometimes, especially when HDMI comes into play (because I use an external monitor when at home). But so far, it was always possible to sort it out somehow and I’m using Nvidia drivers from RPM Fusion without any bigger issues.

When I bought this laptop, I put Fedora 39 on it and managed to get CUDA working when I was experimenting with some LLMs using llama.cpp. However, when I did an upgrade to Fedora 40 the other day, I had to remove some previously installed stuff to get the Nvidia drivers working again. And now, when I wanted to fiddle with a local LLM again, I had to go through the whole process again so I’m going to write it down here in case I need to remember the steps in the future (it may be futile, though, as everything in this area moves very fast). Anyway, first I installed CUDA development packages, it had to be from Nvidia’s repo intended for Fedora 39 (at the time of writing this text, there was nothing available for Fedora 40 yet):

# dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora39/x86_64/cuda-fedora39.repo

# dnf install cuda

I had the llama.cpp repository checked out locally already so I just updated it to the latest. However, when I tried to build the binaries with CUDA support, it complained about gcc being too new. Fedora 40 has version 14 and llama.cpp said it needed version 12 at most. Now, it’s possible to build an older version of gcc but it seemed easier to me to simply use one from an older Fedora release. At the time of writing this, Fedora 37 repositories, which contained gcc version 12, were still available so I just installed the needed stuff into a temporary location using installroot:

# dnf install --releasever=37 --installroot=/tmp/just_a_temp_dir gcc gcc-c++

Once that was done, I went to where the CUDA binaries were located and created symlinks in this directory so that the older version of gcc would be used:

  cd /usr/local/cuda/bin/
  ln -s /tmp/just_a_temp_dir/bin/gcc gcc
  ln -s /tmp/just_a_temp_dir/bin/c++ c++
  ln -s /tmp/just_a_temp_dir/bin/g++ g++

Having done this, I was able to build llama.cpp with CUDA support finally:

cd llama.cpp/
make -B GGML_CUDA=1

This is what simple use from the CLI can look like:

But of course, it’s more useful to tap into the llama.cpp API using some other tool, be it the Web UI that comes with llama-server, VS Code or something entirely different. In either case, I went ahead and asked the local llama model to summarize this very post:

Can you summarize this text in bullets?

and this is what I got:

Llama I’d be happy to help you summarize the text in bullets!

Here’s a summary of the text:

• The user bought an Nvidia laptop for educational purposes (CUDA) and wanted to try it out on Fedora 39.
• They installed CUDA development packages from Nvidia’s repo intended for Fedora 39, as there was no version available for Fedora 40 at that time.
• To build llama.cpp with CUDA support, the user needed a specific older version of GCC (12 or less), which is not compatible with Fedora 40’s default GCC 14.
• The solution was to install an older version of GCC from Fedora 37 in a temporary directory using installroot and create symlinks to use this older version instead of the system’s default GCC.

Let me know if you have any further requests!

Well, there’s still some room for improvement but it’s pretty neat for a local model running on my laptop.