Are you exposing any ports on your home server?

ffhein@lemmy.world · 4 days ago

I just wanted to test if it was viable to run larger MoE LLMs on CPU, e.g. Qwen3-next-80B-A3B… Even if I got acceptable generation speeds I’d probably get bored with it after a few hours, as with other local models. Had I got it for €700 it was pretty low value for money anyway, since my current RAM is enough for everything else I use the computer for. On the positive side, I can put that money towards a Steam Frame instead.

ffhein@lemmy.world · 6 days ago

… I was thinking about buying a 96GB DDR5 kit from the local computer store a few weeks ago, but wasn’t sure it was actually worth €700. Checked again now and the exact same product costs €1500. I guess that settles it, 32GB will have to be enough for the next couple of years then.

ffhein@lemmy.world · 2 months ago

Still are, but I guess a lot of people don’t know much about them

ffhein@lemmy.world · 2 months ago

So far that has never happened because I’m not using that much storage :) But I shut it down when I need to turn off the mains electricity, and for powering it on afterwards the fake wall can be lifted off. It’s just the area underneath the desk so the panel might be smaller than it sounds like, and it hangs on some hooks so it’s fairly easy to remove if you know what you’re doing. Painted in the same colour as the wall, and with some some random junk on the floor in front, it blends in quite well though. I think the risk of burglary is fairly low, so it’s primarily to soothe my own paranoia.

ffhein@lemmy.world · 2 months ago

I mounted mine on the wall under a desk in a room with no other electronics, and then put up a fake wall in front of the server. It can draw in air from the sides, and exhaust upwards behind the desk. But the only real solution is offsite backup, which will also protect against fire and other disasters.

ffhein@lemmy.world · edit-2 4 months ago

Can’t help but think about this old XKCD from 2010.

ffhein@lemmy.world · 5 months ago

Products targeted towards businesses have always been unreasonably more expensive than those targeted towards consumers. It sucks for us AI hobbyists that Nvidia are stingy with VRAM on consumer cards, but I don’t find it surprising.

Personally I only have a single RTX 3090, but I know a lot of people online who are stacking multiple consumer cards to run AI. Buying used 3090s and putting them in a mining rig is probably still the best value for money if you need a large amount of VRAM.

How much VRAM do you actually need btw?

ffhein@lemmy.world · 10 months ago

Intel NUC running Linux. Not the cheapest solution but can play anything and I have full control over it. At first I tried to find some kind of programmable remote but now we have a wireless keyboard with built-in touchpad.

Biggest downside is that the hardware quality is kind of questionable and the first two broke after 3 years + a few months, so we’re on our third now.

ffhein@lemmy.world · 10 months ago

This is my wireguard docker setup:

version: "3.6"
services:
  wireguard:
    image: linuxserver/wireguard
    container_name: wireguard
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    environment:
      - PUID=116
      - PGID=122
      - TZ=Europe/Stockholm
      - ALLOWEDIPS=192.168.1.0/24
    volumes:
      - /data/torrent/wireguard/config:/config
      - /lib/modules:/lib/modules
    ports:
      - 192.168.1.111:8122:8122  # Deluge webui
      - 192.168.1.111:9127:9127  # jackett webui
      - 192.168.1.111:9666:9666  # prowlarr webui
      - 51820:51820/udp           # wireguard
      - 192.168.1.111:58426:58426  # Deluge RPC
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
      - net.ipv6.conf.all.disable_ipv6=1
      - net.ipv6.conf.default.disable_ipv6=1
    restart: unless-stopped

Can reach the webuis from LAN, no other network configuration was necessary. 192.168.1.111 is the server’s LAN address. The other services are configured very similar to your qbittorrent, and don’t expose any ports. Can’t promise it’s 100% correct but it’s working for me.

ffhein@lemmy.world · 1 year ago

Add “site:reddit.com” to your google query.

ffhein@lemmy.world · 1 year ago

Sad thing is that search engines have got so bad, and usually return so much garbage blog spam that searching directly on reddit is more likely to give useful results. I hope a similar amount of knowledge will build up on Lemmy over time.

ffhein@lemmy.world · 1 year ago

Assuming they already own a PC, if someone buys two 3090 for it they’ll probably also have to upgrade their PSU so that might be worth including in the budget. But it’s definitely a relatively low cost way to get more VRAM, there are people who run 3 or 4 RTX3090 too.

ffhein@lemmy.world · edit-2 1 year ago

For LLMs it entirely depends on what size models you want to use and how fast you want it to run. Since there’s diminishing returns to increasing model sizes, i.e. a 14B model isn’t twice as good as a 7B model, the best bang for the buck will be achieved with the smallest model you think has acceptable quality. And if you think generation speeds of around 1 token/second are acceptable, you’ll probably get more value for money using partial offloading.

If your answer is “I don’t know what models I want to run” then a second-hand RTX3090 is probably your best bet. If you want to run larger models, building a rig with multiple (used) RTX3090 is probably still the cheapest way to do it.

ffhein@lemmy.world · 1 year ago

Is max tokens different from context size?

Might be worth keeping in mind that the generated tokens go into the context, so if you set it to 1k with 4k context you only get 3k left for character card and chat history. I think i usually have it set to 400 tokens or something, and use TGW’s continue button in case a long response gets cut off

ffhein@lemmy.world · 1 year ago

llama.cpp uses the gpu if you compile it with gpu support and you tell it to use the gpu…

Never used koboldcpp, so I don’t know why it would it would give you shorter responses if both the model and the prompt are the same (also assuming you’ve generated multiple times, and it’s always the same). If you don’t want to use discord to visit the official koboldcpp server, you might get more answers from a more llm-focused community such as !localllama@sh.itjust.works

ffhein@lemmy.world · 1 year ago

I use https://www.criticker.com/ for movies because it has a really nice recommendation algorithm based on your personal scores. They also have a section for rating games but I haven’t tried that part

ffhein@lemmy.world · 1 year ago

A static website and Immich

ffhein@lemmy.world · 2 years ago

There are tons of options for running LLMs locally nowadays, though none come close to GPT4 or Claude 2 etc. One place to start is /c/localllama@sh.itjust.works

ffhein@lemmy.world · 2 years ago

Static html+css page generated with this: https://github.com/maximtrp/tab

ffhein@lemmy.world · 2 years ago

Do you mean that you want to build the docker image on one computer, export it to a different computer where it’s going to run, and there shouldn’t be any traces of the build process on the first computer? Perhaps it’s possible with the –output option… Otherwise you could write a small script which combines the commands for docker build, export to file, delete local image, and clean up the system.

ffhein@lemmy.world · 2 years ago

Are you exposing any ports on your home server?

ffhein

Are you exposing any ports on your home server?

Are you exposing any ports on your home server?