• 0 Posts
  • 59 Comments
Joined 1 year ago
cake
Cake day: June 1st, 2023

help-circle



  • Never ask a man his pay, a woman her weight, or a data horder the contents of their stash.

    Jk. Mostly.

    I have a similar-ish set up to @Davel23 , I have a couple of cool use cases.

    • I seed the last 5 arch and opensuse (a few different flavors) ISOs at all times

    • I run an ArchiveBot for archive.org

    • I scan nontrivial mail (the paper kind) and store it in docspell for later OCR searches, tax purposes etc.

    • I help keep Sci-Hub healthy

    • I host several services for de-googling, including Nextcloud, Blocky, Immich, and Searxng

    • I run Navidrome, that has mostly (and hopefully will soon completely) replace Spotify for my family.

    • I run Plex (hoping to move to Jellyfin sometime, but there’s inertial resistance to that) that has completely replaced Disney streaming, Netflix streaming, etc for me and my extended family.

    • I host backups for my family and close friends with an S3 and WebDAV backup target

    I run 4x14TB, 2x8TB, 2x4TB, all from serverpartsdeals, in a ZFS RAID10 with two 1TB cache dives, so half of the spinning rust usable at ~35TB, and right now I’m at 62% utilization. I usually expand at about 85%


  • My favorite city builder in decades. A few notes.

    Pros:

    • Easy mode is relaxing and quite easy.
    • Medium mode is a fun challenge at first, eventually becoming fairly chill as you advance in skill and confidence.
    • Hard mode is always fairly hard, especially on harder maps.
    • There are many resources to manage, but none that feel burdensome.
    • The game is extremely thematic, it feels alive with charm.
    • Graphics are excellent, though sometimes graphical glitches can still be encountered.
    • The water. It’s so hard to explain to someone who hasn’t encountered this system before, but water is life in this game, and it’s both beautiful graphically, and extremely well simulated by physics. Learning to control the water, and see the shortest paths to end water scarcity with beaver engineering is an amazingly fun and unique aspect of the game.
    • Mods are well supported and the community is vibrant.

    Cons:

    • Not a ton of content. They’ve been very good about adding new mechanics (badwater, extract, etc) but there’s still just 2 races of beaver and a dozen or so maps.
    • No directed experience. In similar games I’ve enjoyed a campaign, challenge maps/scenarios, weekly challenges, a deeper progression system, just… Something to optionally set your goals. There’s nothing of the sort in the vanilla game. It’s fully open ended and there’s only one unlock outside of your progress though the resource tree in a map.

    All in all, I highly recommend it, especially at the modest asking price. If you love city builders, charming and beautiful art, thematic settings, dynamic challenge, and solution engineering, this is a fantastic game for you.

    Other games I’ve enjoyed that scratch similar itches:

    • KSP
    • Cities: Skylines (but Timberborn has been far more compelling)
    • Factorio
    • Mindustry
    • Planet Zoo (Timberborn has less of a directed experience, but is otherwise completely superior)
    • Gnomoria
    • Banished
    • Tropico series (though I view this as more casual)

    Get it and have fun is my recommendation.








  • Yeah, you should be scrubbing weekly or monthly, depending on how often you are using the data. Scrub basically touches each file and checks the checksums and fixes any errors it finds proactively. Basically preventative maintenance.
    https://manpages.ubuntu.com/manpages/jammy/man8/zpool-scrub.8.html

    Set that up in a cron job and check zpool status periodically.

    No dedup is good. LZ4 compression is good. RAM to disk ratio is generous.

    Check your disk’s sector size and vdev ashift. On modern multi-TB HDDs you generally have a block size of 4k and want ashift=12. This being set improperly can lead to massive write amplification which will hurt throughput.
    https://www.high-availability.com/docs/ZFS-Tuning-Guide/

    How about snapshots? Do you have a bunch of old ones? I highly recommend setting up a snapshot manager to prune snapshots to just a working set (monthly keep 1-2, weekly keep 4, daily keep 6 etc) https://github.com/jimsalterjrs/sanoid

    And to parrot another insightful comment, I also recommend checking the disk health with SMART tests. In ZFS as a drive begins to fail the pool will get much slower as it constantly repairs the errors.


  • ZFS is a very robust choice for a NAS. Many people, myself included, as well as hundreds of businesses across the globe, have used ZFS at scale for over a decade.

    Attack the problem. Check your system logs, htop, zpool status.

    When was the last time you ran a zpool scrub? Is there a scrub, or other zfs operation in progress? How many snapshots do you have? How much RAM vs disk space? Are you using ZFS deduplication? Compression?




  • Hard disagree on them being the same thing. LLMs are an entirely different beast from traditional machine learning models. The architecture and logic are worlds apart.

    Machine Learning models are "just"statistics. Powerful, yes. And with tons of useful applications, but really just statistics, generally using just 1 to 10 variables in useful models to predict a handful of other variables.

    LLMs are an entirely different thing, built using word vector matrices with hundreds or even thousands of variables, which are then fed into dozens or hundreds of layers of algorithms that each modify the matrix slightly, adding context and nudging the word vectors towards new outcomes.

    Think of it like this: a word is given a massive chain of numbers to represent both the word and the “thoughts” associated with it, like the subject, tense, location, etc. This let’s the model do math like: Budapest + Rome = Constantinople.

    The only thing they share in common is that the computer gives you new insights.


  • You’re talking about two very different technologies though, but both are confusingly called “AI” by overzealous marketing departments. The basic language recognition and regressive model algorithms they ship today are “Machine Learning”, and fairly simple machine learning at that. This is generally the kind of thing we’re running on simple CPUs in realtime, so long as the model is optimized and pre-trained. What we’re talking about here is a Large Language Model, a form of neural network, the kind of thing that generally brings datacenter GPUs to their knees and generally has hundreds of parameters being processed by tens of thousands of worker neurons in hundreds of sequential layers.

    It sounds like they’ve managed to simplify the network’s complexity and have done some tricks with caching while still keeping fair performance and accuracy. Not earth shaking, but a good trick.