

First of all, he should drop Python for anything resource intensive as such a simulation. And then think about how to optimize the algorithm.
First of all, he should drop Python for anything resource intensive as such a simulation. And then think about how to optimize the algorithm.
The compressing and renumbering seems to be more common with embedded Chinese fonts - Space-wise it makes a lot of sense. But yes, mark and copy text, paste it into word or writer, and you get gibberish. Can’t verify the search, though. And, of course, Google translate can’t do anything with it, either.
If you ever need to edit a PDF that way, just use Inkscape. It is way better than LO draw for that.
It is not a curse. It does exactly what it is intended to do: Create an archive of a document that is universally reproduceable.
It is a very well designed cul-de-sac for exactly this purpose. Using it for anything else is calling for trouble.
The problem lies in the PDFs themselves. In there are objects that represent lines of glyphs. If you are lucky. A conversion tool can guess which of those lines belong together and produce the text.
It cannot know any intentions behind it, though. Take a numbered list. The first line is two line objects: the number plus the . or the ), and the first line of text. The conversion tool can now guess. As the line blocks with the numbers are all left of the line blocks with text, this could be a numbered list. Or it could be a table with two columns. Nothing in the PDF is giving any hints.
And that is the easy part. This assumes that the document either uses default fonts, or keeps its embedded fonts untouched. If they use embedded fonts and a PDF optimizer that only embeds the used characters and renumbers them, any copy or conversion tool is bound to fail.
Same with protected PDFs where you simply cannot copy the text from the start.
And then there are PDFs that just consist of scanned pages. Here you would need an OCR software to get something readable out of them.
PDF is an archival, output format, the end of a process. Not something to work from.
Always preserve the original file. Keep it safe. If you change tools, make sure you have a conversion path into something editable. The PDF is for giving away, nothing else.
Have you ever seen how long it takes for a tree to grow?
Just a normal US politician. They lost all connection to reality ages ago. Take his ID papers away and let him survive on minimum wage jobs somewhere where nobody knows his sorry face.
Move away from port 22, and 90% vanishes. Move it up to a port in the five digit range, and you will rarely see them.
Those Dell fans were never built to be quiet. And they are also not built to be replaced by any quiet fans.
While yes, there is a reason why I have retired the Dell server I had for a normal desktop PC. The server was so loud, I could hear it two stairs and two closed doors away.
My largest file transfer I have done via USB disk. You simply don’t transfer multiple terabytes over the net.
I use my former PC as the home server. It is probably 10+ years old, has no M2 slot or something, but an SSD for the OS. More than big and fast enough for all my needs: File service (Samba), Web service (apache2), Wiki service (mediawiki), Database (MySQL), Calendar service (Radicale), Project service (Subversion), and probably some others I forgot. All of it running on Ubuntu Server, aministrated by WebMin.
The only investment I did when I turned this into a server was that I put 2x8TB in it as a RAID for bulk storage - I dump the family PCs backups on that machine, too.
No docker. Plain executable.
You need this for your family, and not hundreds of people? No crazy, outlandish usage requirements?
Then basically any PC will do.
I do regularly have issues with radicale, for years now. One is that it does not work properly after boot. I have to SSH in, kill the radicale process, and restart it.
What the heck are you self-hosting that anything beyond 64G is even taken into account?
Well, performance-wise, you cannot beat Intel and AMD at the moment. Then there is ARM, which is strong, especially if power consumption is an issue. And it is closing the gap to the top. Bonus: ARM has a range from simple M0 cores to GHz multiprocessor chips. Where is RISC-V on that scale?
Compare ARM performance: https://browser.geekbench.com/search?utf8=✓&q=ARM with RISC-V performance: https://browser.geekbench.com/search?q=RISC-V and you’ll see that RISC-V has a looooong way to go before it can be considered relevant.
RISC-V has it’s place, no question, but don’t expect servers or workstations anytime soon. At least outside China and Russia…
FTFY: by referring to the single source of truth opinion.
My home server runs on an old desktop PC, bought at a discounter. But as we have bought several identical ones, we have both parts to upgrade them (RAM!) as well as organ donors for everything else.
This depends on what you are actually looking for, and how you are looking for it.
Do you really need pattern matching, or do you only look for fixed strings? Then other tools may be faster.
If you need case independent search on an upper- and lowercase data set, make a copy that is all upper or all lower, and search there.
If you only search in certain columns, make a copy that only includes these.
Or import the data into a database.