BraveSirZaphod

BraveSirZaphod@kbin.social · 7 months ago

If something is possible, and this simply indeed is, someone is going to develop it regardless of how we feel about it, so it’s important for non-malicious actors to make people aware of the potential negative impacts so we can start to develop ways to handle them before actively malicious actors start deploying it.

Critical businesses and governments need to know that identity verification via video and voice is much less trustworthy than it used to be, and so if you’re currently doing that, you need to mitigate these risks. There are tools, namely public-private key cryptography, that can be used to verify identity in a much tighter way, and we’re probably going to need to start implementing them in more places.

BraveSirZaphod@kbin.social · 7 months ago

That is not at all what right to work means.

I get the frustration, but if you’re going to criticize a thing, it’s a lot more effective if you actually know what the thing is.

BraveSirZaphod@kbin.social · 8 months ago

Meta will probably be pretty cautious and strict about what inbound content is allowed, since they have a global quagmire of laws and regulations to comply with and cannot just open up the firehose without significant legal risk. I’d imagine they’d only accept content from vetted instances that agree to some amount of common policy.

BraveSirZaphod@kbin.social · 8 months ago

In which case you essentially return to the status quo right now, where the Fediverse is a small group of somewhat-ideological tech enthusiasts.

BraveSirZaphod@kbin.social · 8 months ago

To compare forced labor camps where the alternative is being murdered to people making the active choice to volunteer to serve as moderators is a comparison so lacking in perspective that I’d expect to only find it on Reddit, but I guess Lemmy has managed to foster the same kind of behavior.

Are you going to compare Reddit killing the API to the Holocaust next?

BraveSirZaphod@kbin.social · 10 months ago

The key element here is that an LLM does not actually have access to its training data, and at least as of now, I’m skeptical that it’s technologically feasible to search through the entire training corpus, which is an absolutely enormous amount of data, for every query, in order to determine potential copyright violations, especially when you don’t know exactly which portions of the response you need to use in your search. Even then, that only catches verbatim (or near verbatim) violations, and plenty of copyright questions are a lot fuzzier.

For instance, say you tell GPT to generate a fan fiction story involving a romance between Draco Malfoy and Harry Potter. This would unquestionably violate JK Rowling’s copyright on the characters if you published the output for commercial gain, but you might be okay if you just plop it on a fan fic site for free. You’re unquestionably okay if you never publish it at all and just keep it to yourself (well, a lawyer might still argue that this harms JK Rowling by damaging her profit if she were to publish a Malfoy-Harry romance, since people can just generate their own instead of buying hers, but that’s a messier question). But, it’s also possible that, in the process of generating this story, GPT might unwittingly directly copy chunks of renowned fan fiction masterpiece My Immortal. Should GPT allow this, or would the copyright-management AI strike it? Legally, it’s something of a murky question.

For yet another angle, there is of course a whole host of public domain text out there. GPT probably knows the text of the Lord’s Prayer, for instance, and so even though that output would perfectly match some training material, it’s legally perfectly okay. So, a copyright police AI would need to know the copyright status of all its training material, which is not something you can super easily determine by just ingesting the broad internet.

BraveSirZaphod@kbin.social · 10 months ago

AI haters are not applying the same standards to humans that they do to generative AI

I don’t think it should go unquestioned that the same standards should apply. No human is able to look at billions of creative works and then create a million new works in an hour. There’s a meaningfully different level of scale here, and so it’s not necessarily obvious that the same standards should apply.

If it’s spitting out sentences that are direct quotes from an article someone wrote before and doesn’t disclose the source then yeah that is an issue.

A fundamental issue is that LLMs simply cannot do this. They can query a webpage, find a relevant chunk, and spit that back at you with a citation, but it is simply impossible for them to actually generate a response to a query, realize that they’ve generated a meaningful amount of copyrighted material, and disclose its source, because it literally does not know its source. This is not a fixable issue unless the fundamental approach to these models changes.

BraveSirZaphod@kbin.social · 10 months ago

There is literally no resemblance between the training works and the model.

This is way too strong a statement when some LLMs can spit out copyrighted works verbatim.

https://www.404media.co/google-researchers-attack-convinces-chatgpt-to-reveal-its-training-data/

A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever.

Often, that “random content” is long passages of text scraped directly from the internet. I was able to find verbatim passages the researchers published from ChatGPT on the open internet: Notably, even the number of times it repeats the word “book” shows up in a Google Books search for a children’s book of math problems. Some of the specific content published by these researchers is scraped directly from CNN, Goodreads, WordPress blogs, on fandom wikis, and which contain verbatim passages from Terms of Service agreements, Stack Overflow source code, copyrighted legal disclaimers, Wikipedia pages, a casino wholesaling website, news blogs, and random internet comments.

Beyond that, copyright law was designed under the circumstances where creative works are only ever produced by humans, with all the inherent limitations of time, scale, and ability that come with that. Those circumstances have now fundamentally changed, and while I won’t be so bold as to pretend to know what the ideal legal framework is going forward, I think it’s also a much bolder statement than people think to say that fair use as currently applied to humans should apply equally to AI and that this should be accepted without question.

BraveSirZaphod@kbin.social · 11 months ago

Info that is publically broadcast, that technically must be publically broadcast, that isn’t necessarily personally identifiable, and is only linked to a user-chosen pseudonym probably isn’t going to be found to have much of a right to privacy.

BraveSirZaphod@kbin.social · 11 months ago

So you have evidence of bribes?

That’s cool. Please share with the class.

BraveSirZaphod@kbin.social · 11 months ago

If this is the level of maturity that’s going to represent the Fediverse, I’m almost inclined to believe they actually do have pure intentions, because there’s no way this shit is financially valuable.

BraveSirZaphod@kbin.social · 11 months ago

Are you truly incapable of imagining that someone might have a different opinion than you without being bribed?

“Everyone who disagrees with me must be getting paid” is not the mature take you think it is.

BraveSirZaphod@kbin.social · 11 months ago

This perspective of “Either you agree with me or you’re complicit in a conspiracy against me” is incredibly childish and immature.

Sometimes people have different opinions than you. Try to find a way to deal with it.

BraveSirZaphod@kbin.social · 1 year ago

I don’t think the government views video game mod hosts as so fundamental to a healthy society that they require strong limitations on their own freedom of speech, but you’re welcome to call up your representative and start a campaign for the ability to force Nexus to host Nazis if it’s truly important to you.

BraveSirZaphod@kbin.social · 1 year ago

Forcing people to host speech they don’t want to is far more draconian than not doing so.

You’d probably be more than a little annoyed if I put a swastika sign on your front yard and then told you that you were infringing on my right to free speech when you went to go remove it.

BraveSirZaphod@kbin.social · 1 year ago

If you’re looking for games that have nothing that might make you uncomfortable, those games do exist, but Baldur’s Gate is not one of them.

For a lot of people, directly tackling elements of life that are uncomfortable or actively unpleasant is what can make a game, movie, or whatever else high quality art. Schindler’s List is explicitly about one of the most horrendous chapters in all of human history, and it’s also one of the greatest movies ever made. Being uncomfortable isn’t necessarily a bad thing.

BraveSirZaphod@kbin.social · 1 year ago

This is not the Baldur’s Gat devs dashing into the mod maker’s house and holding him at gunpoint until he deletes the mod. I’d agree, that would be inappropriate.

What this is instead is the people running Nexus deciding that they don’t want to be associated with this kind of content and that they are not willing to host it. If you owned a bar and it started being frequented by neo-Nazis, you’d be perfectly within your rights to kick them out, because you’re a private business owner and can conduct it however you like within the bounds of the law.

Your position isn’t the “live and let live” idea you think it is, because what you’re in effect claiming is that the people behind Nexus should be forced to host content that they find extremely morally objectionable.

BraveSirZaphod@kbin.social · 1 year ago

Some context is that this is Spotify’s first profitable quarter in quite a while. Also, there are 11 million artists on Spotify. I won’t pretend to have any data on listening distribution, but even naively and stupidly going with a uniform split, that’s of course $5 per artist if you eliminated Spotify’s profit entirely. In reality, most of those will have next to no listeners, and the vast majority of streams are going to the top several thousand.

The deeper question to ask is where all the streaming revenue is actually going, and the answer to that isn’t to line Spotify’s pockets; it’s to the labels.

BraveSirZaphod@kbin.social · 1 year ago

This is the kind of thing where moderators need to put in a lot of active work to enforce some level of content and behavior standards or it’ll simply collapse to the basic state of human laziness like most online communities.

There’s not exactly anything wrong with that - it’s perfectly normal - but people will always default to doing this kind of thing unless there’s active effort to prevent it, and I haven’t really seen any Fediverse communities interested in doing that work yet (which I wouldn’t blame them for; it’s nontrivial)

BraveSirZaphod@kbin.social · 1 year ago

It’s quite obvious that most people commenting here didn’t read the post, given that it says 90% of creators already have all ad types enabled for pre and post video already, and that it directly leads to greater payouts to them.