nerdculture.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
Be excellent to each other, live humanism, no nazis, no hate speech. Not only for nerds, but the domain is somewhat cool. ;) No bots in general. Languages: DE, EN, FR, NL, ES, IT

Administered by:

Server stats:

1.1K
active users

#datapoisoning

0 posts0 participants0 posts today

Hi #Admins 👋,

Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

The quotes should make your work🙏 visible in a generally understandable way

¹ blog.campact.de/author/friedem

Campact BlogFriedemann EbeltFriedemann Ebelt engagiert sich für digitale Grundrechte. Im Campact-Blog schreibt er darüber, wie Digitalisierung fair, frei und nachhaltig gelingen kann. Er hat Ethnologie und Kommunikationswissenschaften studiert und interessiert sich für alles, was zwischen Politik, Technik, und Gesellschaft passiert. Sein vorläufiges Fazit: Wir müssen uns besser digitalisieren!

“We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy…”

#LLM #misinformation #datapoisoning
nature.com/articles/s41591-024

NatureMedical large language models are vulnerable to data-poisoning attacks - Nature MedicineLarge language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

"The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety."

nature.com/articles/s41591-024

NatureMedical large language models are vulnerable to data-poisoning attacks - Nature MedicineLarge language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

Safeguarding #OpenData: #Cybersecurity essentials and skills for #data providers by Publications Office of the #EuropeanUnion

This webinar provides an overview of the fundamentals of open data and the complexity in terms of cybersecurity.

youtube.com/watch?v=6kPiY_8hRw

#Nightshade is an offensive #DataPoisoning tool, a companion to a defensive style protection tool called #Glaze, which The Register covered in February last year.

Nightshade poisons #ImageFiles to give indigestion to models that ingest data without permission. It's intended to make those training image-oriented models respect content creators' wishes about the use of their work. #LLM #AI

How artists can poison their pics with deadly Nightshade to deter #AIScrapers
theregister.com/2024/01/20/nig

The Register · How artists can poison their pics with deadly Nightshade to deter AI scrapersBy Thomas Claburn
heise+ | Sicherheit: Schutz vor Data Poisoning und anderen Angriffen auf KI-Systeme

Fehlerhafte Daten können Machine-Learning-Systeme zu folgenreichen Irrtümern verleiten. Ein Praxisbeispiel zeigt, wie so etwas verhindert werden soll.
Sicherheit: Schutz vor Data Poisoning und anderen Angriffen auf KI-Systeme
heise onlineSicherheit: Schutz vor Data Poisoning und anderen Angriffen auf KI-SystemeBy Mirko Ross
Replied in thread

@mhoye The thought occurs: #chaffing / #DataPoisoning.

If we're going to live in a world in which every utterance and action is tracked, issue and utter as much as posssible.

Wire up a speech-aware-and-capable GPT-3 to your phone, have it handle telemarketers, scammers, and political calls. Simply to tie up their time.

Create positive-emotive socmed bots to #pumpUp your #socialcredit score.

Unleash bots on your political opposition's media channels. Have them call in to talk radio, and #ZoomBomb calls and conferences.

Create plausible deniability. Post selfies from a ddozen, or a thousand, places you're not.

Create #DigitalSmog to choke the #FAANG s.

Fight fire with fire.