order @order

0 posts0 participants0 posts today

**Tobias Zeumer** @vform@openbiblio.social · Apr 6, 2024

Apr 6, 2024

Tobias Zeumer @vform@openbiblio.social

Google Books reportedly indexing bad AI-written works https://www.theverge.com/2024/4/5/24122077/google-books-ai-indexing-ngram

The Verge · Apr 5, 2024Google Books reportedly indexing bad AI-written worksBy Emilia David

#google #googleBooks #ngram

**Karsten Schmidt** @toxi@mastodon.thi.ng · Nov 3, 2023 *

Nov 3, 2023 *

Karsten Schmidt @toxi@mastodon.thi.ng

#HowToThing #030 — Procedural, rule-based & stochastic text generation using a custom DSL, parse grammar (via https://thi.ng/parse) and abstract syntax tree transformation (via https://thi.ng/defmulti).

Since it's #NaNoWriMo & #NaNoGenMo [1], I'm closing out this first season of 30 #HowToThing's with a related topic & maybe someone even finds it useful/interesting...

This example is in principle inspired by @galaxykate's oldie & goodie #Tracery, but is using a super simple custom text format instead of JSON to define variables and template text. Variables are expanded recursively and I've also added features like dynamic, indirect pointer-like variable lookups to derive variables based on current values (useful for conditionals & context-specific expansions), hidden assignments, chainable modifiers... I've included 5 different "story" templates (incl. comments) showing various features. Just press "regenerate" to create new random variations...

Similar to the previous #HowToThing, I'm hoping this example also shows that approaching use cases like this via small domain-specific languages with proper grammar rules, does not require much ceremony and is often more amenable to change during prototyping (and later also more maintainable!) than just regex bashing approaches...

The parser grammar itself is explained in the https://thi.ng/parse readme. As usual, the grammar was created/prototyped with the Parser Playground[2], which we developed from scratch during the first thi.ng livestream[3] (2.5h video)...

Demo (example project #145):
https://demo.thi.ng/umbrella/procedural-text/

Source code:
https://github.com/thi-ng/umbrella/tree/develop/examples/procedural-text/src

If you have any questions about this topic or the packages used here, please reply in thread or use the discussion forum (or issue tracker):

https://github.com/thi-ng/umbrella/discussions

[1] https://github.com/NaNoGenMo/2023/
[2] https://demo.thi.ng/umbrella/parse-playground/
[3] https://www.youtube.com/watch?v=mXp92s_VP40

GIF

Screenshot of the TypeScript source code for the DSL parser & text generation parts of the linked project

Screenshot of the TypeScript source code for the main UI/editor parts of the linked project

#ThingUmbrella #NaNoWriMo2023 #NaNoGenMo2023

Continued thread

**Nick Byrd, Ph.D.** @ByrdNick · Jun 26, 2023 *

Jun 26, 2023 *

Nick Byrd, Ph.D. @ByrdNick

Catching up on the #SPP2023 #preconference on #memory:

Felipe De Brigaard introduced us to the topic and some recent trends before a series of talks ensued.

Find Felipe's work on gScholar: https://scholar.google.com/citations?user=l9gS2joAAAAJ&hl=en&oi=ao

Felipe De Brigard on the rise in interest in memory among philosophers.

#philosophy #PhilMind #PhilosophyOfMemory

**Thankful Pond** @thankfulpond@mstdn.social · Jun 11, 2023

Jun 11, 2023

Thankful Pond @thankfulpond@mstdn.social

Google #ngram Viewer seems like a great tool for #writers who need to research what people were calling things during a specific era or span of years. https://books.google.com/ngrams/

books.google.comGoogle Books Ngram ViewerGoogle Books Ngram Viewer

**Harald Sack** @lysander07@sigmoid.social · May 12, 2023

May 12, 2023

Harald Sack @lysander07@sigmoid.social

One of the basic questions we tackle when working towards statistical language models is "Can we predict a word?"
This was also one of the intro questions to the students last Wednesday in our #ise2023 lecture no.4, when we were introducing simple n-gram language models.

#nlp #lecture #ngram #languagemodels #language #aiart #stablediffusion #creativeAI @fizise @KIT_Karlsruhe @nfdi4ds @nfdi4culture

Stable Diffusions imagination for the following prompt: "Create a historical photograph of a language model, which generates probabilities by training on text corpora in one or many languages...". The image is part of the slides for the lecture Information Service Engineering 2023 at KIT Karlsruhe. Text: Can we predict a word?

**John Moore** @jdmyeepa@zirk.us · Feb 4, 2023

Feb 4, 2023

John Moore @jdmyeepa@zirk.us

A fun thing to do is enter your name in the amazing Ngram Viewer. Mine looks like this. The peak is explained, I presume, by the misfortunes of a gentleman who ran into a spot of bother in Spain. My other namesakes have failed in much less heroic ways. Don’t like the look of the graph, though. Seems we’re dying off as fast as butterflies in the UK
(Google Ngram charts word frequencies from a large corpus of books that were printed between 1500 and 2019) #history #language #Ngram

a graph showing the frequency of use of the name John Moore from 1800 to 2019. The graph climbs steeply to year 1809, then there's a sudden drop and apart from a small increase in the 1830's the line falls steadily until reaching almost zero today

Continued thread

**Elan Hasson** @elan@publicsquare.global · Jan 21, 2023

Jan 21, 2023

Elan Hasson @elan@publicsquare.global

In the next episode we'll be building out our Hashtag grain in #MicrosoftOrleans.

It'll be responsible for taking in raw input and breaking it apart into many ngram combinations and then returning possibile solutions, ranked by some metrics associated with the #Google #ngram #dataset.

#BigData #CamelCaseAllTheHashtags #FediMod

**Elan Hasson** @elan@publicsquare.global · Jan 18, 2023

Jan 18, 2023

Elan Hasson @elan@publicsquare.global

Importing the #Google #ngram #Data set into #PostgresSQL.

I'm almost done with the bi-grams.

I've got about ~900GB more to import, then it's on to the tri-grams.

This is an entire, unfiltered set, that I'm going to backup first and put in cold storage.

Then I'm going to filter out rows that have characters that aren't allowed in #HashTags. This is the dataset that will power #FediMod's hashtag #accessibility service.

**Elan Hasson** @elan@publicsquare.global · Jan 16, 2023

Jan 16, 2023

Elan Hasson @elan@publicsquare.global

Downloading #Google #ngram set and importing to #postgres

fun way to learn postgres

Continued thread

**EditorMark** @EditorMark@mstdn.social · Jan 6, 2023 *

Jan 6, 2023 *

EditorMark @EditorMark@mstdn.social

The Google corpus of edited text shows a big pre-COVID spike in one word starting in 2012. But by 2019, hand washing and handwashing were equally likely.

https://books.google.com/ngrams/graph?content=hand+washing%2Chand-washing%2Chandwashing&year_start=1800&year_end=2019&corpus=26&smoothing=3

books.google.comGoogle Books Ngram ViewerGoogle Books Ngram Viewer

#words #handwashing #ngram

**mostaleoht** @mostaleoht@zirk.us · Nov 25, 2022

Nov 25, 2022

mostaleoht @mostaleoht@zirk.us

My son told me today that he loves the word "peckish", so we talked about #word frequency, and tried to find a common (non-specialized) word that is is less frequent than "peckish". Mostly failed.

But what draw my attention is how all of the words we thought of raised in popularity after 2000, on Google #ngram. Why? What happened in 1960-1980 that drove them down? Or is GN corpus skewed (towards patents and academic papers?) during this era?

Several curves on a graph, all increasing on the right side, after about year 2000

#language #linguistics

Recent searches

Search options

Administered by:

Server stats:

#ngram

Recent searches

Search options

Administered by:

Server stats:

nGram

#ngram