nerdculture.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
Be excellent to each other, live humanism, no nazis, no hate speech. Not only for nerds, but the domain is somewhat cool. ;) No bots in general. Languages: DE, EN, FR, NL, ES, IT

Administered by:

Server stats:

1.1K
active users

#gemini

48 posts41 participants8 posts today

"To test this out, the Carnegie Mellon researchers instructed artificial intelligence models from Google, OpenAI, Anthropic, and Meta to complete tasks a real employee might carry out in fields such as finance, administration, and software engineering. In one, the AI had to navigate through several files to analyze a coffee shop chain's databases. In another, it was asked to collect feedback on a 36-year-old engineer and write a performance review. Some tasks challenged the models' visual capabilities: One required the models to watch video tours of prospective new office spaces and pick the one with the best health facilities.

The results weren't great: The top-performing model, Anthropic's Claude 3.5 Sonnet, finished a little less than one-quarter of all tasks. The rest, including Google's Gemini 2.0 Flash and the one that powers ChatGPT, completed about 10% of the assignments. There wasn't a single category in which the AI agents accomplished the majority of the tasks, says Graham Neubig, a computer science professor at CMU and one of the study's authors. The findings, along with other emerging research about AI agents, complicate the idea that an AI agent workforce is just around the corner — there's a lot of work they simply aren't good at. But the research does offer a glimpse into the specific ways AI agents could revolutionize the workplace."

tech.yahoo.com/ai/articles/nex

Yahoo Tech · Carnegie Mellon staffed a fake company with AI agents. It was a total disaster.By Shubham Agarwal

"For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the first glimpses of this back in early 2023, with Perplexity (first launched December 2022, first prompt leak in January 2023) and then the GPT-4 powered Microsoft Bing (which launched/cratered spectacularly in February 2023). Since then a whole bunch of people have taken a swing at this problem, most notably Google Gemini and ChatGPT Search.

Those 2023-era versions were promising but very disappointing. They had a strong tendency to hallucinate details that weren’t present in the search results, to the point that you couldn’t trust anything they told you.

In this first half of 2025 I think these systems have finally crossed the line into being genuinely useful."

simonwillison.net/2025/Apr/21/

Simon Willison’s WeblogAI assisted search-based research actually works nowFor the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …
Continued thread

As I said it’s snappy. That’s the internet we had decades ago. The connection speed improved massively but now it’s full of heavy crap that would kill this machine (if it even loaded).

Note that’s on an unsupported 1.5Ghz G4 Mini so pretty much one of the fastest possible OS 9 machines.

I’d love to test this on my clamshell G3 iBook but it’s in a box on another continent …

Strangely (or not) there's a #Gemini client for the old Amiga (AmiGemini) but not for Mac OS Classic (?)

Thankfully the #smolnet proxy at portal.mozz.us also works with insecure http. So here I am on the GeminiSpace from a Mac Mini running Mac OS 9 in Internet Explorer 5. I tried Netscape but there's an issue with new lines on Antenna and other capsules.

Using custom fonts & colors it's visually pleasant on 1280x1024 screen and quite snappy.

Just a reminder that #Google knows everything about you at all times.
I just asked #Gemini about something general, and in its reply it tried to be extra helpful so it included something specific to my location. I didn't provide it with that information.
I really should get back on track with my all too long and arduous de-googling process. Thanks for reminding me, Google.