nerdculture.de is one of the many independent Mastodon servers you can use to participate in the fediverse.
Be excellent to each other, live humanism, no nazis, no hate speech. Not only for nerds, but the domain is somewhat cool. ;) No bots in general. Languages: DE, EN, FR, NL, ES, IT

Administered by:

Server stats:

1.2K
active users

#deepseek

30 posts28 participants6 posts today
Continued thread

So, apparently they are somewhat alike, but not the same.
Trying to fill the 6 numbers into a 2x3 and 3x2 matrix and using matrix multiplication on them doesn't yield the result FontLab does.
I thought this might be a nice test for an #AI, so I fed the question into #DeepSeek. It kept “thinking” until my computer’s fans got loud, then I stopped it without a getting a result.

🤖🇨🇳 #DeepSeek спільно з Університетом Цінхуа працює над новим підходом навчання моделей ШІ, який має знизити витрати на цей процес.

Згідно з новим підходом, ШІ буде самостійно закріплювати набуті знання. Цей метод покликаний допомогти моделям краще відповідати на людські уподобання.

Зазначена стратегія перевершила існуючі методи й моделі в різних тестах, а результат показав кращу продуктивність з меншими обчислювальними ресурсами.

I find it fascinating and disturbing that the reasoning behaviors which make #AI models like #o3 and #deepseek so powerful -- like thinking ahead, checking their work, and backtracking -- are purely the result of how the models are *trained*.

Nothing to do with their "neural architecture" (other than brute number of parameters) ... !

Surely the network structure can somehow be optimized such that these behaviors are more robustly converged on.

arxiv.org/pdf/2503.01307

Interesting pre-paper on Inference-Time Scaling for Generalist Reward Modeling, the basis for the soon to be released new DeepSeek models I think.
#ai #deepseek #llm #grm #SPCT
arxiv.org/abs/2504.02495

arXiv logo
arXiv.orgInference-Time Scaling for Generalist Reward ModelingReinforcement learning (RL) has been widely adopted in post-training for large language models (LLMs) at scale. Recently, the incentivization of reasoning capabilities in LLMs from RL indicates that $\textit{proper learning methods could enable effective inference-time scalability}$. A key challenge of RL is to obtain accurate reward signals for LLMs in various domains beyond verifiable questions or artificial rules. In this work, we investigate how to improve reward modeling (RM) with more inference compute for general queries, i.e. the $\textbf{inference-time scalability of generalist RM}$, and further, how to improve the effectiveness of performance-compute scaling with proper learning methods. For the RM approach, we adopt pointwise generative reward modeling (GRM) to enable flexibility for different input types and potential for inference-time scaling. For the learning method, we propose Self-Principled Critique Tuning (SPCT) to foster scalable reward generation behaviors in GRMs through online RL, to generate principles adaptively and critiques accurately, resulting in $\textbf{DeepSeek-GRM}$ models. Furthermore, for effective inference-time scaling, we use parallel sampling to expand compute usage, and introduce a meta RM to guide voting process for better scaling performance. Empirically, we show that SPCT significantly improves the quality and scalability of GRMs, outperforming existing methods and models in various RM benchmarks without severe biases, and could achieve better performance compared to training-time scaling. DeepSeek-GRM still meets challenges in some tasks, which we believe can be addressed by future efforts in generalist reward systems. The models will be released and open-sourced.
Replied in thread

@fradie_new gefährlich ist an der Stelle eher das Wissen das Google über uns hat.. zusammen mit der Integration auf Endgeräten in unseren Taschen. Allerdings wird die LLM Geschichte teuer für Google werden... wenn sie kein Kartell hinbekommen. Sie müssen ihre Suche komplett ersetzen und es kostenlos zur verfügung stellen.

#Google#Gemini2#AI

DeepSeek has unveiled a new method for improving the reasoning capabilities of large language models in collaboration with Tsinghua University.

The technique combines generative reward modeling and self-principled critique tuning to guide AI models towards human preferences, resulting in better and faster results.

scmp.com/tech/tech-trends/arti

South China Morning Post · DeepSeek unveils new AI reasoning method amid anticipation for R2 modelIn collaboration with Tsinghua University, DeepSeek developed a technique combining reasoning methods to guide AI models towards human preferences.
#ai#technology#llm