norden.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Moin! Dies ist die Mastodon-Instanz für Nordlichter, Schnacker und alles dazwischen. Folge dem Leuchtturm.

Administered by:

Server stats:

3.4K
active users

#deepresearch

1 post1 participant0 posts today

"For the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the first glimpses of this back in early 2023, with Perplexity (first launched December 2022, first prompt leak in January 2023) and then the GPT-4 powered Microsoft Bing (which launched/cratered spectacularly in February 2023). Since then a whole bunch of people have taken a swing at this problem, most notably Google Gemini and ChatGPT Search.

Those 2023-era versions were promising but very disappointing. They had a strong tendency to hallucinate details that weren’t present in the search results, to the point that you couldn’t trust anything they told you.

In this first half of 2025 I think these systems have finally crossed the line into being genuinely useful."

simonwillison.net/2025/Apr/21/

Simon Willison’s WeblogAI assisted search-based research actually works nowFor the past two and a half years the feature I’ve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the …

I'm looking a various AI deep research tools and I'm finding that what is just as valuable or sometimes far more valuable than the actual report it produces are tools like Genspark that allow you to see how they reasoned and what articles they "read" in producing their final report. These reasoning breadcrumbs are great for exploratory search and lateral thinking.

"DeepSeek-R1 Thoughtology:
Let’s <think> about LLM reasoning"

Interesting, very long paper about how reasoning works in DeepSeek-R1. One finding was that enhanced reasoning creates dual-use risk - better capabilities but worse safety. Future models might move away from the single chain of reasoning in this model to diverse reasoning strategies to enhance problem-solving flexibility.

mcgill-nlp.github.io/thoughtol

McGill NLPDeepSeek-R1 Thoughtology: Let’s think about LLM reasoningLarge Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an answer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly “thinking” about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1’s basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-à-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a ‘sweet spot’ of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. We also note strong safety vulnerabilities of DeepSeek-R1 compared to its non-reasoning counterpart, which can also compromise safety-aligned LLMs.
Continued thread

Prediction for Q2 2026: next Nobel prize awarded for realizing you may as well stop treating report generation as the core aspect of "deep research" (as it obviously makes no sense, but hey, time traveler's spoilers, sorry!), and stop at the "stuff search results into a database" and let users "chat with the search results", making the research an interactive process.

We'll call this huge scientific breakthrough "DeepRAG With Human Feedback", or "DRHF".

On #DeepResearch - some predictions. Context: news.ycombinator.com/item?id=4

Prediction for Next Hot Thing in Q4 2025 / Q1 2026: someone will make the Nobel prize-worthy discovery that you can stuff results of your deep search into a database (vector or otherwise) and then use it to improve the ability to compile a higher-quality report from much larger amount of sources.

We'll call it #DeepRAG or Retrieval Augmented Deep Research or something.

news.ycombinator.comdeep search is the new RAG | Hacker News