YouTube Accelerates LLM Recommendation Validation by 948x with New STATIC Framework

YouTube and Google DeepMind have released a new framework called STATIC that accelerates recommendation validation in large language models (LLMs) by 948 times. The breakthrough solves a common problem where...

OpenAI Releases GPT-5.3 Instant with Improved Accuracy and Communication

OpenAI has launched GPT-5.3 Instant, a major update to its most widely used model, focusing on enhanced communication quality. The model now declines safe requests less often and avoids overly...

Google Releases Gemini 3.1 Flash-Lite: Ultra-Fast and Cost-Efficient AI Model

Google has introduced the Gemini 3.1 Flash-Lite, the fastest and most affordable model in the Gemini 3 series. Priced at just $0.25 per million input tokens and $1.50 per million...

Microsoft Research and Salesforce Reveal Dialogue Reduces LLM Reliability

Microsoft Research and Salesforce have highlighted a rarely discussed issue: dialogue significantly lowers the reliability of large language models (LLMs). Testing 15 top models, including GPT-4.1, Gemini 2.5 Pro, and...

Chinese Robotaxi Firms Suspend Dubai Services Amid Regional Tensions

Chinese autonomous driving firms Baidu’s Apollo Go and WeRide have halted robotaxi operations in Dubai following Iran’s missile strikes that heightened regional tensions. While WeRide continues services in Abu Dhabi...

Sakana AI Introduces Text-to-LoRA and Doc-to-LoRA for Faster LLM Customization

Sakana AI has unveiled two new research advancements, Text-to-LoRA and Doc-to-LoRA, which significantly simplify and speed up the customization of large language models (LLMs). These methods allow models to instantly...

OpenAI to Launch Smart Speaker with Camera in 2027

OpenAI plans to release a smart speaker with a built-in camera and facial recognition capabilities in February 2027. The device, priced between $200 and $300, will analyze the surroundings and...

OpenAI Freezes Stargate Project Amid Challenges

OpenAI has halted its ambitious Stargate project, initially planned in partnership with SoftBank and Oracle. The suspension is due to internal corporate disagreements, a shortage of engineering talent, and investor...

Microsoft Sovereign Cloud Adds Governance, Productivity, and Support for Large AI Models Securely Running Even When Completely Disconnected

Microsoft has expanded its Sovereign Cloud offerings to enhance governance, productivity, and support for large AI models that can run securely even when fully disconnected from the cloud. The updates...

Claude Opus 4.6 Linked to $1.8M Moonwell Exploit

A recent exploit in the DeFi lending protocol Moonwell led to a loss of $1.78 million due to a smart contract vulnerability. The issue arose from an incorrect price setting...

Anthropic Proposes Persona Selection Model to Explain AI Assistant Behavior

Anthropic’s alignment team has introduced the Persona Selection Model (PSM) to explain why AI assistants behave like distinct personalities rather than mere algorithms. The model suggests that during pretraining, language...

AI Plugins Become New Vector for Attacks

A recent surge in attacks via AI plugins has raised significant security concerns. Over 1,100 malicious skills were found on the OpenClaw marketplace, with one attacker uploading 677 packages disguised...

Anthropic Launches Claude Code Security

Anthropic has launched Claude Code Security, a new tool that scans codebases and suggests patches for detected security issues. Currently available in limited preview for Enterprise and Team clients, repository...

SkillsBench Research Shows Real Impact of Skills on LLM Agents

SkillsBench, a new benchmark and research project, tested the impact of Skills on Large Language Model (LLM) agents across 84 tasks in 11 domains with 7 model configurations including Claude,...

VulnLLM-R-7B: New Reasoning Model for Code Security

A new reasoning language model, VulnLLM-R-7B, has been released for code security, designed to detect vulnerabilities like a pentester. Unlike traditional models that search for suspicious patterns, VulnLLM-R-7B analyzes data...

Microsoft Says Bug Causes Copilot to Summarize Confidential Emails

Microsoft has acknowledged a bug in its Microsoft 365 Copilot AI assistant that caused it to summarize confidential emails since late January, bypassing data loss prevention (DLP) policies. The issue...

Google Releases Gemini 3.1 Pro with Advanced Abstract Reasoning

Google has officially launched Gemini 3.1 Pro, showcasing a significant leap in AI intelligence with a 77.1% score on the challenging ARC-AGI-2 abstract reasoning test—nearly double the previous version’s result....

Anthropic Measures AI Agent Autonomy in Real-World Use

Anthropic has released an analysis of millions of interactions with its AI agent, Claude Code, revealing how agent autonomy evolves in practical settings. The study shows autonomous task durations nearly...

Strand-Rust-Coder-14B: Specialized AI Model for Rust Code Generation

The new AI model Strand-Rust-Coder-14B is specifically trained to generate Rust code with the expertise of an experienced developer. Unlike general coding assistants, this model focuses on idiomatic Rust, safe...

LLM Accuracy Significantly Improves by Repeating Prompt Twice

A recent study has revealed that simply repeating the same prompt twice can dramatically boost the accuracy of large language models (LLMs). In one test involving searching for an element...

Context Graphs, One Month In

A month after publishing their perspective on context graphs, Ashu Garg and Jaya Gupta have seen the concept become a major topic in AI. Context graphs serve as institutional memory,...

Pentagon Negotiates with OpenAI and Anthropic to Deploy AI in Classified Networks

The Pentagon is in talks with OpenAI and Anthropic to deploy their AI models, including GPT and Claude, across classified military networks. The U.S. Department of Defense seeks to lift...

MiniMax Releases M2.5 Model

MiniMax has launched the M2.5 model, built on the MoE architecture with 10 billion active parameters out of 229 billion total. The model was trained in hundreds of thousands of...

Google Standardizes AI Agent Interaction with Websites via WebMCP

Google has introduced WebMCP, a protocol implementation designed to standardize how AI agents interact with websites. WebMCP provides a predictable interface that replaces fragile page parsing, enabling faster and more...

Anthropic Publishes Complete Guide to Building Skills for Claude

Anthropic has released a comprehensive guide titled The Complete Guide to Building Skills for Claude, designed to help developers, power users, and teams customize Claude AI workflows effectively. The guide...

Microsoft Reduces Dependence on OpenAI and Promises AGI by Year-End

Microsoft is aggressively pursuing self-sufficiency in AI development, aiming to reduce its reliance on OpenAI. Mustafa Suleyman, head of Microsoft AI, announced a massive investment of $140 billion in capital...

OpenAI Updates GPT-5.2 Instant with Clearer Responses

OpenAI has released an update for the GPT-5.2 Instant model, focusing on improving the quality of generated responses. The update makes answers more balanced and places the most important information...

OpenAI Releases GPT-5.3-Codex-Spark with High Speed

OpenAI has introduced GPT-5.3-Codex-Spark, a smaller version of GPT-5.3 designed for high-speed processing with a target speed of 1,000 tokens per second. This means most tasks can complete in under...

Gemini 3 Deep Think: AI Excels in Complex Problem Solving

Google’s Gemini 3 Deep Think AI model demonstrates remarkable capabilities in solving high-level problems typically tackled by top programmers and Olympiad participants. The model achieves state-of-the-art results on the ARC-AGI-2...

Harness Engineering: Leveraging Codex in an Agent-First World

OpenAI’s engineering team has built an internal beta software product with zero lines of manually written code, relying entirely on Codex agents to generate about a million lines of code...

subscribe via RSS