SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via CI

https://arxiv.org/static/browse/0.3.4/images/arxiv-logo-fb.png
LLM-powered agents are evaluated on SWE-CI, a repository-level benchmark that assesses their ability to maintain code quality over long-term evolution. SWE-CI comprises 100 tasks, each representing an average 233-day evolution history with 71 commits.

Cloud VM benchmarks 2026

https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihxwrtvu3iezsrccaih2.png
A comprehensive cloud compute VM comparison was conducted, testing 44 instance types across 7 providers, including AWS, GCP, Azure, Oracle, Linode, DigitalOcean, and Hetzner, with a focus on CPU performance and price. The results show AMD EPYC Turin as the top performer, followed by Intel Granite Rapids and Google Axion, with significant performance differences between providers and instance types.

"Warn about PyPy being unmaintained"

https://opengraph.githubassets.com/a7c3e32ce5c10a983fd362a0d6820b5b2ffddd24c0280dcca36daed2f90310be/astral-sh/uv/pull/17643
A commit was pushed to the tmeijn/dotfiles repository referencing a pull request that updated the astral-sh/uv package to version 0.9.27.

From RGB to L*a*b* color space (2024)

https://kaizoudou.com/wp-content/uploads/2024/07/image-2.png
To assess color accuracy between images, convert them to the Lab color space using the XYZ intermediate space and calculate Delta E (ΔE) for objective comparison. The Lab color space separates lightness (L*) from color information (a* and b*), making it ideal for precise color manipulation and comparison.

CasNum

https://repository-images.githubusercontent.com/1155292460/c3a16c6c-63b3-4762-9a3b-9c86b22e748b
CasNum is a library implementing arbitrary precision arithmetic using compass and straightedge constructions, integrating with a modified Game Boy emulator. It features a viewer showing geometric constructions and allows running games like Pokémon Red using only compass and straightedge operations.

Show HN: Curiosity – DIY 6" Newtonian Reflector Telescope

https://curiosity-telescope.vercel.app/Telescope/Design/04-Optical%20Layout%20cal.png
A Newtonian reflector telescope was built with a 190mm inner diameter PVC tube and a 6" primary mirror, following a design inspired by Stellafane's Guide. The telescope's aperture allows it to gather 625 times more light than the human eye, making it suitable for observing faint stars.

MonoGame: A .NET framework for making cross-platform games

https://raw.githubusercontent.com/MonoGame/MonoGame.Logo/refs/heads/master/FullColorOnLight/LogoOnly_128px.png
MonoGame is a .NET framework for creating games across desktop, mobile, and console platforms using C#. It supports various platforms and has a growing list of features including Vulkan and DirectX12 graphics support.

How to run Qwen 3.5 locally

https://unsloth.ai/docs/~gitbook/image?url=https%3A%2F%2F3215535692-files.gitbook.io%2F~%2Ffiles%2Fv0%2Fb%2Fgitbook-x-prod.appspot.com%2Fo%2Fspaces%252FxhOjnexMCB3dmuQFQ2Zq%252Fuploads%252F7H0N7guLeBxQJzMTQeJ4%252FScreenshot%25202026-03-05%2520at%25203.59.09%25E2%2580%25AFAM.png%3Falt%3Dmedia%26token%3D3f3c6c7d-e249-409c-b95b-106e430205ee&width=768&dpr=3&quality=100&sign=7ca8413e&sv=2
Qwen3.5 is a new model family from Alibaba with various sizes and capabilities, including multimodal hybrid reasoning and support for 201 languages. It can be used for tasks like chat, coding, and long-context tasks.

A decade of Docker containers

The website is temporarily blocked due to security reasons after a suspicious action was detected. Please email the site owner with the Cloudflare Ray ID and details of the action that triggered the block.

Emacs internals: Deconstructing Lisp_Object in C (Part 2)

The author discusses how they approach reading source code, starting from general computation and data representation, and applies this to the GNU Emacs source code, which uses a tagged pointer technique to represent Lisp values in C. This technique is a universal pattern in systems programming, allowing metadata to be stored in unused bits of pointers, and is used in Emacs to implement ...

Dumping Lego NXT firmware off of an existing brick (2025)

The user contributed to the Pybricks project and obtained a used Lego NXT with the original 2006 firmware version 1.01, which they wanted to archive, leading to the discovery of arbitrary code execution. They successfully exploited the NXT's firmware to gain native ARM code execution, allowing them to access and dump the firmware, and potentially enabling the creation of an NXT worm.

Yoghurt delivery women combatting loneliness in Japan

https://ichef.bbci.co.uk/images/ic/480xn/p0n451bl.jpg.webp
In Japan, a network of women delivering probiotic milk drinks, known as Yakult Ladies, has become a vital source of connection and care for the elderly. These women, who are often self-employed, offer a lifeline of human connection and help reduce loneliness in a rapidly ageing population.

Rijksmuseum researchers discover new painting by Rembrandt van Rijn

https://www.rijksmuseum.nl/assets/b90d19e5-efd0-4fca-b717-95be8c403a6f?w=1920&h=1080&fx=2845&fy=5186&c=5bc09d9ebe91012b4eb13928b446a6f9fafaa0114685014b3a0e04fd1abba974
Researchers at the Rijksmuseum have demonstrated that the painting Vision of Zacharias in the Temple (1633) was made by Rembrandt. They examined the work with the same advanced techniques used in Operation Night Watch, and closely compared it with other paintings by Rembrandt from the same period. Materials analysis, stylistic and thematic similarities, alterations made by Rembrandt, and the ...

Show HN: A weird thing that detects your pulse from the browser video

https://pulsefeedback.io/og-image.png
This page responds to your pulse through your camera. No one can see you. Only your heart rate is shared.

Autoresearch: Agents researching on single-GPU nanochat training automatically

https://raw.githubusercontent.com/karpathy/autoresearch/master/progress.png
A researcher, @karpathy, created a project to let AI agents experiment autonomously with a simplified LLM training setup, modifying code and training for 5 minutes at a time. The project uses a single file, train.py, which the agent edits, and a program.md file that humans edit to set up the research org.

The surprising whimsy of the Time Zone Database

https://muddy.jprs.me/media/20260306-203048.png
The author learned to handle time zones by using the IANA Time Zone Database, a resource built by others, rather than writing custom code. The database contains a rich history of time zone changes and whimsical comments.

Best performance of a C++ singleton

https://andreasfertig.com/img/sherlock.png
The user discusses implementing a singleton with performance in mind, comparing two approaches: using a block local static variable and a private static data member. The private static data member approach is recommended for better performance when a constructor is needed.

In 1985 Maxell built a bunch of life-size robots for its bad floppy ad

https://assets.buttondown.email/images/d74314eb-fcaf-42b2-9927-c01dbf847780.png?w=960&fit=max
Maxell's 1980s ads featured robots eating floppy disks, but the company also created life-size robot props that were displayed in a museum exhibit. The robots were part of a Smart Machines exhibit at The Computer Museum in Boston, which opened in 1987.

Ten years of deploying to production

The user worked in a company in 2018 where the operations team was responsible for production deployments, which happened only every two weeks, causing delays in fixing issues. The user implemented a DevOps solution, creating an internal PyPi repository and establishing a pattern of versioning and code review, which improved the developer experience and reduced friction in production changes.

FLASH radiotherapy's bold approach to cancer treatment

https://spectrum.ieee.org/media-library/photo-of-a-man-in-a-lab-coat-adjusting-a-large-piece-of-medical-equipment-thats-pointed-at-the-head-of-a-partial-mannequin.jpg?id=65111419&width=1200&height=913
Physicists at CERN and other labs are developing FLASH radiotherapy, a new cancer treatment that delivers high doses of radiation in a short burst, reducing damage to healthy tissue. Researchers are refining the technology and expect it to become a routine clinical option in about 10 years, potentially transforming cancer care worldwide.

macOS code injection for fun and no profit (2024)

The user discusses Live++ by Molecular Matters, a C/C++ hot-reload/live coding solution, and shares a project to inject code into a running process on macOS using Mach APIs. The project involves modifying a test program's memory, allocating executable memory, and setting up a trampoline to replace a function with new code.

I'm Not Consulting an LLM

https://lr0.org/images/art/2025-05-02_23-26-02_screenshot.png
The author argues that relying on LLMs for information can be intellectually corrosive because it lacks the experience and critical thinking that comes from researching and encountering diverse perspectives. This can lead to "intellect-rot" where one's understanding is based on plausibility rather than a deep understanding of the subject.

Lisp-style C++ template meta programming

https://opengraph.githubassets.com/8deb14cd2cf854c003125cdb4d7bf1ed2c4a972054877790b2b7d24fe0d61621/mistivia/lmp
The code implements a prime number sieve using infinite integers and lazy evaluation. It generates a list of prime numbers starting from 2.

Digital Iris [video]

Files are the interface humans and agents interact with

https://avatars.githubusercontent.com/u/25641936?v=4
The author, a former vector database company employee, notes a shift in the AI ecosystem towards using filesystems for context and memory, citing various companies and researchers adopting this approach.

To the Polypropylene Makers

https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/HQTueNS4mLaGy3BBL/p7iaiua4zcd1zfeyrxqd
During the COVID-19 pandemic, Braskem America workers volunteered to live in factories for 28 days to produce polypropylene for N95 masks, producing 40M pounds. They were paid full wages and a week off after, showing how creative thinking can fill vital gaps in emergencies.

SigNoz (YC W21) is hiring for engineering, growth and product roles

https://app.ashbyhq.com/api/images/org-theme-logo/2f1f7a19-9719-437c-902f-861cf9096134/fcf65159-ffdc-40fa-9a2f-4b400b3d1493/d51eab50-af4c-4a66-808c-efee755b61e9.png
SigNoz Jobs

Ask HN: Why there are no actual studies that show AI is more productive?

I know there are companies that are highly productive with AI including ours. However, AI skeptics ask for real studies and all of them available now show no real gains. So my question is, are there any actual studies about the companies that actually make it work with AI?

LLM Writing Tropes.md

A single file containing AI writing tropes was created to help AI assistants avoid common patterns in writing, such as overused adverbs, grandiose nouns, and false suspense transitions. The file lists various tropes to avoid, including negative parallelism, superficial analyses, and invented concept labels, to help AI writers produce more human-like and engaging content.

How important was the Battle of Hastings?

This website is using a security service to protect itself from online attacks. We are checking your browser to establish a secure connection and keep you safe.