gpu – Hackaday

Import GPU: Python Programming with CUDA

Bryan Cockfield — Wed, 26 Feb 2025 03:00:30 +0000

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

Asahi Linux Brings Better Gaming to Apple Silicon

Navarre Bartz — Tue, 29 Oct 2024 11:00:19 +0000

For those of you longing for better gaming on an Apple Silicon device, Asahi Linux is here to help.

While Apple’s own line of CPUs are relatively new kids on the block, they’ve still been around for four years now, giving hackers ample time to dissect their innards. The team behind Asahi Linux has now brought us “the only conformant OpenGL®, OpenCL, and Vulkan® drivers” for Apple’s M1 and M2.

The emulation overhead of the system means that most games will need at least 16 GB of RAM to run. Many games are playable, but newer titles can’t yet hit 60 frames per second. The developers are currently focused on “correctness” and hope to improve performance in future updates. Many indie titles are reported to already be working at full speed though.

You can hear more about some of the fiddly bits of how to “tessellate with arcane compute shaders” in the video below. Don’t worry, it’s only 40 minutes of the nine hour video and it should start right at the presentation by GPU dev [Alyssa Rosenzweig].

If you want to see some of how Linux on Apple Silicon started or some of the previous work on hacking the M1 GPU, we have you covered.

C64 Gets a Graphics Upgrade Courtesy Of Your Favorite Piano Manufacturer

Lewin Day — Fri, 11 Oct 2024 18:30:21 +0000

Ceff

" data-medium-file="https://hackaday.com/wp-content/uploads/2024/10/cart_cable-1-1-e1728545444431.jpg?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2024/10/cart_cable-1-1-e1728545444431.jpg?w=800" />

The Commodore 64 was quite a machine in its time, though a modern assessment would say that it’s severely lacking in the graphical department. [Vossi] has whipped up a bit of an upgrade for the C64 and C128, in the form of a graphics expansion card running Yamaha hardware.

As you might expect, the expansion is designed to fit neatly into a C64 cartridge slot. The card runs the Yamaha V9958—the video display processor known for its appearance in the MSX2+ computers. In this case, it’s paired with a healthy 128 kB of video RAM so it can really do its thing. The V9958 has an analog RGB output that can be set for PAL or NTSC operation, and can perform at resolutions up to 512×212 or even 512×424 interlaced. Naturally, it needs to be hooked directly up to a compatible screen, like a 1084, or one with SCART input. [Vossi] took the time to create some demos of the chip’s capabilities, drawing various graphics in a way that the C64 couldn’t readily achieve on its own.

It’s a build that almost feels like its from an alternate universe, where Yamaha decided to whip up a third-party graphics upgrade for the C64. That didn’t happen, but stranger team ups have occurred over the years.

[Thanks to Stephen Walters for the tip!]

Learn GPU Programming With Simple Puzzles

Dave Rowntree — Wed, 25 Sep 2024 08:00:00 +0000

https://www.youtube.com/watch?v=K4T-YwsOxrM

" data-image-caption="" data-medium-file="https://hackaday.com/wp-content/uploads/2024/09/Screenshot-2024-09-24-133519-featured.png?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2024/09/Screenshot-2024-09-24-133519-featured.png?w=800" />

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

Hackaday Links: September 15, 2024

Dan Maloney — Sun, 15 Sep 2024 23:00:12 +0000

A quick look around at any coffee shop, city sidewalk, or sadly, even at a traffic light will tell you that people are on their phones a lot. But exactly how much is that? For Americans in 2023, it was a mind-boggling 100 trillion megabytes, according to the wireless industry lobbying association CTIA. The group doesn’t discuss their methodology in the press release, so it’s a little hard to make judgments on that number’s veracity, or the other numbers they bandy about, such as the 80% increase in data usage since 2021, or the fact that 40% of data is now going over 5G connections. Some of the numbers are more than a little questionable, too, such as the claim that 330 million Americans (out of a current estimate of 345.8 million people) are covered by one or more 5G networks. Even if you figure that most 5G installations are in densely populated urban areas, 95% coverage seems implausible given that in 2020, 57.5 million people lived in rural areas of the USA. Regardless of the details, it remains that our networks are positively humming with data, and keeping things running is no mean feat.

If you’ve ever wondered what one does with a degree in wildlife biology, look no further than a study that looks into “avian-caused ignitions” of wildfires. The study was led by Taylor Barnes, a wildlife biologist and GIS specialist who works for a civil engineering firm, and concludes that some utility poles are 5 to 8 times more likely to spark a wildfire than the average pole due to “thermal events” following electrocution of a bird, squirrel, bear, or idiot. Unfortunately, the paper is paywalled, so there’s no information on methodology, but we’re guessing a grad student or intern spent a summer collecting animal carcasses from beneath power poles. It’s actually very valuable work since it informs decisions on where to direct wildlife mitigation efforts that potentially reduce the number of service outages and wildfires, but it’s still kinda funny.

From the “How to get rid of a lot of money in a hurry” files comes a story of a bad GPU made into an incredibly unattractive purse. About the only thing good about the offering, which consists of a GeForce GT 730 video card stuffed into a clear plastic box with a gold(ish) chain attached, is the price of $1,024. The completely un-dodgy GPUStore Shopify site also lists a purse fashioned from an NVIDIA H100 Tensor Core GPU for a cool $65,536. At least somebody knows about base two.

And finally, if you’ve struggled with the question of what humanoid robots bring to the table, chances are pretty good that adding the ability to fly with four jet engines isn’t going to make things much clearer. But for some reason, a group from the Italian Institute of Technology is working on the problem of “aerial humanoid robotics” with a cherub-faced bot dubbed iRonCub. The diminutive robot is only about 70 kilograms, which includes the four jet engines generating a total of 1,000 newtons of thrust. Applications for the flying baby robot are mostly left to the imagination, although there is a vague reference to “search and rescue” applications; we’re not sure about you, but if we’re lost in the woods and half-crazed from hunger and exposure, a baby descending from the sky on a 600° plume of exhaust might not be the most comforting sight.

Startup Claims it Can Boost CPU Performance by 2-100X

Bryan Cockfield — Thu, 13 Jun 2024 02:00:25 +0000

Although Moore’s Law has slowed at bit as chip makers reach the physical limits of transistor size, researchers are having to look to other things other than cramming more transistors on a chip to increase CPU performance. ARM is having a bit of a moment by improving the performance-per-watt of many computing platforms, but some other ideas need to come to the forefront to make any big pushes in this area. This startup called Flow Computing claims it can improve modern CPUs by a significant amount with a slight change to their standard architecture.

It hopes to make these improvements by adding a parallel processing unit, which they call the “back end” to a more-or-less standard CPU, the “front end”. These two computing units would be on the same chip, with a shared bus allowing them to communicate extremely quickly with the front end able to rapidly offload tasks to the back end that are more inclined for parallel processing. Since the front end maintains essentially the same components as a modern CPU, the startup hopes to maintain backwards compatibility with existing software while allowing developers to optimize for use of the new parallel computing unit when needed.

While we’ll take a step back and refrain from claiming this is the future of computing until we see some results and maybe a prototype or two, the idea does show some promise and is similar to some ARM computers which have multiple cores optimized for different tasks, or other computers which offload non-graphics tasks to a GPU which is more optimized for processing parallel tasks. Even the Raspberry Pi is starting to take advantage of external GPUs for tasks like these.

Retrogadgets: The Ageia PhysX Card

Al Williams — Mon, 06 May 2024 17:00:19 +0000

Old computers meant for big jobs often had an external unit to crunch data in specific ways. A computer doing weather prediction, for example, might have an SIMD (single instruction multiple data) vector unit that could multiply a bunch of numbers by a constant in one swoop. These days, there are many computers crunching physics equations so you can play your favorite high-end computer game. Instead of vector processors, we have video cards. These cards have many processing units that can execute “kernels” or small programs on large groups of data at once.

Awkward Years

However, there was that awkward in-between stage when personal computers needed fast physics simulation, but it wasn’t feasible to put array processing and video graphics on the same board. Around 2006, a company called Ageia produced the PhysX card, which promised to give PCs the ability to do sophisticated physics simulations without relying on a video card.

Keep in mind that when this was built, multi-core CPUs were an expensive oddity and games were struggling to manage everything they needed to with limited memory and compute resources. The PhysX card was a “PPU” or Physics Processor Unit and used the PCI bus. Like many companies, Ageia made the chips and expected other companies — notably Asus — to make the actual board you’d plug into your computer.

The Technology

The chip had 125 million transistors on a 0.13 micron process. With 128 megabytes of 733 MHz GDDR3 memory, the board needed an extra power connector that could draw 20 watts. The price was around $300. Quite a bit for a card that did absolutely nothing without specialized software.

There was a physics engine, NovodeX, that could handle game physics for developers using either the chip or a software stack, so we presume that’s what most gamemakers would use.

Of course, today, a 20 watt GPU with an extra power connector isn’t enough to make you look up from your screen. But times were different then. According to contemporary reports, the chip has a two terabit per second memory bandwidth. Watch the demo vide below. It won’t knock your socks off, but for a computer system nearly twenty years ago, it was pretty good.

Aftermath

So what happened? Well, the company caused quite a stir, although it isn’t clear how many people ponied up to get better performance on a handful of games. The boards were a thing for only about two years. Ultimately, though, NVidia would buy Ageia and adapt its technology to run on NVidia hardware, and so some part of it lives on today as software, and you might find some games that still boast extra PhysX features.

If you want to see a direct comparison of before and after hardware acceleration, check out the video below. Don’t forget to note the frame rates in the bottom right corner.

These days, you are more likely to get heavy processing via CUDA or OpenCL. While GPU architectures vary, they will all outperform this early entry into hardware acceleration.