CUDA – Hackaday

Import GPU: Python Programming with CUDA

Bryan Cockfield — Wed, 26 Feb 2025 03:00:30 +0000

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

Learn GPU Programming With Simple Puzzles

Dave Rowntree — Wed, 25 Sep 2024 08:00:00 +0000

https://www.youtube.com/watch?v=K4T-YwsOxrM

" data-image-caption="" data-medium-file="https://hackaday.com/wp-content/uploads/2024/09/Screenshot-2024-09-24-133519-featured.png?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2024/09/Screenshot-2024-09-24-133519-featured.png?w=800" />

Have you wanted to get into GPU programming with CUDA but found the usual textbooks and guides a bit too intense? Well, help is at hand in the form of a series of increasingly difficult programming ‘puzzles’ created by [Sasha Rush]. The first part of the simplification is to utilise the excellent NUMBA python JIT compiler to allow easy-to-understand code to be deployed as GPU machine code. Working on these puzzles is even easier if you use this linked Google Colab as your programming environment, launching you straight into a Jupyter notebook with the puzzles laid out. You can use your own GPU if you have one, but that’s not detailed.

The puzzles start, assuming you know nothing at all about GPU programming, which is totally the case for some of us! What’s really nice is the way the result of the program operation is displayed, showing graphically how data are read and written to the input and output arrays you’re working with. Each essential concept for CUDA programming is identified one at a time with a real programming example, making it a breeze to follow along. Just make sure you don’t watch the video below all the way through the first time, as in it [Sasha] explains all the solutions!

Confused about why you’d want to do this? Then perhaps check out our guide to CUDA first. We know what you’re thinking: how do we use non-nVIDIA hardware? Well, there’s SCALE for that! Finally, once you understand CUDA, why not have a play with WebGPU?

CUDA, But Make It AMD

Lewin Day — Tue, 16 Jul 2024 18:30:00 +0000

Compute Unified Device Architecture, or CUDA, is a software platform for doing big parallel calculation tasks on NVIDIA GPUs. It’s been a big part of the push to use GPUs for general purpose computing, and in some ways, competitor AMD has thusly been left out in the cold. However, with more demand for GPU computation than ever, there’s been a breakthrough. SCALE from [Spectral Compute] will let you compile CUDA applications for AMD GPUs.

SCALE allows CUDA programs to run as-is on AMD GPUs, without modification. The SCALE compiler is also intended as a drop-in swap for nvcc, right down to the command line options. For maximum ease of use, it acts like you’ve installed the NVIDIA Cuda Toolkit, so you can build with cmake just like you would for a normal NVIDIA setup. Currently, Navi 21 and Navi 31 (RDNA 2.0 and RDNA 3.0) targets are supported, while a number of other GPUs are undergoing testing and development.

The basic aim is to allow developers to use AMD hardware without having to maintain an entirely separate codebase. It’s still a work in progress, but it’s a promising tool that could help break NVIDIA’s stranglehold on parts of the GPGPU market.

NeRF: Shoot Photos, Not Foam Darts, to See Around Corners

Michael Shaub — Thu, 23 Jun 2022 05:00:18 +0000

Readers are likely familiar with photogrammetry, a method of creating 3D geometry from a series of 2D photos taken of an object or scene. To pull it off you need a lot of pictures, hundreds or even thousands, all taken from slightly different perspectives. Unfortunately the technique suffers where there are significant occlusions caused by overlapping elements, and shiny or reflective surfaces that appear to be different colors in each photo can also cause problems.

But new research from NVIDIA marries photogrammetry with artificial intelligence to create what the developers are calling an Instant Neural Radiance Field (NeRF). Not only does their method require far fewer images, as little as a few dozen according to NVIDIA, but the AI is able to better cope with the pain points of traditional photogrammetry; filling in the gaps of the occluded areas and leveraging reflections to create more realistic 3D scenes that reconstruct how shiny materials looked in their original environment.

If you’ve got a CUDA-compatible NVIDIA graphics card in your machine, you can give the technique a shot right now. The tutorial video after the break will walk you through setup and some of the basics, showing how the 3D reconstruction is progressively refined over just a couple of minutes and then can be explored like a scene in a game engine. The Instant-NeRF tools include camera-path keyframing for exporting animations with higher quality results than the real-time previews. The technique seems better suited for outputting views and animations than models for 3D printing, though both are possible.

Don’t have the latest and greatest NVIDIA silicon? Don’t worry, you can still create some impressive 3D scans using “old school” photogrammetry — all you really need is a camera and a motorized turntable.

What Kind of GPU Are You?

Al Williams — Fri, 23 Jul 2021 02:00:23 +0000

In the old days, big computers often had some form of external array processor. The idea is you could load a bunch of numbers into the processor and then do some math operations on all of the numbers in parallel. These days, you are more likely to turn to your graphics card for number crunching support. You’ll usually use some library to help you do that, but things are always better when you understand what’s going on under the hood. That’s why we enjoyed [RasterGrid’s] post on GPU architecture types.

If you can tell the difference between IMR (immediate mode) and TBR (tile-based) rendering this might not be the post for you. But while we knew the terms, we found a lot of interesting detail including some graphics and pseudo code that clarified the key differences.

Which architecture is better? As the post points out, that depends on how you define better. Each can boast it is better at something, but the flip side is that each is also worse at something else. In general, IMR GPUs wind up in desktop computers and mobile devices tend towards TBR. It also depends on the specific task you ask of the GPU.

Granted, you normally don’t need to know any of this. For graphics, you are probably not directly controlling the device and for computation, you will likely use CUDA or OpenCL. But you don’t need to understand an engine to drive a car, but the best-performing drivers do know how an engine works.

Deep Learning Enables Intuitive Prosthetic Control

Lewin Day — Fri, 28 May 2021 02:00:15 +0000

Prosthetic limbs have been slow to evolve from simple motionless replicas of human body parts to moving, active devices. A major part of this is that controlling the many joints of a prosthetic is no easy task. However, researchers have worked to simplify this task, by capturing nerve signals and allowing deep learning routines to figure the rest out.

The prosthetic arm under test actually carries a NVIDIA Jetson Nano onboard to run the AI nerve signal decoder algorithm.

" data-medium-file="https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?w=247" data-large-file="https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?w=386" class="size-medium wp-image-480140" src="https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?w=247" alt="" width="247" height="400" srcset="https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg 1265w, https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?resize=154,250 154w, https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?resize=247,400 247w, https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?resize=386,625 386w, https://hackaday.com/wp-content/uploads/2021/05/cudarm4.jpg?resize=949,1536 949w" sizes="auto, (max-width: 247px) 100vw, 247px" />

The prosthetic arm under test actually carries a NVIDIA Jetson Nano onboard to run the AI nerve signal decoder algorithm.

Reported in a pre-published paper, researchers used implanted electrodes to capture signals from the median and ulnar nerves in the forearm of Shawn Findley, who had lost a hand to a machine shop accident 17 years prior. An AI decoder was then trained to decipher signals from the electrodes using an NVIDIA Titan X GPU.

With this done, the decoder model could then be run on a significantly more lightweight system consisting of an NVIDIA Jetson Nano, which is small enough to mount on a prosthetic itself. This allowed Findley to control a prosthetic hand by thought, without needing to be attached to any external equipment. The system also allowed for intuitive control of Far Cry 5, which sounds like a fun time as well.

The research is exciting, and yet another step towards full-function prosthetics becoming a reality. The key to the technology is that models can be trained on powerful hardware, but run on much lower-end single-board computers, avoiding the need for prosthetic users to carry around bulky hardware to make the nerve interface work. If it can be combined with a non-invasive nerve interface, expect this technology to explode in use around the world.

[Thanks to Brian Caulfield for the tip!]

NVIDIA Announces $59 Jetson Nano 2GB, a Single Board Computer with Makers in Mind

Tom Nardi — Mon, 05 Oct 2020 13:01:54 +0000

NVIDIA kicked off their line of GPU-accelerated single board computers back in 2014 with the Jetson TK1, a $200 USD development system for those looking to get involved with the burgeoning world of so-called “edge computing”. It was designed to put high performance computing in a small and energy efficient enough package that it could be integrated directly into products, rather than connecting to a data center half-way across the world.

The TK1 was an impressive piece of hardware, but not something the hacker and maker community was necessarily interested in. For one thing, it was fairly expensive. But perhaps more importantly, it was clearly geared more towards industry types than consumers. We did see the occasional project using the TK1 and the subsequent TX1 and TX2 boards, but they were few and far between.

Then came the Jetson Nano. Its 128 core Maxwell CPU still packed plenty of power and was fully compatible with NVIDIA’s CUDA architecture, but its smaller size and $99 price tag made it far more attractive for hobbyists. According to the company’s own figures, the number of active Jetson developers has more than tripled since the Nano’s introduction in March of 2019. With the platform accessible to a larger and more diverse group of users, new and innovative applications for machine learning started pouring in.

Cutting the price of the entry level Jetson hardware in half was clearly a step in the right direction, but NVIDIA wanted to bring even more developers into the fray. So why not see if lightning can strike twice? Today they’ve officially announced that the new Jetson Nano 2GB will go on sale later this month for just $59. Let’s take a close look at this new iteration of the Nano to see what’s changed (and what hasn’t) from last year’s model.

Trimming the Fat

To be clear the new Jetson Nano 2GB is not a new device, it’s essentially just a cost optimized version of the hardware that was released back in 2019. It’s still the same size, draws the same amount of power, and has the exact same Maxwell GPU. In broad terms, it’s a drop-in replacement for the more expensive Nano. In fact, it’s so similar that you might not even be able to tell the difference between the two models at first. Especially since the biggest change isn’t visible: as the name implies, the new model only has two gigabytes of RAM compared to four in the original Nano.

The board has lost a few ports as part of the effort to get it down to half the original price, however. The Nano 2GB drops the DisplayPort for HDMI (the previous version had both), deletes the second CSI camera connector, does away with the M.2 slot, and reduces the number of USB ports from four to three. Losing a USB port probably isn’t a deal breaker for most applications, but if you need high-speed data, it’s worth noting that only one of them is 3.0. Overall, it seems clear that NVIDIA took a close look at the sort of devices that folks were connecting to their Nano and adjusted the type and number of ports accordingly.

Of course, the 40 pin header on the side remains unchanged so the new board should remain pin-compatible with anything you’ve already built. The Gigabit Ethernet port is still there, but unfortunately wireless still didn’t make the cut this time around. So if you need WiFi for your project, count on one of those USB ports being permanently taken up with a dongle.

It’s not just slimmed down, but updated as well. The 2GB removes the old school DC barrel jack and replaces it with a USB-C port. On the original Nano you could run it off of the micro USB port for most tasks, but it was recommended to use a laptop style power supply if you were going to be pushing the hardware. Now you can just use a 15 watt USB-C power supply and be covered in all situations.

A Tight Squeeze

Since the hardware is nearly identical between the two versions of the Nano, there’s really no point running any new benchmarks on it. If your software worked on the $99 Nano, it will run just as well on the $59 one. Or at least, that’s the idea. In reality, having only half the available RAM might be a problem for some applications.

NVIDIA sent me a review unit so as a simple test, I ran the detectnet.py script that makes up part of NVIDIA’s AI training course on the live video from a Logitech C270 camera. While the Nano maintained a respectable 22 to 24 frames per second, the system ran out of RAM almost immediately and had to dip into swap to keep up. Naturally this is pretty problematic on an SD card, and certainly not something you’d want to do for any extended period of time unless you happen to own SanDisk stock.

To help combat this, NVIDIA recommends disabling the GUI on the Nano 2GB and running headless if you’re planning on doing any computationally intensive tasks. That should save you 200 to 300 MB of memory, but obviously isn’t going to work in all situations. It’s also a bit counter-intuitive considering the default Ubuntu 18.04 system image boots directly into a graphical environment. It’ll be interesting to see if some lightweight operating system choices are offered down the line to help address this issue.

Rise of the Machines

It’s probably not fair to call the Jetson Nano 2GB a direct competitor to the Raspberry Pi, but clearly NVIDIA wants to close the gap. While the lack of built-in WiFi and Bluetooth will likely give many makers pause, there’s no question that the Nano will run circles around the Pi 4 if you’re looking to experiment with things like computer vision. At $99 that might not have mattered for budget-conscious hardware hackers, but now that the Nano is essentially the same price as the mid-range Pi 4, it’s going to be a harder decision to make.

Especially since NVIDIA is using the release of the new board to help kick off the Jetson AI Certification Program. This free self-paced source is comprised of tutorials and video walkthroughs that cover everything from the fundamentals of training up to practical applications like collision avoidance and object following. To complete the Jetson AI Specialist course and be granted the certification, applicants will have to submit an open source project to NVIDIA’s Community Projects forum for review and approval.

If you wanted to get your feet wet with AI and machine learning, picking up a Jetson Nano at $99 was a great choice. Now that there’s a $59 version that includes access to a training and certification program, there’s barely even a choice left to make. Which in the end, is exactly what NVIDIA wants.