benchmarks | ||
magic | ||
.gitignore | ||
fib.py | ||
icon.svg | ||
install_package.sh | ||
MANIFEST.in | ||
README.md | ||
setup.py | ||
trivial.py |
MAGIC (Magic Accelerates via General Intercept-based Caching)
This python module MAGICally mitigates performance issues on BWUni caused by faulty OS cache configuration.
The term "HOCUSPOCUS" (Hocuspocus Overcomes Configuration Upsies for Superior Performance: Optimized Caching via Intercepting Userspace Syscalls) finds its origins in a misinterpretation from the 17th century, rooted in the Latin phrase "Hoc est corpus" used during the Catholic Mass to signify the transformation of bread into the Body of Christ. To those unfamiliar with Latin, this sacred invocation sounded like mystical jargon, which they mockingly or mistakenly transformed into "hocus pocus."
❗ This project is still very WIP and leads to crashes in many situations. At this point it is still completely unusable. |
---|
Function
hocuspocus(inform_cache_hit=False, inform_cache_miss=False, inform_cache_write=True, filetype_whitelist=".py,.pyc,.so,.dll", file_blacklist="")
A function to configure the caching mechanism. It uses RAM and the local SSD for caching. By default, it caches specific file types (.py, .pyc, .so, .dll) and informs about cache writes. Call hocuspocus()
at the beginning of your script to initialize the caching mechanism. It needs to be called before importing other packages. (Or more accurately: Any code before magic.hocuspocus
will be run twice, so it must not have any side-effects!). If you want to customize the caching behavior, use the parameters provided.
Parameters:
inform_cache_hit
: Boolean flag to print cache hits (default: False).inform_cache_miss
: Boolean flag to print cache misses (default: False).inform_cache_write
: Boolean flag to print cache writes (default: True).filetype_whitelist
: Comma-separated string of file extensions to cache (default: ".py,.pyc,.so,.dll").file_blacklist
: Comma-separated string of file paths to exclude from caching (default: "").
Example Usage
import magic
magic.hocuspocus()
import numpy as np
import torch as th
# Your code here
How It Works
It's actually not magic... Python relies on the OS cache to keep actively used modules in RAM for quick access. However, an issue at BWUni causes these modules to be evicted repeatedly, leading to frequent and unnecessary reloads that need to pass through the internal network backbone. ATIS has confirmed this as the underlying issue but has yet to find a solution.
We make us of the LD_PRELOAD
trick to inject a shared library (hocuspocus.so
) into the Python process. This library overrides some standard file-related system calls with our custom implementations. These then intercept all file operations, allowing us to listen in and, when passing our white- and blacklists, forge the returned file descriptors to point to a local cache which we automatically populate instead of referencing the original files. (OS-level VFS caching is also functional on these cached copies, so we get a 2 level RAM/SSD cache overall.) This explicit caching prevents the erroneous evictions caused by the misconfiguration; once a module is loaded from the cache, it remains quickly accessible, reducing the overhead of repeated file loading and significantly improving performance.
This approach results in a significant decrease in training time and an even more significant decrease in the number of automatic e-mails sent by ATIS regarding 'high I/O activity'.
This package is only meant for python applications; but the provided hocuspocus.so
could also work on a wide range of other applications. Have a look at our fairly minimal source code if you wanna try to adapt it...
Benchmarks
Training wall-clock-time reduction for RL workloads
TODO
Automatic ATIS e-mail reduction
Before:
After:
We achieve a 100% reduction in automatic mails received from ATIS.
Authors
ChatGPT-4o (Lead Developer)
Dominik Roth (Manager, Assistant Developer and Benchmarking)
Questions should primarely be directed at the lead developer (ChatGPT-4o).
Donations
DogeCoin: DGUjmkYd3pzV2ovUydRs6c1dmd6AHV4Aby
Up to 50% of the total funds received through donations will be forwarded to Sam Altman's 7 trillion USD funding round.
Note: ATIS seems to be highly competence most of the time. This repo is not meant as an attack, it is merely the result of coding while in a silly goofy mood.