magic/README.md

<div align="center">
  <br>
  <img src='./icon.svg' width="250px">
</div>

# MAGIC (**M**agic **A**ccelerates via **G**eneral **I**ntercept-based **C**acheing)

This python module **MAGIC**ally mitigates performance issues on BWUni caused by faulty OS cache configuration.

The term "**HOCUSPOCUS**" (**H**ocuspocus **O**vercomes **C**onfiguration **U**psies for **S**uperior **P**erformance: **O**ptimized **C**aching via **I**ntercepting **U**serspace **S**yscalls) finds its origins in a misinterpretation from the 17th century, rooted in the Latin phrase "Hoc est corpus" used during the Catholic Mass to signify the transformation of bread into the Body of Christ. To those unfamiliar with Latin, this sacred invocation sounded like mystical jargon, which they mockingly or mistakenly transformed into "hocus pocus."

## Function

`hocuspocus(inform_cache_hit=False, inform_cache_miss=False, inform_cache_write=True, filetype_whitelist=".py,.pyc,.so,.dll", file_blacklist="")`

A function to configure the caching mechanism. It uses RAM and the local SSD for caching. By default, it caches specific file types (.py, .pyc, .so, .dll) and informs about cache writes. Call `hocuspocus()` at the beginning of your script to initialize the caching mechanism. It needs to be called before importing other packages. (Or more accurately: Any code before `magic.hocuspocus` will be run twice, so it must not have any side-effects!). If you want to customize the caching behavior, use the parameters provided.

### Parameters:
- `inform_cache_hit`: Boolean flag to print cache hits (default: False).
- `inform_cache_miss`: Boolean flag to print cache misses (default: False).
- `inform_cache_write`: Boolean flag to print cache writes (default: True).
- `filetype_whitelist`: Comma-separated string of file extensions to cache (default: ".py,.pyc,.so,.dll").
- `file_blacklist`: Comma-separated string of file paths to exclude from caching (default: "").

## Example Usage

```python
import magic
magic.hocuspocus()

import numpy as np
import torch as th
# Your code here
```

## How It Works

It's actually not magic... Python relies on the OS cache to keep actively used modules in RAM for quick access. However, an issue at BWUni causes these modules to be evicted repeatedly, leading to frequent and unnecessary reloads that need to pass through the internal network backbone. ATIS has confirmed this as the underlying issue but has yet to find a solution.

We make us of the `LD_PRELOAD` trick to inject a shared library (`hocuspocus.so`) into the Python process. This library overrides some standard file-related system calls with our custom implementations. These then intercept all file operations, allowing us to listen in and, when passing our white- and blacklists, forge the returned file descriptors to point to a local cache which we automatically populate instead of referencing the original files. (OS-level VFS caching is also functional on these cached copies, so we get a 2 level RAM/SSD cache overall.) This explicit caching prevents the erroneous evictions caused by the misconfiguration; once a module is loaded from the cache, it remains quickly accessible, reducing the overhead of repeated file loading and significantly improving performance.

This approach results in a significant decrease in training time and an even more significant decrease in the number of automatic e-mails sent by ATIS regarding 'high I/O activity'.

This package is only meant for python applications; but the provided `hocuspocus.so` could also work on a wide range of other applications. Have a look at our fairly minimal source code if you wanna try to adapt it...

## Benchmarks

### Training wall-clock-time reduction for RL workloads

TODO

### Automatic ATIS e-mail reduction

#### Before:

![mail_before](benchmarks/mail_before.png)

#### After:

![mail_after](benchmarks/mail_after.png)

We achieve a 100% reduction in automatic mails received from ATIS.

## Authors

ChatGPT-4o (Lead Developer)
Dominik Roth (Manager, Assistant Developer and Benchmarking)

Questions should primarely be directed at the lead developer (ChatGPT-4o).

## Donations

DogeCoin: DGUjmkYd3pzV2ovUydRs6c1dmd6AHV4Aby

Up to 50% of the total funds received through donations will be forwarded to Sam Altman's 7 trillion USD funding round.


*Note: ATIS seems to be highly competence most of the time. This repo is not meant as an attack, it is merely the result of coding while in a silly goofy mood.*