upd README

This commit is contained in:
Dominik Moritz Roth 2024-05-27 16:17:05 +02:00
parent ab9a09f548
commit 14fd017eb9

View File

@ -42,7 +42,8 @@ If we were to give up on lossless compression, one could expand MiddleOut to for
Based on an expected distribution of deltas that have to be transmitted, an efficient Huffman-like binary format is used for encoding the data.
## On Lossless 200x Compression
## Discussion
### On Lossless 200x Compression
Expecting a 200x compression ratio is ludicrous, as it would mean transmitting only 1 bit per 20 data points. Given the high entropy of the readings, this is an absurd goal. Anyone who thinks lossless 200x compression is remotely feasible has a woefully inadequate grasp of information theory. Please, do yourself a favor and read Shannons paper.
@ -52,7 +53,7 @@ Let's see how far we can get with the approach presented here...
On another note: Why is the dataset provided not 10-bit if the readings are? They are all 16-bit. And the last 6 bits are not all zeros. We know they can't encode sensible information when the readings are only 10-bit, but we also can't just throw them away since they do contain something. We also observe that all possible values the data points can take on are separated by 64 or 63 (64 would make sense; 63 very much does not). (See `fucked_up_wavs.py`)
## On Evaluation
### On Evaluation
The provided eval.sh script is also flawed (as in: not aligned with what should be optimized for), since it counts the size of the compressor and decompressor as part of the transmitted data. Especially the decompressor part makes no sense. It also makes it impossible to compress data from multiple threads together, which is required for the free lunch we can get from topological reconstruction.
@ -75,6 +76,10 @@ pip install -r requirements.txt
## Usage
### CLI
TODO
### Training
Requires Slate, which is not currently publicly available. Install via (requires repo access):