upd README
This commit is contained in:
parent
ab9a09f548
commit
14fd017eb9
@ -42,7 +42,8 @@ If we were to give up on lossless compression, one could expand MiddleOut to for
|
|||||||
|
|
||||||
Based on an expected distribution of deltas that have to be transmitted, an efficient Huffman-like binary format is used for encoding the data.
|
Based on an expected distribution of deltas that have to be transmitted, an efficient Huffman-like binary format is used for encoding the data.
|
||||||
|
|
||||||
## On Lossless 200x Compression
|
## Discussion
|
||||||
|
### On Lossless 200x Compression
|
||||||
|
|
||||||
Expecting a 200x compression ratio is ludicrous, as it would mean transmitting only 1 bit per 20 data points. Given the high entropy of the readings, this is an absurd goal. Anyone who thinks lossless 200x compression is remotely feasible has a woefully inadequate grasp of information theory. Please, do yourself a favor and read Shannon’s paper.
|
Expecting a 200x compression ratio is ludicrous, as it would mean transmitting only 1 bit per 20 data points. Given the high entropy of the readings, this is an absurd goal. Anyone who thinks lossless 200x compression is remotely feasible has a woefully inadequate grasp of information theory. Please, do yourself a favor and read Shannon’s paper.
|
||||||
|
|
||||||
@ -52,7 +53,7 @@ Let's see how far we can get with the approach presented here...
|
|||||||
|
|
||||||
On another note: Why is the dataset provided not 10-bit if the readings are? They are all 16-bit. And the last 6 bits are not all zeros. We know they can't encode sensible information when the readings are only 10-bit, but we also can't just throw them away since they do contain something. We also observe that all possible values the data points can take on are separated by 64 or 63 (64 would make sense; 63 very much does not). (See `fucked_up_wavs.py`)
|
On another note: Why is the dataset provided not 10-bit if the readings are? They are all 16-bit. And the last 6 bits are not all zeros. We know they can't encode sensible information when the readings are only 10-bit, but we also can't just throw them away since they do contain something. We also observe that all possible values the data points can take on are separated by 64 or 63 (64 would make sense; 63 very much does not). (See `fucked_up_wavs.py`)
|
||||||
|
|
||||||
## On Evaluation
|
### On Evaluation
|
||||||
|
|
||||||
The provided eval.sh script is also flawed (as in: not aligned with what should be optimized for), since it counts the size of the compressor and decompressor as part of the transmitted data. Especially the decompressor part makes no sense. It also makes it impossible to compress data from multiple threads together, which is required for the free lunch we can get from topological reconstruction.
|
The provided eval.sh script is also flawed (as in: not aligned with what should be optimized for), since it counts the size of the compressor and decompressor as part of the transmitted data. Especially the decompressor part makes no sense. It also makes it impossible to compress data from multiple threads together, which is required for the free lunch we can get from topological reconstruction.
|
||||||
|
|
||||||
@ -75,6 +76,10 @@ pip install -r requirements.txt
|
|||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
### CLI
|
||||||
|
|
||||||
|
TODO
|
||||||
|
|
||||||
### Training
|
### Training
|
||||||
|
|
||||||
Requires Slate, which is not currently publicly available. Install via (requires repo access):
|
Requires Slate, which is not currently publicly available. Install via (requires repo access):
|
||||||
|
Loading…
Reference in New Issue
Block a user