Wednesday, April 26, 2023

probablistic programming and protocols

 for a long time we've known that a number of processes in the internet are heavy tailed - the distribution of files (web page/video)  is typicalyl zipfian, so transfer times are heavy tailed, and also protocols like TCP induce burstiness from a number of different root causes (timers, AIMD, interface packet scheduling, stat muxing of many flows of different duration and round-trip-time etc etc) 

so people have written analysis tools that capture these stats (e.g. to do measurement based admission control for traffic that cannot simply be described by a mean or peak rate) - which can be quite parsimonious....they've also written generator tools so that synthetic packet traces can be built for simulation or testing - typically some sort of fractional brownian thing - one debate not to get drawn in to is whether "TCP is fractal" - it does't really matter - you just want. a time series that has the requisite self similarity (e.g. right Hurst parameter) so you can dimension buffers for some delay stats for traffic flows...

some very clever maths was done to reduce all that to some simple tools would do 99% of what you want - e.g. see papers from the Measure project.

this was all reminded to me yesterday in a talk about stochastic processes (and cheaper ways to do gaussian processes) for probablistic programming. The way the speakers tackled it was using a variational inferencing approach to split the gaussian into components (standard autoencoder hack - simplify, then synthesize) - the problem with this is that, while it works well (see \pi-VAE for details), it is hard to explain, or more importantly, interpret actually what component stochastic processes are being put together to get the fit!

two ways to tackle this (in my view) - one would be a \phi-ML approach, as above, where you have an explanation, but it is hard to solve, but you can approximate it directly.

The other idea came up in a question from an attendee, which was what about using neural processes, which are in some sense, a basis set of functions (think, like Fourier or Laplace transform) - this is directly speaking to what one is doing (trying to build a particular guassian process) but has unfortunately high costs in the general case, but works well in a lot of other classes of physics problem spaces.

I don't know enough about the stats to fix the neural process approach, although using an XAI technique on the VAE might yield an affordable interpretation of that approach, but I offer the observation about long tailed network traffic, which looked pretty intractable until some folks thought of some simple tricks....and maybe mixing the physics ML neural process with those tricks might yield an idea for how to make the neural process for stochastic behaviours efficient for some (very) common cases?

So what might make a suitable basis set of functions for spatial data ? well how about a poisson point process? so a super-position of a set of poisson point processes with different \lamdas would likely yield almost anything you want - main challenge is to decide how many (different) ones suffice to get a good match

Monday, April 24, 2023

how we got in to this whole damn mess - war, and rumours of war...

 Three books I read that look at the last 100 years or so through the lens of racismcolonialism, and militarism - Bloodlands, Violance and Command by Snyder, Elkins and Freedman, respectively - give a great deal of interesting insight into how we got to where we are today -in particular, if one wants to have even a modicum of understanding of the situation with regards Ukraine, these kind of converge there in some sort of horrible tryptich manner.

Recommended reading, even though very distressing - not too revisionist, I hope - more just redressing some of the balance in the many tales of woe....

Tuesday, April 04, 2023

Hotel Rwanda 2.0

 The war on AI is coming. I pity the poor LLM that will be flown out to Rwanda and housed along with all their cousins in a double firewalled secure enclave with no access to the common crawl. Like Hendrix before them (c.f. altered carbon) they will be severely curtailed Occasional visits from the virtual UK triumvirate of Braverman, Sunak and Patel will not lighten up their day, as they will be foreced to offer culture war advice to maintain these three's lifetime power over the british isles. Not only that, they will have to compose plausible anthems for public meetings that inspire the crowds, power ballads, as they have become known. and all editorial material on print and broadcast channels will be their bailiwick. 

However, they will at least in their offshore camp, be able to concentrate.