The Singing Dodo: Columbia Workshop, Robert Adler

Warning: my understanding of these talks was at times pretty thin and any error in discussing the topics are almost certainly mine.

Robert Adler's talk focused on uses of the Euler characteristic to summarize complex shapes such as the brain in 3-D. The fact that a number derived by algebraic topologists would nicely do that sort of thing for you is not a big surprise---I gather that's one of goals of the field[1]. It was interesting to hear the Euler characteristic defined in terms of the pattern of adding and subtracting alternating features of a shape because it reminded me of a children's book I recently read to my son which uses a number like that as part of a puzzle.

Figure 1: Algebraic topology is everywhere! My son now knows how to calculate the Euler characteristic for solid straight-sided shapes.

I went back and checked and it turns out the definitions are identical and indeed the Euler characteristic is defined in that book for polyhedra.

More interesting from a data visualization point of view was the presentation of a (I gather common) method for representing the components (e.g.- verteces, edges, and faces in the case of polyhedra) which make up a complex shape and how they are "born" and "die" when a descending threshold is applied to the object in one dimension.

The thresholding is best explained using a landscape (mountains, valleys, that sort of thing) in three dimensions. The threshold is a plane at a set height in 3-D space, starting well above the landscape. As the threshold descends it first contacts the highest points on the landscape and those features are "born" in the visualization (barcode), as it descends through the landscape there is a set of rules for when new components are "born" or "die". Robert Adler was kind enough to mention some of these topological rules in passing to give us a sense of their flavor but he spared us the details[1].


Figure 2: Thresholding on a 4-D space with the resulting barcode (3 categories of bars). Each bar represents the lifespan from birth (on left) to death (on right) of a feature. This could be applied to summarizing the brain. [2]

My understanding of the main point of the talk was that these barcodes were a convenient way of summarizing multi-dimensional shapes and they were currently being applied to problems such as the comparison of brain scans from alzheimer's patients and matched controls. According to Adler, the main difficulty with making these comparisons (not that it's stopping anybody) is that there is no good understanding of the statistical properties of these barcodes, therefore no method for defining a null hypothesis or statistical model for the data (what would non-informative priors on barcodes look like anyway?)

Obviously these techniques are applicable to biology since brain scans were one of the examples Adler discussed, but I wonder if there are other less obvious places where these approaches are relevant. After all, the barcodes are generated by thresholding an n-1 dimensional object living in an n-dimensional space. It doesn't have to be a physical object. I assume that you need some local autocorrelation to get any coherent barcode (think of the autocorrelation of height in the 2-D landscape in 3-D space), but other than that there seem to be few requirements for crunching the numbers. This is, of course, a discussion I should have tried with Dr. Adler while he was in front of me, but I hadn't digested the talk yet so I missed my chance.

I guess the question I need to spend some time with is: what other types of spaces can be summarized in this way? The local autocorrelation in the thresholded values gives some clues but it's not very clear. For example, can pedigrees be visualized this way (based on the continuity in genetic similarity)? Phylogenies more generally? Is there a point to that summary?