The first session of presentations this morning is a kaleidoscope session, so as was to be expected the presentations were highly diverse. Three presentations really stood out to me, ranging from online collaborating on R-packages using R-forge, to visualizing categorical data and dynamic representation of the results of principal components analysis.
46% of the R-packages are developed and maintained by more than one author, which at times leads to difficulties regarding the cooperation. Building upon statements made earlier by Kurt Hornik, a challenge for future development of R-Project might just be in this area. How do we keep people motivated to keep working on complex packages? Well, Stefan Theussl and the other the people working on R-Forge must have thought: “by keeping them facilitated”. R-Forge is an open source, online collaboration facility, specifically bound to R-Project. If you’re the developer of an R-package, you might find it to be the right way of sharing you’re code with others while it is still under development.
The presentation by David Meyer was something completely different: visualizing categorical data and the corresponding VCD package. What I loved about this presentation, is that it contained some ideas on how one should properly visualize data. Still, we run into some weird instances of graphics, as was illustrated during the presentation (for instance the use of 3d bar-plots, in which we cannot compare the bar-height). One of the possibly interesting strategies, was to use color shaded mosaic plots, in which differences that were statistically significant to a higher extent were indicated by stronger colors. The direction of the relationship determined the nature of the color, the strength of the relationship the intensity of the color.
At the end of the session, a very nice piece of software was demonstrated, aimed at the dynamic interpretation and representation of principal components analyses. What the package basically does, is to take an object containing the results of a PCA, and then to represent that in a java based application. By that, we can easily select the variables or factors to represent. It is even possible to only show those factors, that have an impact above a specified threshold. Very nice visualization, and in the case of enormous numbers of variables, a real life saver.