Mike's SOS: January 2014

Week 1

Studying Software Evolution Using Software Models
http://sail.cs.queensu.ca/publications/pubs/Thomas-2012-SCP.pdf

Before reading this paper, it had not occurred to me that a statistical analysis of a computer program as done by the authors would be useful. I still don't believe it is very useful to me. I believe it is not that useful during a project. I think the best use is during a post-mortem of a project to learn what might have been done incorrectly, and of course correctly, and use that information on the next project to make it more efficient.

An example that reinforces my argument is Figure 1 on page 6. The authors show that, in general, as the two pieces of software aged and became bigger, they were able to extract more usable terms from them. This seems like common sense to me.

My impression is that the work done by a programmer, or a team of programmers, whose work might be analyzed by this technique is driven by something other than the type of data this method gives them. If I created a word processing program by myself, for example, where I focus my work each day could be driven by the latest bug discovered. The next day, if I analyzed my work, it would be likely that I would discover that I worked on things related to that bug.

If I asked my new team of programmers to work on adding a feature to my word processor, I would likely discover that most of the work was subsequently done on that feature. Therein lies what I see as the limited value of the technique during a project. I could, the next day, see that the team, or an individual perhaps, had instead been working on the last bug. I could then try to change the direction of the misdirected team or programmer.

All this, of course, is coming from someone who hasn't worked in a programming production environment. Perhaps those who have would see this differently.

I also wonder how accurate the authors' seemingly arbitrary choices for some of the data are. Not that they are inaccurate in their methods but that they might not be looking at the correct data. For example, they state that they pruned the words that occur in less than 2% of documents and that occur in more than 80% of documents. I have no idea if this is the best selection or not. As sometimes occurs during studies of various types though, I wonder if these thresholds, among other things, might have been picked because they showed the outcome that favored the authors' point of view.

Mike's SOS

Monday, January 27, 2014

Thursday, January 9, 2014