Academic Software Is Junk

Either that, or it’s stellar. I don’t know, it’s just a guess. The point is that almost no one else knows, either.

Software is a mainstay of modern research. In the computing fields in particular, but also in others, custom software is frequently rolled for research-oriented projects. This software is typically used to gather or process the data on which the results of the research are based. For reference, these phases are highlighted in the (oversimplified) research process shown below.

Whereas the results of using such software beget many articles in The Literature, the source of the software itself is rarely published. This suggests an easy avenue of attack when attempting to dispute research results: Simply ask for the source. An antagonist may use the strategy shown below to attack the results.

It goes something like this: The published results of your research contradict my worldview. I don’t care if your research is actually valid. At this point I don’t even care what data you have. I am going to ask for the source of any software that you used to gather or process the data on which the results are based. If you can’t/won’t give me the source, I am going to claim that the results are made up. In order to disprove me, you now have to release the source.

If you do release the source, I am going to subject it to analysis tools like FindBugs or Clang, or tools such as those produced by companies like Coverity and Parasoft who make a good business out of finding software defects. I am likely to find hundreds, possibly thousands of defects in your source (because any non-trivial software system has bugs), at which point I am going to claim that the results are invalid because they were produced using untrustworthy software. In order to disprove me, you now have to demonstrate that no defect (or combination of defects) affected the results, a tedious if not impossible task.

If I don’t find any defects, then I am probably using the tools incorrectly. This is not a statement of the inability of researchers to produce quality software, but rather of the complexity inherent to software development. Regardless, I am now going to ask for your data.

If you can’t/won’t give me the data, again, I am going to claim that the results are made up. In order to disprove me, you now have to release the data. If you do release the data, I am going to attempt to use your software.

If your software was used to gather the data in some way that cannot be automated, such as may be the case with a user study, I’ll probably make the standard complaints about methods and sample size, but my grumbling would be effectively neutralized by correct sampling and good statistical techniques. If your software has a UI that determines or influences the data gathered, I am going to subject it to analysis techniques such as GOMS, which may enable me to make some weak argument about bias in the results.

On the other hand, if your software was used to gather the data in some way that could be automated, or if your software was used to subsequently process the data, something that should be automated, but you have provided no automation with either the source or the data, I will claim that the results are unverifiable. (Software developers may recognize this as a variant of automated testing.) As the complexity of processing the data increases, so too does the likelihood that a mistake of process produces incorrect results, even supposing perfect software. In order to disprove me, you now have to explicate the process and provide some assurance that it was followed precisely. As any QA engineer can attest, this is no easy task.

This leaves the scenario that you have provided me with both the source and the data, that the source (and possibly the user interface) passes rigorous analysis, and that processing the data is automated. I can’t claim anything negative about the results, so I will compare you to Hitler.

I was picking on academia in the title, but this problem is not specific to academia. The credibility of public research is at stake, in particular research in fields that are politically charged. The havoc wreaked by “Climategate” was due to the leaking of emails and other ancillary documents that had little to do with the actual research. Despite the media coverage, the criticism was poorly received except by like-minded opponents of climate science. What happens when these opponents obtain the relevant software and start tearing it apart, finding thousands of defects? What happens if that receives the same media coverage?

As an opponent of some field of research, my goal would not be to negate the research but to defund it. I wouldn’t need to negate the research. I wouldn’t need to persuade supporters to become opponents like myself. I would need only to sway public opinion enough to increase the political capital necessary to fund the research, such that the funding remains below a critical mass. I could do that, at least in part, with a credible argument that receives adequate media coverage. I submit that attacking research by means of its supporting software is such an argument.

This is the reality of the environment in which public research must be conducted. So what to do?

The more obvious solution is to publish (under an appropriate license) the source of any supporting software alongside the corresponding research results, be they positive or negative. Publishing source should be a push activity, not a pull activity. In other words, source should always be published alongside results, regardless of whether anyone asks for it. This allows the software to undergo continuous scrutiny, evaluation, and improvement. Coupled with automated processing of the data, results can be easily reproduced in response to changes to the software.

The less obvious solution is to use only software for which the source is already published. In this regard the importance of OSS to public research cannot be underestimated. Though I have been focusing on custom software, ostensibly written by the researcher, the arguments above apply to any closed-source software. Many fine companies sell proprietary packages to institutions for research purposes. However, use of these packages is inconsistent with one of the core tenets of public research:

Using closed-source software to gather or otherwise process the data upon which the results of research are based is a violation of the principal of full disclosure.

In other words, the software used to support public research should be open-source software. If the software is written specifically to support the research, it should be released as open-source software. Using software whose internal processes cannot themselves be observed and analyzed is tantamount to hiding some portion of a physical experiment and then declaring “trust me, I’m getting correct numbers from this thing”.

Not only is this kind of openness essential to the integrity of public research, but alongside additional effort (such as the automated processing of data) it is likely to blunt a latent tactic that otherwise has the potential to erode support for such research.

Advertisements
Posted in Uncategorized | Tagged , , , , , , , , , , , , , | Leave a comment

How You Close Java Resources Is Probably Wrong

The standard boilerplate for using a stream looks like the following:

InputStream in = ...
try {
    doSomethingWith(in);
} finally {
    in.close();
}

The problem with this is, if doSomethingWith(in) throws an instance of IOException (or some exception that specifically indicates a failure due to an operation on the stream), the close() operation is also likely to throw an instance of IOException. The exception thrown from within the finally clause discards the exception thrown by the catch clause, hiding the original cause of the failure. This is the standard issue with a finally clause that completes abruptly.

Instead, the original exception should be allowed to propagate, though, arguably, an attempt to close the stream should still be made. In other words, if the finally clause is entered due to an exception of the stream, an attempt should be made to close the stream quietly:

InputStream in = ...
IOException streamFailure = null;
try {
    doSomethingWith(in);
} catch (IOException e) {
    streamFailure = e;
    throw e;
} finally {
    try {
        in.close();
    } catch (IOException e) {
        if (streamFailure == null) {
            throw e;
        } else {
            throw streamFailure;
        }
    }
}

This has the desired effect (though some subtle problems remain), but it is a lot of boilerplate for each new stream. A wrapper class could be written to encapsulate this behavior:

public class InputStreamWrapper extend InputStream {
    private InputStream source;
    private IOException streamFailure;
    public InputStreamWrapper(InputStream source) {
        this.source = source;
        this.streamFailure = null;
    }
    public int read() throws IOException {
        try {
            return this.source.read();
        } catch (IOException e)
            this.streamFailure = e;
            throw e;
        }
    }
    public void close() throws IOException {
        try {
            this.source.close();
        } catch (IOException e) {
            if (this.streamFailure == null) {
                throw e;
            } else {
                // close quietly
            }
        }
    }
}

The above wrapper allows the larger boilerplate to be reduced back to the original, smaller boilerplate. However, it has several disadvantages, the foremost among them being that such a wrapper would need to be written for each stream class that provides specialized behavior. The large boilerplate (or a similar solution) must then be used, until JDK 7 is shipped with the Automatic Resource Management proposal of Project Coin.

Posted in Uncategorized | Tagged , , , , | Leave a comment

Developer Intentionality

Intentionality is how a developer intends for a software system to perform, based on its specification. A specification of a software system is consistent with the intentionality of the developer if every possible faithful execution of the system performs as the developer intended it to perform. A specification of a software system is inconsistent with the intentionality of the developer if any possible faithful execution of the system performs contrary to the way in which the developer intended it to perform.

An execution of a system is not a faithful execution if:

  1. A translation from the source specification to the execution specification does not retain the semantics of the source specification.
  2. A requirement of the execution environment is not met.
  3. An unspecified failure of the execution environment occurs during execution.

An example of the first condition might be a semantically non-equivalent executable caused by an overly aggressive optimization. An example of the second condition might be insufficient processor resources due to a GPU overloaded by other processes. An example of the third condition might be a defect of the execution environment, or an amplitude distortion occurring in the hardware. Any of these conditions can cause an executing system to exhibit behavior that appears to indicate a defect in the specification of the system, when in fact there may be no such contributing defects.

Intentionality can be inclusive or exclusive. Inclusive intentionality is intentionality that defines the behavior an execution should exhibit. Exclusive intentionality is intentionality that defines the behavior an execution should not exhibit. Programming languages tend toward inclusive intentionality.

Intentionality can be ambiguous. An execution of a system may exhibit behavior that is more precise than required by the intentionality of the developer. For example, intentionality may require that every execution of the system completes within a certain time period, even though each execution may complete within a narrower time period.

Intentionality can be incomplete. An execution of a software system may exhibit behavior that is not captured by the intentionality of the developer, whether intentionally or unintentionally. For example, intentionality may not require that any execution of the system completes within a certain period of time, even though this is implicitly desirable.

Facilitating the alignment of intentionality with the actual behavior of a system is one of the primary goals for the variety of tools and languages available. Intentionality plays an important role in the specification of a software system, since it defines both defects and equivalent implementations.

Posted in Uncategorized | Tagged , | Leave a comment

Broadening the Definition of Formal Specification

A formal specification is, classically, a description of a (software) system from which an implementation can be derived, or against which an implementation can be verified. Basic examples include interfaces like those found in Java or C#, and assertions or invariants like those found in Eiffel. More sophisticated examples include specifications written in Z notation. The “formal” part comes from the idea that a formal specification is subject to formal analysis, which is to say formal methods (aka math).

A formal specification is generally considered to be a declarative specification rather than an imperative specification. Speaking outside the strict context of programming language paradigms, the dichotomy of declarative versus imperative is a false one. A system must be executed consistently with respect to the intended outcome of its implementation, regardless of whether the control flow described by any imperative specification of the implementation is followed literally. Any intermediate representation (GIMPLE, CLR, bytecode, etc.) is free to be manipulated so long as the resultant execution is consistent. Software optimization is the typical example.

In this sense, an implementation is an exemplar member of the equivalence class of implementations that express the same intended outcome, regardless of whether the implementation is specified imperatively. An implementation then is itself a description of the system from which an implementation can be derived, or against which an implementation can be verified. This erases the distinction between a formal specification and an implementation.

A formal specification is thus any specification of a system. Even more broadly, because a system is specified using languages that need not be programming languages (XML, properties files, etc.), a formal specification is any specification that conforms to a Turing-recognizable language. The result is a definition that is significantly broadened but, coincidentally, is still consistent with alternate terminology: A formal specification is a specification that conforms to a formal language.

Posted in Uncategorized | Tagged , | Leave a comment