In this poster, we discuss the unification of regular expressions to find antipatterns in WebTA. WebTA is a multi-language code critic designed to detect, report, and explain novice antipatterns to beginner programmers across many engineering and computing disciplines. Novice antipatterns are mistakes made in code that seem correct, but contain logical and structural fallacies. WebTA finds these antipatterns, displays them to the student, and offers immediate and meaningful, novice-targeted feedback to fix the problem. WebTA currently supports Java, MATLAB, and Python, with more languages in development.
Many of the antipatterns in WebTA are specified using regular expressions. Similar antipatterns appear across the different languages, with subtle differences based on the language’s representation of logical structures such as if, while, or operator statements. While these differences are syntactically different, they are semantically identical. For each language we add to WebTA, many antipatterns need to be rewritten due to these syntactical differences. This increases development time and lessens the effectiveness of new languages due to a lesser corpus of antipattern definitions.
Unified Regular Expression Antipattern Language (UREAL) seeks to unify regular expression antipatterns where the only difference is syntax. UREAL captures syntactic differences by language through regex expression tokenization. Instead of specifying the specific regular expression for each code structure, we specify a UREAL token which is usable across languages. We then use these UREAL tokens to create the regular expression antipatterns. We are able to automatically substitute language-specific regular expressions into UREAL expressions when using them to parse a given language to find antipatterns. In effect, if the "shape" of a piece of source code in two different languages is similar, we are able to write one UREAL expression to match it. This design-based research is evaluated on the reduction of regular expressions that need to be produced when specifying similar antipatterns across separate languages.
By unifying the regular expressions in this way, we are able to reduce development time for new languages, increasing the time that can be spent encoding new antipatterns and providing quality feedback. Increasing the effectiveness and language diversity of WebTA will help students improve their programming skills regardless of chosen language and will help instructors draw upon a deeper antipattern library.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.