Automate Migration Evaluation With XML Linter

When folks consider linting, the very first thing that involves thoughts is often static code evaluation for programming languages, however hardly ever for markup languages.

On this article, I want to share how our staff developed ZK Shopper MVVM Linter, an XML linter that automates migration evaluation for our new Shopper MVVM characteristic within the upcoming ZK 10 launch. The essential thought is to compile a catalog of recognized compatibility points as lint guidelines to permit customers to evaluate the potential points flagged by the linter earlier than committing to the migration.

For these unfamiliar with ZK, ZK is a Java framework for constructing enterprise functions; ZUL (ZK Person Interface Markup Language) is its XML-based language for simplifying person interface creation. By sharing our expertise growing ZK Shopper MVVM Linter, we hope XML linters can discover broader functions.

File Parsing

The Drawback

Like different well-liked linters, our ZUL linter begins by parsing supply code into AST (summary syntax tree). Though Java offers a number of libraries for XML parsing, they lose the unique line and column numbers of components within the parsing course of. As the next evaluation stage will want this positional info to report compatibility points exactly, our first job is to discover a approach to acquire and retailer the unique line and column numbers in AST.

How We Tackle This

After exploring totally different on-line sources, we discovered a Stack Overflow solution that leverages the event-driven property of SAX Parser to retailer the tip place of every begin tag in AST. Its key commentary was that the parser invokes the startElement technique every time it encounters the ending ‘>’ character. Due to this fact, the parser place returned by the locator have to be equal to the tip place of the beginning tag, making the startElement technique the proper alternative for creating new AST nodes and storing their finish positions.

public static Doc parse(File file) throws Exception 
  Doc doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
  parser.parse(file, new DefaultHandler() 
    non-public Locator _locator;
    non-public remaining Stack<Node> _stack = new Stack<>();

    @Override
    public void setDocumentLocator(Locator locator) 
      _locator = locator;
      _stack.push(doc);
    

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) 
      // Create a brand new AST node
      Component component = doc.createElement(qName);
      for (int i = 0; i < attributes.getLength(); i++)
        component.setAttribute(attributes.getQName(i), attributes.getValue(i));
      // Retailer its finish place
      int lineNumber = _locator.getLineNumber(), columnNumber = _locator.getColumnNumber();
      component.setUserData("place", lineNumber + ":" + columnNumber, null);
      _stack.push(component);
    

    @Override
    public void endElement(String uri, String localName, String qName) 
      Node component = _stack.pop();
      _stack.peek().appendChild(component);
    
  );
  return doc;

Constructing on the answer above, we carried out a extra refined parser able to storing the place of every attribute. Our parser makes use of the tip positions returned by the locator as reference factors to scale back the duty into discovering attribute positions relative to the tip place. Initially, we began with a easy thought of iteratively discovering and eradicating the final prevalence of every attribute-value pair from the buffer. For instance, if <elem attr1="worth" attr2="worth"> ends at 3:34 (line 3: column 34), our parser will carry out the next steps:

Initialize buffer = <elem attr1="worth" attr2="worth">
Discover buffer.lastIndexOf("worth") = 28 → Replace buffer = <elem attr1="worth" attr2="
Discover buffer.lastIndexOf("attr2") = 21 → Replace buffer = <elem attr1="worth"
Discover buffer.lastIndexOf("worth") = 14 → Replace buffer = <elem attr1="
Discover buffer.lastIndexOf("attr1") =  7 → Replace buffer = <elem
From steps 3 and 6, we are able to conclude that attr1 and attr2 begin at 3:7 and three:21, respectively.

Then, we additional improved the mechanism to deal with different formatting variations, equivalent to a single begin tag throughout a number of strains and a number of begin tags on a single line, by introducing the beginning index and main area stack to retailer the buffer indices the place new strains begin and the variety of main areas of every line. For instance, if there’s a begin tag that begins from line 1 and ends at 3:20 (line 3: column 20):

<elem attr1="worth
    throughout 2 strains"
    attr2 = "worth">

Our parser will carry out the next steps:

Initialize buffer = <elem attr1="worth throughout 2 strains" attr2 = "worth">
Initialize startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
Discover buffer.lastIndexOf("worth") = 45
Discover buffer.lastIndexOf("attr2") = 36
 → lineNumber = 3, startIndexes = [0, 19, 35] and leadingSpaces = [0, 4, 4]
 → columnNumber = 36 - startIndexes.peek() + leadingSpaces.peek() = 5
Discover buffer.lastIndexOf("worth throughout 2 strains") = 14
Discover buffer.lastIndexOf("attr1") =  7
 → Replace lineNumber = 1, startIndexes = [0], and leadingSpaces = [0]
 → columnNumber =  7 - startIndexes.peek() + leadingSpaces.peek() = 7
From steps 4 and eight, we are able to conclude that attr1 and attr2 begin at 1:7 and three:5, respectively.

On account of the code supplied beneath:

public void startElement(String uri, String localName, String qName, Attributes attributes) 
  // initialize buffer, startIndexes, and leadingSpaces
  int endLineNumber = _locator.getLineNumber(), endColNumber = _locator.getColumnNumber();
  for (int i = 0; _readerLineNumber <= endLineNumber; i++, _readerLineNumber++) 
    startIndexes.push(buffer.size());
    if (i > 0) _readerCurrentLine = _reader.readLine();
    buffer.append(' ').append((_readerLineNumber < endLineNumber ? _readerCurrentLine :
            _readerCurrentLine.substring(0, endColNumber - 1)).stripLeading());
    leadingSpaces.push(countLeadingSpaces(_readerCurrentLine));
  
  _readerLineNumber--;
  // get well attribute positions
  int lineNumber = endLineNumber, columnNumber;
  Component component = doc.createElement(qName);
  for (int i = attributes.getLength() - 1; i >= 0; i--) 
    String[] phrases = attributes.getValue(i).break up("s+");
    for (int j = phrases.size - 1; j >= 0; j--)
      buffer.delete(buffer.lastIndexOf(phrases[j]), buffer.size());
    buffer.delete(buffer.lastIndexOf(attributes.getQName(i)), buffer.size());
    whereas (buffer.size() < startIndexes.peek()) 
      lineNumber--; leadingSpaces.pop(); startIndexes.pop();
    
    columnNumber = leadingSpaces.peek() + buffer.size() - startIndexes.peek();
    Attr attr = doc.createAttribute(attributes.getQName(i));
    attr.setUserData("place", lineNumber + ":" + columnNumber, null);
    component.setAttributeNode(attr);
  
  // get well component place
  buffer.delete(buffer.lastIndexOf(component.getTagName()), buffer.size());
  whereas (buffer.size() < startIndexes.peek()) 
    lineNumber--; leadingSpaces.pop(); startIndexes.pop();
  
  columnNumber = leadingSpaces.peek() + buffer.size() - startIndexes.peek();
  component.setUserData("place", lineNumber + ":" + columnNumber, null);
  _stack.push(component);

File Evaluation

Now that now we have a parser that converts ZUL recordsdata into ASTs, we’re prepared to maneuver on to the file evaluation stage. Our ZulFileVisitor class encapsulates the AST traversal logic and delegates the duty of implementing particular checking mechanisms to its subclasses. This design permits lint guidelines to be simply created by extending the ZulFileVisitor class and overriding the go to technique for the node kind the lint rule wants to examine.

public class ZulFileVisitor 
  non-public Stack<Component> _currentPath = new Stack<>();

  protected void report(Node node, String message) 
    System.err.println(node.getUserData("place") + " " + message);
  

  protected void go to(Node node) 
    if (node.getNodeType() == Node.ELEMENT_NODE) 
      Component component = (Component) node;
      _currentPath.push(component);
      visitElement(component);
      NamedNodeMap attributes = component.getAttributes();
      for (int i = 0; i < attributes.getLength(); i++)
        visitAttribute((Attr) attributes.merchandise(i));
    
    NodeList youngsters = node.getChildNodes();
    for (int i = 0; i < youngsters.getLength(); i++)
      go to(youngsters.merchandise(i));
    if (node.getNodeType() == Node.ELEMENT_NODE) _currentPath.pop();
  

  protected void visitAttribute(Attr node) 

  protected void visitElement(Component node) 

Conclusion

The Advantages

For easy lint guidelines equivalent to “row components not supported,” growing an XML linter might appear to be an overkill when guide checks would suffice. Nonetheless, because the codebase expands or the variety of lint guidelines will increase over time, the benefits of linting will rapidly develop into noticeable in comparison with guide checks, that are each time-consuming and vulnerable to human errors.

class SimpleRule extends ZulFileVisitor 
  @Override
  protected void visitElement(Component node) 
    if ("row".equals(node.getTagName()))
      report(node, "`row` not supported");
  

However, sophisticated guidelines involving ancestor components are the place XML linters really shine. Contemplate a lint rule that solely applies to components inside sure ancestor components, equivalent to “row components not supported exterior rows components,” our linter would have the ability to effectively determine the infinite variety of variations that fulfill the rule, which can’t be accomplished manually or with a easy file search.

class ComplexRule extends ZulFileVisitor 
  @Override
  protected void visitElement(Component node) 
    if ("row".equals(node.getTagName())) 
      boolean outsideRows = getCurrentPath().stream()
        .noneMatch(component -> "rows".equals(component.getTagName()));
      if (outsideRows) report(node, "`row` not supported exterior `rows`");
    
  

Now It is Your Flip

Regardless of XML linting not being broadly adopted within the software program trade, we hope our ZK Shopper MVVM Linter, which helps us to automate migration evaluation, will have the ability to present the advantages of XML linting and even provide help to to develop your personal XML linter.