IMPALA-12045: Strip ANSI escape sequences for JUnitXML

ANSI escape sequences do a variety of actions in the
terminal like adding color to compilation warnings.
generate_junitxml.py currently hits an error when trying
to generate JUnitXML for compilation output that contains
ANSI escape sequences.

This changes generate_junitxml.py to strip ANSI
escape sequences from the strings incorporated into
JUnitXML (e.g. the error output of a compiler).
The solution is based off the discussion at:
https://stackoverflow.com/questions/14693701

Testing:
 - A case where generate_junitxml.py was failing to
   generate JUnitXML now generates valid JUnitXML.
   The output still contains all the compiler warnings
   and information needed to diagnose the issue.

Change-Id: I9654a6b13350cb9582ec908b8807b630636a1ed0
Reviewed-on: http://gerrit.cloudera.org:8080/19708
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Wenzhe Zhou <wzhou@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
This commit is contained in:
Joe McDonnell
2023-04-05 16:55:03 -07:00
parent 3e0a422c2e
commit 2f73239607

View File

@@ -26,6 +26,7 @@ import argparse
import codecs
import errno
import os
import re
import textwrap
from xml.dom import minidom
from xml.etree import ElementTree as ET
@@ -170,6 +171,22 @@ class JunitReport(object):
return junit_log_file
@staticmethod
def remove_ansi_escape_sequences(string):
"""
Remove ANSI escape sequences from this string.
ANSI escape sequences customize terminal output by adding colors, etc.
Compilers use them to add color to error messages. ANSI escape
sequences interfere with producing the JUnitXML (and do not add any
value for JUnitXML), so this function strips them.
See https://stackoverflow.com/questions/14693701 for more information
on this solution.
"""
ansi_escape = re.compile(r'\x1B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])')
return ansi_escape.sub('', string)
@staticmethod
def get_xml_content(file_or_string=None):
"""
@@ -196,7 +213,7 @@ class JunitReport(object):
# This is a string passed in on the command line. Make sure to return it as
# a unicode string.
content = unicode(file_or_string, encoding="UTF-8")
return content
return JunitReport.remove_ansi_escape_sequences(content)
def __unicode__(self):
"""