Stripping Strings – A Python Utility

I do a lot of work with various external devices that communicate via a command-response protocol using ASCII strings.  Unfortunately some of these gadgets don’t have symmetrical responses, and some even include characters that, while they are valid ASCII values, don’t really make any sense (unless it was someone just trying to be clever without thinking things through completely).

So I needed a utility in Python that I could use to strip out the “junk” in a response string and just preserve the important data. The result is Stripper, which is shown below.

def Stripper(instr, pack = True, sense_etx = False):
""" Removes non-printable characters and whitespace from a string.
    Feb 2009 - JMH

  st = 0          # character slice start position index
  en = st + 1     # character slice end position index
  char_list = []  # result array

  # ignore non-string input
  if type(instr) == str:
    # get the input string length
    cnt = len(instr)

    # don't do null strings
    if cnt > 0:
      while True:
        char = instr[st:en]
        # check for ETX and exit if found
        if sense_etx and (ord(char) == 3):
          if not pack:
            # convert tab to space
            if ord(char) == 9:
              char = chr(32)
              # if a valid character, save it
              if ord(char) >= 32 and ord(char) <= 126:
# save only characters (toss spaces, too)
if ord(char) >= 33 and ord(char) <= 126:
# set up for next char in the string
st += 1
en += 1
# see if we've walked off the end of the string
if en > cnt:
#end while
return ''.join(char_list)

The following description was originally in the docstring (I use epydoc, and I like verbose descriptions) but, for whatever reason, the ‘python’ sourcecode tag was mangling it. So I put it here:

This utility function is intended for situations where an external device connected to a serial port might send back additional “stuff” in a response string that either isn’t useful or is problematic (like \xff at the start of a message string).

The parameter instr is a non-empty string the elements of which may contain any value between 0 and 255. If an empty string is passed in then the function will return an empty string. If instr is not a string then the function will also return an empty string.

Stripper removes all non-printable characters from a string by simply discarding them. If pack is False then everything with an ASCII value between 32 (space) and 127 (‘~’) is preserved, and tabs (value = 9) are converted to a single space (one space per tab chracter). If the input parameter ‘pack’ is True, then spaces and tabs are also tossed out. The default is to pack the output string.

If the parameter ‘sense_etx’ is True then an ETX character (\x03) in the input string is interpreted as an End-Of-Transmission and the function will exit. Otherwise the input string is scanned until the entire string has been examined (the default behavior).


Input : '\xff\t/\t0 @\x00\x03\r\n'
pack = False, sense_etx = True
Output: ' / 0 @'
pack = False, sense_etx = False
Output: ' / 0 @'
pack = True, sense_etx = False
Output: '/0@'
pack = True, sense_etx = True
Output: '/0@'

Note that in any case the \x00, ETX, CR and LF (\r and \n) characters are stripped from the input string.

This may not be the most Pythonic way of doing this but I wasn’t aiming to be clever. I wanted the code to be easily comprehensible and easily verified. There is almost always a trade-off between compact cleverness and transparency, and I will almost always opt for transparency.


0 Responses to “Stripping Strings – A Python Utility”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: