Today: string in, if/elif, string .find(), slices and drawing

See chapters in Guide: String - If

String in Test

>>> 'Dog' in 'CatDogBird'
True
>>> 'dog' in 'CatDogBird'   # upper vs. lower case
False
>>> 'd' in 'CatDogBird'     # finds d at the end
True
>>> 'atD' in 'CatDogBird'   # not picky about words
True
>>> 
>>> 'x' in 'CatDogBird'
False

Variant: not in

There's also a not in form which is True if the element is not in there, similar to !=. Use this form an if-statement where you want to take an action if something is not in a string.

>>> s = 'CatDogBird'
>>> if 'Fish' not in s:     # YES this way
        print('no Fish')
no Fish
>>>
>>> if not 'Fish' in s:     # NO works but not PEP8
        print('no Fish')
no Fish
>>>

In Python style, the first form above - not in - is officially preferred to the second.

Example: has_pi()

> has_pi()

    '3.14' -> True
'a 14 b 3' -> True
     '315' -> False

has_pi(s): Given a string s, return True if it contains the substrings '3' and '14' somewhere within it, but not necessarily together. Use "in".

Note these functions are in the string-3 section on the Experimental server

has_pi() Expression - Does This Work?

This looks reasonable.

if '3' and '14' in s:
    ... 

Q: Does the above work?

A: No it does not work 

This example looks sensible if we think of it as a phrase of English. So it is a bit tragic that it does not work in Python or most computer languages.

Unlike English text, the and must be between two boolean values, with each boolean value produced by an expression like < or == or in. The correct form is shown below. Notice how each side of the and is a free-standing expression that produces a boolean.

if '3' in s and '14' in s:  # Works correctly
    ...

Strategy: Prefer Built In Functions

We prefer using Python built-in functions such as "in" vs. writing the code yourself. It would be a mistake to manually write code look through a string to see if another string appears in there. Just use "in" for that. The built-in works correctly, and it makes readable code, since Python programmers are already familiar with "in" and know what it does at a glance.

The "in" operator works for several data structure to see if a value is in there, and its use with strings is our first example of it. Also, in some cases, the built-in can run faster than what you could code yourself.

Later Practice: has_first()

> has_first()


Example: catty()

> catty()

'xaCtxyzAx' -> 'aCtA'

Return a string made of the chars from the original string, whenever the chars are one of 'c' 'a' 't', (not case sensitive).

Catty Solution V1

Not case sensitive: convert each char to lowercase form with s[i].lower(), then do testing with the lowercase form.

This works correctly, but that if-test is quite long. Can we do better? Indeed, it's so long, it's awkward to fit on screen.

Aside: see style guide breaking up long lines for way to break up long lines like this.

def catty(s):
    result = ''
    for i in range(len(s)):
        if s[i].lower() == 'c' or s[i].lower() == 'a' or s[i].lower() == 't':
            result += s[i]
    return result

Strategy Idea: Add Var

Catty Solution V2 - Better

Start with the V1 code. Add a variable to hold the repeated computation — shorten the code and it "reads" better with the new variable.

    low = s[i].lower()

Solution with "low" variable, better

def catty(s):
    result = ''
    for i in range(len(s)):
        low = s[i].lower()   # Add var
        if low == 'c' or low == 'a' or low == 't':
            result += s[i]
    return result

We will make frequent use of this strategy in CS106A. If the solution is getting a little lengthy, add a variable to hold some sub-part of the computation for use on later lines.

Style: Variable Names

We'll talk about Style more later. For today, the name of a variable should label that value in the code, helping the programmer to keep their ideas straight. Other than that, being short is a great quality in a variable name. Typing and reading long variable names feels like a drag on the coding effort. The name does not need to repeat every true thing about the value. Just enough to distinguish it from other values in this algorithm.

1. Good names for this example, short but with key facts

# Good names
low
low_char

2. Names that are too long or too short:

# Too long
low_char_i
low_char_in_s

# Too short, cryptic
a
c

3. Avoid this name: lower - the name would work, but we avoid choosing a name for a variable that is already the name of a function, to avoid confusion. Here .lower() is the name of a string function.

# Avoid Names of a Functions
lower
len

The V1 code above is acceptable, but V2 is shorter and nicer. The V2 code also runs slightly faster, as it does not needlessly re-compute the lowercase form three times per char.

Optional Aside: "in" Trick Form of "or"

This is just a coding trick, not something we would ever require or look for students to do. The way in works for strings, it can do the "or" logic for us, like this:

# This works
if low == 'c' or low == 'a' or low == 't':
    ...

# Trick with "in", equivalent
if low in 'cat':
    ...

Recall: if and if/else:

N Tests - if/elif

Use the if/elif structure to look through a series of tests, stopping at the first True test. This is much more rarely used than the plain if-statement.

The sequence is akin to looking through a series of drawers for a pen — you look in each drawer in turn, and stop as soon as you find the pen.

The structure has n if-tests.

if test1:
  action1
elif test2:
  action2
elif test3:
  action3
else:
  action4

Python goes through the tests from top to bottom, stopping at the first True test. Python runs the corresponding action, and then exits the if/elif structure. The result is that at most 1 of the n actions runs. An optional "else" at the end runs if none of the tests succeed. Mnemonic: the words "else" and "elif" are the same length.

Example: vowel_swap()

> vowel_swap()

The need for an if/elif structure is a little rare, but this problem is dialed in to show what if/elif solves.

The most common letters used in English text are: e, t, a, i, o, n

Here we process string s, swapping around the 3 most common vowels like this:

e -> a
a -> i
i -> e

This changes an English word in a way that looks like a word and is kind of funny.

'table' -> 'tibla'
'kitten' -> 'kettan'
'radio' -> 'rideo'

vowel_swap(s): Given string s. We'll swap around the three most common vowels in English, which are 'e', 'a', and 'i'. Return a form of s where each lowercase 'e' is changed to 'a', each 'a' is changed to 'i', and each 'i' is changed to 'e'. Other chars leave unchanged. So the word 'kitten' returns 'kettan'. The provided loop sets a variable ch to hold each char in turn, appending ch to the result. Add code to change ch.

vowel_swap() v1 Code

The provided loop sets a variable ch to be each char in turn. This solution is written with plain "if" to check and change each char. Plain if works fine in many circumstances, but for this algorithm, it runs into a quite subtle problem.

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        if ch == 'a':
            ch = 'i'
        if ch == 'i':
            ch = 'e'
        
        result += ch
    return result

Run this code. Here is some incorrect output it produces

'aaaa' -> 'eeee'

Why is 'a' replaced by 'e' instead of the expected 'i' here?

Problem Trace - Multiple If Interference

The problem is not obvious glancing at the code. Trace through the v1 code carefully for the input 'aaaa'. The ch == 'a' if-test succeeds, which is fine. But then the ch == 'i' test also succeeds, which is a problem. We have multiple if-tests, and they are interfering with each other.

vowel_swap() Solution if/elif

With if/elif, only one if-test succeeds, which is what we want for this 'e' 'a' 'i' detection:

def vowel_swap(s):
    result = ''
    for i in range(len(s)):
        ch = s[i]
        # Make changes to ch
        if ch == 'e':
            ch = 'a'
        elif ch == 'a':
            ch = 'i'
        elif ch == 'i':
            ch = 'e'
        
        result += ch
    return result

if/elif vs. if/return

A return can accomplish something similar to the if/elif structure, which is why we have not really needed if/elif up until now. Suppose we are doing the vowel-swap algorithm, but in a function that processes a single char. This is our pick-off strategy, exiting the function once a solution is known.

def swap_ch(ch):
    """Vowel-swap on one char."""
    if ch == 'e':
        return 'a'
    if ch == 'a':
        return 'i'
    if ch == 'i':
        return 'e'
    return ch

Since the return exits the function, we get in effect the if/elif behavior. Once an if-test succeeds, the later ones are skipped.

However, the full-string vowel_swap() above cannot use return like this, as it needs to keep running the loop to do the other characters. We need to handle each char in the loop but without leaving the function, and for that, the if/elif is perfect.

Later Practice: str_adx()

> str_adx()

str_adx(s): Given string s. Return a string of the same length. For every alphabetic char in s, the result has an 'a', for every digit a 'd', and for every other type of char the result has an 'x'. So 'Hi4!x3' returns 'aadxad'. Use an if/elif structure.


String .find()

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> 
>>> s.find('th')
2
>>> s.find('o')
4
>>> s.find('y')
1
>>> s.find('x')
-1
>>> s.find('N')
-1
>>> s.find('P')
0

Strategy: Dense = Slow Down

Python String Slices 1

alt:string 'Python' shown with index numbers 0..5

>>> s = 'Python'
>>> s[1:3]    # 1 .. UBNI
'yt'
>>> s[1:5]
'ytho'
>>> s[4:5]
'o'
>>> s[4:4]    # Empty string
''

Slices 2 - Can Omit Start/End Numbers

alt:string 'Python' shown with index numbers 0..5

>>> s[:3]     # Omit num = from/to end
'Pyt'
>>> s[:4]
'Pyth'
>>> s[4:]     # Split str at 4
'on'
>>> s[4:999]  # Too big = through the end
'on'
>>> s[:]      # The whole thing
'Python'

brackets() Strategy - Drawing vs. OBO Errors

This is a nice example. The code is dense, but the details can be managed with careful use of variables and a drawing. This code can easily fail with Off-By-One (OBO) errors, but we try to proceed carefully and get each line exactly right. Or more simply — don't try to do it in your head.

Example: brackets()

> brackets

'cat[dog]bird' -> 'dog'

brackets(s): Look for a pair of brackets '[...]' within s, and return the text between the brackets, so the string 'cat[dog]bird' returns 'dog'. If there are no brackets, return None. If the brackets are present, there will be only one of each, and the right bracket will come after the left bracket.

A first venture into using index numbers and slices. Many problems work in this domain - e.g. extracting all the hashtags from your text messages.

Brackets Drawing

alt: draw 'cat[dog]bird', show left, right before arrows added

Brackets Observations

Brackets Drawing After

alt: draw 'cat[dog]bird', show left, right with arrows added pointing into string

Brackets Solution — Readable

def brackets(s):
    left = s.find('[')
    if left == -1:
        return None
    right = s.find(']')
    return s[left + 1:right]

For programming style, we prefer "readable" code — when the eye sweeps over the code, what the code does is apparent. This code is quite dense, but the variable names do help. Look at the last line. You can see how it is using the index numbers for the left and right brackets, even if the OBO of the exact numbers is something puzzle over.

(optional) Brackets Without Add-Var

The variables left and right are a natural example of the Add Var strategy. Pulling important parts of the algorithm into variables, so the names of the variables sort of narrate the lines as you look at them.

Just as comparison, here's the code with no variables. It works fine, and it's one line shorter, but the readability is clearly worse. It also likely runs a little slower, as it computes the left-bracket index twice.

def brackets(s):
    if s.find('[') == -1:
        return None
    return s[s.find('[') + 1:s.find(']')]

Code that is unreadable is more likely to have bugs, so we prefer the readable version, with its variables labeling the parts of the computation.

Exercise: inside3x()

> inside3x()

'hi((yo))bye' -> 'yo,yo,yo'

inside3x(s): Given a string that may contain a pair of double-parenthesis, like 'aa((bbb))cc'. There is some text inside the parenthesis and some before and after. Return a string like 'bbb,bbb,bbb', made of three copies of the inside text separated by commas. The string is guaranteed to either contain the double parenthesis in the correct order, or will contain no parenthesis. The starting code includes the two s.find() calls.

Hint: make a drawing. Pull out the text inside, store in a variable "text". Use + to put together the result string. Add an if-statement to pick off the case that there are no parenthesis. We cannot use "in" as a variable name, since it is a Python operator.

inside3x() Solution

def inside3x(s):
    left = s.find('((')
    right = s.find('))')
    if left == -1:
        return None
    text = s[left + 2:right]  # Add var strategy
    return text + ',' + text + ',' + text

Later Practice: at_3()

> at_3

Here is a more difficult problem, similar to brackets for you to try. A drawing really helps the OBO on this one.

Milestone-1 - get the 'abc' output below, not worrying about if the input is too short

Milestone-2 - add logic for the too-short case. Note the i < len(s) valid idea below.

at_3(s): Given string s. Find the first '@' within s. Return the len-3 substring immediately following the '@'. Except, if there is no '@' or there are not 3 chars after the '@', return None.

'xx@abcd' -> 'abc'
'xxabcd' -> None
'x@x' -> None

at_3() Hint: Valid Index i < len(s)


More s.find() if we have time...

s.find() 2 Param Form

s.find() variant with 2 params: s.find(target, start_index) - start search at start_index vs. starting search at index 0. Returns -1 if not found, as usual. Use to search in the string starting at a particular index.

Suppose we have the string '[xyz['. How to find the second '[' which is at 4? Start the search at 1, just after the first bracket:

>>> s = '[xyz['
>>> s.find('[')      # find first [
0
>>> s.find('[', 1)   # start search at 1
4

Exercise: parens()

> parens()

'x)x(abc)xxx' -> 'abc'

This is nice, realistic string problem with a little logic in it.

Thinking about this input: '))(abc)'.

Hint Here is some starting hint code, to find the right paren after the left paren:

    left = s.find('(')
    ...
    right = s.find(')', left)

1. This fine:
right = s.find(')', left)

Is there a right parenthesis at index left? No, is not possible for a right parenthesis to be at that exact index. We already know that index holds a left parenthesis.

2. Therefore, could write it this way, moving the search for the right parenthesis 1 index farther along:
right = s.find(')', left + 1)

We can appreciate having the sort of analytical mind that work out that (2) will work. That said, keeping things as simple as possible, KISS, is a great strategy for code, and so simply writing (1) is probably for the best.

Optional: Negative Slice

alt: negative index into string

>>> s = 'Python'
>>> s[len(s) - 1]
'n'
>>> s[-1]  # -1 is the last char
'n'
>>> s[-2]
'o'
>>> s[-3]
'h'
>>> s[1:-3]  # works in slices too
'yt'
>>> s[-3:]
'hon'