Strings

A string is a data type.

Another data type is a string. A string is a series of letters, numbers, and even other special characters from the keyboard. Strings are very similar with lists.

phrase = "This is a string example"
example_of_string = "this is another string example"
type(phrase) # <class 'str'>
phrase + example_of_string # 'This is a string examplethis is another string example'

As you can see, by using the + sign, we merged the values of the two string variables. This process is called concatenation of strings or simply concatenation. + is the concatenation operator.

phrase + example_of_string # 'This is a string examplethis is another string example'
# TypeError: unsupported operand type(s) for -: 'str' and 'str'

The - operator is not valid for strings.

The - operator is not valid for strings. But it can be used for numbers. Try adding two numbers.

Working with strings

split() function

split() function can be used to split a string into more strings and put all of them in a single list.

address = "1600 Pennsylvania Ave NW Washington, DC"

# .split() is a string method (a function that works only on strings) that splits a string into a list based on some delimiter.
# In this example, we're splitting address into a list at every space.
address = address.split(" ")

# Address is now a list equal to:
# ['1600', 'Pennsylvania', 'Ave', 'NW', 'Washington,', 'DC']
# Note that the list created is a list of strings.

# And since it's a list, you can loop over it!

# .split() is commonly used to split text files into a list (at each newline)
# .split() is also commonly used to split spreadsheet files in comma separated value (CSV) format into a list (at each comma)

# Any time you need to split a string into multiple parts, you can use .split()

License

join() function

join() function can be used to join the strings from a list to a single string.

states = ["Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District Of Columbia", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Ohio", "Oklahoma", "Oregon", "PALAU", "Pennsylvania", "PUERTO RICO", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming"]

# .join() is a string method (a function that works only on strings) that glues together a list back into a string.

# .join() has two main parts to it: the glue and the list

# The glue is the string that you'd like to glue in between each piece of the list as you're putting it back together as a string.
# The glue is the string that goes just before the dot.

# The list is the list you'd like to glue back together.

# So if we ran the following command:
print ("glue".join(states))

# We'd get:
#AlabamaglueAlaskaglueArizonaglueArkansasglueCaliforniaglueColoradoglueConnecticutglueDelawareglueDistrict Of ColumbiaglueFloridaglueGeorgiaglueHawaiiglueIdahoglueIllinoisglueIndianaglueIowaglueKansasglueKentuckyglueLouisianaglueMaineglueMarylandglueMassachusettsglueMichiganglueMinnesotaglueMississippiglueMissouriglueMontanaglueNebraskaglueNevadaglueNew HampshireglueNew JerseyglueNew MexicoglueNew YorkglueNorth CarolinaglueNorth DakotaglueOhioglueOklahomaglueOregongluePALAUgluePennsylvaniagluePUERTO RICOglueRhode IslandglueSouth CarolinaglueSouth DakotaglueTennesseeglueTexasglueUtahglueVermontglueVirginiaglueWashingtonglueWest VirginiaglueWisconsinglueWyoming

# Funny (and helpful for remembering), but that doesn't look very good.

# Instead, let's use a useful piece of glue.  Let's glue it back together with a newline between each state.
print ("\n".join(states))

# Now we get:
# Alabama
# Alaska
# Arizona
# Arkansas
# California
# Colorado
# Connecticut
# Delaware
# District Of Columbia
# Florida
# Georgia
# Hawaii
# Idaho
# Illinois
# Indiana
# Iowa
# Kansas
# Kentucky
# Louisiana
# Maine
# Maryland
# Massachusetts
# Michigan
# Minnesota
# Mississippi
# Missouri
# Montana
# Nebraska
# Nevada
# New Hampshire
# New Jersey
# New Mexico
# New York
# North Carolina
# North Dakota
# Ohio
# Oklahoma
# Oregon
# PALAU
# Pennsylvania
# PUERTO RICO
# Rhode Island
# South Carolina
# South Dakota
# Tennessee
# Texas
# Utah
# Vermont
# Virginia
# Washington
# West Virginia
# Wisconsin
# Wyoming

License

Substrings

To find a substring in a string you can use find function.

a="find my word here"
b="word"
a.find(b) # this will return 8

In the example from above, the 3rd line will return the index of the searched substring, which is 8 (‘w’ is the 7th letter from a, and because indexing starts with 0, we’ll have 7+1).

To check this we can:

a="find my word here"
b="word"
b_index = a.find(b) 
print(b_index) # this will print 8
print(a[b_index, b_index+len(b)]) # this prints word

The last line will print the substring of a, starting from the seached index (b_index) and of length equal with the length of b. This means that word will be printed.

A more derailed example:

# String methods: string.find()

# string.find() tells you where you can find a part of one string in a larger string.
# string.find() will return a number:
# 		if string.find() returns -1, it could not find the string inside the larger string.
#		otherwise, string.find() will return the slicing number/index of where it found that string

email_address = "hoorayforpython@notarealwebsite.com"

print ("I found the snail at: {0}".format(email_address.find("@"))) # the slicing number/index of where the at symbol appears

# string.find() + slicing = awesome!

# Everything before the @ is part of the email_handle; everything after the @ is part of the domain where they have their email registered.
# Let's use string.find() and slicing together to split those apart.

at_symbol_index = email_address.find("@")

print ("I found the snail at: {0}".format(at_symbol_index)) # Notice how line 10 and 19 each give the same result, but take a different approach

email_handle = email_address[0:at_symbol_index]

print "The email_handle is: {0}".format(email_handle)

email_domain = email_address[at_symbol_index + 1:] # without the +1, the at symbol would be included. Notice that there is no number after the colon, so Python assumes you want everything to the end.

print ("The email_domain is: {0}".format(email_domain))

print ("When string.find() can't find a string, it'll give a -1.  So since there's no 'QQQ' in email_address, this will return a -1: {0}".format(email_address.find("QQQ")))

License

String formatting

# String Formatting

# String formatting is how we can use variables (which store information including numbers, strings, and other types of data) inside of strings
# We can do this by using the .format() string method.

# Here's how it works:

# First, we'll need a variable:
name = "Shannon"

# Now, let's insert it into the print statement:
print ("My name is {0}".format(name)) # This will print "My name is Shannon"

# We'll analyze each part of the syntax in a moment.  For now, why is this preferable to doing a print "My name is Shannon"?

# Using .format() is more flexible and allows your strings to change as your variables change.

# So let's give the name variable a new value.
name = "Pumpkin"

# Now, let's print it again
print ("My name is {0}".format(name)) # This will print "My name is Pumpkin"

# Remember that Python runs commands from top to bottom, left to right.

# The two new parts of this print statement are the {0} and the .format(name)

# The {0} is a placeholder for the 0th variable in the list that appears inside the parentheses of .format() -- remember Python starts counting at 0, not 1
# So it really just keeps the spot warm.

# To see why it's {0}, let's define a few more variables.

age = 100
location = "The Pumpkin Patch"

# Now if we want to include those variables, we'll need to put placeholders in the string as well.
print ("My name is {0} and my age is {1} and I live in {2}".format(name, age, location))

# Note how we put the placeholders exactly in the string where we want them; and the variables go inside the parentheses of the .format()

# Remember how Python counts.
# So {0} is a placeholder for name;
# {1} is a placeholder for age;
# and {2} is a placeholder for location

# If we had more variables to include, we'd continue in the same way.

# But there's more than one way to do this:
print ("My name is {name} and my age is {age} and I live in {location}".format(name=name, age=age, location=location)) # This way feels more explicit

License

Lower and upper

# String methods: string.lower()

# string.lower() is used for turning all characters in your string lowercase.
# There are some related string methods too, like string.upper()

name = "SHANNON!!"

print (name.lower()) # shannon!!
print (name) # it's back to the original of SHANNON!!

# To make the changes stick:
name = name.lower()

print (name) # shannon!!


# string.upper() will turn all characters in your string uppercase but otherwise works in the same manner as string.lower()

greeting = "hello, hi" # not very exuberant ...

print (greeting.upper()) # MUCH BETTER!

# Making the changes stick:
greeting = greeting.upper()

print (greeting) # HELLO, hi


# string.lower() and .upper() are primarily used for testing strings in a case-insensitive manner

gender = 'F'

if gender.lower() == 'f':
    print ("Hi lady!")

# To accomplish the same thing without string.lower(), you would have to do:
if gender == 'F' or gender == 'f':
    print ("Hi lady!")

License

Replacing strings

# String methods: string.replace()

# string.replace() is similar to the find -> replace feature in Word, Excel, or other office-y type programs

song = "eat, eat, eat, apples and bananas"

# Let's start here:
print ("I like to ... {0}".format(song))


# string.replace() lets us replace all instances of one string with another.
print ("I like to ... {0}".format(song.replace("a","o"))) # We're replacing all of the lowercase *a*s in song with *o*s

# Let's take a look at the syntax.
# We've seen the {0} syntax; that's the placeholder that string.format() uses to insert a variable into the string that comes before the dot in .format()
# The 0 corresponds to the first variable in the list inside the parentheses (remember that Python starts counting at zero)
# What's the variable we're going to insert at {0}? It's song.replace("a", "o")
# Python will evaluate song.replace("a", "o") and place the result inside of the {0}
# How song.replace("a", "o") works is: .replace() will replace every "a" it finds in song with an "o"
# The way I remember it is .replace() will perform its action on what comes before the dot (which in song.replace("a", "o"), is song)

print ("But note that the original song itself is unchanged: {0}".format(song))

print ("string.replace() is case-sensitive.")
print (song.replace("Eat", "chop")) # This won't replace anything!

print (song)
print (song.replace("eat", "chop"))
print (song) # the original is unchanged

# If you want your changes to stick, you'll need to assign your variable song a new value
song = song.replace("eat", "chop")
# What we're saying here is essentially:
# song is now equal to the new value of song.replace("eat", "chop")

# If you have lots of replaces to do on a string, you *could* do it like this:
song = song.replace("apples", "mangos")
song = song.replace(" and", ", pears, and")
song = song.replace("bananas", "kiwis")

print (song)

# Or, you could chain lots of replaces together -- remember that what gets replaced is what comes before the dot!
# In other words, replaces will occur in left-to-right order
song = "eat, eat, eat, apples and bananas" # setting it back to the original
song = song.replace("eat", "chop").replace("apples", "mangos").replace(" and", ", pears, and").replace("bananas", "kiwis")

print (song)

License

Counting

# String methods: string.count()

# string.count() tells you how many times one string appears in a larger string

gettysburg_address = """
Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. 
We are met on a great battlefield of that war. 
We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. 
It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate, we can not consecrate, we can not hallow this ground. 
The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. 
The world will little note, nor long remember what we say here, but it can never forget what they did here. 
It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. 
It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause 
for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, 
shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.
"""

# Now that we have a fairly long string to search through, let's see how many times the word "people" appears in the text
print(gettysburg_address.count("people")) # appears 3 times

# What goes inside the parentheses is the string that you're looking for; the larger string to look inside is the string that comes before the dot.

print(gettysburg_address.count("here, ")) # appears 2 times
print(gettysburg_address.count("e")) # appears 165 times
print(gettysburg_address.count("!!!!!!")) # doesn't appear at all

License

Slicing

# Slicing examples
# Slicing allows us to see one piece or 'slice' of an item, like a single character (or set of characters) within a string


# Let's start by creating a variable called github_handle; it will hold a string with my GitHub handle in it
github_handle = '@shannonturner'


# You can use a comma to separate different items that you want to print(as shown below)
print("My github handle is ", github_handle)


# This is our first slicing example.  Notice the square brackets attached directly to the variable name with no spaces in between.
# The two numbers in the middle, separated by a colon, are called the slicing indexes
print("My first name is ", github_handle[1:8])


# Here's how you can visualize the print(statement above.)

#       @shannonturner
#       0123456789....

# A note about the above: Python starts counting at zero, and the last few letters (r, n, e, r) are tied to 10, 11, 12, 13

# Or, shown vertically, it looks like this:

##      0		@
##      1		s
##      2		h
##      3		a
##      4		n
##      5		n
##      6		o
##      7		n
##      8		t
##      9		u
##      10		r
##      11		n
##      12		e
##      13		r

# So in the example of github_handle[1:8], notice that the t (at slice #8) is not included, but the s (at slice #1), is.
# That's because the first slice value is inclusive, but the second slice value is exclusive.
# I think of it as: Python starts at 1 and walks UNTIL it gets to 8 and then stops, gathering up everything in between.


print("My last name is ", github_handle[8:14])

# Notice that there is no index 14.  If the second index is higher than what exists, Python will assume you mean "until the very end"

# You can omit the second index; Python understands this as "go to the end"
print("My last name is ", github_handle[8:])

# And if you omit the first index, Python understands this as "start from the beginning"
print("My twitter handle is NOT ", github_handle[:8])

# What happens if you use a negative slicing index?

# You can use negative slicing indexes to count backwards from the end, like this:

##      -14		@
##      -13		s
##      -12		h
##      -11		a
##      -10		n
##      -9		n
##      -8		o
##      -7		n
##      -6		t
##      -5		u
##      -4		r
##      -3		n
##      -2		e
##      -1		r

print("My last name is ", github_handle[-6:])

# You can also mix and match positive and negative slicing indexes as needed

print("My first name is ", github_handle[1:-6])

# In these examples, we're relying on knowing the exact slicing indexes.  But what if our string changes in size or content?
# With short strings, it's fairly easy (especially if you write it out as above) to figure out which slices you need.

# But a more common and practical way to slice, rather than using numbers directly, is to create a variable that holds the number you need (but can change as needed)

# If this part is confusing, you may want to revisit this section when you're comfortable with string methods like str.find()

print("### Part Two ###")

text = "My GitHub handle is @shannonturner and my Twitter handle is @svt827"

# Let's extract the GitHub handle using str.find() and slicing.

snail_index = text.find('@')

print(text[snail_index:snail_index + 14] )# So the first slicing index is given by the variable, but we're still relying on knowing the exact number of characters (14).  We can improve this.

space_after_first_snail_index = text[snail_index:].find(' ') # Note that we're using slicing here to say start the .find() after the first snail is found.

print(text[snail_index:snail_index + space_after_first_snail_index] )# Why do we need to add snail_index to the second slicing index? Take a look:

print("snail_index is: ", snail_index)
print("space_after_first_snail_index is: ", space_after_first_snail_index)

print("So this is essentially saying text[20:34], see? --> ", text[20:34])

# Instead of creating a separate variable, you can just add the str.find() that gives the number you want right into the slice, like this:

print(text[text.find('@'):text.find('@')+text[text.find('@'):].find(' ')] )# But as you can see, it's not the most readable, especially compared to above.

# Still, it's a fairly common syntax / notation, so it's worth being familiar with it and knowing what it looks like in case you run into it.

print("Can you use slicing and string methods like str.find() to extract the Twitter handle from text?")

License

Alphaspace function (example)

Let’s createa a function which tells us if a string is made only from letters and spaces.

def is_alphaspace(string):

    """
    Returns True if all characters in the string are spaces or letters; otherwise returns False.

    using str.isalpha() returns a bool on whether ALL of the characters in a string are letters
    using str.isspace() returns a bool on whether ALL of the characters in a string are whitespace;

    Although it's not a string method, this function combines the functionality of the string methods above        
    """
    
    return all([any([char.isspace(), char.isalpha()]) for char in string])

# This custom function will behave similarly to the str.isalpha() and str.isspace() combined together.

test_string = "This string will return false for each of isalpha and isspace but it will return true for the custom function"

print ("test_string.isalpha() gives us: ", test_string.isalpha())
print ("test_string.isspace() gives us: ", test_string.isspace())

# Note how the syntax differs.  That's because is_alphaspace() isn't a string method, it's a custom function.
print ("But is_alphaspace(test_string) gives us: ", is_alphaspace(test_string))

License