Welcome

Introduction to


Matt Shirley

This course requires no prior knowledge of Python, and this first lecture requires no Unix experience either!

Getting Started

  • For the first class, we will be exclusively using the IPython notebook interface.
  • Thanks to Amazon, we are able to run a large private server in the "cloud" that is able to host a separate IPython notebook for each student in the class.
  • Open your web browser and navigate to http://54.225.180.126

Q: What am I looking at right now?

A: You are looking at the IPython notebook interface. There are many different ways to interact with Python:

  • Advanced: By typing python at a command prompt (Unix/Mac)
  • Easy: Using IPython "notebooks" (Unix/Mac/Win)

Not only easy, though - it's powerful:

  • Supports inline display of graphics
  • Powerful Mathjax and Markdown formatting for text
  • JSON-based notebook format is human and machine-readable
  • NBViewer service allows sharing notebooks easily

Hello World

In [1]:
print 'hello world'
hello world

  • Print will display a string on your screen (in UNIX we call this standard output, or stdout).
  • A string must be surrounded by ' (single) or " (double) quotes.
In [2]:
what = 'hello'
# This is a comment
print what
hello

  • Variables may hold a string of text
  • Variables are not surrounded in quotes
  • Comments start with # (hash) and are ignored by the Python interpreter
In [3]:
who = 'world'
print what + who
print what + ' ' + who
print what + ' ' + who + '!'
helloworld
hello world
hello world!

In [4]:
print 'this is correct indentation'
  print 'this is NOT correct indentation'
  File "<ipython-input-4-2045dd26b583>", line 2
    print 'this is NOT correct indentation'
    ^
IndentationError: unexpected indent
  • The Python interpreter requires strict indentation blocks
  • The number of spaces used for indentation is variable, but must remain the same within a block of code

Variables

In [5]:
x = 1
x
Out[5]:
1
In [6]:
y = 2
y
Out[6]:
2
In [7]:
x = y
x
Out[7]:
2
  • Variables can be assigned any value
In [8]:
x, y = 1, 2
print x
print y
1
2

  • multiple assignment is possible if the number of variables = number of values
In [9]:
y = y + 1
y
Out[9]:
3
In [10]:
x += 1
x
Out[10]:
2
  • x += 1 is shorthand for x = x + 1
  • += is the autoincrement operator

Python as a calculator

In [11]:
1 + 1
Out[11]:
2

Addition

In [12]:
4 - 1
Out[12]:
3

Subtraction

In [13]:
5 * 2
Out[13]:
10

Multiplication

In [14]:
1 / 5
Out[14]:
0

Divis... WHAT? 1 / 5 = 0.2, right?

Object types

Python (and any other programming language) will only follow your instructions literally. When you type 1 / 5 in to the interpreter, Python guesses that both 1 and 5 are numeric integer numbers and that you want the result of the division to also be an integer.

In [15]:
int(1) / int(5)
Out[15]:
0

We actually expect that the answer will be a floating point number and not an integer, so we have to give Python better instructions.

In [16]:
float(1) / 5
Out[16]:
0.2

We can also use a shortcut.

In [17]:
1. / 5
Out[17]:
0.2

Long floating point numbers can be rounded using round:

In [18]:
round(1. / 3, 2)
Out[18]:
0.33

Python has many object types built-in, including:

  • int()
  • float()
  • str()
  • bool()
  • tuple()
  • list()
  • dict()

We can determine the type of an object by using the type function.

In [19]:
type(1)
Out[19]:
int
In [20]:
type(1.)
Out[20]:
float
In [21]:
type('hello')
Out[21]:
str
In [22]:
type(True)
Out[22]:
bool
In [23]:
4 + 'hello world'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-1b7e5cde89f1> in <module>()
----> 1 4 + 'hello world'

TypeError: unsupported operand type(s) for +: 'int' and 'str'
  • Types should be the same for most operations
  • The TypeError is telling us that the + operator requires objects of the same type

Function application

  • Functions in Python take the form: f()
  • The function name f precedes parentheses ()
  • Calling the function name without parentheses just returns the function object
In [24]:
sum
Out[24]:
<function sum>
In [25]:
sum((1,5))
Out[25]:
6
  • Applying the function to an object returns a result

Boolean comparisons

In [26]:
1 < 2
Out[26]:
True
In [27]:
1 > 2
Out[27]:
False
In [28]:
1 == 2
Out[28]:
False
In [29]:
1 != 2
Out[29]:
True
In [30]:
2 >= 2
Out[30]:
True
In [31]:
x = 1
y = "a"
In [32]:
x > y
Out[32]:
False
In [33]:
x < y
Out[33]:
True
  • Again, be careful about comparing different types of data

Python data structures

Strings

In [34]:
primer1 = "AGGGTCA"
primer2 = "AGGTTAC"
In [35]:
primer1 == primer2
Out[35]:
False
In [36]:
print primer1[0]
print primer2[0]
A
A

  • Strings can be compared
  • Individual elements can be accessed using a 0-based index
In [37]:
primer1[0] == primer2[0]
Out[37]:
True
   0   1   2   3   4   5   6
 +---+---+---+---+---+---+---+
 | A | G | G | G | T | C | A |
 +---+---+---+---+---+---+---+
   -7  -6  -5  -4  -3  -2  -1
In [38]:
print primer1
print primer1[0]
print primer1[1]
print primer1[2]
print primer1[-1]
print primer1[-2]
AGGGTCA
A
G
G
A
C

Slicing

 0   1   2   3   4   5   6   7
 +---+---+---+---+---+---+---+
 | A | G | G | G | T | C | A |
 +---+---+---+---+---+---+---+
-7  -6  -5  -4  -3  -2  -1
In [39]:
print primer1[:]
print primer1[0:]
print primer1[:-1]
print primer1[0:-1]
print primer1[0:5]
print primer1[3:-1]
AGGGTCA
AGGGTCA
AGGGTC
AGGGTC
AGGGT
GTC

  • Slicing takes form [start:end]
  • Indexes for slicing start at 0 and are end-exclusive: [start:end)
In [40]:
print primer1[::]
print primer1[0:8:2]
print primer1[::-1] ## stride = -1
AGGGTCA
AGTA
ACTGGGA

  • Slicing can take the form [start:end:stride], where stride indicates number of elements to skip
  • primer1[::-1] is a simple way to reverse a string

String methods

In [41]:
len(primer1)
Out[41]:
7
In [42]:
'A' in primer1
Out[42]:
True
In [43]:
primer1.find('A')
Out[43]:
0
In [44]:
primer1.count('A')
Out[44]:
2
In [45]:
primer3 = primer1.replace('A', 'T')
primer3
Out[45]:
'TGGGTCT'
In [46]:
primer1.lower()
Out[46]:
'agggtca'

Exercise 1: Count the GC content of a DNA string

Q: How would you calculate the GC content (percent) for the sequence 'ATGCATGATACATAGATACC'?

In [47]:
dna = 'ATGCATGATACATAGATACC'
c_count = dna.count('C')
g_count = dna.count('G')
dna_len = len(dna)
gc_cont = float(c_count + g_count) / dna_len * 100
print gc_cont
35.0

Exercise 2: Palindromic sequences

Q: How would you test whether the sequence 'ATGCATGATTAGTACGTA' is palindromic (reads the same forward and backward)?

In [48]:
seq = 'ATGCATGATTAGTACGTA'
seq == seq[::-1]
Out[48]:
True

Tuples

In [49]:
names = ('Fred', 'Ted', 'Ned')
names
Out[49]:
('Fred', 'Ted', 'Ned')
  • Tuples are constructed using () and ,
  • Commas separate each element, and the entire tuple is enclosed in parentheses.

You can access the first name in names by specifying the 0-based index position of that element.

In [50]:
names[0]
Out[50]:
'Fred'

To access the last name in names, you can use either the 0-based index of that element [2], or use a negative index [-1].

In [51]:
print names
('Fred', 'Ted', 'Ned')

In [52]:
names[2]
Out[52]:
'Ned'
In [53]:
names[-1]
Out[53]:
'Ned'

Tuples can be "sliced" using [:].

In [54]:
names[0:3]
Out[54]:
('Fred', 'Ted', 'Ned')
In [55]:
names[1:3]
Out[55]:
('Ted', 'Ned')
In [56]:
names[:2]
Out[56]:
('Fred', 'Ted')
In [57]:
names[0] = 'Zed' ## This will result in a TypeError
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-57-c7309bd65052> in <module>()
----> 1 names[0] = 'Zed' ## This will result in a TypeError

TypeError: 'tuple' object does not support item assignment
  • Tuples are immutable.
  • Sometimes you want modify the contents of a set
  • Sounds like a job for lists!

Lists

In [58]:
names = ['Fred', 'Ted', 'Ned']
names
Out[58]:
['Fred', 'Ted', 'Ned']
  • Notice that the list construction uses [] and not ()
  • List are mutable, meaning that we can change the contents after creation
In [59]:
names[2] = 'Zed'
names
Out[59]:
['Fred', 'Ted', 'Zed']
In [60]:
names.append('Mike')
names
Out[60]:
['Fred', 'Ted', 'Zed', 'Mike']
In [61]:
names += ['Obama']
names
Out[61]:
['Fred', 'Ted', 'Zed', 'Mike', 'Obama']
In [62]:
names.extend(['Craig'])
names
Out[62]:
['Fred', 'Ted', 'Zed', 'Mike', 'Obama', 'Craig']

There are several ways to add objects to a list.

In [63]:
names.pop()
Out[63]:
'Craig'
In [64]:
names.insert(3, 'Ned')
names
Out[64]:
['Fred', 'Ted', 'Zed', 'Ned', 'Mike', 'Obama']
In [65]:
names.index('Obama')
Out[65]:
5

We can remove and insert elements, as well as find the numeric index by element name.

In [66]:
sorted(names)
Out[66]:
['Fred', 'Mike', 'Ned', 'Obama', 'Ted', 'Zed']
In [67]:
names.sort()
names
Out[67]:
['Fred', 'Mike', 'Ned', 'Obama', 'Ted', 'Zed']
In [68]:
names.reverse()
names
Out[68]:
['Zed', 'Ted', 'Obama', 'Ned', 'Mike', 'Fred']

Here are a few methods to manipulate the order of lists.

  • Note that names.sort() and names.reverse() actually modified the list instead of returning a new list
In [69]:
gtca = list("GATACA")
gtca
Out[69]:
['G', 'A', 'T', 'A', 'C', 'A']
In [70]:
"".join(gtca)
Out[70]:
'GATACA'
In [71]:
"-".join(gtca)
Out[71]:
'G-A-T-A-C-A'
  • Lists can be constructed from strings
  • Lists can be joined to form a string

List constructions

In [72]:
'G-A-T-A-C-A'.split('-')
Out[72]:
['G', 'A', 'T', 'A', 'C', 'A']
In [73]:
range(0,10)
Out[73]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [74]:
range(0,10,2)
Out[74]:
[0, 2, 4, 6, 8]
  • The split method for strings creates a list from elements separated by a character
  • The range function creates a list from start, end, step

List comprehensions

In [75]:
r = range(0,10)
r
Out[75]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [76]:
[i + 1 for i in r]
Out[76]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

List comprehensions apply an expression (i + 1) to an iterable (r = range(0,10)). We'll discuss iterables later.

In [77]:
[i + 1 for i in r if i
 > 5]
Out[77]:
[7, 8, 9, 10]

List comprehensions can also filter the resulting list on a condition (i > 5)

In [78]:
names['Fred']
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-78-43f399010876> in <module>()
----> 1 names['Fred']

TypeError: list indices must be integers, not str
  • Lists must be indexed using integers
  • Wouldn't it be easier to access list elements by name?

Dictionaries

In [79]:
person = {'name':'matt', 'height':71, 'weight':170}
person
Out[79]:
{'height': 71, 'name': 'matt', 'weight': 170}
  • Dictionaries store key:value pairs
  • You can think about dictionaries like lists with named indexes
In [80]:
person['name']
Out[80]:
'matt'
In [81]:
person.keys()
Out[81]:
['name', 'weight', 'height']
In [82]:
person.values()
Out[82]:
['matt', 170, 71]
In [83]:
person.items()
Out[83]:
[('name', 'matt'), ('weight', 170), ('height', 71)]
In [84]:
person['location'] = 'Baltimore'

Assign keys to values like this.

In [85]:
person['pounds'] = person['weight']
del person['weight']
person['name'] = 'james'
person
Out[85]:
{'height': 71, 'location': 'Baltimore', 'name': 'james', 'pounds': 170}

Delete and update keys and values like this

  • Dictionaries have methods for retreiving keys, values, and tuples of (key, value) pairs
  • Many methods for lists work on dictionaries
  • Methods are functions associated with an object
  • What other methods does our dictionary have?
In [86]:
print dir(person)
['__class__', '__cmp__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'has_key', 'items', 'iteritems', 'iterkeys', 'itervalues', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values', 'viewitems', 'viewkeys', 'viewvalues']

The dir function lists all of the methods and attributes of an object.

In [87]:
help(person.pop)
Help on built-in function pop:

pop(...)
    D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
    If key is not found, d is returned if given, otherwise KeyError is raised


The help function returns the __doc__ attribute of the object it is applied to.

In [88]:
print person.pop.__doc__
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised

Flow control

If/Else statements

In [89]:
x = 1
if x > 1:
    print "x is greater than 1"
else:
    print "x is less than or equal to 1"
x is less than or equal to 1

  • The expression after if is evaluated, and the following code block is executed if the expression is True
  • If the expression evaluates to False, the code block following the optional else statement is evaluated
In [90]:
x = False
if x:
    print "x is True"
else:
    print "x is False"
x is False

In [91]:
x = False
if not x:
    print "x is False"
else:
    print "x is True"
x is False

elif

In [92]:
x = 200
if x / 2 == 1:
    print "x is 2"
elif x / 20 == 1:
    print "x is 20"
elif x / 200 == 1:
    print "x is 200"
else:
    print "x is something else"
x is 200

  • elif tests an expression just like if
  • usually add an else to catch unexpected conditions

For loops

In [93]:
for i in range(0,10):
    print i
0
1
2
3
4
5
6
7
8
9

for loops repeat a block of code for each element in an iterable

What is an iterable?

Short answer:

  • strings
  • tuples
  • lists
  • dictionaries
In [94]:
iter('abc')
Out[94]:
<iterator at 0x243ea90>
In [95]:
iter(1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-95-eb9b6f09d0b6> in <module>()
----> 1 iter(1)

TypeError: 'int' object is not iterable

Long answer: anything with an __iter__ or __getitem__ method.

In [96]:
'__iter__' in dir([1, 2, 3])
Out[96]:
True
In [97]:
'__getitem__' in dir('123')
Out[97]:
True
In [98]:
'__iter__' in dir({'name':'matt', 'fingers':9})
Out[98]:
True

While loops

In [99]:
x = 10
while x > 0:
    x = x - 1
    print x
9
8
7
6
5
4
3
2
1
0

  • while loops repeat a block of code while the condition is True
  • Danger! - while 1, while True will result in an infinite loop

Questions?

Let's work on some examples that are relevant to your research.