 |
Snyppets - Python snippets |
 |
|
This page contains a bunch of miscellaneous Python code
snippets, recipes, mini-guides, links, examples, tutorials and
ideas, ranging from very (very) basic things to
advanced.
I hope they will be usefull to you.
(Don't forget to read my main Python page ( http://sebsauvage.net/python/
): there is handful of other programs and a guide to creating your
own Python extensions in C/C++ under Windows.)
Send
a file using FTP
Piece of cake.
import ftplib
# We import
the FTP module
session = ftplib.FTP('myserver.com','login','passord') # Connect to
the FTP server
myfile = open('toto.txt','rb')
# Open the file to
send
session.storbinary('STOR toto.txt', myfile)
# Send the file
myfile.close()
# Close the file
session.quit()
# Close FTP session
Queues
(FIFO) and
stacks (LIFO)
Python makes using queues and stacks a piece of cake (Did I
already say "piece of cake" ?).
No use creating a specific class: simply use list
objects.
For a stack
(LIFO),
stack with append()
and
destack with
pop():
>>> a = [5,8,9]
>>> a.append(11)
>>> a
[5, 8, 9, 11]
>>> a.pop()
11
>>> a.pop()
9
>>> a
[5, 8]
>>>
For a queue
(FIFO), enqueue
with append()
and dequeue
with pop(0):
>>> a = [5,8,9]
>>> a.append(11)
>>> a
[5, 8, 9, 11]
>>> a.pop(0)
5
>>> a.pop(0)
8
>>> a
[9, 11]
As lists can contain any type of object, you an create queues and
stacks of any type of objects !
(Note that there is also a Queue
module, but it is
mainly usefull with threads.)
A
function which returns
several values
When you're not accustomed with Python, it's easy to forget
that
a function can return just any type of object, including
tuples.
This a great to create functions which return several values. This
is typically the kind of thing that cannot be done in other
languages without some code overhead.
>>> def
myfunction(a):
return (a+1,a*2,a*a)
>>> print myfunction(3)
(4, 6, 9)
You can also use mutiple assignment:
>>> (a,b,c) =
myfunction(3)
>>> print b
6
>>> print c
9
And of course your functions can return any
combination/composition of objects (strings, integer, lists,
tuples, dictionnaries, list of tuples, etc.).
Exchanging
the content of
2 variables
In most languages, exchanging the content of two variable
involves using a temporary variable.
In Python, this can be done with multiple assignment.
>>> a=3
>>> b=7
>>> (a,b)=(b,a)
>>> print a
7
>>> print b
3
In Python, tuples, lists and dictionnaries are your friends,
really !
Highly recommended reading: Dive
into Python
(http://diveintopython.org/).
The
first chapter contains a nice tutorial on tuples, lists and
dictionnaries. And don't forget to read the rest of the book (You
can download the entire book for free).
Getting
rid of duplicate items in a list
The trick is to temporarly convert the list in into a
dictionnary:
>>> mylist =
[3,5,8,5,3,12]
>>> print dict().fromkeys(mylist).keys()
[8, 3, 12, 5]
>>>
Since Python 2.5, you can also use sets:
>>> mylist =
[3,5,8,5,3,12]
>>> print list(set(mylist))
[8, 3, 12, 5]
>>>
Get
all links in a web
page (1)
... or regular expression marvels.
import re, urllib
htmlSource =
urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
linksList = re.findall('<a
href=(.*?)>.*?</a>',htmlSource)
for link in linksList:
print link
Get
all links in a web
page (2)
You can also use the HTMLParser module.
import HTMLParser, urllib
class linkParser(HTMLParser.HTMLParser):
def __init__(self):
HTMLParser.HTMLParser.__init__(self)
self.links = []
def handle_starttag(self, tag, attrs):
if tag=='a':
self.links.append(dict(attrs)['href'])
htmlSource =
urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
p = linkParser()
p.feed(htmlSource)
for link in p.links:
print link
For each HTML start tag encountered, the
handle_starttag() method will be called.
For example <a href="http://google.com>
will trigger the method
handle_starttag(self,'A',[('href','http://google.com')]).
See also all others handle_*()
methods in Pyhon
manual.
(Note that HTMLParser is not
bullet-proof: it will
choke on ill-formed HTML. In this case, use the
sgmllib module, go back to regular expressions
or use
BeautifulSoup.)
Get
all links in a web
page (3)
Still hungry ?
Beautiful
Soup is a Python module which is quite good at extracting
data from HTML.
Beautiful Soup's main advantages are its ability to handle very bad
HTML code and its simplicity. Its drawback is its speed (it's
slow).
You can get it from http://www.crummy.com/software/BeautifulSoup/
import urllib
import BeautifulSoup
htmlSource =
urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
soup = BeautifulSoup.BeautifulSoup(htmlSource)
for item in soup.fetch('a'):
print item['href']
Get
all links in a web
page (4)
Still there ?
Ok, here's another one:
Look ma ! No parser nor regex.
import urllib
htmlSource =
urllib.urlopen("http://sebsauvage.net/index.html").read(200000)
for chunk in htmlSource.lower().split('href=')[1:]:
indexes = [i for i in
[chunk.find('"',1),chunk.find('>'),chunk.find(' ')] if
i>-1]
print chunk[:min(indexes)]
Granted, this is a crude hack.
But it works !
Zipping/unzipping
files
Zipping a file:
import zipfile
f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)
f.write('file_to_add.py')
f.close()
Replace 'w' with 'a'
to add files to
the zip archive.
Unzipping all files from a zip archive:
import zipfile
zfile = zipfile.ZipFile('archive.zip','r')
for filename in zfile.namelist():
data = zfile.read(filename)
file = open(filename, 'w+b')
file.write(data)
file.close()
If you want to zip all file in a directory recursively (all subdirectories):
import zipfile
f = zipfile.ZipFile('archive.zip','w',zipfile.ZIP_DEFLATED)
startdir = "c:\\mydirectory"
for dirpath, dirnames, filenames in os.walk(startdir):
for filename in filenames:
f.write(os.path.join(dirpath,filename))
f.close()
Listing
the content of a
directory
You have 4 ways of doing this, depending on your need.
The listdir() method returns the
list of all files
in a directory:
import os
for filename in os.listdir(r'c:\windows'):
print filename
Note that you can use the fnmatch()
module to
filter file names.
The glob module wraps listdir() and
fnmatch() into a single method:
import glob
for filename in glob.glob(r'c:\windows\*.exe'):
print filename
And if you need to collect subdirectories, use
os.path.walk():
import os.path
def processDirectory ( args, dirname, filenames ):
print 'Directory',dirname
for filename in filenames:
print '
File',filename
os.path.walk(r'c:\windows', processDirectory, None )
os.path.walk() works with a callback:
processDirectory() will be called for each
directory
encountered.
dirname will contain the path of the directory.
filenames will contain a list of filenames in
this
directory.
You can also use os.walk(), which works in
a
non-recursive way and is somewhat easier to understand.
import os
for dirpath, dirnames, filenames in os.walk('c:\\winnt'):
print 'Directory', dirpath
for filename in filenames:
print ' File',
filename
A
webserver in 3 lines
of code
import BaseHTTPServer,
SimpleHTTPServer
server =
BaseHTTPServer.HTTPServer(('',80),SimpleHTTPServer.SimpleHTTPRequestHandler)
server.serve_forever()
This webserver will serve files in the current directory. You
can use os.chdir() to change the directory.
This trick is handy to serve or transfer files between computers on
a local network.
Note that this webserver is pretty fast, but can only serve
one
HTTP request at time. It's not recommended for high-traffic
servers.
If you want better performance, have a look at asynchronous sockets
(asyncore, Medusa...) or multi-thread webservers.
Creating
and raising
your own exceptions
Do not consider exception as nasty things
which want
to break you programs. Exceptions are you friend. Exceptions are a
Good Thing. Exceptions are messengers which tell you that
something's wrong, and what is wrong. And try/except blocks will
give you the chance to handle the problem.
In your programs, you should also try/catch
all calls
that may fall into error (file access, network connections...).
It's often usefull to define your own
exceptions to
signal errors specific to your class/module.
Here's an example of defining an exception
and a class
(say in myclass.py):
class myexception(Exception):
pass
class myclass:
def __init__(self):
pass
def dosomething(self,i):
if i<0:
raise myexception, 'You made a mistake !'
(myexception is a
no-brainer exception:
it contains nothing. Yet, it is usefull because the exception
itself is a message.)
If you use the class, you could do:
import myclass
myobject = myclass.myclass()
myobject.dosomething(-2)
If you execute this program, you will get:
Traceback (most recent call last):
File "a.py", line 3, in ?
myobject.dosomething(-2)
File "myclass.py", line 9, in dosomething
raise myexception, 'You made a mistake !'
myclass.myexception: You made a mistake !
myclass tells you
you did something
wrong. So you'd better try/catch, just in case there's a
problem:
import myclass
myobject = myclass.myclass()
try:
myobject.dosomething(-2)
except myclass.myexception:
print 'oops ! myclass tells me I did
something
wrong.'
This is better ! You have a chance to do something if
there's a problem.
Scripting
Microsoft SQL Server
with Python
If you have Microsoft SQL Server, you must have encountered
this
situation where you tell yourself «If only I was able to
script all those clicks in Enterprise Manager (aka the
MMC) !».
You can ! It's possible to script in Python whatever you can
do
in the MMC.
You just need the win32all python module
to access COM
objects from within Python (see http://starship.python.net/crew/mhammond/win32/)
(The win32all module is also provided with ActiveState's Python
distribution: http://www.activestate.com/Products/ActivePython/)
Once installed, just use the SQL-DMO objects.
For example, get the list of databases in a server:
from win32com.client import gencache
s = gencache.EnsureDispatch('SQLDMO.SQLServer')
s.Connect('servername','login','password')
for i in range(1,s.Databases.Count):
print s.Databases.Item(i).Name
Or get the script of a table:
database = s.Databases('COMMERCE')
script = database.Tables('CLIENTS').Script()
print script
You will find the SQL-DMO documentation in MSDN:
Accessing
a database with
ODBC
Under Windows, ODBC provides an easy way to access almost any
database. It's not very fast, but it's ok.
You need the win32all
python module.
First, create a DSN (for example: 'mydsn'), then:
import dbi, odbc
conn = odbc.odbc('mydsn/login/password')
c = conn.cursor()
c.execute('select clientid, name, city from client')
print c.fetchall()
Nice and easy !
You can also use fetchone() or
fetchmany(n) to fetch - respectively - one or
n rows at once.
Note : On big datasets, I have quite bizarre and
unregular
data truncations on tables with a high number of columns. Is that a
bug in ODBC, or in the SQL Server ODBC driver ? I will have to
investigate...
Accessing
a database with ADO
Under Windows, you can also use ADO (Microsoft
ActiveX Data
Objects) instead of ODBC to access databases. The following
code uses ADO COM objects to connect to a Microsoft SQL Server
database, retreive and display a table.
import win32com.client
connexion =
win32com.client.gencache.EnsureDispatch('ADODB.Connection')
connexion.Open("Provider='SQLOLEDB';Data Source='myserver';Initial Catalog='mydatabase';User ID='mylogin';Password='mypassword';")
recordset = connexion.Execute('SELECT clientid, clientName FROM
clients')[0]
while not recordset.EOF:
print
'clientid=',recordset.Fields(0).Value,'
client name=',recordset.Fields(1).Value
recordset.MoveNext()
connexion.Close()
For ADO documentation, see MSDN:
http://msdn.microsoft.com/library/en-us/ado270/htm/mdmscadoobjects.asp
CGI
under Windows with
TinyWeb
TinyWeb is a one-file webserver for Windows (the exe is only
53
kb). It's fantastic for making instant webservers and share files.
TinyWeb is also capable of serving CGI.
Let's have some fun and create some CGI with Python !
First, let's get and install TinyWeb:
- Get TinyWeb from http://www.ritlabs.com/tinyweb/
(it's free, even for commercial use !) and unzip it to
c:\somedirectory (or any directory you'd
like).
- Create the "
www" subdirectory in
this
directory
- Create
index.html in the www
directory:
<html><body>Hello,
world
!</body></html>
- Run the server:
tiny.exe c:\somedirectory\www
(make sure you use an absolute path)
- Point your browser at
http://localhost
If you see "Hello, world !", it means that TinyWeb is up and
running.
Let's start making some CGI:
- In the
www directory, create the
"cgi-bin" subdirectory.
- Create
hello.py containing:
print "Content-type: text/html"
print
print "Hello, this is Python talking !"
- Make sure Windows always uses
python.exe
when you
double-clic .py files.
(SHIFT+rightclick on a .py file, "Open with...",
choose
python.exe,
check the box "Always use this program...", click Ok)
- Point your browser at
http://localhost/cgi-bin/hello.py
You should see "Hello, this is Python talking !"
(and
not the source code).
If it's ok, you're done !
Now you can make some nice CGI.
(If this does not work, make sure the path to python.exe is ok
and that you used an absolute path in tinyweb's command line.)
Note that this will never be as fast as mod_python under
Apache
(because TinyWeb will spawn a new instance of the Python
interpreter for each request on a Python CGI). Thus it's not
appropriate for high-traffic production servers, but for a small
LAN, it can be quite handy to serve CGI like this.
Refer to Python documentation for CGI tutorials and
reference.
Creating
.exe files from
Python programs
Like Sun's Java or Microsoft's .Net,
if you want to
distribute your Python programs, you need to bundle the virtual
machine too.
You have several options: py2exe,
cx_Freeze or
pyInstaller.
py2exe
py2exe provides an easy way to gather all necessary files to
distribute your Python program on computers where Python is not
installed.
For example, under Windows, if you want to transform
myprogram.py into myprogram.exe,
create
the file setup.py as follows:
from distutils.core import setup
import py2exe
setup(name="myprogram",scripts=["myprogram.py"],)
Then run:
python setup.py py2exe
py2exe will get all dependant files and write them in the
\dist subdirectory. You will typically find
your
program as .exe, pythonXX.dll
and
complementary .pyd files. Your program will
run on any
computer even if Python is not installed. This also works for
CGI.
(Note that if your program uses tkinter, there is a trick.)
Hint : Use UPX to compress
all
dll/exe/pyd
files. This will
greatly reduce file size. Use:
upx --best *.dll *.exe *.pyd
(Typically, python22.dll shrinks from 848 kb
to 324
kb.)
Note that since version 0.6.1, py2exe is capable of creating a single EXE
(pythonXX.dll and
other files are integrated into the EXE).
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
from distutils.core import setup
import py2exe
setup(
options = {"py2exe": {"compressed": 1,
"optimize": 0,
"bundle_files": 1, } },
zipfile = None,
console=["myprogram.py"]
)
cx_Freeze
You can also use cx_Freeze, which is an
alternative to
py2exe (This is what I used in webGobbler).
cx_Freeze\FreezePython.exe
--install-dir bin
--target-name=myprogram.exe myprogram.py
or even create a console-less version:
cx_Freeze\FreezePython.exe
--install-dir bin
--target-name=myprogram.exe
--base-binary=Win32GUI.exe myprogram.py
Tip for the console-less
version: If you try to print
anything, you will
get a nasty error window, because stdout and stderr do not exist
(and the cx_freeze Win32gui.exe stub will display an error
Window).
This is a pain when you want your program to be able to run in GUI
mode and
command-line
mode.
To safely disable console output, do as follows at the beginning of
your program:
try:
sys.stdout.write("\n")
sys.stdout.flush()
except IOError:
class dummyStream:
''' dummyStream behaves
like a stream but does nothing. '''
def __init__(self):
pass
def write(self,data):
pass
def read(self,data):
pass
def flush(self):
pass
def close(self):
pass
# and now redirect all default streams
to this
dummyStream:
sys.stdout = dummyStream()
sys.stderr = dummyStream()
sys.stdin = dummyStream()
sys.__stdout__ = dummyStream()
sys.__stderr__ = dummyStream()
sys.__stdin__ = dummyStream()
This way, if the program starts in console-less mode, it will work
even if the code contains print statements.
And if run in command-line mode, it will print out as usual. (This
is basically what I did in webGobbler, too.)
pyInstaller
pyInstaller
is the
reincarnation of McMillan
Installer. It can also create one-file executables.
You can get it from http://pyinstaller.hpcf.upr.edu/cgi-bin/trac.cgi/wiki
Unzip pyInstaller in the pyinstaller_1.1 subdirectory, then do:
python pyinstaller_1.1\Configure.py
(You only have to do this once.)
Then create the .spec file for your program:
python pyinstaller_1.1\Makespec.py
myprogram.py myprogram.spec
Then pack your program:
python pyinstaller_1.1\Build.py
myprogram.spec
You program will be available in the \distmyprogram
subdirectory. (myprogram.exe, pythonXX.dll, MSVCR71.dll, etc.)
You have several options, such as:
Reading
Windows
registry
import _winreg
key = _winreg.OpenKey(_winreg.HKEY_CURRENT_USER,
'Software\\Microsoft\\Internet Explorer',
0, _winreg.KEY_READ)
(value, valuetype) = _winreg.QueryValueEx(key, 'Download
Directory')
print value
print valuetype
valuetype is the type of the registry key. See
http://docs.python.org/lib/module--winreg.html
Measuring
the performance of
Python programs
Python is provided with a code profiling module:
profile. It's rather easy
to use.
For example, if you want to profile myfunction(), instead of
calling it with:
myfunction()
you just have to do:
import profile
profile.run('myfunction()','myfunction.profile')
import pstats
pstats.Stats('myfunction.profile').sort_stats('time').print_stats()
This will display a report like this:
Thu Jul 03 15:20:26
2003
myfunction.profile
1822 function
calls (1792 primitive calls) in 0.737 CPU seconds
Ordered by: internal time
ncalls tottime
percall cumtime
percall filename:lineno(function)
1
0.224 0.224
0.279 0.279 myprogram.py:512(compute)
10
0.078 0.008
0.078 0.008 myprogram.py:234(first)
1
0.077 0.077
0.502 0.502 myprogram.py:249(give_first)
1
0.051 0.051
0.051 0.051 myprogram.py:1315(give_last)
3
0.043 0.014
0.205 0.068 myprogram.py:107(sort)
1
0.039 0.039
0.039 0.039 myprogram.py:55(display)
139
0.034 0.000
0.106 0.001 myprogram.py:239(save)
139
0.030 0.000
0.072 0.001 myprogram.py:314(load)
...
This report tells you, for each function/method:
- how many times it was called (
ncalls).
- total time spent in function (minus time spent in
sub-functions) (
tottime)
- total time spent in function (including time spent in
sub-functions) (
cumtime)
- average time per call (
percall)
As you can see, the profile module displays the precise
filename, line and function name. This is precious information and
will help you to spot the slowest parts of your programs.
But don't try to optimize too early in development stage. This
is evil ! :-)
Note that Python is also provided with a similar module named
hotspot, which
is more
accurate but does not work well with threads.
Speed
up your Python
programs
To speedup your Python program, there's nothing like
optimizing
or redesigning your algorithms.
In case you think you can't do better, you can always use
Psyco:
Psyco is a Just-In-Time-like compiler for Python for Intel
80x86-compatible processors. It's very easy to use and provides x2
to x100 instant speed-up.
- Download psyco for your Python version (http://psyco.sourceforge.net)
- unzip and copy the
\psyco
directory to your Python
site-packages directory (should be something like
c:\pythonXX\Lib\site-packages\psyco\ under
Windows)
Then, put this at the beginning of your programs:
import psyco
psyco.full()
Or even better:
try:
import psyco
psyco.full()
except:
pass
This way, if psyco is installed, your program will run
faster.
If psyco is not available, your program will run as usual.
(And if psyco is still not enough, you can rewrite the code
which is too slow in C or C++ and wrap it with SWIG (http://swig.org).)
Note: Do not use Psyco when debugging, profiling or tracing
your
code. You may get innacurate results and strange
behaviours.
Regular
expressions are
sometimes overkill
I helped someone on a forum who wanted process a text file: He
wanted to extract the text following "Two words" in all lines
starting whith these 2 word. He had started writing a regular
expression for this:
r = re.compile("Two\sword\s(.*?)").
His problem was better solved with:
[...]
for line in file:
if line.startswith("Two words "):
print line[10:]
Regular expression are sometime overkill. They are not always
the best choice, because:
- They involve some overhead:
- You have to compile the regular expression
(
re.compile()). This means parsing the regular
expression and transforming it into a state machine. This consumes
CPU time.
- When using the regular expression, you run the state
machine
against the text, which make the state machine change state
according to many rules. This is also eats CPU time.
- Regular expression are not failsafe: they can fail
sometimes on
specific input. You may get a "maximum recusion limit
exceeded" exception. This means that you should also enclose
all
match(), search()
and
findall() methods in try/except
blocks.
- The Zen of Python (
import this :-)
says
«Readability counts». That's a good thing. And
regular expression quickly become difficult to read, debug and
change.
Besides, string methods like find(),
rfind() or startwith()
are very
fast, much faster than regular expressions.
Do not try to use regular expressions everywhere.
Often a
bunch of string operations will do the job faster.
Executing
another Python
program
exec("anotherprogram.py")
Bayesian
filtering
Bayesian filtering is the last buzz-word of spam fighting. And
it works very well indeed !
Reverend is a free Bayesian module for
Python. You can
download it from http://divmod.org/trac/wiki/DivmodReverend
Here's an example: Recognizing the language of a text.
First, train it on a few sentences:
from reverend.thomas import Bayes
guesser = Bayes()
guesser.train('french','La souris est rentrée dans son
trou.')
guesser.train('english','my tailor is rich.')
guesser.train('french','Je ne sais pas si je viendrai demain.')
guesser.train('english','I do not plan to update my website
soon.')
And now let it guess the language:
>>> print
guesser.guess('Jumping
out of cliffs it not a good idea.')
[('english', 0.99990000000000001), ('french',
9.9999999999988987e-005)]
The bayesian filter says: "It's english, with a
99,99%
probability."
Let's try another one:
>>> print
guesser.guess('Demain il
fera très probablement chaud.')
[('french', 0.99990000000000001), ('english',
9.9999999999988987e-005)]
It says: "It's french, with a 99,99% probability."
Not bad, isn't it ?
You can train it on even more languages at the same time. You
can also train it to classify any kind of text.
Tkinter
and cx_Freeze
(This trick also works with py2exe).
Say you want to package a Tkinter Python program with cx_Freeze
in order to distribute it.
You create your program:
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
import Tkinter
class myApplication:
def __init__(self,root):
self.root = root
self.initializeGui()
def initializeGui(self):
Tkinter.Label(self.root,text="Hello,
world").grid(column=0,row=0)
def main():
root = Tkinter.Tk()
root.title('My application')
app = myApplication(root)
root.mainloop()
if __name__ == "__main__":
main()
This program works on your computer. Now let's package it
with cx_Freeeze:
FreezePython.exe --install-dir bin
--target-name=test.exe test.py
If you run your program (test.exe), you will get
this error:
The dynamic link library tk84.dll
could not
be found in the specified path [...]
In fact, you need to copy the TKinter DLLs. Your builing batch
becomes:
FreezePython.exe --install-dir bin
--target-name=test.exe test.py
copy C:\Python24\DLLs\tcl84.dll .\bin\
copy C:\Python24\DLLs\tk84.dll .\bin\
Ok, john, build it again.
Run the EXE: it works !
Run the EXE on another computer (which does not have Python
installed): Error !
Traceback (most recent call last):
File "cx_Freeze\initscripts\console.py", line 26, in ?
exec code in m.__dict__
File "test.py", line 20, in ?
File "test.py", line 14, in main
File "C:\Python24\Lib\lib-tk\Tkinter.py", line 1569, in
__init__
_tkinter.TclError: Can't find a usable init.tcl in the following
directories:
[...]
Nasty, isn't it ?
The reason it fails is that Tkinter needs the runtime tcl scripts
which are located in C:\Python24\tcl\tcl8.4 and
C:\Python24\tcl\tk8.4.
So let's copy these scripts in the same directory as you
application.
You building batch becomes:
cx_Freeze\FreezePython.exe
--install-dir bin
--target-name=test.exe test.py
copy C:\Python24\DLLs\tcl84.dll .\bin\
copy C:\Python24\DLLs\tk84.dll .\bin\
xcopy /S /I /Y "C:\Python24\tcl\tcl8.4\*.*"
"bin\libtcltk84\tcl8.4"
xcopy /S /I /Y "C:\Python24\tcl\tk8.4\*.*"
"bin\libtcltk84\tk8.4"
But you also need to tell your program where to get the tcl/tk
runtime scripts (in bold below):
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
import os, os.path
# Take the tcl/tk library from local subdirectory if available.
if os.path.isdir('libtcltk84'):
os.environ['TCL_LIBRARY'] =
'libtcltk84\\tcl8.4'
os.environ['TK_LIBRARY'] =
'libtcltk84\\tk8.4'
import Tkinter
class myApplication:
def __init__(self,root):
self.root = root
self.initializeGui()
def initializeGui(self):
Tkinter.Label(self.root,text="Hello,
world").grid(column=0,row=0)
def main():
root = Tkinter.Tk()
root.title('My application')
app = myApplication(root)
root.mainloop()
if __name__ == "__main__":
main()
Now you can properly package and distribute Tkinter-enabled
applications. (I used this trick in webGobbler.)
Possible improvement:
You surely could get rid of some tcl/tk script you don't need.
Example: bin\libtcltk84\tk8.4\demos (around 500 kb) are only
tk demonstrations. They are not necessary.
This depends on which features of Tkinter your program will
use.
(cx_Freeze and - AFAIK - all other packagers are not capable
of resolving tcl/tk dependencies.)
A
few Tkinter
tips
Tkinter is the basic GUI toolkit provided with Python.
Here's a simple example:
import Tkinter
class myApplication:
#1
def __init__(self,root):
self.root = root
#2
self.initialisation()
#3
def initialisation(self):
#3
Tkinter.Label(self.root,text="Hello,
world !").grid(column=0,row=0) #4
def main():
#5
root = Tkinter.Tk()
root.title('My application')
app = myApplication(root)
root.mainloop()
if __name__ == "__main__":
main()
#1 : It's always better to code a GUI in the form of
a
class. It will be easier to reuse your GUI components.
#2 : Always keep a reference to your ancestor. You
will need
it when adding widgets.
#3 : Keep the code which creates all the widgets
clearly
separated from the rest of the code. It will be easier to
maintain.
#4 : Do not use the .pack().
It's usually
messy, and painfull when you want to extend your GUI.
grid() lets you place and move your widgets
elements
easily. Never ever mix .pack() and
.grid(), or your application will hang without
warning, with 100% CPU usage.
#5 : It's always a good idea to have a main()
defined. This way, you can test the GUI elements by directly by
running the module.
I lack time, so this list of recommendations could be much larger
after my experience with webGobbler.
Tkinter
file
dialogs
Tkinter is provided with several basic dialogs for file or
directory handling. There's pretty easy to use, but it's good to
have some examples:
Select a directory:
import Tkinter
import tkFileDialog
root = Tkinter.Tk()
directory =
tkFileDialog.askdirectory(parent=root,initialdir="/",title='Please
select a directory')
if len(directory) > 0:
print "You chose directory %s"
% directory
Select a file for open (askopenfile
will open
the file for you. file will behave like a
normal file
object):
import Tkinter
import tkFileDialog
root = Tkinter.Tk()
file = tkFileDialog.askopenfile(parent=root,mode='rb',title='Please
select a file')
if file != None:
data = file.read()
file.close()
print "I got %d bytes from the file." %
len(data)
Save as... dialog:
import Tkinter
import tkFileDialog
myFormats = [
('Windows Bitmap','*.bmp'),
('Portable Network Graphics','*.png'),
('JPEG / JFIF','*.jpg'),
('CompuServer GIF','*.gif'),
]
root = Tkinter.Tk()
filename =
tkFileDialog.asksaveasfilename(parent=root,filetypes=myFormats,title="Save
image as...")
if len(filename) > 0:
print "Now saving as %s"
% (filename)
Including
binaries in your
sources
Sometime it's handy to include small files in your sources (icons,
test files, etc.)
Let's take a file (myimage.gif) and convert it in base64
(optionnaly compressing it with zlib):
import base64,zlib
data = open('myimage.gif','rb').read()
print base64.encodestring(zlib.compress(data))
Get the text created by this program and use it in your source:
import base64,zlib
myFile = zlib.decompress(base64.decodestring("""
eJxz93SzsExUZlBn2MzA8P///zNnzvz79+/IgUMTJ05cu2aNaBmDzhIGHj7u58+fO11ksLO3Kyou
ikqIEvLkcYyxV/zJwsgABDogAmQGA8t/gROejlpLMuau+j+1QdQxk20xwzqhslmHH5/xC94Q58ST
72nRllBw7cUDHZYbL8VtLOYbP/b6LhXB7tAcfPCpHA/fSvcJb1jZWB9c2/3XLmQ+03mZBBP+GOak
/AAZGXPL1BJe39jqjoqEAhFr1fBi1dao9g4Ovjo+lh6GFDVWJqbisLKoCq5p1X5s/Jw9IenrFvUz
+mRXTeviY+4p2sKUflA1cjkX37TKWYwFzRpFYeqTs2fOqEuwXsfgOeGCfmZ57MP4WSpaZ0vSJy97
WPeY5ca8F1sYI5f5r2bjec+67nmaTcarm7+Z0hgY2Z7++fpCzHmBQCrPF94dAi/jj1oZt8R4qxsy
6liJX/UVyLjwoHFxFK/VMWbN90rNrLKMGQ7iQSc7mXgTkpwPXVp0mlWz/JVC4NK0s0zcDWkcFxxF
mrvdlBdOnBySvtNvq8SBFZo8rF2MvAIMoZoPmZrZPj2buEDr2isXi0V8egpelyUvbXNc7yVQkKgS
sM7g0KOr7kq3WRIkitSuRj1VXbSk8v4zh8fljqtOhyobP91izvh0c2hwqKz3jPaHhvMMXVQspYq8
aiV9ivkmHri5u2NH8fvPpVWuK65I3OMUX+f4Lee+3Hmfux96Vq5RVqxTN38YeK3wRbVz5v06FSYG
awWFgMzkktKiVIXkotTEktQUhaRKheDUpMTikszUPIVgx9AwR3dXBZvi1KTixNKyxPRUhcQSBSRe
Sn6JQl5qiZ2CrkJGSUmBlb4+QlIPKKGgAADBbgMp"""))
print "I have a file of %d bytes." % len(myFile)
For example, if you use PIL (Python Imaging Library), you can
directly open this image:
import Image,StringIO
myimage = Image.open(StringIO.StringIO(myFile))
myimage.show()
Good
practice: try/except non-standard import statements
If your program uses modules which are not part of the standard
Python distribution, it can be a pain for your users to identify
which module are required and where to get them.
Ease their pain with a simple try/except statement which tells
the module name (which is not always the same name as stated in the
import statement) and where to get it.
Example:
try:
import win32com.client
except ImportError:
raise ImportError, 'This program requires the
win32all extensions for Python. See
http://starship.python.net/crew/mhammond/win32/'
Good
practice: Readable objects
Let's define a "client" class. Each client has a name and a
number.
class client:
def __init__(self,number,name):
self.number
= number
self.name = name
Now if we create an instance of this class and if we display
it:
my_client = client(5,"Smith")
print my_client
You get:
<__main__.client instance at
0x007D0E40>
Quite exact, but not very explicit.
Let's improve that and add a __repr__ method:
class client:
def __init__(self,number,name):
self.number
= number
self.name = name
def
__repr__(self):
return '<client id="%s" name="%s">' % (self.number,
self.name)
Let's do it again:
my_client = client(5,"Smith")
print my_client
We get:
<client id="5"
nom="Dupont">
Ah !
Much better. Now this object has a meaning to you.
It's much better for debugging or logging.
You can even apply this to compound objects, such as a client
directory:
class directory:
def __init__(self):
self.clients = []
def addClient(self, client):
self.clients.append(client)
def __repr__(self):
lines = []
lines.append("<directory>")
for client in
self.clients:
lines.append(" "+repr(client))
lines.append("</directory>")
return
"\n".join(lignes)
Then create a directory, and add clients to this directory:
my_directory = directory()
my_directory.addClient( client(5,"Smith") )
my_directory.addClient( client(12,"Doe") )
print my_directory
You'll get:
<directory>
<client id="5" name="Smith">
<client id="12" name="Doe">
</directory>
Much better, isn't it ?
This trick - which is not exclusive to Python - is handy for
debugging or logging.
For example, if your program goes tits ups, you can log the objects
states in a file for debugging purposes in the except
clause of a try/except block.
Good
practice: No
blank-check read()
When you read a file or a socket, you often use simply
.read(), such as:
# Read from a file:
file = open("a_file.dat","rb")
data = file.read()
file.close()
# Read from an URL:
import urllib
url = urllib.urlopen("http://sebsauvage.net")
html = url.read()
url.close()
But what happens if the file is 40 Gb, or the website sends data
non-stop ?
You program will eat all the system's memory, slow down to a crawl
and probably crash the system too.
You should always bound
your
read().
For example, I do not expect to process files larger than 10 Mb,
nor read HTML pages larger than 200 kb, so I would write:
# Read from a file:
file = open("a_file.dat","rb")
data = file.read(10000000)
file.close()
# Read from an URL:
import urllib
url = urllib.urlopen("http://sebsauvage.net")
html = url.read(200000)
url.close()
This way, I'm safe from buggy or malicious external data
sources.
Always be cautious when manipulating data you have no control over
!
...er, finally, be also
cautious with your own data, too.
Shit happens.
1.7
is different
than 1.7 ?
This is a common pitfall amongst novice programmers:
Never confuse data and it's representation on
screen.
When you see a floating number 1.7, you only see a textual representation
of the
binary data stored in
computer's
memory.
When you use a date, such as :
>>> import
datetime
>>> print datetime.datetime.now()
2006-03-21 15:23:20.904000
>>>
"2006-03-21 15:23:20.904000" is NOT the
date. It's a textual
representation of the date (The real date is
binary
data in the computer's memory).
The print statement seems to be trivial, but
it's not.
It involves complex work in order to create a human-readable
representation of various binary data formats. This is not trivial,
even for a simple integer.
This leads to pitfalls, such as:
a = 1.7
b = 0.9 + 0.8 # This should be 1.7
print a
print b
if a == b:
print "a and b are equal."
else:
print "a and b are different !"
What do you expect this code to print ? "a and b are equal ?".
You're wrong !
1.7
1.7
a and b are different !
How can this be ?
How can 1.7 be different than 1.7 ?
Remember the two "1.7" are just textual
representation of numbers,
which are almost
equal to
1.7.
The program says they are different because a and b are different at
the binary level.
Only their textual representation is the same.
Thus for comparing floating numbers, use the following tricks:
if abs(a-b)
< 0.00001:
print "a and b are equal."
else:
print "a and b are different !"
or even:
if str(a) == str(b):
print "a and b are equal."
else:
print "a and b are different !"
Why is 0.9+0.8 different than 1.7 ?
Because the computer can only handle bits, and you cannot precisely
represent all numbers in binary.
The computer is good a storing values such as 0.5 (which is 0.1 in
binary), or 0.125 (which is 0.001 in binary).
But it's not capable of storing the exact value 0.3 (because there
is no exact representation of 0.3 in binary).
Thus, as soon as you do a=1.7, a
does
not contain 1.7,
but only a
binary approximation of
the
decimal number 1.7.
Get
user's home directory
path
It's handy to store or retreive configuration files for your
programs.
import os.path
print os.path.expanduser('~')
Note that this also works under Windows. Nice !
(It points to the "Document and settings" user's folder, or even
the network folder if the user has one.)
Python's
virtual machine
Python - like Java or Microsoft .Net - has a virtual machine.
Python has a specific bytecode.
It's an machine language
like Intel 80386 or
Pentium machine language, but there is no physical microprocessor
capable of executing it.
The bytecode runs in a program which simulates a microprocessor:
a virtual
machine.
This is the same for Java and .Net. Java's virtual machine is
named JVM (Java Virtual Machine), and .Net's virtual machine is the
CLR (Common Language Runtime)
Let's have an example: mymodule.py
def myfunction(a):
print "I have ",a
b = a * 3
if b<50:
b = b + 77
return b
This no-nonsense program takes a number, displays it, multiplies it
by 3, adds 77 if the result is less than 50 and returns it.
(Granted, this is weird.)
Let's try it:
C:\>python
Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
>>> import mymodule
>>> print mymodule.myfunction(5)
I have 5
92
>>>
Ok, easy.
See the mymodule.pyc
file
which appeared ? This is the "compiled" version of our
module, the Python bytecode. This file contains instructions for
the Python virtual machine.
The .pyc files are automatically generated by Python whenever a
module is imported.
Python can directly run the .pyc files if you want. You could even
run the .pyc without the .py.
If you delete the .pyc file, it will be recreated from the .py.
If you update the .py source, Python will detect this change and
automatically update the corresponding .pyc.
Want to have a peek in the .pyc bytecode to see what it looks like
?
It's easy:
>>> import dis
>>> dis.dis(mymodule.myfunction)
2
0
LOAD_CONST
1 ('I have')
3 PRINT_ITEM
4
LOAD_FAST
0 (a)
7 PRINT_ITEM
8 PRINT_NEWLINE
3
9
LOAD_FAST
0 (a)
12
LOAD_CONST
2 (3)
15 BINARY_MULTIPLY
16
STORE_FAST
1 (b)
4
19
LOAD_FAST
1 (b)
22
LOAD_CONST
3 (50)
25
COMPARE_OP
0 (<)
28
JUMP_IF_FALSE
14 (to 45)
31 POP_TOP
5
32
LOAD_FAST
1 (b)
35
LOAD_CONST
4 (77)
38 BINARY_ADD
39
STORE_FAST
1 (b)
42
JUMP_FORWARD
1 (to 46)
>> 45
POP_TOP
6
>> 46
LOAD_FAST
1 (b)
49 RETURN_VALUE
>>>
You can see the virtual machine instructions (LOAD_CONST,
PRINT_ITEM, COMPARE_OP...) and their operands
(0 which is the reference of the
variable a,
1 which is the reference of variable
b...)
For example, line 3 of the source code is: b = a * 3
In Python bytecode, this translates to:
3
9
LOAD_FAST
0 (a) #
Load
variable a on the stack.
12
LOAD_CONST
2 (3) #
Load the
value 3 on the stack
15 BINARY_MULTIPLY
#
Multiply
them
16
STORE_FAST
1 (b) #
Store result
in variable b
Python also tries to optimise the code.
For example, the string "I have" will not be reused after line 2.
So Python decides to reuse the adresse of the string (1) for
variable b.
The list of instructions supported by the Python virtual machine is
at http://www.python.org/doc/current/lib/bytecodes.html
SQLite
- databases made
simple
SQLite
is a
tremendous
database
engine. I mean
it.
It has some drawbacks:
- Not designed for concurrent access (database-wide lock on
writing).
- Only works locally (no network service, although you can
use
things like sqlrelay).
- Does not handle foreign keys.
- No rights management (grant/revoke).
Advantages:
- very
fast (faster than
mySQL on most operations).
- Respects almost the whole SQL-92 standard.
- Does not require installation of a service.
- No database administration to perform.
- Does not eat computer memory and CPU when not in use.
- SQLite databases are compact
- 1 database = 1 file (easy to
move/deploy/backup/transfer/email).
- SQLite databases are portable across platforms (Windows,
MacOS,
Linux, PDA...)
- SQLite is ACID
(data consistency is assured even on computer failure or
crash)
- Supports transactions
- Fields can store Nulls, integers, reals (floats), text or
blob
(binary data).
- Can handle up to 2 Tera-bytes of data (although
going over
12 Gb is not recommended).
- Can work as a in-memory database (blazing performances !)
SQLite is very fast, very compact, easy to use. It's god gift for
local data processing (websites, data crunching, etc.).
Oh... and it's not only free,
it's also public domain
(no GPL license
headaches).
I love it.
SQLite engine can be accessed from a wide variety of languages.
(Thus SQLite databases are also a great way to exchange complex
data sets between programs written in different languages, even
with mixed numerical/text/binary data. No use to invent a special
file format or a complex XML schema with base64-encoded data.)
SQLite is embeded in Python 2.5.
For Python 2.4 and ealier, it must be installed
separately: http://initd.org/tracker/pysqlite
Here's the basics:
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
from sqlite3 import dbapi2 as sqlite
# Create a database:
con = sqlite.connect('mydatabase.db3')
cur = con.cursor()
# Create a table:
cur.execute('create table clients (id INT PRIMARY KEY, name
CHAR(60))')
# Insert a single line:
client = (5,"John Smith")
cur.execute("insert into clients (id, name) values (?, ?)", client
)
con.commit()
# Insert several lines at
once:
clients = [ (7,"Ella Fitzgerald"),
(8,"Louis Armstrong"),
(9,"Miles Davis")
]
cur.executemany("insert into clients (id, name) values (?, ?)",
clients )
con.commit()
cur.close()
con.close()
Now let's use the database:
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
from sqlite3 import dbapi2 as sqlite
# Connect to an existing
database
con = sqlite.connect('mydatabase.db3')
cur = con.cursor()
# Get row by row
print "Row by row:"
cur.execute('select id, name from clients order by name;')
row = cur.fetchone()
while row:
print row
row = cur.fetchone()
# Get all rows at once:
print "All rows at once:"
cur.execute('select id, name from clients order by name;')
print cur.fetchall()
cur.close()
con.close()
This outputs:
Row by row:
(7, u'Ella Fitzgerald')
(5, u'John Smith')
(8, u'Louis Armstrong')
(9, u'Miles Davis')
All rows at once:
[(7, u'Ella Fitzgerald'), (5, u'John Smith'), (8, u'Louis
Armstrong'), (9, u'Miles Davis')]
Note that creating a database and connecting to an existing one is
the same instruction (sqlite.connect()).
To manage your SQLite database, there is a nice freeware under
Windows: SQLiteSpy
(http://www.zeitungsjunge.de/delphi/sqlitespy/)
Hint 1: If
you use
sqlite.connect(':memory:'),
this creates an in-memory database. As there is no disk access,
this is a very very
fast
database.
(But make sure you have enough memory to handle your data.)
Hint 2: To
make your
program compatible with Python 2.5 and
Python 2.4+pySqlLite, do the
following:
try:
from sqlite3 import dbapi2 as sqlite
# For Python 2.5
except ImportError:
pass
if not sqlite:
try:
from pysqlite2
import dbapi2 as
sqlite # For
Python 2.4 and
pySqlLite
except ImportError:
pass
if not sqlite: # If module not imported successfully, raise
an error.
raise ImportError, "This module requires
either:
Python 2.5 or Python 2.4 with the pySqlLite module
(http://initd.org/tracker/pysqlite)"
# Then use it
con = sqlite.connect("mydatabase.db3")
...
This way, sqlite wil be properly imported whenever it's running
under Python 2.5 or Python 2.4.
Links:
Dive
into
Python
You're programming in Python ?
Then you should be reading Dive into
Pyhon.
The book is free.
Go read it.
No really.
Read it.
I can't imagine decent Python programing without reading this
book.
At least download it...
...now !
This is a must-read.
This book is available for free in different formats (HTML, PDF,
Word 97...).
Plenty of information, good practices, ideas, gotchas and snippets
about classes, datatypes, introspection, exceptions, HTML/XML
processing, unit testing, webservices, refactoring, whatever.
You'll thank yourself one day for having read this book.
Trust me.
Creating
a mutex under
Windows
I use a mutex in webGobbler
so that the
InnoSetup
uninstaller
knows webGobbler is still running (and that it shouldn't be
uninstalled while the program is still running).
That's a handy feature of InnoSetup.
CTYPES_AVAILABLE = True
try:
import ctypes
except ImportError:
CTYPES_AVAILABLE = False
WEBGOBBLER_MUTEX = None
if CTYPES_AVAILABLE and sys.platform=="win32":
try:
WEBGOBBLER_MUTEX=ctypes.windll.kernel32.CreateMutexA(None,False,"sebsauvage_net_webGobbler_running")
except:
pass
I perform an except:pass, because if the
mutex can't
be created, it's not a big deal for my program (It's only an
uninstaller issue).
Your mileage may vary.
This mutex will be automatically destroyed when the Python program
exits.
urllib2
and
proxies
With urllib2,
you can use
proxies.
#
The proxy
address and port:
proxy_info = { 'host' : 'proxy.myisp.com',
'port' : 3128
}
# We create a handler for
the
proxy
proxy_support = urllib2.ProxyHandler({"http" :
"http://%(host)s:%(port)d" % proxy_info})
# We create an opener
which uses
this handler:
opener = urllib2.build_opener(proxy_support)
# Then we install this
opener as
the default opener for urllib2:
urllib2.install_opener(opener)
# Now we can send our
HTTP
request:
htmlpage =
urllib2.urlopen("http://sebsauvage.net/").read(200000)
What is nice about this trick is that this will set the proxy
parameters for your whole
program.
If your proxy requires authentication, you can do it too !
proxy_info = { 'host' :
'proxy.myisp.com',
'port' : 3128,
'user' : 'John Doe',
'pass' : 'mysecret007'
}
proxy_support = urllib2.ProxyHandler({"http" :
"http://%(user)s:%(pass)s@%(host)s:%(port)d" % proxy_info})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
htmlpage =
urllib2.urlopen("http://sebsauvage.net/").read(200000)
(Code in this snippet was heavily inspired from
http://groups.google.com/groups?selm=mailman.983901970.11969.python-list%40python.org
)
Note that as of version 2.4.2 of Python, urllib2 only supports the
following proxy authentication methods: Basic and Digest.
If your proxy uses NTLM
(Windows/IE-specific), you're out of luck.
Beside
this trick,
there is a simplier way to set the proxy:
import os
os.environ['HTTP_PROXY'] = 'http://proxy.myisp.com:3128'
You can also do the same with
os.environ['FTP_PROXY'].
A
proper User-agent in
your HTTP requests
If you have a Python program which sends HTTP requests, the
netiquette says it should properly identify itself.
By default, Python uses a User-Agent such as: Python-urllib/1.16
You should change this.
Here's how to do it with urllib2:
request_headers = { 'User-Agent':
'PeekABoo/1.3.7' }
request = urllib2.Request('http://sebsauvage.net', None,
request_headers)
urlfile = urllib2.urlopen(request)
As a rule of thumb:
Error
handling with urllib2
You are using urllib/urllib2 and want to check for 404 and other
HTTP errors ?
Here's the trick:
try:
urlfile =
urllib2.urlopen('http://sebsauvage.net/nonexistingpage.html')
except urllib2.HTTPError,
exc:
if exc.code == 404:
print "Not found !"
else:
print "HTTP request
failed with error %d (%s)" % (exc.code, exc.msg)
except urllib2.URLError,
exc:
print "Failed because:", exc.reason
This way, you can check for 404 and other HTTP error codes.
Note that urllib2 will not raise an exception on 2xx and 3xx codes.
The exception urllib2.HTTPError will be raised with
4xx
and 5xx codes (which is the expected behaviour).
(Note also that HTTP 30x redirections will be automatically and
transparently handled by urllib2.)
urllib2: What am I
getting ?
When you send a HTTP request, this may return html, images, videos,
whatever.
In some cases you should check that the type of data you're
receiving is what
you expected.
To check the type of document you're receiving, look at the MIME
type (Content-type)
header:
urlfile =
urllib2.urlopen('http://www.commentcamarche.net/')
print "Document type is", urlfile.info().getheader("Content-Type","")
This will output:
Document type is text/html
Warning: You
may find other
info after
a semi-colon,
such as:
Document type
is text/html;
charset=iso-8859-1
So what you should always do is:
print "Document type is",
urlfile.info().getheader("Content-Type","").split(';')[0].strip()
to get only the "text/html" part.
Note that .info() will also give you other
HTTP response headers:
print "HTTP Response headers:"
print urlfile.info()
This would print things like:
Document type is Date: Thu, 23 Mar
2006
15:13:29 GMT
Content-Type: text/html; charset=iso-8859-1
Server: Apache
X-Powered-By: PHP/5.1.2-1.dotdeb.2
Connection: close
Reading
(and
writing) large XLS (Excel) files
In one of my projects, I had to read large XLS files.
Of course you can access all cells content through COM calls, but
it's painfully slow.
There's a simple trick: Simply ask Excel to open the XLS file and
save it in CSV, then use Python's CSV module to read the file !
This is the fastest way to read large XLS data files.
import os
import win32com.client
filename = 'myfile.xls'
filepath = os.path.abspath(filename) # Always make sure you use an
absolute path !
# Start Excel and open
the XLS
file:
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = True
workbook = excel.Workbooks.Open(filepath)
# Save as CSV:
xlCSVWindows
=0x17 #
from enum
XlFileFormat
workbook.SaveAs(Filename=filepath+".csv",FileFormat=xlCSVWindows)
# Close workbook and
Excel
workbook.Close(SaveChanges=False)
excel.Quit()
Hint: You can use this trick the other way round (generate
a
CSV in Python, open with Excel) to import a large quantity of data
into Excel. This is much
faster than
filling data cell by cell through COM calls.
Hint: When
using
excel.Workbooks.Open(), always make sure you
use an
asbolute path with os.path.abspath().
Hint: You
can also ask
excel to save as HTML, then parse the HTML with htmllib, sgmllib or
BeautifulSoup. You will be able to get more information, including
formatting, colors, cells span, document author or even formulas
!
Hint: For
Excel VBA
documentation, search *.chm in C:\Program Files\Microsoft
Office\
Example: For Excel 2000, it's C:\Program Files\Microsoft
Office\Office\1036\VBAXL9.CHM
Hint: If you
want to find
the corresponding VBA code for an action without hunting through
the VBA Help file, just record a macro of the action and open it
!
This will automatically generate the VBA code (which can be
easily translated into Python).
I created an example video of this trick (in French, sorry):
http://sebsauvage.net/temp/wink/excel_vbarecord.html
Hint:
Sometimes, you'll
need Excel constants. To get the list of constants:
- Run makepy.py
(eg.
C:\Python24\Lib\site-packages\win32com\client\makepy.py)
- In the list, choose "Microsoft
Excel 9.0 Object Library
(1.3)" (or similar) and click ok.
- Have a look in
C:\Python24\Lib\site-packages\win32com\gen_py\
directory.
You will find the wrapper (such as
00020813-0000-0000-C000-000000000046x0x1x3.py)
- Open this file: it contains Excel constants and their
values
(You can copy/paste them in your code.)
For example:
xlCSVMSDOS
=0x18 #
from enum
XlFileFormat
xlCSVWindows
=0x17 #
from enum
XlFileFormat
Hint: If you want to
import data
into Excel,
you can also
generate an HTML document in Python and ask Excel to open it.
You'll be able to set cell font colors, spanning, etc.
Saving
the stack
trace
Sometimes when you create an application, it's handy to have the
stack trace dumped in a log file for debugging purposes.
Here's how to do it:
import
traceback
def fifths(a):
return 5/a
def myfunction(value):
b = fifths(value) * 100
try:
print myfunction(0)
except Exception, ex:
logfile = open('mylog.log','a')
traceback.print_exc(file=logfile)
logfile.close()
print "Oops ! Something went wrong.
Please look
in the log file."
After running this program, mylog.log
contains:
Traceback (most recent call last):
File "a.py", line 10, in ?
print myfunction(0)
File "a.py", line 7, in myfunction
b = fifths(value) * 100
File "a.py", line 4, in fifths
return 5/a
ZeroDivisionError: integer division or modulo by zero
Hint: You can also simply use
traceback.print_exc(file=sys.stdout) to print
the
stacktrace on screen.
Hint: Mixing
this trick
with this one can
save your day.
Detailed error messages = bugs more easily spotted.
Filtering
out warnings
Sometimes, Python displays warning.
While they are usefull
and
should be taken care of,
you sometimes want to disable them.
Here's how to filter them:
import warnings
warnings.filterwarnings(action = 'ignore',message='.*?no locals\(\)
in functions bound by Psyco')
(I use to filter this specific Psyco warning.)
message is a
regular
expression.
Make sure you do not filter too
much, so that important information is not thrown away.
Saving
an
image as progressive JPEG with PIL
PIL
(Python
Imaging Library) is very good graphics library for image
manipulation (This is the library I used in webGobbler).
Here's how to save an Image object in
progressive
JPEG.
This may seem obvious, but hey...
myimage.save('myimage.jpg',option={'progression':True,'quality':60,'optimize':True})
(Assuming that myimage is an Image
PIL
object.)
Charsets
and encoding
( There is a french translation of this article: http://sebsauvage.net/python/charsets_et_encoding.html
)
If you think text = ASCII = 8 bits = 1 byte per character, you're
wrong.
That's short-sighted.
There is something every developer should know about, otherwise
this will bite you one day if you don't know better:
Charsets
and
encoding
Ok. Let me put this:
You know the computer is a big stupid machine. It knows
nothing
about alphabets or
even decimal numbers. A computer is a bit cruncher.
So when we have symbols such as the letter 'a' or the question mark
'?', we have to create binary
representation of these symbols for the computer.
That's the only way to store them in the computer's memory.
The
character set
First, we
have to choose
which number to use for each symbol. That's a simple table.
Symbol →
number
The usual suspect is ASCII.
In ASCII, the letter 'a' is the number 97. The question mark '?' is
the number 67.
But ASCII is far from a universal standard.
There are plenty of other character sets, such as EBCDIC, KOI8-R
for Russian characters, ISO-8852-1 for latin characters (accent
characters, for example), Big5 for traditional chinese, Shift_JIS
for Japanese, etc. Every country, culture, language has
developed its own character set. This is a big mess, really.
An international effort tries to standardise all this: UNICODE.
Unicode is a huge table which tells which number to use for
each symbol.
Some examples:
 |
 |
 |
 |
Unicode
table
0000 to 007F (0 to 127)
(Latin characters) |
Unicode
table
0080 to 00FF (128 to 255)
(Latin characters,
including accented characters) |
Unicode
table
0900 to 097F (2304 to 2431)
(devanagari) |
Unicode
table
1100 to 117F (4352 to 4479)
(hangul jamo) |
So the word "bébé"
(baby in
French) will translate to
these numbers: 98 233 98
233 (or 0062 00E9 0062 00E9 in 16 bits
hexadecimal).
The
encoding
Now we have all those numbers, we have to find a binary
representation for them.
Number →
Bits
ASCII uses the simple mapping: 1 ASCII code (0...127) = 1 byte
(8 or 7 bits). It's ok for ASCII, because ASCII uses only numbers
from 0 to 127. It fits in a byte.
But for Unicode and other charsets, that's a problem: 8 bits are not enough.
These charsets
require other encodings.
Most of them use a multi-byte encoding (a character is represented
by several bytes).
For Unicode,
there are
several encodings. The first one is the raw 16 bits Unicode. 16
bits (2 bytes) per character.
But as most texts only use the lower part of the Unicode table
(codes 0 to 127), that's huge waste of space.
That's why UTF-8
was
invented.
That's brilliant: For codes 0 to 127, simply use 1 byte per
character. Just like ASCII.
If you need special, less common characters (128 to 2047), use two
bytes.
If you need more specific characters (2048 to 65535), use three
bytes.
etc.
Unicode value
(in hexadecimal) |
Bits to output |
| 00000000 to 0000007F |
0xxxxxxx |
| 00000080 to 000007FF |
110xxxxx 10xxxxxx |
| 00000800 to 0000FFFF |
1110xxxx 10xxxxxx
10xxxxxx |
| 00010000 to 001FFFFF |
11110xxx 10xxxxxx
10xxxxxx
10xxxxxx |
| 00200000 to 03FFFFFF |
111110xx 10xxxxxx
10xxxxxx 10xxxxxx
10xxxxxx |
| 04000000 to 7FFFFFFF |
1111110x 10xxxxxx
10xxxxxx 10xxxxxx
10xxxxxx 10xxxxxx |
Thus for most latin texts, this will be as space-savvy as ASCII,
but you have the ability to use any
special Unicode character if you
want.
How's that ?
Let's
sum up all
this
| Symbol |
→ |
Number |
→ |
Bits |
|
charset |
|
encoding |
|
The charset
will tell you
which number to use for each symbol,
the encoding
will tell you
how to encode these numbers into bits.
One simple example is:
| é |
→ |
233 |
→ |
C3 A9 |
|
|
in
Unicode |
|
in UTF-8 |
For example the word "bébé" (baby
in French):
| bébé |
→ |
98 233 98 233 |
→ |
62 C3 A9 62 C3 A9 |
|
|
in
Unicode |
|
in UTF-8 |
If I receive the bits 62 C3 A9 62 C3 A9
without the
knowledge of the encoding
and the charset,
this will
be useless to me.
Clueless programers will display these bits as is:

then will ask "Why am I
getting
those strange characters ?".
You're not clueless, because you've just read this article.
Transmitting a text alone
is useless.
If
you
transmit a text, you must always also tell which
charset/encoding was
used.
That's also why many webpages are broken: They
do
not tell their charset/encoding.
Do you know that in this case all browsers try to guess the charset ?
That's bad.
Every webpage should have its encoding specified in HTTP headers or
in the HTML header itself, such as:
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
This is the same for
emails: Any
good email client will indicate which charset/encoding the text is
encoded in.
Hint: Some encodings are specific to some charsets. For example,
UTF-8 is only used for Unicode. So if I receive UTF-8 encoded data,
I know its charset is Unicode.
Python
and
Unicode
Python supports directly Unicode and UTF-8.
Use them as much as
possible.
Your programs will smoothly support international characters.
First, you should always indicate which charset/encoding your
Python source uses, such as:
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
Next, use Unicode strings in your programs (use the 'u' prefix):
badString = "Bad string !"
bestString = u"Good unicode string."
anotherGoodString = u"Ma vie, mon \u0153uvre."
( \u0153 is the unicode character "œ". (0153 is the code for
"œ"). The "œ" character is in the latin-1 section of
the charts:
http://www.unicode.org/charts/
)
To convert a standard string to Unicode, do:
myUnicodeString =
unicode(mystring)
or
myUnicodeString = mystring.decode('iso-8859-1')
To convert a Unicode string to a specific charset:
myString =
myUnicodeString.encode('iso-8859-1')
The list of charsets/encodings supported by Python are available at
http://docs.python.org/lib/standard-encodings.html
Don't
forget than
when you print, you use the charset of the
console
(stdout). So sometimes printing a Unicode string can fail, because
the string may
contain Unicode characters which are not available in the charset
of your operating system console.
Let me put it again: A
simple
print instruction can fail.
Example, with the french word "œuvre":
>>> a =
u'\u0153uvre'
>>> print a
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "c:\python24\lib\encodings\cp437.py", line 18, in
encode
return
codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character
u'\u0153' in position 0: character maps to <undefined>
Python is telling you that the Unicode character 153 (œ)
has no equivalent in the charset your operating system console
uses.
To see which charset your console supports, you can do:
>>> import sys
>>> print sys.stdout.encoding
cp437
So to make sure you print without error, you can do:
>>> import sys
>>> a = u'\u0153uvre'
>>> print a.encode(sys.stdout.encoding,'replace')
?uvre
>>>
Unicode characters which cannot be displayed by the console will be
converted to '?'.
Special note:
When dealing
with external sources (files, databases, stdint/stdout/stderr, API
such as Windows COM or registry, etc.) be carefull: Some of these
will not communicate in Unicode, but in some special charset. You
should properly convert to and from Unicode accordingly.
For
example, to
write Unicode strings to an UTF-8 encoded file, you can do:
>>> a =
u'\u0153uvre'
>>> file = open('myfile.txt','w')
>>> file.write( a.encode('utf-8') )
>>> file.close()
Reading the same file:
>>> file =
open('myfile.txt','r')
>>> print file.read()
œuvre
>>>
Oops...
you see
there's a problem here. We opened the file but we didn't specify
the encoding when reading. That's why we get this "œ"
garbage (which is UTF-8 codes).
Let's decode the UTF-8:
>>>
file=open('myfile.txt','r')
>>> print repr( file.read().decode('utf-8') )
u'\u0153uvre'
>>>
There, we got it right. That's our "œuvre" word.
Remember
our console does not support
the \u0153 character ? (That's why we used
repr().)
So let's encode the string in a charset supported by our
console:
>>> import sys
>>> file=open('myfile.txt','r')
>>> print
file.read().decode('utf-8').encode(sys.stdout.encoding,'replace')
?uvre
>>>
Yes, this looks cumbersome.
But don't forget we are translating between 3 modes: UTF-8 (the
input file),
Unicode (the Python object) and cp437 (the output console
charset).
| UTF-8 |
→ |
Unicode |
→ |
cp437 |
|
The input file. |
.decode('utf-8')
|
The Python unicode string. |
.encode('cp437') |
The console. |
That's why we have to explicitely
convert between
encodings.
Explicit is better than implicit.
Links:
Iterating
A shorter syntax
When you come from other languages, you are tempted to use these
other languages' constructs.
For example, when iterating over the elements of a table, you would
probably iterate using an index:
countries =
['France','Germany','Belgium','Spain']
for i in range(0,len(countries)):
print countries[i]
or
countries =
['France','Germany','Belgium','Spain']
i = 0
while i<len(countries):
print countries[i]
i = i+1
It's better to use iterators:
countries =
['France','Germany','Belgium','Spain']
for country in countries:
print country
It does the same thing, but:
- You've spared a variable (i).
- The code is more compact.
- It's more readable.
"for country in countries" is almost plain
English.
The same is true for other things, like reading lines from a text
file. So instead for doing:
file = open('file.txt','r')
for line in
file.readlines():
print line
file.close()
Simply do:
file = open('file.txt','r')
for line in file:
print line
file.close()
These kind of constructs can help to keep code shorter and more readable.
Iterating with
multiple items
It's also easy to iterate over multiple items at once.
data = [ ('France',523,'Jean
Dupont'),
('Germany',114,'Wolf Spietzer'),
('Belgium',227,'Serge Ressant')
]
for
(country,nbclients,manager) in
data:
print
manager,'manages',nbclients,'clients
in',country
This also applies to dictionnaries (hashtables). For example, you
could iterate over a dictionnary like this:
data = { 'France':523,
'Germany':114,
'Belgium':227 }
for country in data: # This is the same as for
country
in data.keys()
print 'We have',data[country],'clients
in',country
But it's better to do it this way:
data = { 'France':523,
'Germany':114,
'Belgium':227 }
for (country,nbclients)
in
data.items():
print 'We have',nbclients,'clients
in',country
because you spare a hash for each entry.
Creating
iterators
It's easy to create your own iterators.
For example, let's say we have a clients file:
COUNTRY
NBCLIENTS
France 523
Germany 114
Spain
127
Belgium 227
and we want a class capable of reading this file format. It
must return the country and the number of clients.
We create a clientFileReader
class:
class clientFileReader:
def __init__(self,filename):
self.file=open(filename,'r')
self.file.readline()
# We discard the first line.
def close(self):
self.file.close()
def
__iter__(self):
return self
def
next(self):
line =
self.file.readline()
if not line:
raise StopIteration()
return ( line[:13],
int(line[13:]) )
To create an iterator:
- Create a
__iter__()
method which returns the iterator (which happen to be ourselves
!)
- The iterator must have a
next() method which
returns the next
item.
- The
next() method must raise the StopIteration()
exception when no more
data is available.
It's as simple as this !
Then we can simply use our file reader as:
clientFile =
clientFileReader('file.txt')
for (country,nbclients)
in
clientFile:
print 'We have',nbclients,'clients
in',country
clientFile.close()
See ?
"for (country,nbclients)
in
clientFile:" is a higher level construct which makes the
code much more readable and hides the complexity of the underlying
file format.
This is much better than chopping file lines in the main loop.
Parsing
the
command-line
It's not recommended to try to parse the command-line
(sys.argv) yourself. Parsing the command-line is not
as trivial as
it seems to be.
Python has two good modules dedicated to command-line parsing:
getopt ant optparse.
They do their job very well (They take care of mundane tasks such
as parameters quoting, for example).
optparse is
the new, more
Pythonic and OO module. Yet I often prefer getopt.
We'll see both.
Ok, let's create a program which is supposed to reverses all lines in a text file.
Our program has:
- a mandatory argument:
file,
the file to process.
- an optional parameters with value:
-o to specify an
output file (such as
-o myoutputfile.txt)
- an optional parameter without
value:
-c
to capitalize all letters.
- an optional parameters:
-h
to display program help.
getopt
Let's do it with getopt
first:
import sys
import getopt
if __name__ == "__main__":
opts, args = None, None
try:
opts, args =
getopt.getopt(sys.argv[1:], "hco:",["help",
"capitalize","output="])
except getopt.GetoptError, e:
raise 'Unknown argument
"%s" in command-line.' % e.opt
for option, value in opts:
if option in
('-h','--help'):
print 'You asked for the program help.'
sys.exit(0)
if option in
('-c','--capitalize'):
print "You used the --capitalize option !"
elif option in
('-o','--output'):
print "You used the --output option with value",value