Brief Notes
Noteworthy links, summaries and concise prose on stuff.

Python development

All about developing in Python.

Use Python to touch a file and get its mtime

Timing a Python function/module/process/program/script with the time.time() function

operator.attrgetter

PyYAML safe_dump

Python 3: List all the keys in a dictionary

Noteworthy links

python script analyzing millions of lines - how to improve its performance? - Code Review Stack Exchange - covers timeit, cProfile, defaultdict, an alternative to using strptime & strftime, and a string formatting trick, to reduce the time taken for a data analysis operation from 3 hours to 20 minutes.


More Pythonic way of counting things in a heavily nested defaultdict - Stack Overflow

Merging nested defaultdicts - Stack Overflow

Autovivification in Python: nested defaultdicts with a specific final type

[Nested dictionaries in python]{http://ohuiginn.net/mt/2010/07/nested_dictionaries_in_python.html}

Autovivification - Wikipedia


spotify/luigi · GitHub:

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Luigi’s visualizer gives a nice visual overview of the dependency graph of a workflow for a pipeline of batch jobs:

Copyright: spotify/luigi
Luigi's visualizer gives a nice visual overview of the dependency graph of a workflow for a pipeline of batch jobs


Which Python framework - Pyramid or Django? [Flask vs. Django vs. Pyramid vs. Plone] is a great article that delves into this question, and be sure to read the comments too.


Bunch - The simple but handy “collector of a bunch of named stuff” class:

Often we want to just collect a bunch of stuff together, naming each item of the bunch; a dictionary’s OK for that, but a small do-nothing class is even handier, and prettier to use.

class Bunch:
    def __init__(self, **kwds):
        self.__dict__.update(kwds)

# that's it!  Now, you can create a Bunch
# whenever you want to group a few variables:

point = Bunch(datum=y, squared=y*y, coord=x)

# and of course you can read/write the named
# attributes you just created, add others, del
# some of them, etc, etc:
if point.squared > threshold:
    point.isok = 1

Dictionaries are fine for collecting a small bunch of stuff, each item with a name; however, when names are constants and to be used just like variables, the dictionary-access syntax (“if bunch[‘squared’] > threshold”, etc) is not maximally clear; it takes VERY little effort to build a little class, as in the ‘Bunch’ example above, that will both ease the initialization task and provide elegant attribute-access syntax (“if bunch.squared > threshold”, etc).