Wednesday, March 19, 2014

Flatten a list of lists with a list comprehension

By Vasudev Ram

Recently, I saw this comment thread on Hacker News:

Python Language Features and Tricks (sahandsaba.com), which was about this post:

30 Python Language Features and Tricks You May Not Know About

One of the comments on the HN thread was this:

Except from it:

"I don't understand how this one to flatten lists works:
a = [[1, 2], [3, 4], [5, 6]]
[x for l in a for x in l]
Can somebody explain what the order of operations is here and what the variables refer to in the various stages of evaluation?"

Another HN user answered the question, and I also commented on it.

But the interesting part was where another user posted below, in the same thread, this comment:

"So that flattening can also be written as:
x = [[1,2], [3,4], [5,6]]
    [x for x in x for x in x]
"
The interesting part here (to me) was not just the use of a list comprehension to flatten the list, but the fact that the same variable is used throughout the list comprehension; in fact it is used 5 times in the expression.

I tried it out, and it works.

Then I thought of checking whether the same kind of code would work for a triply-nested list, and found that it did work:

# Flatten a list that is nested to three levels with a list comprehension.
x = [[[1,2], [3,4]], [[5,6], [7,8]]]
print "before flatten, x =", x
y = [x for x in x for x in x for x in x]
print "after flatten, y =", y

That raises the question of how the same variable name (x) can be used multiple times in the same scope to mean different things, as in the above two code fragments (for the doubly- and triply-nested lists). The explanation I can think of is that the different uses of x in the list comprehension are actually in different scopes, even though part of the same expression. Not sure if that is the right answer or not.

Update: I tried to check my explanation more by writing this - a normal for loop, not a list comprehension, which also uses the name x multiple times:
x = range(5)
for x in x:
    print x
And it worked. So it looks like my explanation (that there are different scopes) may be right, or at least close.

The HN user (jamesdutc) who posted the code with "[ x for x in x for x in x ]" has a blog here with more Python posts that look interesting:

seriously.dontusethiscode.com

Read other Python posts on my blog.

- Vasudev Ram - Dancing Bison Enterprises

Contact Page

4 comments:

Toby Sargeant said...

I think this should explain what's happening:

Importantly the LOAD_FAST of x occurs before the loop starts, and the STORE_FAST affects the same x. So in summary, there's no scope, but the value of x is computed before the loop starts assigning to x.

(sorry about the formatting, code and pre tags aren't allowed)

>>> def f():
... x = [1,2,3]
... for x in x:
... pass
...
>>> import dis
>>> dis.disassemble(f.__code__)
2 0 LOAD_CONST 1 (1)
3 LOAD_CONST 2 (2)
6 LOAD_CONST 3 (3)
9 BUILD_LIST 3
12 STORE_FAST 0 (x)

3 15 SETUP_LOOP 14 (to 32)
18 LOAD_FAST 0 (x)
21 GET_ITER
>> 22 FOR_ITER 6 (to 31)
25 STORE_FAST 0 (x)

4 28 JUMP_ABSOLUTE 22
>> 31 POP_BLOCK
>> 32 LOAD_CONST 0 (None)
35 RETURN_VALUE

Daniel Pope said...

Great quirk! There is no scoping here; I think the reason it works is because each assignment to x is read immediately and doesn't need to be re-read.

Roger said...

each for loop doesn't get it's own
scope. Just the first line of the
for loop crates it own iterable of the x (eg creates a hidden version of iter(x) (let call it y).

Then each time round the loop at start of the loop this assign is made:-
x = y.next()


The following demonstrates:
-although blogger has probably mangled the indentation

x = [ [1,2], [3,4],[5,6]]

for x in x:
print "L1i",x
for x in x:
print "l2:",x
print "L1o",x
print x

Notice the x after the loops is still set to 6.

Vasudev Ram said...

Thanks, guys, for the comments.