Python Notes

The os.walk function

The walk function of the os module takes one required argument and several optional ones. The address of a directory must be passed as a required argument.

The walk() function returns a generator object from which to get tuples. Each tuple "describes" one subdirectory from the directory tree passed to the function.

Each tuple has three items:

  1. Address of the current directory as a string.
  2. List of subdirectories names of the first level nesting in this directory. If there are no subdirectories, the list will be empty.
  3. List of file names of the first level nesting in the given directory. If there are no files, the list will be empty.

Let's say we have a directory tree like this:

Directory tree example

Pass the test directory to the os.walk() function:

import os

tree = os.walk('test')
print(tree)

for i in tree:
    print(i)
<generator object walk at 0x7fa36d013740>
('test', ['cgi-bin'], ['index.html', 'dgs.png'])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])

If you pass an absolute address, the directory addresses will also be absolute:

import os

for i in os.walk('/home/pl/test'):
    print(i)
('/home/pl/test', ['cgi-bin'], ['index.html', 'dgs.png'])
('/home/pl/test/cgi-bin', ['another', 'backup'], ['hello.py'])
('/home/pl/test/cgi-bin/another', [], ['data.txt'])
('/home/pl/test/cgi-bin/backup', [], [])

Because walk() returns a generator, you can't retrieve data from it again. Therefore, if there is a need to store tuples, the generator can be "turned" into a list of tuples:

import os

tree = list(os.walk('test'))

for i in tree:
    print(i)
('test', ['cgi-bin'], ['index.html', 'dgs.png'])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])

To get the full address of a file (absolute or relative), use the os.path.join function:

import os.path

for address, dirs, files in os.walk('test'):
    for name in files:
        print(os.path.join(address, name))
test/index.html
test/dgs.png
test/cgi-bin/hello.py
test/cgi-bin/another/data.txt

The variable address on each iteration is associated with the first item of the current tuple (the string containing the address of the directory), dirs - with the second item (the list of subdirectories), and files - with a list of files in this directory. The nested loop retrieves the name of each file from the list of files.

The walk function has a topdown argument that defaults to True. If you assign False, then the directory tree will be traversed not "top-down" (from root to nested), but vice versa - "bottom-up" (subdirectories will be first).

import os

tree = os.walk('test', topdown=False)

for i in tree:
    print(i)
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test', ['cgi-bin'], ['index.html', 'dgs.png'])