The os.walk
function
The walk
function of the os
module takes one required argument and several optional ones. The address of a directory must be passed as a required argument.
The walk()
function returns a generator object from which to get tuples. Each tuple "describes" one subdirectory from the directory tree passed to the function.
Each tuple has three items:
- Address of the current directory as a string.
- List of subdirectories names of the first level nesting in this directory. If there are no subdirectories, the list will be empty.
- List of file names of the first level nesting in the given directory. If there are no files, the list will be empty.
Let's say we have a directory tree like this:
Pass the test directory to the os.walk()
function:
import os
tree = os.walk('test')
print(tree)
for i in tree:
print(i)
<generator object walk at 0x7fa36d013740>
('test', ['cgi-bin'], ['index.html', 'dgs.png'])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])
If you pass an absolute address, the directory addresses will also be absolute:
import os
for i in os.walk('/home/pl/test'):
print(i)
('/home/pl/test', ['cgi-bin'], ['index.html', 'dgs.png'])
('/home/pl/test/cgi-bin', ['another', 'backup'], ['hello.py'])
('/home/pl/test/cgi-bin/another', [], ['data.txt'])
('/home/pl/test/cgi-bin/backup', [], [])
Because walk()
returns a generator, you can't retrieve data from it again. Therefore, if there is a need to store tuples, the generator can be "turned" into a list of tuples:
import os
tree = list(os.walk('test'))
for i in tree:
print(i)
('test', ['cgi-bin'], ['index.html', 'dgs.png'])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])
To get the full address of a file (absolute or relative), use the os.path.join
function:
import os.path
for address, dirs, files in os.walk('test'):
for name in files:
print(os.path.join(address, name))
test/index.html
test/dgs.png
test/cgi-bin/hello.py
test/cgi-bin/another/data.txt
The variable address on each iteration is associated with the first item of the current tuple (the string containing the address of the directory), dirs - with the second item (the list of subdirectories), and files - with a list of files in this directory. The nested loop retrieves the name of each file from the list of files.
The walk
function has a topdown
argument that defaults to True
. If you assign False
, then the directory tree will be traversed not "top-down" (from root to nested), but vice versa - "bottom-up" (subdirectories will be first).
import os
tree = os.walk('test', topdown=False)
for i in tree:
print(i)
('test/cgi-bin/another', [], ['data.txt'])
('test/cgi-bin/backup', [], [])
('test/cgi-bin', ['another', 'backup'], ['hello.py'])
('test', ['cgi-bin'], ['index.html', 'dgs.png'])