The os.walk function
The walk function of the os module takes one required argument and several optional ones. The address of a directory must be passed as a required argument.
The walk()
function returns a generator object from which to get tuples. Each tuple "describes" one subdirectory from the directory tree passed to the function.
Each tuple has three items:
- Address of the current directory as a string.
- List of subdirectories names of the first level nesting in this directory. If there are no subdirectories, the list will be empty.
- List of file names of the first level nesting in the given directory. If there are no files, the list will be empty.
Let's say we have a directory tree like this:
Pass the test directory to the os.walk()
function:
import os tree = os.walk('test') print(tree) for i in tree: print(i)
<generator object walk at 0x7fa36d013740> ('test', ['cgi-bin'], ['index.html', 'dgs.png']) ('test/cgi-bin', ['another', 'backup'], ['hello.py']) ('test/cgi-bin/another', [], ['data.txt']) ('test/cgi-bin/backup', [], [])
If you pass an absolute address, the directory addresses will also be absolute:
import os for i in os.walk('/home/pl/test'): print(i)
('/home/pl/test', ['cgi-bin'], ['index.html', 'dgs.png']) ('/home/pl/test/cgi-bin', ['another', 'backup'], ['hello.py']) ('/home/pl/test/cgi-bin/another', [], ['data.txt']) ('/home/pl/test/cgi-bin/backup', [], [])
Because walk()
returns a generator, you can't retrieve data from it again. Therefore, if there is a need to store tuples, the generator can be "turned" into a list of tuples:
import os tree = list(os.walk('test')) for i in tree: print(i)
('test', ['cgi-bin'], ['index.html', 'dgs.png']) ('test/cgi-bin', ['another', 'backup'], ['hello.py']) ('test/cgi-bin/another', [], ['data.txt']) ('test/cgi-bin/backup', [], [])
To get the full address of a file (absolute or relative), use the os.path.join function:
import os.path for address, dirs, files in os.walk('test'): for name in files: print(os.path.join(address, name))
test/index.html test/dgs.png test/cgi-bin/hello.py test/cgi-bin/another/data.txt
The variable address on each iteration is associated with the first item of the current tuple (the string containing the address of the directory), dirs - with the second item (the list of subdirectories), and files - with a list of files in this directory. The nested loop retrieves the name of each file from the list of files.
The walk function has a topdown argument that defaults to True. If you assign False, then the directory tree will be traversed not "top-down" (from root to nested), but vice versa - "bottom-up" (subdirectories will be first).
import os tree = os.walk('test', topdown=False) for i in tree: print(i)
('test/cgi-bin/another', [], ['data.txt']) ('test/cgi-bin/backup', [], []) ('test/cgi-bin', ['another', 'backup'], ['hello.py']) ('test', ['cgi-bin'], ['index.html', 'dgs.png'])