At a client we have a huge directory of files. I wanted to list the first few files. ls -l | head took ages as it first lists all the files and only then cuts it down. After my first attempts in Python failed I wrote a Perl one-liner to list the first elements of a huge directory. However I wanted to see if I can do it with Python in some other way.
using iterdir of pathlib
The original attempt in Python was using the iterdir method of pathlib.
examples/python/list_dir_using_iterdir.py
import pathlib
path = pathlib.Path("/home/gabor/work/code-maven.com/sites/en/pages/")
count = 0
for thing in path.iterdir():
count += 1
print(thing)
if count > 3:
break
On the real data it took 47 minutes to run.
using walk of os
The second attempt was to use the walk method of os.
examples/python/list_dir_using_walk.py
import os
path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0
for dirname, dirs, files in os.walk(path):
for filename in files:
print(os.path.join(dirname, filename))
count += 1
if count > 3:
exit()
I don't know how long this would take. I stopped it after a minute.
using scandir of os
Finally I found the scandir method of os. That did the trick:
examples/python/list_dir_using_scandir.py
import os
path = "/home/gabor/work/code-maven.com/sites/en/pages/"
count = 0
with os.scandir(path) as it:
for entry in it:
print(entry.name)
count += 1
if count > 3:
exit()
using scandir and a range
After getting an improvement suggestion for my solution in Perl I thought I can use the same idea here too. I assume that there are at least 3 element in this folder or I'll get a StopIteration exception calling next, but besides that this works.
examples/python/list_dir_using_scandir_range.py
import os
path = "/home/gabor/work/code-maven.com/sites/en/pages/"
with os.scandir(path) as it:
for _ in range(3):
print(it.__next__().name)