Python 3 filtering directories by name that matches specific pattern

Currently I'm developing script that will perform cleanup of specific directories.

For example: Directory: /app/test/log contains many sub-directories with name pattern testYYYYMMDD and logYYYYMMDD

What I need, is to filter out only directories like testYYYYMMDD

To get all folders with absolute path that are in given directory I use:

folders_in_given_folder = [name for name in os.listdir(Directory) if os.path.isdir(os.path.join(Directory, name))] folder_list = [] for folder in folders_in_given_folder: folder_list.append([os.path.join(Directory, folder)]) print(folder_list)

Gives output:

[['/app/test/log/test20150615'], ['/app/test/log/test20150616'], ['/app/test/log/b'], ['/app/test/log/a'], ['/app/test/log/New folder'], ['/app/test/log/rem'], ['/app/test/log/test']]

So now I need to filter out sub-directories that fits pattern, pattern can be something like: *test*, test*, test2015*

I've tried using glob.glob(), but this seems to work only with files not directories.

Could someone please be so kind and explain how I could achieve desired outcome?


import os
import re

result = []
reg_compile = re.compile("test\d{8}")
for dirpath, dirnames, filenames in os.walk(myrootdir):
result = result + [dirname for dirname in dirnames if reg_compile.match(dirname)]

As advised I will explain (thanks for the -1 btw :D)

the compile("test\d{8}) will prepare a regex that matches any folder named test followed by a date with a 8 numbers format.

Then I take advantage of the os.walk method to have every folder properly in the folders iterator (thus avoiding using the method is_dir)

With the line [dirname for dirname in dirnames if reg_compile.match(dirname)] I filter the folder whose name matches the regular expression explained above.

For a first answer (yes it was the first) that works (tested on my computer for python2 and python3) I find it harsh to be downvoted. Also the accepted answers contains the same kind regular expression I used. Now I also agree that I should have had explained earlier.

Would you be kind enough to remove that downvote ?

You need to use re module. re module is regexp python module. re.compile creates re object and you can use match method to filter list.

import re
R = re.compile(pattern)
filtered = [folder for folder in folder_list if R.match(folder)]

As a pattern you can use smth like this:

>>> R = re.compile(".*test.*")
>>> R.match("1test")
<_sre.SRE_Match object at 0x024ED800>
>>> R.match("1test")
<_sre.SRE_Match object at 0x024ED598>
>>> R.match("test2015")
<_sre.SRE_Match object at 0x024ED800>
>>> R.match("1test2")
<_sre.SRE_Match object at 0x024ED598>

Python 3.4.2 (default, Oct 8 2014, 13:08:17)
>>> import re
>>> re.match(r'.*/[^/]*test[^/]*$', '/app/test/log/test20150616')
<_sre.SRE_Match object; span=(0, 26), match='/app/test/log/test20150616'>

Regular expression r'.*/[^/]*test[^/]*$' means matching any path that ends with /*test* where * as anything except /.

Category:python Time:2018-12-20 Views:0

Related post

Copyright (C), All Rights Reserved.

processed in 3.586 (s). 14 q(s)