Tìm tất cả các tệp trong một thư mục có đuôi .txt trong Python

1043

Làm thế nào tôi có thể tìm thấy tất cả các tệp trong một thư mục có phần mở rộng .txttrong python?

python file-io

2357

Bạn có thể sử dụng glob:

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

hoặc đơn giản là os.listdir:

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

hoặc nếu bạn muốn duyệt qua thư mục, hãy sử dụng os.walk:

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))

— ghostdog74
nguồn

11

Sử dụng giải pháp số 2, bạn sẽ tạo một tệp hoặc danh sách với thông tin đó như thế nào?

— Merlin

72

@ ghostdog74: Theo tôi, nó sẽ thích hợp để viết for file in fhơn là for files in fvì trong biến là một tên tệp duy nhất. Thậm chí tốt hơn là thay đổi fthành filesvà sau đó các vòng lặp cho có thể trở thành for file in files.

— martineau

45

@computermacgyver: Không, filekhông phải là một từ dành riêng, chỉ là tên của một hàm được xác định trước, vì vậy hoàn toàn có thể sử dụng nó làm tên biến trong mã của riêng bạn. Mặc dù đúng là nói chung người ta nên tránh va chạm như vậy, filelà một trường hợp đặc biệt vì hầu như không cần sử dụng nó, vì vậy nó thường được coi là một ngoại lệ đối với hướng dẫn. Nếu bạn không muốn làm điều đó, PEP8 khuyên bạn nên thêm một dấu gạch dưới vào các tên như vậy, nghĩa là file_bạn phải đồng ý vẫn có thể đọc được.

— martineau

9

Cảm ơn, martineau, bạn hoàn toàn đúng. Tôi đã nhảy quá nhanh để kết luận.

— máy tính điện tử

40

Một cách Pythonic hơn cho # 2 có thể dành cho tệp trong [f cho f trong os.listdir ('/ mydir') nếu f.endswith ('. Txt')]:

— ozgur

247

Sử dụng toàn cầu .

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

— Muhammad Alkarouri
nguồn

Điều này không chỉ dễ dàng, mà còn là trường hợp không nhạy cảm. (Ít nhất, nó có trên Windows, như nó phải vậy. Tôi không chắc chắn về các hệ điều hành khác.)

— Jon Coombs

35

Coi chừng globkhông thể tìm thấy các tệp đệ quy nếu con trăn của bạn dưới 3,5. thêm thông tin

— qun

phần tốt nhất là bạn có thể sử dụng kiểm tra biểu thức chính quy * .txt

— Alex Punnen

@JonCoombs không. Ít nhất là không có trên Linux.

— Karuhanga

157

Một cái gì đó như thế sẽ làm công việc

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print file

— Adam Byrtek
nguồn

73

+1 để đặt tên biến của bạn root, dirs, filesthay vì r, d, f. Dễ đọc hơn nhiều.

— Clément

27

Lưu ý rằng đây là trường hợp nhạy cảm (sẽ không phù hợp .TXT hoặc .txt), vì vậy có thể bạn sẽ muốn làm gì nếu file.lower () endswith ( 'txt.'):.

— Jon Coombs

1

câu trả lời của bạn đề cập đến thư mục con.

— Sam Liao

117

Một cái gì đó như thế này sẽ hoạt động:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

— Seth
nguồn

Làm thế nào tôi có thể lưu đường dẫn đến text_files? ['path / euc-cn.txt', ... 'path / windows-950.txt']

— IceQueeny

5

Bạn có thể sử dụng os.path.jointrên từng yếu tố của text_files. Nó có thể là một cái gì đó như text_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.txt')].

— Seth

55

Bạn chỉ có thể sử dụng pathlibs ¹ :glob

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

hoặc trong một vòng lặp:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

Nếu bạn muốn nó đệ quy bạn có thể sử dụng .glob('**/*.txt)

¹ Các pathlibmô-đun đã được đưa vào thư viện chuẩn trong python 3.4. Nhưng bạn có thể cài đặt các cổng sau của mô-đun đó ngay cả trên các phiên bản Python cũ hơn (tức là sử dụng condahoặc pip): pathlibvà pathlib2.

— MSeifert
nguồn

**/*.txtkhông được hỗ trợ bởi các phiên bản python cũ. Vì vậy, tôi đã giải quyết vấn đề này bằng: foundfiles= subprocess.check_output("ls **/*.txt", shell=True) for foundfile in foundfiles.splitlines(): print foundfile

— Roman

1

@Roman Vâng, đó chỉ là một chương trình giới thiệu những gì pathlibcó thể làm và tôi đã bao gồm các yêu cầu phiên bản Python. :) Nhưng nếu cách tiếp cận của bạn chưa được đăng, tại sao không thêm nó vào câu trả lời khác?

— MSeifert

1

vâng, đăng một câu trả lời sẽ cho tôi khả năng định dạng tốt hơn, chắc chắn. Tôi đăng nó ở đó bởi vì tôi nghĩ rằng đây là một nơi thích hợp hơn cho nó.

— La Mã

5

Lưu ý rằng bạn cũng có thể sử dụng rglobnếu bạn muốn tìm các mục đệ quy. Ví dụ:.rglob('*.txt')

— Bram Vanroy

40

import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]

— người dùng3281344
nguồn

29

Tôi thích os.walk () :

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

Hoặc với máy phát điện:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

— hughdbrown
nguồn

28

Đây là nhiều phiên bản tương tự tạo ra kết quả hơi khác nhau:

global.iglob ()

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

global.glob1 ()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter ()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

— jfs
nguồn

3

Đối với người tò mò, glob1()là một hàm trợ giúp trong globmô-đun không được liệt kê trong tài liệu Python. Có một số ý kiến nội tuyến mô tả những gì nó làm trong tệp nguồn, xem .../Lib/glob.py.

— martineau

1

@martineau: glob.glob1()không công khai nhưng nó có sẵn trên Python 2.4-2.7; 3.0-3.2; kim tự tháp; jython github.com/ened/test_glob1

— jfs

1

Cảm ơn, đó là thông tin bổ sung cần có khi quyết định có nên sử dụng chức năng riêng tư không có giấy tờ trong mô-đun hay không. ;-) Đây là một chút nữa. Phiên bản Python 2.7 chỉ dài 12 dòng và có vẻ như nó có thể dễ dàng trích xuất từ globmô-đun.

— martineau

21

path.py là một lựa chọn khác: https://github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

— Anuvrat Parashar
nguồn

Thật tuyệt, nó cũng chấp nhận biểu hiện đều đặn trong mẫu. Tôi đang sử dụng for f in p.walk(pattern='*.txt')đi qua tất cả các thư mục con

— Kostanos

1

Ya cũng có pathlib. Bạn có thể làm một cái gì đó như: list(p.glob('**/*.py'))

— user2233949

15

Python v3,5

Phương pháp nhanh sử dụng os.scandir trong hàm đệ quy. Tìm kiếm tất cả các tệp có phần mở rộng được chỉ định trong thư mục và thư mục con.

import os

def findFilesInFolder(path, pathList, extension, subFolders = True):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:        Base directory to find files
    pathList:    A list that stores all paths
    extension:   File extension to find
    subFolders:  Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    """

    try:   # Trapping a OSError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and entry.path.endswith(extension):
                pathList.append(entry.path)
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                pathList = findFilesInFolder(entry.path, pathList, extension, subFolders)
    except OSError:
        print('Cannot access ' + path +'. Probably a permissions error')

    return pathList

dir_name = r'J:\myDirectory'
extension = ".txt"

pathList = []
pathList = findFilesInFolder(dir_name, pathList, extension, True)

Cập nhật tháng 4 năm 2019

Nếu bạn đang tìm kiếm trên các thư mục chứa 10.000 tệp, việc thêm vào danh sách sẽ không hiệu quả. "Mang lại kết quả" là một giải pháp tốt hơn. Tôi cũng đã bao gồm một chức năng để chuyển đổi đầu ra thành Dữ liệu Pandasrame.

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]

    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .

    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 

                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]

                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path

            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """

    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })

    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

— DougR
nguồn

14

Python có tất cả các công cụ để làm điều này:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

— Xxxo
nguồn

1

Nếu bạn muốn all_txt_files trở thành một danh sách:all_txt_files = list(filter(lambda x: x.endswith('.txt'), os.listdir(the_dir)))

— Ena

12

Để lấy tất cả tên tệp '.txt' trong thư mục 'dataPath' dưới dạng danh sách theo cách Pythonic:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

— ewalel
nguồn

12

Hãy thử điều này sẽ tìm thấy tất cả các tệp của bạn đệ quy:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

— mayank
nguồn

không có phiên bản đệ quy (sao đôi **:). Chỉ có sẵn trong python 3. Những gì tôi không thích là chdirmột phần. Không cần điều đó.

— Jean-François Fabre

2

tốt, bạn có thể sử dụng thư viện os để tham gia đường dẫn, ví dụ, filepath = os.path.join('wallpaper')và sau đó sử dụng nó như glob.glob(filepath+"**/*.psd", recursive = True)sẽ mang lại kết quả tương tự.

— Mitalee Rao

8

import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res

— mrgloom
nguồn

8

Tôi đã thực hiện một thử nghiệm (Python 3.6.4, W7x64) để xem giải pháp nào là nhanh nhất cho một thư mục, không có thư mục con, để có danh sách các đường dẫn tệp hoàn chỉnh cho các tệp có phần mở rộng cụ thể.

Nói ngắn gọn, đối với nhiệm vụ os.listdir()này là nhanh nhất và nhanh gấp 1,7 lần so với nhiệm vụ tốt nhất tiếp theo: os.walk()(với thời gian nghỉ!), pathlibNhanh gấp 2,7 lần , nhanh hơn os.scandir()3,2 lần và nhanh hơn 3,3 lần glob.
Xin lưu ý rằng những kết quả đó sẽ thay đổi khi bạn cần kết quả đệ quy. Nếu bạn sao chép / dán một phương thức bên dưới, vui lòng thêm .lower () nếu không .EXT sẽ không được tìm thấy khi tìm kiếm .ext.

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

Các kết quả:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274

— người dùng136036
nguồn

Tài liệu Python 3.6.5 nêu: Hàm os.scandir () trả về các mục nhập thư mục cùng với thông tin thuộc tính tệp, cho hiệu năng tốt hơn [so với os.listdir ()] cho nhiều trường hợp sử dụng phổ biến.

— Bill Oldroyd

Tôi đang thiếu phạm vi mở rộng của bài kiểm tra này, bạn đã sử dụng bao nhiêu tệp trong bài kiểm tra này? Làm thế nào để họ so sánh nếu bạn mở rộng số lượng lên / xuống?

— N4ppeL

5

Mã này làm cho cuộc sống của tôi đơn giản hơn.

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

— praba230890
nguồn

5

Sử dụng fnmatch: https://docs.python.org/2/l Library / fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

— yucer
nguồn

5

Để có được một mảng tên tệp ".txt" từ một thư mục có tên là "data" trong cùng thư mục, tôi thường sử dụng dòng mã đơn giản này:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

— Kamen Tsvetkov
nguồn

3

Tôi đề nghị bạn sử dụng fnmatch và phương pháp trên. Theo cách này, bạn có thể tìm thấy bất kỳ điều nào sau đây:

Tên. txt ;
Tên. TXT ;
Tên. Txt

.

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

— Nicolalie
nguồn

3

Đây là một với extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

— Efreeto
nguồn

Không sử dụng với .txt:)

— Efreeto

2

Giải pháp chức năng với các thư mục con:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

— Adam Chrapkowski
nguồn

15

Đây có phải là mã bạn muốn duy trì trong thời gian dài?

— Simeon Visser

2

Trong trường hợp thư mục chứa nhiều tệp hoặc bộ nhớ là một ràng buộc, hãy xem xét sử dụng trình tạo:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

Tùy chọn A: Lặp lại

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

Tùy chọn B: Nhận tất cả

files = [f for f in yield_files_with_extensions('.', '.txt')]

— tashuhka
nguồn

2

Một giải pháp có thể sao chép tương tự như một trong những ghostdog:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

— Martin Thoma
nguồn

1

sử dụng mô-đun Python OS để tìm các tệp có phần mở rộng cụ thể.

ví dụ đơn giản ở đây:

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

— Rajiv Sharma
nguồn

0

Nhiều người dùng đã trả lời với os.walkcâu trả lời, bao gồm tất cả các tệp nhưng cũng có tất cả các thư mục và thư mục con và tệp của họ.

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

Hoặc cho một nơi mà bạn không cần một máy phát điện:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

Nếu bạn định sử dụng kết quả khớp cho thứ khác, bạn có thể muốn đặt danh sách đó thay vì biểu thức trình tạo:

    matches = [f for f in dirfiles if f.endswith(ext)]

— kfsone
nguồn

0

Một phương pháp đơn giản bằng cách sử dụng forvòng lặp:

import os

dir = ["e","x","e"]

p = os.listdir('E:')  #path

for n in range(len(p)):
   name = p[n]
   myfile = [name[-3],name[-2],name[-1]]  #for .txt
   if myfile == dir :
      print(name)
   else:
      print("nops")

Mặc dù điều này có thể được thực hiện tổng quát hơn.

— BoRRis
nguồn

cách rất unpythonic để kiểm tra một phần mở rộng. Không an toàn quá. Nếu tên quá ngắn thì sao? và tại sao sử dụng danh sách các ký tự và không phải chuỗi?

— Jean-François Fabre