将大型目录树中的所有文件大小归零(删除文件内容,保留文件)

如何删除大型目录树(10 GB,1K文件)的内容(零文件大小),但保留整个树结构,文件名,扩展名。 (如果我可以保留原来的最后写入时间[上次内容修改时间]这是一个奖金)。

我已经看到了几个关于个人文件的build议,但是却无法弄清楚如何让整个CWD工作。

def deleteContent(fName): with open(fName, "w"): pass 

Solutions Collecting From Web of "将大型目录树中的所有文件大小归零(删除文件内容,保留文件)"

以管理员身份运行后,应将所有内容重置为空文件并保留文件的最后写入时间

 gci c:\temp\test\*.* -recurse | % { $LastWriteTime = $PSItem.LastWriteTime clear-content $PSItem; $PSItem.LastWriteTime = $LastWriteTime } 

os.walk()返回所有目录作为以下元组列表:

 (directory, list of folders in the directory, list of files in the directory) 

当我们将你的代码和os.walk()结合起来的时候:

 import os for tuple in os.walk("top_directory"): files = tuple[2] dir = tuple[0] for file in files: with open(os.path.join(dir, file), "w"): pass 

所有好的答案,但我可以看到提供的答案两个更多的挑战:

遍历目录树时,可能需要限制其深度,这样可以保护您免受超大型目录树的影响。 其次,Windows在文件名和路径中有256个字符的限制(由Explorer强制执行)。 虽然这个限制会产生各种操作系统错误,但有一个解决方法。

让我们从最大长度的文件路径的解决方法开始,你可以像下面这样做一个解决方法:

 import os import platform def full_path_windows(filepath): """ Filenames and paths have a default limitation of 256 characters in Windows. By inserting '\\\\?\\' at the start of the path it removes this limitation. This function inserts '\\\\?\\' at the start of the path, on Windows only Only if the path starts with '<driveletter>:\\' eg 'C:\\'. It will also normalise the characters/case of the path. """ if platform.system() == 'Windows': if filepath[1:3] == ':\\': return u'\\\\?\\' + os.path.normcase(filepath) return os.path.normcase(filepath) 

有提到的写保护,或使用中的文件,或任何其他可能导致不能写入文件的条件,这可以通过以下检查(无需实际写入):

 import os def write_access(filepath): """ Usage: read_access(filepath) This function returns True if Write Access is obtained This function returns False if Write Access is not obtained This function returns False if the filepath does not exists filepath = must be an existing file """ if os.path.isfile(filepath): return os.access(filepath, os.W_OK) return False 

要设置最小深度或最大深度,可以这样做:

 import os def get_all_files(rootdir, mindepth = 1, maxdepth = float('inf')): """ Usage: get_all_files(rootdir, mindepth = 1, maxdepth = float('inf')) This returns a list of all files of a directory, including all files in subdirectories. Full paths are returned. WARNING: this may create a very large list if many files exists in the directory and subdirectories. Make sure you set the maxdepth appropriately. rootdir = existing directory to start mindepth = int: the level to start, 1 is start at root dir, 2 is start at the sub direcories of the root dir, and-so-on-so-forth. maxdepth = int: the level which to report to. Example, if you only want in the files of the sub directories of the root dir, set mindepth = 2 and maxdepth = 2. If you only want the files of the root dir itself, set mindepth = 1 and maxdepth = 1 """ file_paths = [] root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1 for dirpath, dirs, files in os.walk(rootdir): depth = dirpath.count(os.path.sep) - root_depth if mindepth <= depth <= maxdepth: for filename in files: file_paths.append(os.path.join(dirpath, filename)) elif depth > maxdepth: del dirs[:] return file_paths 

现在把上面的代码放在一个函数中,这应该给你一个想法:

 import os def clear_all_files_content(rootdir, mindepth = 1, maxdepth = float('inf')): not_cleared = [] root_depth = rootdir.rstrip(os.path.sep).count(os.path.sep) - 1 for dirpath, dirs, files in os.walk(rootdir): depth = dirpath.count(os.path.sep) - root_depth if mindepth <= depth <= maxdepth: for filename in files: filename = os.path.join(dirpath, filename) if filename[1:3] == ':\\': filename = u'\\\\?\\' + os.path.normcase(filename) if (os.path.isfile(filename) and os.access(filename, os.W_OK)): with open(filename, 'w'): pass else: not_cleared.append(filename) elif depth > maxdepth: del dirs[:] return not_cleared 

这不会保持“上次写入时间”。

它将返回list not_cleared ,您可以检查遇到写访问问题的文件。