Date and Time: Jun 10, 2026
Link: https://leetcode.com/problems/find-duplicate-file-in-system/
Given a list of directory info including the directory path, and all the files with contents in this directory, find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fk.txt(fk_content)"
It means there are k files in the directory "root/d1/d2/.../dm".
Output a list of groups of duplicate file paths. For a group of duplicate files, the full path of a file is "root/d1/d2/.../dm/f_name.txt".
Example 1:
Input: paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"]
Output: [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Example 2:
Input: paths = ["root/a 1.txt(abcd)","root/c 3.txt(efgh)"]
Output: []
-
1 <= paths.length <= 2 * 10^4 -
1 <= paths[i].length <= 3000 -
1 <= sum(paths[i].length) <= 5 * 10^5 -
paths[i]consist of English letters, digits,'/','.','(',')', and' '. -
You may assume no files or directories share the same name in the same directory.
-
You may assume each given directory info represents a unique directory. A single blank space separates the directory path and file info.
-
For each path string, split by space: first token is the directory, remaining tokens are
"filename(content)"entries. -
For each file token, parse out the filename (before
() and content (between(and)). Map{content: ["directory/filename", ...]}in a hashmap. -
Return all hashmap values where the list has more than one entry.
class Solution:
def findDuplicate(self, paths: List[str]) -> List[List[str]]:
# Q: File path structure: [root/d file(content)...], [root/e file(content)...]
# S: 1. Hashmap to save a path with [files]
# 2. Create hashmap to save every {file: [path...]}
# 3. Construct ans for each duplicated file from 2. hashmap
# TC: O(n*k), n is total files, k is average length of file, SC: O(n)
hashmap = collections.defaultdict(list) # {content: ["directory/file"]}
res = [] # [[content's list of files], ..., ]
# 1. Process data
# 2. Create hashmap{content: ["directory/fileName", ... ,]]
for path in paths:
# Process the string path
path = path.split()
direct = path[0]
# Process file from fileName
for file in path[1:]:
content = ""
fileName = ""
l, r = 0, len(file)-1
while l < r:
# Find content
while file[l] != '(':
fileName += file[l]
l += 1
l += 1
while file[l] != ')':
# Get content
content += file[l]
l += 1
# Save file into hashmap
hashmap[content].append(direct + "/" + fileName)
# 3. Loop over hashmap and find if len(lists) > 1, append into answer
for content, path in hashmap.items():
if len(path) > 1:
res.append(path)
return resTime Complexity: n = total number of files across all paths; each file's content is parsed once.
Space Complexity: