Leetcode/questions/609.Find_Duplicate_File_in_System(Medium).md at main · JC01111/Leetcode

609. Find Duplicate File in System (Medium)

Date and Time: Jun 10, 2026

Link: https://leetcode.com/problems/find-duplicate-file-in-system/

Question:

Given a list of directory info including the directory path, and all the files with contents in this directory, find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have the same content.

A single directory info string in the input list has the following format: "root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fk.txt(fk_content)"

It means there are k files in the directory "root/d1/d2/.../dm".

Output a list of groups of duplicate file paths. For a group of duplicate files, the full path of a file is "root/d1/d2/.../dm/f_name.txt".

Example 1:

Input: paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"]
Output: [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Example 2:

Input: paths = ["root/a 1.txt(abcd)","root/c 3.txt(efgh)"]
Output: []

Constraints:

1 <= paths.length <= 2 * 10^4
1 <= paths[i].length <= 3000
1 <= sum(paths[i].length) <= 5 * 10^5
paths[i] consist of English letters, digits, '/', '.', '(', ')', and ' '.
You may assume no files or directories share the same name in the same directory.
You may assume each given directory info represents a unique directory. A single blank space separates the directory path and file info.

Walk-through:

For each path string, split by space: first token is the directory, remaining tokens are "filename(content)" entries.
For each file token, parse out the filename (before () and content (between ( and )). Map {content: ["directory/filename", ...]} in a hashmap.
Return all hashmap values where the list has more than one entry.

Python Solution:

class Solution:
    def findDuplicate(self, paths: List[str]) -> List[List[str]]:
        # Q: File path structure: [root/d file(content)...], [root/e file(content)...]
        # S: 1. Hashmap to save a path with [files]
        # 2. Create hashmap to save every {file: [path...]}
        # 3. Construct ans for each duplicated file from 2. hashmap

        # TC: O(n*k), n is total files, k is average length of file, SC: O(n)
        hashmap = collections.defaultdict(list)     # {content: ["directory/file"]}
        res = []    # [[content's list of files], ..., ]
        # 1. Process data
        # 2. Create hashmap{content: ["directory/fileName", ... ,]]
        for path in paths:
            # Process the string path
            path = path.split()
            direct = path[0]
            # Process file from fileName
            for file in path[1:]:
                content = ""
                fileName = ""
                l, r = 0, len(file)-1
                while l < r:
                    # Find content
                    while file[l] != '(':
                        fileName += file[l]
                        l += 1
                    l += 1
                    while file[l] != ')':
                        # Get content
                        content += file[l]
                        l += 1
                # Save file into hashmap
                hashmap[content].append(direct + "/" + fileName)

        # 3. Loop over hashmap and find if len(lists) > 1, append into answer
        for content, path in hashmap.items():
            if len(path) > 1:
                res.append(path)
        return res

Time Complexity: $O(n * k)$, n = total number of files across all paths; each file's content is parsed once.
Space Complexity: $O(n)$, for the hashmap storing all file paths grouped by content.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

609. Find Duplicate File in System (Medium)

Question:

Constraints:

Walk-through:

Python Solution:

FilesExpand file tree

609.Find_Duplicate_File_in_System(Medium).md

Latest commit

History

609.Find_Duplicate_File_in_System(Medium).md

File metadata and controls

609. Find Duplicate File in System (Medium)

Question:

Constraints:

Walk-through:

Python Solution: