Skip to content

Latest commit

 

History

History
103 lines (76 loc) · 4.78 KB

File metadata and controls

103 lines (76 loc) · 4.78 KB

609. Find Duplicate File in System (Medium)

Date and Time: Jun 10, 2026

Link: https://leetcode.com/problems/find-duplicate-file-in-system/


Question:

Given a list of directory info including the directory path, and all the files with contents in this directory, find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have the same content.

A single directory info string in the input list has the following format: "root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fk.txt(fk_content)"

It means there are k files in the directory "root/d1/d2/.../dm".

Output a list of groups of duplicate file paths. For a group of duplicate files, the full path of a file is "root/d1/d2/.../dm/f_name.txt".


Example 1:

Input: paths = ["root/a 1.txt(abcd) 2.txt(efgh)","root/c 3.txt(abcd)","root/c/d 4.txt(efgh)","root 4.txt(efgh)"]
Output: [["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Example 2:

Input: paths = ["root/a 1.txt(abcd)","root/c 3.txt(efgh)"]
Output: []


Constraints:

  • 1 <= paths.length <= 2 * 10^4

  • 1 <= paths[i].length <= 3000

  • 1 <= sum(paths[i].length) <= 5 * 10^5

  • paths[i] consist of English letters, digits, '/', '.', '(', ')', and ' '.

  • You may assume no files or directories share the same name in the same directory.

  • You may assume each given directory info represents a unique directory. A single blank space separates the directory path and file info.


Walk-through:

  1. For each path string, split by space: first token is the directory, remaining tokens are "filename(content)" entries.

  2. For each file token, parse out the filename (before () and content (between ( and )). Map {content: ["directory/filename", ...]} in a hashmap.

  3. Return all hashmap values where the list has more than one entry.


Python Solution:

class Solution:
    def findDuplicate(self, paths: List[str]) -> List[List[str]]:
        # Q: File path structure: [root/d file(content)...], [root/e file(content)...]
        # S: 1. Hashmap to save a path with [files]
        # 2. Create hashmap to save every {file: [path...]}
        # 3. Construct ans for each duplicated file from 2. hashmap

        # TC: O(n*k), n is total files, k is average length of file, SC: O(n)
        hashmap = collections.defaultdict(list)     # {content: ["directory/file"]}
        res = []    # [[content's list of files], ..., ]
        # 1. Process data
        # 2. Create hashmap{content: ["directory/fileName", ... ,]]
        for path in paths:
            # Process the string path
            path = path.split()
            direct = path[0]
            # Process file from fileName
            for file in path[1:]:
                content = ""
                fileName = ""
                l, r = 0, len(file)-1
                while l < r:
                    # Find content
                    while file[l] != '(':
                        fileName += file[l]
                        l += 1
                    l += 1
                    while file[l] != ')':
                        # Get content
                        content += file[l]
                        l += 1
                # Save file into hashmap
                hashmap[content].append(direct + "/" + fileName)

        # 3. Loop over hashmap and find if len(lists) > 1, append into answer
        for content, path in hashmap.items():
            if len(path) > 1:
                res.append(path)
        return res

Time Complexity: $O(n * k)$, n = total number of files across all paths; each file's content is parsed once.
Space Complexity: $O(n)$, for the hashmap storing all file paths grouped by content.


CC BY-NC-SABY: credit must be given to the creatorNC: Only noncommercial uses of the work are permittedSA: Adaptations must be shared under the same terms