Skip to content

Latest commit

Β 

History

History
111 lines (90 loc) Β· 3.5 KB

File metadata and controls

111 lines (90 loc) Β· 3.5 KB

How I optimised path searching operations

Our whole system is dependent on searching paths and directories of whether we need to watch the path or to ignore it.

We are not setting up watchers for individual files rather for a whole directory at each levels. But individual files can be ignored

Where we are actually using path

  1. Adding paths to watch
  2. ignoring paths during watching Isn't that simple storing lists and sets?
core_path = WatcherPath(
    "/home/arnab/Desktop/system-implementations/py-mon/src/core", ignore=["cache.txt","__pycache__",".venv"]
)
watcher = Watcher().add_path(core_path, test_path).add_event(RELOAD_EVENTS)
  • We can definitely do this. For each paths we can search for all the ignore paths and then determine whether that is ignored or not. A typical O(n*m) solution

Solving this with trie data structure

Current Check (O(N * M))

Currently, for each path in os.walk(), we iterate over all ignore paths and check if it starts with an ignored directory.

for dirpath, dirnames, _ in os.walk(base_path):
    for ignore_path in ignore_list:  # Loop over all ignore paths
        if dirpath.startswith(ignore_path):  # Check if it's ignored
            print(f"Ignored: {dirpath}")

Issue

For N directories and M ignore paths, we check every directory against every ignore path β†’ O(N * M) time complexity.


πŸš€ Trie-Based Approach (O(N * L))

How Trie Improves This?

A Trie organizes ignored paths hierarchically, allowing fast prefix lookups instead of scanning all ignore paths.

  • Instead of checking each path against all ignored paths,
  • We traverse the Trie once for each path, reducing comparisons.

πŸ“Œ Example

πŸ“‚ Given Directory Structure

/root/
 β”œβ”€β”€ logs/
 β”‚   β”œβ”€β”€ error.log
 β”‚   └── access.log
 β”œβ”€β”€ src/
 β”‚   β”œβ”€β”€ test/
 β”‚   β”‚   β”œβ”€β”€ test1.py
 β”‚   β”‚   └── test2.py
 β”‚   └── temp/
 β”‚       └── temp.txt
 └── data/

Ignore Paths

/root/logs
/root/src/test
/root/src/temp

Current Approach (O(N * M))

For every directory, we compare it against each ignore path:

Checking /root/logs  -> Compare with logs (Ignored βœ…)
Checking /root/src/test -> Compare with logs ❌, test βœ… (Ignored βœ…)
Checking /root/src/temp -> Compare with logs ❌, test ❌, temp βœ… (Ignored βœ…)
Checking /root/data  -> Compare with logs ❌, test ❌, temp ❌ (Not Ignored βœ…)

Every directory is checked against all ignored paths!


Trie-Based Approach (O(N * L))

πŸ“‚ Trie Structure

(root)
 β”œβ”€β”€ logs (*)
 └── src
     β”œβ”€β”€ test (*)
     └── temp (*)

πŸ” Traversal Using Trie

Instead of checking each directory against all ignore paths, we traverse:

Checking /root/logs -> logs is in Trie (Ignored βœ…)
Checking /root/src/test -> src β†’ test is in Trie (Ignored βœ…)
Checking /root/src/temp -> src β†’ temp is in Trie (Ignored βœ…)
Checking /root/data -> Not in Trie (Not Ignored βœ…)

βœ… Each directory is checked in O(L) time (L = path depth), avoiding multiple comparisons.


Complexity Comparison

Approach Comparisons Per Directory Worst-Case Complexity
Brute Force M (All ignore paths) O(N * M)
Trie Lookup L (Path depth) O(N * L)

πŸš€ Trie reduces comparisons drastically, making lookups much faster!