Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 38 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,64 +2,65 @@

## Introduction

`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
`pagodo` automates Google searching for potentially vulnerable web pages and applications on the Internet. It replaces
manually performing Google dork searches with a web GUI browser.

There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
There are 2 parts. The first is `ghdb_scraper.py` that retrieves the latest Google dorks and the second portion is
`pagodo.py` that leverages the information gathered by `ghdb_scraper.py`.

The core Google search library now uses the more flexible [yagooglesearch](https://github.com/opsdisk/yagooglesearch)
instead of [googlesearch](https://github.com/MarioVilas/googlesearch). Check out the [yagooglesearch
instead of [googlesearch](https://github.com/MarioVilas/googlesearch). Check out the [yagooglesearch
README](https://github.com/opsdisk/yagooglesearch/blob/master/README.md) for a more in-depth explanation of the library
differences and capabilities.

This version of `pagodo` also supports native HTTP(S) and SOCKS5 application support, so no more wrapping it in a tool
like `proxychains4` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
like `proxychains4` if you need proxy support. You can specify multiple proxies to use in a round-robin fashion by
providing a comma separated string of proxies using the `-p` switch.

## What are Google dorks?

Offensive Security maintains the Google Hacking Database (GHDB) found here:
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
<https://www.exploit-db.com/google-hacking-database>. It is a collection of Google searches, called dorks, that can be
used to find potentially vulnerable boxes or other juicy info that is picked up by Google's search bots.

## Terms and Conditions

The terms and conditions for `pagodo` are the same terms and conditions found in
[yagooglesearch](https://github.com/opsdisk/yagooglesearch#terms-and-conditions).

This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
violate their [Terms of Service](https://policies.google.com/terms). Another Python Google search library had some
This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may
violate their [Terms of Service](https://policies.google.com/terms). Another Python Google search library had some
interesting information/discussion on it:

* [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)
* [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431>)
* Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)
* ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)
- [Original issue](https://github.com/aviaryan/python-gsearch/issues/1)
- [A response](https://github.com/aviaryan/python-gsearch/issues/1#issuecomment-365581431>)
- Author created a separate [Terms and Conditions](https://github.com/aviaryan/python-gsearch/blob/master/T_AND_C.md)
- ...that contained link to this [blog](https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/)

Google's preferred method is to use their [API](https://developers.google.com/custom-search/v1/overview).

## Installation

Scripts are written for Python 3.6+. Clone the git repository and install the requirements.
Scripts are written for Python 3.6+. Clone the git repository and install the requirements.

```bash
git clone https://github.com/opsdisk/pagodo.git
cd pagodo
python3 -m venv .venv # If using a virtual environment.
source .venv/bin/activate # If using a virtual environment.
pip install --upgrade pip setuptools
pip install -r requirements.txt
```

## ghdb_scraper.py

To start off, `pagodo.py` needs a list of all the current Google dorks. The repo contains a `dorks/` directory with the
To start off, `pagodo.py` needs a list of all the current Google dorks. The repo contains a `dorks/` directory with the
current dorks when the `ghdb_scraper.py` was last run. It's advised to run `ghdb_scraper.py` to get the freshest data
before running `pagodo.py`. The `dorks/` directory contains:
before running `pagodo.py`. The `dorks/` directory contains:

* the `all_google_dorks.txt` file which contains all the Google dorks, one per line
* the `all_google_dorks.json` file which is the JSON response from GHDB
* Individual category dorks
- the `all_google_dorks.txt` file which contains all the Google dorks, one per line
- the `all_google_dorks.json` file which is the JSON response from GHDB
- Individual category dorks

Dork categories:

Expand Down Expand Up @@ -119,12 +120,12 @@ dorks["category_dict"].keys()
dorks["category_dict"][1]["category_name"]
```

## <span>pagodo.py</span>
## pagodo.py

### Using <span>pagodo.py</span> as a script
### Using pagodo.py as a script

```bash
python pagodo.py -d example.com -g dorks.txt
python pagodo.py -d example.com -g dorks.txt
```

### Using pagodo as a module
Expand Down Expand Up @@ -195,37 +196,37 @@ site:github.com

### Wait time between Google dork searchers

* `-i` - Specify the **minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
get HTTP 429'd quickly.
* `-x` - Specify the **maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
take a long time.
- `-i` - Specify the **minimum** delay between dork searches, in seconds. Don't make this too small, or your IP will
get HTTP 429'd quickly.
- `-x` - Specify the **maximum** delay between dork searches, in seconds. Don't make this too big or the searches will
take a long time.

The values provided by `-i` and `-x` are used to generate a list of 20 randomly wait times, that are randomly selected
between each different Google dork search.

### Number of results to return

`-m` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
`-m` - The total max search results to return per Google dork. Each Google search request can pull back at most 100
results at a time, so if you pick `-m 500`, 5 separate search queries will have to be made for each Google dork search,
which will increase the amount of time to complete.

### Save Output

`-o [optional/path/to/results.json]` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
`-o [optional/path/to/results.json]` - Save output to a JSON file. If you do not specify a filename, a datetimestamped
one will be generated.

`-s [optional/path/to/results.txt]` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
`-s [optional/path/to/results.txt]` - Save URLs to a text file. If you do not specify a filename, a datetimestamped one
will be generated.

### Save logs

`--log [optional/path/to/file.log]` - Save logs to the specified file. If you do not specify a filename, the default
`--log [optional/path/to/file.log]` - Save logs to the specified file. If you do not specify a filename, the default
file `pagodo.py.log` at the root of pagodo directory will be used.

## Google is blocking me!

Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
Performing 7300+ search requests to Google as fast as possible will simply not work. Google will rightfully detect it
as a bot and block your IP for a set period of time. One solution is to use a bank of HTTP(S)/SOCKS proxies and pass
them to `pagodo`

### Native proxy support
Expand All @@ -236,7 +237,7 @@ Pass a comma separated string of proxies to `pagodo` using the `-p` switch.
python pagodo.py -g dorks.txt -p http://myproxy:8080,socks5h://127.0.0.1:9050,socks5h://127.0.0.1:9051
```

You could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs. The proxies passed
You could even decrease the `-i` and `-x` values because you will be leveraging different proxy IPs. The proxies passed
to `pagodo` are selected by round robin.

### proxychains4 support
Expand All @@ -249,7 +250,7 @@ Install `proxychains4`
apt install proxychains4 -y
```

Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
Edit the `/etc/proxychains4.conf` configuration file to round robin the look ups through different proxy servers. In
the example below, 2 different dynamic socks proxies have been set up with different local listening ports (9050 and
9051).

Expand All @@ -269,7 +270,7 @@ socks4 127.0.0.1 9050
socks4 127.0.0.1 9051
```

Throw `proxychains4` in front of the `pagodo.py` script and each *request* lookup will go through a different proxy (and
Throw `proxychains4` in front of the `pagodo.py` script and each _request_ lookup will go through a different proxy (and
thus source from a different IP).

```bash
Expand All @@ -278,10 +279,10 @@ proxychains4 python pagodo.py -g dorks/all_google_dorks.txt -o [optional/path/to

Note that this may not appear natural to Google if you:

1) Simulate "browsing" to `google.com` from IP #1
2) Make the first search query from IP #2
3) Simulate clicking "Next" to make the second search query from IP #3
4) Simulate clicking "Next to make the third search query from IP #1
1. Simulate "browsing" to `google.com` from IP #1
2. Make the first search query from IP #2
3. Simulate clicking "Next" to make the second search query from IP #3
4. Simulate clicking "Next to make the third search query from IP #1

For that reason, using the built in `-p` proxy support is preferred because, as stated in the `yagooglesearch`
documentation, the "provided proxy is used for the entire life cycle of the search to make it look more human, instead
Expand Down
2 changes: 1 addition & 1 deletion ghdb_scraper.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# Custom Python libraries.


__version__ = "1.2.1"
__version__ = "1.3.0"


"""
Expand Down
2 changes: 1 addition & 1 deletion pagodo.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# Custom Python libraries.


__version__ = "2.6.4"
__version__ = "2.7.0"


class Pagodo:
Expand Down
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
beautifulsoup4==4.13.4
requests==2.32.3
beautifulsoup4==4.13.5
requests==2.32.5
yagooglesearch==1.10.0
Loading