Skip to content

Commit ff62277

Browse files
ChinmayHegde24mneethiraj
authored andcommitted
RANGER-5532: Ranger Performance Analyser: Python 3.11 Support and Enhancements (#900)
* RANGER-5532:Python 3.11 support to performance analyser tool * RANGER-5532: Updated Doc Readme.md (cherry picked from commit fd5ec7a)
1 parent 902964c commit ff62277

4 files changed

Lines changed: 42 additions & 16 deletions

File tree

ranger-tools/src/main/python/README.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ under the License.
2222
Run the below command to generate pydocs for the package. Code base has doc strings describing the methods and classes from which the document is generated.
2323

2424
```bash
25-
> python -m pydoc -b
25+
> python3 -m pydoc -b
2626
```
2727

2828
Other README files can be found in the following directory:
@@ -41,11 +41,7 @@ or
4141

4242
## Client side Installation
4343

44-
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install requirements for running the performance tests.
45-
Ensure right path to requirements.txt is given.
46-
4744
```bash
48-
> pip install -r requirements.txt
4945

5046
> apt-get install sshpass
5147
or
@@ -54,11 +50,21 @@ or
5450

5551

5652
## Usage
53+
5754
```cd``` into ```python``` directory before executing below commands
5855

56+
Ensure you have ```Python 3.10 or Python 3.11``` installed.
57+
It is recommended to create a virtual environment using this version and work inside it.
58+
59+
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install requirements for running the performance tests.
60+
61+
```bash
62+
> pip install -r requirements.txt
63+
```
64+
5965
First time usage or to reset the config files:
6066
```bash
61-
> python setup_performance_analyzer.py
67+
> python3 setup_performance_analyzer.py
6268
```
6369

6470
Subsequent usage:
@@ -77,7 +83,7 @@ For single api testing (Command line arguments override config file values)
7783
usage:
7884

7985
```bash
80-
> python performance_analyzer.py --ranger_url <ranger_url> --calls <number of times to call api> --api <name of function of python client corresponding to api> --username <Auth username> --password <Auth password> --client_ip <client ip address> --ssh_host <ranger host to connect for ssh> --ssh_user <Server user e.g. root> --ssh_password <Server password>
86+
> python3 performance_analyzer.py --ranger_url <ranger_url> --calls <number of times to call api> --api <name of function of python client corresponding to api> --username <Auth username> --password <Auth password> --client_ip <client ip address> --ssh_host <ranger host to connect for ssh> --ssh_user <Server user e.g. root> --ssh_password <Server password>
8187
```
8288

8389
Example command:
@@ -92,4 +98,4 @@ System metrics on server side are collected using [vmstat](https://phoenixnap.co
9298
## Warnings
9399
Ensure sudo/root privileges for the user on the server side for vmstat command.
94100

95-
Ensure VPN is enables and client can communicate with the server. Else, in some cases stale values from previous successful run of the tool may be presented
101+
Ensure VPN is enabled and client can communicate with the server. Else, in some cases stale values from previous successful run of the tool may be presented.

ranger-tools/src/main/python/performance_analyzer.py

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,16 @@ def performance_analyzer_main(argv_dict):
144144

145145
with open("performance_report.csv", "w") as f:
146146
aligned_df.to_csv(f, index=False)
147+
access_df['latency'] = pd.to_numeric(access_df['latency'], errors='coerce')
148+
avg_latency_df = access_df.groupby('type')['latency'].agg(
149+
count='count',
150+
avg_latency_ms='mean',
151+
min_latency_ms='min',
152+
max_latency_ms='max',
153+
median_latency_ms='median'
154+
).reset_index().round(2)
155+
f.write("\n\nAverage Latency by API Type\n")
156+
avg_latency_df.to_csv(f, index=False)
147157

148158
with open("performance_report.html", "w") as f:
149159
cm = sns.light_palette("red", as_cmap=True)
@@ -154,8 +164,8 @@ def performance_analyzer_main(argv_dict):
154164
access_logs_timestamp_col_name='time', merge=False)
155165
print(aligned_df.to_string())
156166

157-
statistics_df_access = aligned_df.describe()
158-
statistics_df_system = system_df.describe()
167+
statistics_df_access = aligned_df.describe().add_prefix("access__")
168+
statistics_df_system = system_df.describe().add_prefix("system__")
159169

160170
statistics_df = pd.concat([statistics_df_access, statistics_df_system], axis=1)
161171
df_utils.rename_rows(statistics_df, {"25%": "25th_percentile", "50%": "median", "75%": "75th_percentile"})
@@ -179,6 +189,16 @@ def performance_analyzer_main(argv_dict):
179189

180190
with open(perf_globals.OUTPUT_DIR+"performance_report.csv", "w") as f:
181191
aligned_df.to_csv(f, index=False)
192+
access_df['latency'] = pd.to_numeric(access_df['latency'], errors='coerce')
193+
avg_latency_df = access_df.groupby('type')['latency'].agg(
194+
count='count',
195+
avg_latency_ms='mean',
196+
min_latency_ms='min',
197+
max_latency_ms='max',
198+
median_latency_ms='median'
199+
).reset_index().round(2)
200+
f.write("\n\nAverage Latency by API Type\n")
201+
avg_latency_df.to_csv(f, index=False)
182202

183203
with open(perf_globals.OUTPUT_DIR+"performance_report.html", "w") as f:
184204
cm = sns.light_palette("red", as_cmap=True)

ranger-tools/src/main/python/ranger_performance_tool/ranger_perf_utils/dataframe_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ def truncate_dataframe(self, data, start_time, end_time, timestamp_col_name='tim
5050
:param timestamp_col_name: Column name of the timestamp column
5151
:return: Pandas dataframe with truncated data
5252
"""
53-
mask = (pd.to_datetime(data[timestamp_col_name]) >= pd.to_datetime(start_time, infer_datetime_format=True)) & (
54-
pd.to_datetime(data[timestamp_col_name]) <= pd.to_datetime(end_time, infer_datetime_format=True))
53+
mask = (pd.to_datetime(data[timestamp_col_name]) >= pd.to_datetime(start_time)) & (
54+
pd.to_datetime(data[timestamp_col_name]) <= pd.to_datetime(end_time))
5555
return data.loc[mask]
5656

5757
def align_dataframes(self, system_logs_df, access_logs_df, system_logs_timestamp_col_name = 'time',

ranger-tools/src/main/python/requirements.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,16 @@ idna==3.7
77
Jinja2==3.1.6
88
kiwisolver==1.4.4
99
MarkupSafe==2.1.1
10-
matplotlib==3.5.3
11-
numpy==1.23.1
10+
matplotlib==3.10.8
11+
numpy==1.26.0
1212
packaging==21.3
13-
pandas==1.4.3
13+
pandas==2.1.0
1414
Pillow==10.4.0
1515
pyparsing==3.0.9
1616
python-dateutil==2.8.2
1717
pytz==2022.2
1818
requests==2.32.5
1919
scipy==1.13.0
20-
seaborn==0.11.2
20+
seaborn==0.13.2
2121
six==1.16.0
2222
urllib3==2.6.3

0 commit comments

Comments
 (0)