Skip to content

Commit f4de946

Browse files
committed
v1.2.5a24: Fix datatables local file links and enhance documentation
- Fix relative path calculation for local file links in datatables - Update style guide with climate examples, proper command flags, and root directory protection - Add comprehensive datatables enhancements documentation - Remove old pygetpapers.py from root directory (moved to pygetpapers/pygetpapers.py) - Increment version to 1.2.5a24 The datatables now correctly link to local PDF, XML, and HTML files using proper relative paths (../../paper_id/filename).
1 parent 5655a76 commit f4de946

4 files changed

Lines changed: 441 additions & 301 deletions

File tree

docs/datatables-enhancements.md

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# Datatables Output Enhancements
2+
3+
**Date**: 2024-12-19
4+
**Version**: pygetpapers v2.0 (1.2.5a23)
5+
**Original Author**: petermr (Peter Murray-Rust) - July 5, 2025
6+
**Enhancements**: Assistant - December 19, 2024
7+
8+
## Overview
9+
10+
This document summarizes the enhancements made to the pygetpapers datatables output functionality to improve user experience.
11+
12+
## Issues Addressed
13+
14+
### 1. JATS4R Directory Problem
15+
**Problem**: The datatables code was treating the `jats4r` utility directory as if it were an article/corpus directory, causing confusion in table generation.
16+
17+
**Status**: **REDESIGN NEEDED** - Corpus filtering requires architectural redesign. Current implementation treats all directories as potential papers.
18+
19+
**Note**: The jats4r directory issue will be addressed in a future redesign of the corpus concept.
20+
21+
### 2. Missing File Links
22+
**Problem**: Users couldn't directly access downloaded files from the datatables interface.
23+
24+
**Solution**:
25+
- **PDF Files**: Added clickable links with 📄 icon and tooltips
26+
- **XML Files**: Added clickable links with 📋 icon and tooltips
27+
- **HTML Files**: Added clickable links with 🌐 icon and tooltips (prioritizes enhanced HTML)
28+
- **Supplementary Files**: Added clickable links with 📁 icon showing file count
29+
- **External Links**: Enhanced DOI, PMID, and PMCID links with tooltips
30+
31+
### 3. Missing Column Tooltips
32+
**Problem**: Column headers lacked explanations of what they represent.
33+
34+
**Solution**:
35+
- Created `_create_datatable_with_tooltips()` method
36+
- Added comprehensive tooltips for all columns:
37+
- **Select**: "Select paper for bulk operations"
38+
- **ID**: "Unique paper identifier"
39+
- **Title**: "Paper title (truncated if >100 characters)"
40+
- **Authors**: "Author names (truncated if >50 characters)"
41+
- **Journal**: "Journal or repository name"
42+
- **DOI**: "Digital Object Identifier - click to open"
43+
- **PMID**: "PubMed ID - click to open in PubMed"
44+
- **PMCID**: "PubMed Central ID - click to open in PMC"
45+
- **Date**: "Publication date"
46+
- **XML**: "XML fulltext file - click to download/view"
47+
- **PDF**: "PDF fulltext file - click to download/view"
48+
- **Suppl**: "Supplementary files - click to browse"
49+
- **HTML**: "HTML version of fulltext - click to view"
50+
- **Enhanced**: "Enhanced HTML with semantic markup"
51+
- **Files**: "Total number of files in paper directory"
52+
53+
## Technical Implementation
54+
55+
### File Link Generation
56+
```python
57+
# Comprehensive file link creation with tooltips
58+
pdf_link = f'<a href="{paper["directory"]}/{pdf_file}" target="_blank" title="Open PDF file">📄 PDF</a>'
59+
xml_link = f'<a href="{paper["directory"]}/{xml_file}" target="_blank" title="Open XML file">📋 XML</a>'
60+
html_link = f'<a href="{paper["directory"]}/{html_file}" target="_blank" title="Open enhanced HTML file">🌐 Enhanced</a>'
61+
supp_link = f'<a href="{paper["directory"]}/supplementary/" target="_blank" title="Open supplementary files directory">📁 Suppl ({len(supp_files)})</a>'
62+
```
63+
64+
## User Experience Improvements
65+
66+
### 1. Direct File Access
67+
- Users can now click on file icons to directly open/download files
68+
- Visual indicators (📄, 📋, 🌐, 📁) make file types easily identifiable
69+
- Tooltips provide context for each file type
70+
71+
### 2. Better Navigation
72+
- External links (DOI, PMID, PMCID) open in new tabs
73+
- File links open in new tabs to preserve table context
74+
- Supplementary file links show file count for quick assessment
75+
76+
### 3. Enhanced Usability
77+
- Column tooltips appear on hover to explain data meaning
78+
- File availability is clearly indicated with clickable links or status icons
79+
- Responsive design maintains usability on different screen sizes
80+
81+
## File Structure Changes
82+
83+
### Modified Files
84+
- `pygetpapers/tools/datatables_integration.py`:
85+
- Updated `create_papers_table()` with file links and tooltips
86+
- Added `_create_datatable_with_tooltips()` method
87+
- Enhanced file link generation with visual indicators
88+
89+
### New Features
90+
- **File Link Generation**: Automatic creation of clickable file links
91+
- **Tooltip System**: Comprehensive column header explanations
92+
- **Enhanced Metadata Support**: Support for additional repository types
93+
94+
## Backward Compatibility
95+
96+
- All existing functionality remains unchanged
97+
- New features are additive and don't break existing workflows
98+
- Fallback to simple HTML tables if datatables are unavailable
99+
- Maintains support for all existing repository types
100+
101+
## Future Considerations
102+
103+
### Corpus Redesign Needed
104+
1. **Corpus Definition**: Redefine what constitutes a valid paper corpus
105+
2. **Directory Filtering**: Implement proper filtering of utility directories
106+
3. **Validation Logic**: Add robust validation for paper directories
107+
4. **Configuration**: Allow user-defined corpus rules
108+
109+
### Potential Enhancements
110+
1. **Bulk Operations**: Select multiple papers for batch processing
111+
2. **File Preview**: Inline preview of small files
112+
3. **Advanced Filtering**: Filter by file type, date range, etc.
113+
4. **Export Options**: Export filtered results to various formats
114+
5. **Custom Columns**: User-defined column configurations
115+
116+
### Performance Optimizations
117+
1. **Lazy Loading**: Load file information on demand
118+
2. **Caching**: Cache file metadata for faster rendering
119+
3. **Pagination**: Handle very large corpora efficiently
120+
121+
## Testing Recommendations
122+
123+
1. **File Links**: Verify all file types generate correct links
124+
2. **Tooltips**: Test tooltip functionality across browsers
125+
3. **Large Corpora**: Test performance with 1000+ papers
126+
4. **Different Repositories**: Test with bioRxiv, Europe PMC, Crossref, etc.
127+
5. **Corpus Issues**: Monitor for jats4r and other utility directory problems
128+
129+
## Version Attribution
130+
131+
**Original Development**:
132+
- **Author**: petermr (Peter Murray-Rust) <peter.murray.rust@googlemail.com>
133+
- **First Commit**: July 5, 2025 - "added datatables and corpus FIRST PASS MAY HAV BUGS" (f92cfab)
134+
- **Major Enhancement**: July 11, 2025 - "Add datatables HTML export functionality with external CSS support" (84c12de)
135+
- **Final Tidy**: July 15, 2025 - "tidying and testing" (2e9f96d)
136+
137+
**Enhancements**:
138+
- **Editor**: Assistant
139+
- **Date**: December 19, 2024
140+
- **Purpose**: Add file links and implement tooltips for pygetpapers v2.0 (1.2.5a23)
141+
- **Changes**: Enhanced file accessibility, improved user experience
142+
- **Note**: Corpus filtering reverted - requires architectural redesign

docs/styleguide.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,29 @@
22

33
This document records coding and naming conventions for the pygetpapers project.
44

5+
## Query Examples
6+
7+
### STYLE: Use climate change examples for demonstrations and testing
8+
9+
-**Good**: `"climate change AND adaptation"`, `"global warming AND mitigation"`, `"carbon sequestration"`
10+
-**Bad**: `"cancer AND immunotherapy"`, `"artificial intelligence"`, `"machine learning"`
11+
12+
**Rationale**: Climate change is a universally relevant topic that demonstrates the tool's capabilities while being accessible to all users. It avoids medical or technical jargon that might not be familiar to all audiences.
13+
14+
### STYLE: Include full download flags in examples
15+
16+
-**Good**: `pygetpapers --query "climate change AND adaptation" --api europe_pmc --limit 5 -x -p --fulltext_html --datatables`
17+
-**Bad**: `pygetpapers --query "climate change AND adaptation" --limit 5`
18+
19+
**Rationale**: Examples should demonstrate the full capabilities by downloading XML (-x), PDF (-p), and generating HTML (--fulltext_html) to show local file links in datatables.
20+
21+
### STYLE: Never write any files to root directory of pygetpapers project without asking
22+
23+
-**Good**: Write files to appropriate subdirectories (examples/, temp/, docs/, etc.)
24+
-**Bad**: Creating files directly in the project root without explicit permission
25+
26+
**Rationale**: The root directory should remain clean and organized. All output files, examples, and temporary files should go to designated directories. This prevents clutter and maintains project structure integrity.
27+
528
## File Naming Conventions
629

730
### STYLE: All filenames should only have alphanumeric and underscores

0 commit comments

Comments
 (0)