|
| 1 | +# Datatables Output Enhancements |
| 2 | + |
| 3 | +**Date**: 2024-12-19 |
| 4 | +**Version**: pygetpapers v2.0 (1.2.5a23) |
| 5 | +**Original Author**: petermr (Peter Murray-Rust) - July 5, 2025 |
| 6 | +**Enhancements**: Assistant - December 19, 2024 |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +This document summarizes the enhancements made to the pygetpapers datatables output functionality to improve user experience. |
| 11 | + |
| 12 | +## Issues Addressed |
| 13 | + |
| 14 | +### 1. JATS4R Directory Problem |
| 15 | +**Problem**: The datatables code was treating the `jats4r` utility directory as if it were an article/corpus directory, causing confusion in table generation. |
| 16 | + |
| 17 | +**Status**: **REDESIGN NEEDED** - Corpus filtering requires architectural redesign. Current implementation treats all directories as potential papers. |
| 18 | + |
| 19 | +**Note**: The jats4r directory issue will be addressed in a future redesign of the corpus concept. |
| 20 | + |
| 21 | +### 2. Missing File Links |
| 22 | +**Problem**: Users couldn't directly access downloaded files from the datatables interface. |
| 23 | + |
| 24 | +**Solution**: |
| 25 | +- **PDF Files**: Added clickable links with 📄 icon and tooltips |
| 26 | +- **XML Files**: Added clickable links with 📋 icon and tooltips |
| 27 | +- **HTML Files**: Added clickable links with 🌐 icon and tooltips (prioritizes enhanced HTML) |
| 28 | +- **Supplementary Files**: Added clickable links with 📁 icon showing file count |
| 29 | +- **External Links**: Enhanced DOI, PMID, and PMCID links with tooltips |
| 30 | + |
| 31 | +### 3. Missing Column Tooltips |
| 32 | +**Problem**: Column headers lacked explanations of what they represent. |
| 33 | + |
| 34 | +**Solution**: |
| 35 | +- Created `_create_datatable_with_tooltips()` method |
| 36 | +- Added comprehensive tooltips for all columns: |
| 37 | + - **Select**: "Select paper for bulk operations" |
| 38 | + - **ID**: "Unique paper identifier" |
| 39 | + - **Title**: "Paper title (truncated if >100 characters)" |
| 40 | + - **Authors**: "Author names (truncated if >50 characters)" |
| 41 | + - **Journal**: "Journal or repository name" |
| 42 | + - **DOI**: "Digital Object Identifier - click to open" |
| 43 | + - **PMID**: "PubMed ID - click to open in PubMed" |
| 44 | + - **PMCID**: "PubMed Central ID - click to open in PMC" |
| 45 | + - **Date**: "Publication date" |
| 46 | + - **XML**: "XML fulltext file - click to download/view" |
| 47 | + - **PDF**: "PDF fulltext file - click to download/view" |
| 48 | + - **Suppl**: "Supplementary files - click to browse" |
| 49 | + - **HTML**: "HTML version of fulltext - click to view" |
| 50 | + - **Enhanced**: "Enhanced HTML with semantic markup" |
| 51 | + - **Files**: "Total number of files in paper directory" |
| 52 | + |
| 53 | +## Technical Implementation |
| 54 | + |
| 55 | +### File Link Generation |
| 56 | +```python |
| 57 | +# Comprehensive file link creation with tooltips |
| 58 | +pdf_link = f'<a href="{paper["directory"]}/{pdf_file}" target="_blank" title="Open PDF file">📄 PDF</a>' |
| 59 | +xml_link = f'<a href="{paper["directory"]}/{xml_file}" target="_blank" title="Open XML file">📋 XML</a>' |
| 60 | +html_link = f'<a href="{paper["directory"]}/{html_file}" target="_blank" title="Open enhanced HTML file">🌐 Enhanced</a>' |
| 61 | +supp_link = f'<a href="{paper["directory"]}/supplementary/" target="_blank" title="Open supplementary files directory">📁 Suppl ({len(supp_files)})</a>' |
| 62 | +``` |
| 63 | + |
| 64 | +## User Experience Improvements |
| 65 | + |
| 66 | +### 1. Direct File Access |
| 67 | +- Users can now click on file icons to directly open/download files |
| 68 | +- Visual indicators (📄, 📋, 🌐, 📁) make file types easily identifiable |
| 69 | +- Tooltips provide context for each file type |
| 70 | + |
| 71 | +### 2. Better Navigation |
| 72 | +- External links (DOI, PMID, PMCID) open in new tabs |
| 73 | +- File links open in new tabs to preserve table context |
| 74 | +- Supplementary file links show file count for quick assessment |
| 75 | + |
| 76 | +### 3. Enhanced Usability |
| 77 | +- Column tooltips appear on hover to explain data meaning |
| 78 | +- File availability is clearly indicated with clickable links or status icons |
| 79 | +- Responsive design maintains usability on different screen sizes |
| 80 | + |
| 81 | +## File Structure Changes |
| 82 | + |
| 83 | +### Modified Files |
| 84 | +- `pygetpapers/tools/datatables_integration.py`: |
| 85 | + - Updated `create_papers_table()` with file links and tooltips |
| 86 | + - Added `_create_datatable_with_tooltips()` method |
| 87 | + - Enhanced file link generation with visual indicators |
| 88 | + |
| 89 | +### New Features |
| 90 | +- **File Link Generation**: Automatic creation of clickable file links |
| 91 | +- **Tooltip System**: Comprehensive column header explanations |
| 92 | +- **Enhanced Metadata Support**: Support for additional repository types |
| 93 | + |
| 94 | +## Backward Compatibility |
| 95 | + |
| 96 | +- All existing functionality remains unchanged |
| 97 | +- New features are additive and don't break existing workflows |
| 98 | +- Fallback to simple HTML tables if datatables are unavailable |
| 99 | +- Maintains support for all existing repository types |
| 100 | + |
| 101 | +## Future Considerations |
| 102 | + |
| 103 | +### Corpus Redesign Needed |
| 104 | +1. **Corpus Definition**: Redefine what constitutes a valid paper corpus |
| 105 | +2. **Directory Filtering**: Implement proper filtering of utility directories |
| 106 | +3. **Validation Logic**: Add robust validation for paper directories |
| 107 | +4. **Configuration**: Allow user-defined corpus rules |
| 108 | + |
| 109 | +### Potential Enhancements |
| 110 | +1. **Bulk Operations**: Select multiple papers for batch processing |
| 111 | +2. **File Preview**: Inline preview of small files |
| 112 | +3. **Advanced Filtering**: Filter by file type, date range, etc. |
| 113 | +4. **Export Options**: Export filtered results to various formats |
| 114 | +5. **Custom Columns**: User-defined column configurations |
| 115 | + |
| 116 | +### Performance Optimizations |
| 117 | +1. **Lazy Loading**: Load file information on demand |
| 118 | +2. **Caching**: Cache file metadata for faster rendering |
| 119 | +3. **Pagination**: Handle very large corpora efficiently |
| 120 | + |
| 121 | +## Testing Recommendations |
| 122 | + |
| 123 | +1. **File Links**: Verify all file types generate correct links |
| 124 | +2. **Tooltips**: Test tooltip functionality across browsers |
| 125 | +3. **Large Corpora**: Test performance with 1000+ papers |
| 126 | +4. **Different Repositories**: Test with bioRxiv, Europe PMC, Crossref, etc. |
| 127 | +5. **Corpus Issues**: Monitor for jats4r and other utility directory problems |
| 128 | + |
| 129 | +## Version Attribution |
| 130 | + |
| 131 | +**Original Development**: |
| 132 | +- **Author**: petermr (Peter Murray-Rust) <peter.murray.rust@googlemail.com> |
| 133 | +- **First Commit**: July 5, 2025 - "added datatables and corpus FIRST PASS MAY HAV BUGS" (f92cfab) |
| 134 | +- **Major Enhancement**: July 11, 2025 - "Add datatables HTML export functionality with external CSS support" (84c12de) |
| 135 | +- **Final Tidy**: July 15, 2025 - "tidying and testing" (2e9f96d) |
| 136 | + |
| 137 | +**Enhancements**: |
| 138 | +- **Editor**: Assistant |
| 139 | +- **Date**: December 19, 2024 |
| 140 | +- **Purpose**: Add file links and implement tooltips for pygetpapers v2.0 (1.2.5a23) |
| 141 | +- **Changes**: Enhanced file accessibility, improved user experience |
| 142 | +- **Note**: Corpus filtering reverted - requires architectural redesign |
0 commit comments