Skip to content

Escaping ':' in search queries, except if advanced=true#4207

Open
ineiti wants to merge 6 commits intoDSpace:mainfrom
c4dt:add_advanced_search
Open

Escaping ':' in search queries, except if advanced=true#4207
ineiti wants to merge 6 commits intoDSpace:mainfrom
c4dt:add_advanced_search

Conversation

@ineiti
Copy link
Copy Markdown

@ineiti ineiti commented Apr 16, 2025

References

DSpace/DSpace#9670

Description

Fixes DSpace/DSpace#9670 by escaping ':' in search queries, except if 'advanced == true'.
It does the following:

  • adds an 'advanced' field to the search-form.component, search-options.model, search.component
  • changes the search-options.model to escape ':' characters, unless 'advanced == true'
  • adds a checkbox to the search-form.component
  • adds one entry to the src/assets/i18n/en.json5

TODO:

  • add tests
  • make sure this looks good
  • add other translations

Instructions for Reviewers

Try to search for something which includes a colon at the end of the word.
Then click the 'advanced' checkbox and try to search using advanced search features.

Checklist

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

Comment thread src/app/shared/search/models/search-options.model.ts Fixed
@ineiti ineiti force-pushed the add_advanced_search branch from a8e8917 to ecb9e2f Compare April 16, 2025 16:04
Comment thread src/app/shared/search/models/search-options.model.ts Fixed
@kshepherd
Copy link
Copy Markdown
Member

@ineiti i like the idea of this a lot, just one tip:

  • in search-form.component.ts, you dont need the {{ }} interpolation when setting [attr.name] -- these angular attributes already expect a reference, so you can simplify to [attr.name]="('search.form.advanced' | translate )"

If you fix that, the other parse errors in the template should go away.

Regarding the advanced form control itself... probably worth discussing the best way to make this available without being intrusive. An extra tab up the top? If you go with the checkbox, where to put it, how to style it, etc

In some quick testing, I don't see a difference between checked=true and checked=false, but that part still seems WIP so I'll hold off for now!

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Apr 17, 2025

@kshepherd OK, thanks. I added some more propagation of the advanced field. IIRC Angular, the [(ngModel)]="advanced" should automatically take care of setting the value when the checkbox gets activated. But I have to admit I'm not completely sure that I understood all the voyage of the parameters to the search engine :)

Lazy question: is there a place in the documentation which shows how to easily test the frontend with an example backend?

@jsicot
Copy link
Copy Markdown

jsicot commented Apr 17, 2025

Thanks a lot for this contribution, @ineiti . This issue has been quite disruptive for end users across DSpace-based repositories.

From the perspective of repository managers, this bug has a significant impact, as searching by known title is one of the most common behaviors among users. Unfortunately, many publication titles include colons (:), making this issue especially frustrating for non-technical users who often have no idea why their search returns no results unless they use quotes or escape characters,a workaround very few are aware of.

At the same time, I completely agree with preserving the possibility to search within specific metadata fields, as it remains an important feature for curators and advanced users. So finding a balance between simplicity for the general public and power for expert users is key here.

Regarding the UI proposal in this PR: rather than a simple checkbox, I would suggest introducing a toggle button below the search bar that would allow users to switch between a basic (or simple) search mode where colons would be escaped automatically and an expert mode that retains the full indexing syntax. I would avoid the term advanced, since some repositories already offer an "advanced search" with multiple fields and boolean operators, and we don't want to create confusion.

Design-wise, this toggle should ideally be compatible with both the main search bar on the homepage and the slightly different one displayed after an initial search (which includes the collection scoping dropdown). The toggle should remain unobtrusive, with a clear visual indicator when expert mode is active : something akin to the way ChatGPT activates "Search Mode" (web access) with a small switch and indicator could serve as inspiration.

search_expert_mode_enabled

Really looking forward to seeing how this evolves and happy to collaborate further or test the changes if helpful!

@kshepherd
Copy link
Copy Markdown
Member

@ineiti the default configuration should actually point you to the sandbox server, I think. Does it not work like that for you out of the box? You can also take a look at the instructions here: https://wiki.lyrasis.org/display/DSPACE/Try+out+DSpace+9#TryoutDSpace9-InstalltheUserInterfaceonly

If you run docker, the docker-compose files provided in the angular src also allow this

@kshepherd
Copy link
Copy Markdown
Member

@ineiti if this is still a work-in-progress you can give it draft status or add the work-in-progress label, then toggle back once you're ready for review.

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Apr 17, 2025

@ineiti the default configuration should actually point you to the sandbox server, I think. Does it not work like that for you out of the box? You can also take a look at the instructions here: https://wiki.lyrasis.org/display/DSPACE/Try+out+DSpace+9#TryoutDSpace9-InstalltheUserInterfaceonly

If you run docker, the docker-compose files provided in the angular src also allow this

That's exactly what I was looking for! Thanks a lot :) As I consider having been Nerd Sniped by our admins, I'm trying to spend as little time on this as possible :)

@ineiti ineiti marked this pull request as draft April 17, 2025 11:53
@ineiti ineiti force-pushed the add_advanced_search branch from 60c0132 to f5fcc30 Compare April 17, 2025 12:39
@ineiti
Copy link
Copy Markdown
Author

ineiti commented Apr 17, 2025

OK, this is getting to a place where it starts to be useful... First manual tests seem to show this works. If I search on "vaccination impact:" (without the quotes) with 'Simple' search, it does find it. If I switch to 'Expert' search, it complains.

I'm a software engineer, so this is the best I can come up with:
image
And for expert mode:
image

Shamelessly copy/pasted from the internet.

TODO:

  • write a test or two
  • fix the e2e tests

@ineiti ineiti marked this pull request as ready for review April 19, 2025 19:26
@ineiti
Copy link
Copy Markdown
Author

ineiti commented Apr 19, 2025

Here goes a version which works locally. Unfortunately I was not able to get the e2e tests running locally. I followed the README and the github/workflow, but neither worked for me :( So I added some lines to one of the e2e tests, and a least on github it works.

Please let me know what you think of this PR, and if it also works in your tests.

@ineiti
Copy link
Copy Markdown
Author

ineiti commented May 2, 2025

@jsicot is there something I can do to advance this?

@tdonohue tdonohue added the component: Discovery related to discovery search or browse system label May 2, 2025
@tdonohue tdonohue moved this to 🙋 Needs Reviewers Assigned in DSpace 10.0 Release May 2, 2025
@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented May 2, 2025

@ineiti : Unfortunately, this PR missed our deadline for new features in 9.0, which was back on March 28. So, I've had to move this to our 10.0 board (10.0 is due in May 2026).

That said, others are still welcome to review this, test it & give feedback, and we could always merge it immediately after 9.0 goes out the door, assuming it has positive feedback.

@kshepherd
Copy link
Copy Markdown
Member

I think this is a great prototype for an improvement that could use some extra discussion regarding the UX, to help us make the functionality really clear and to help plan for future improvements to the search forms

@ineiti
Copy link
Copy Markdown
Author

ineiti commented May 5, 2025

OK, thanks for the update - I'll keep an eye on my github notifications in case something comes up :)

@tdonohue
Copy link
Copy Markdown
Member

@ineiti : We discussed this briefly as a team in our weekly Developers Meeting today.

Overall, we agree this is a bug. But, there's some concerns about the design of adding a checkbox next to the searchbox, as that doesn't seem very user friendly. In order to help others to visualize the design, could you add a screenshot or two of the new look to this PR?

If this PR moves forward, we'd need to have more automated tests added. For instance, the new checkbox in the search-form.component.html doesn't have automated tests added into search-form.component.spec.ts. But, this could wait until we can find a design that people like better.

@tdonohue tdonohue moved this from 🙋 Needs Reviewers Assigned to 👀 Under Review in DSpace 10.0 Release Jul 24, 2025
@ineiti
Copy link
Copy Markdown
Author

ineiti commented Aug 11, 2025

@tdonohue - Sorry, been on holidays. Should catch up faster now :)

The screenshots are shown above in my comment from 17th of April:

Default value is this:

image

When you click in the "Simple" rectangle, it changes to:

image

Clicking in the "Expert" rectangle, it changes back to the previous.

This is based on 5 minutes Google search how to do such a thing, I'm sure that with Claude, or an actual UI engineer, you'll have a much better display. The CSS is here:

https://github.com/DSpace/dspace-angular/pull/4207/files#diff-6a79fca90fc96c74a79c3e7f17db7c418267fbae6c034d42ba6569e61ccb0b20R10

And I'd be happy to add some more tests, if we agree on a good UI for it.

@github-actions
Copy link
Copy Markdown

Hi @ineiti,
Conflicts have been detected against the base branch.
Please resolve these conflicts as soon as you can. Thanks!

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Nov 12, 2025

Rebased on latest main - let me know if you need anything else.

@tdonohue
Copy link
Copy Markdown
Member

@ineiti : Apologies that we have not gotten a reviewer on this PR yet. However, we are just beginning a more detailed review process for 10.0 for all new feature PRs. If you can rebase this on latest main that would allow us to consider this for 10.0. Thanks!

@tdonohue tdonohue moved this from 👀 Under Review to 🙋 Needs Reviewers Assigned in DSpace 10.0 Release Feb 13, 2026
@ineiti ineiti force-pushed the add_advanced_search branch from d546337 to ce5309e Compare February 16, 2026 16:16
@ineiti
Copy link
Copy Markdown
Author

ineiti commented Feb 16, 2026

OK - that was easier than I thought... Let's see if it passes the tests!

@tdonohue tdonohue requested a review from atarix83 February 19, 2026 15:32
@tdonohue tdonohue moved this from 🙋 Needs Reviewers Assigned to 👀 Under Review in DSpace 10.0 Release Feb 19, 2026
@jlipka
Copy link
Copy Markdown
Contributor

jlipka commented Mar 12, 2026

In parallel with the review process carried out by the other contributors, I tested this feature locally against the DSpace Sandbox backend.
The behaviour mentioned works as expected for me 👍 : a colon in a search query no longer results in an empty search result when using "simple" mode.

Allow me to make two remarks:

  1. As this pull request specifically focuses on the colon character, it would be good to also encode other special characters, such as '?', and there may be others. Example item in the sandbox: 'Studies of the pathogenesis and immunology of attenuated mutants of Salmonella enterica var. Typhimurium: Lessons for human typhoid fever?"). The search only returns a result if the question mark is removed from the search query. Perhaps this case could be resolved in the same way as a quick win.

  2. Bootstrap 5 offers various UI elements. To maintain consistency, we could use elements such as these: https://getbootstrap.com/docs/5.1/forms/input-group/.

@saschaszott
Copy link
Copy Markdown
Contributor

saschaszott commented Mar 12, 2026

@ineiti , thanks for your contribution. Should the other Lucene special characters (like ?) in addition to the colon also be considered and automatically escaped in the “simple” search mode? See also the DSpace backend issue #9026.

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Mar 13, 2026

@ineiti , thanks for your contribution. Should the other Lucene special characters (like ?) in addition to the colon also be considered and automatically escaped in the “simple” search mode? See also the DSpace backend issue #9026.

Yes, sounds reasonable. Quick question: the full list seems to be

+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /

Any idea how I have to escape the &&? Is it \&\& or is it \&&? Just to save me some testing time :)

ineiti added 5 commits March 13, 2026 10:28
Following up on the issue DSpace/DSpace#9670 - here is a first
proposal.

Per default, this sets the 'advanced' flag to false and will escape ':' characters in the search string.
Only with 'advanced = true' is the search string passed as-is.

TODO:
- add tests
- make sure this looks good
@ineiti ineiti force-pushed the add_advanced_search branch from ce5309e to d8d704b Compare March 13, 2026 09:44
@ineiti ineiti force-pushed the add_advanced_search branch from 59f810f to d170a8c Compare March 13, 2026 10:24
@ineiti
Copy link
Copy Markdown
Author

ineiti commented Mar 13, 2026

OK, here comes an implementation from Claude to escape all Lucene special characters - I looked through it, and it looks good, and the new tests make sense to me and pass locally and on CI/CD.

Claude decided to escape && as \&\& - let me know if this is OK.

@saschaszott
Copy link
Copy Markdown
Contributor

@ineiti , this is the official reference that lists all special characters in Lucene queries: https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/search/package-summary.html

@saschaszott
Copy link
Copy Markdown
Contributor

@ineiti , regarding escaping of two-character operators: I think both options are valid - you can use \&& or \&\&.

@saschaszott
Copy link
Copy Markdown
Contributor

I just noticed in the IndexAnalyzer configuration of the field type text, which is used for many search fields, that it also uses WordDelimiterGraphFilter. This filter removes certain characters from the token stream. For example, the search input foo && bar is turned into the token sequence foo bar when searching on an index field with the field type text.

As a result, it is not possible to perform an exact search here, even with escaping. Should we therefore use special index fields for the advanced search that use a field type which is not based on WordDelimiterGraphFilter?

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Mar 13, 2026

@ineiti , this is the official reference that lists all special characters in Lucene queries: https://lucene.apache.org/core/10_0_0/core/org/apache/lucene/search/package-summary.html

I couldn't find the list in your URL - I used the following one:

https://lucene.apache.org/core/8_11_2/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters

@saschaszott
Copy link
Copy Markdown
Contributor

@ineiti
Copy link
Copy Markdown
Author

ineiti commented Mar 14, 2026

@saschaszott great - the list didn't change between version 8.11.2 and 10.0.0, so all should be well now.

Let me know if there is something else to do.

@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Apr 2, 2026

Unfortunately, this Feature PR missed our DSpace 10.0 "Feature PR Merge Deadline". As DSpace 10.0 is now under a "Feature Freeze", this ticket will be moved to our 11.0 release board in the hopes of completing this feature in the next release. If there are any questions, let us know.

@tdonohue tdonohue moved this to 🙋 Needs Reviewers Assigned in DSpace 11.0 Release Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: Discovery related to discovery search or browse system new feature

Projects

Status: 🙋 Needs Reviewers Assigned

Development

Successfully merging this pull request may close these issues.

Search Solr (DS 7.X) with title containing colon doesn't return results except with quotes or escaping or having a stop word after the colon

7 participants