Fix image extraction newline bug in ImageExtractorNew by anshuman9468 · Pull Request #837 · dbpedia/extraction-framework

anshuman9468 · 2026-03-04T16:05:11Z

Description

The issue in ImageExtractorNew.scala caused image names extracted from sections to include a leading newline (\n). This happened because when iterator.hasNext() returned false at the last character, the code appended the character directly without checking if it was a newline.

Example input:

\nLRO_WAC_South_Pole_Mosaic.jpg|

Previously extracted result:
\nLRO_WAC_South_Pole_Mosaic.jpg

The fix adds a condition to skip newline characters before appending them to the builder. This ensures the correct extraction:

LRO_WAC_South_Pole_Mosaic.jpg

Testing

Due to Scala 2.11 compatibility issues with modern JDK (object java.lang.Object in compiler mirror not found), the full Maven test suite could not run.

Instead, a standalone test program replicating the extraction loop was used with the problematic input. The output correctly extracted:

Extracted Image: [LRO_WAC_South_Pole_Mosaic.jpg]
confirming the fix works as expected.

Issue Link-: #748

Summary by CodeRabbit

Bug Fixes
- Refined image file name extraction to properly handle delimiter boundaries during the parsing process. The system now stops accumulating characters upon encountering colons, equal signs, pipes, and line breaks. This prevents unwanted trailing delimiters from being included in extracted file names, ensuring more accurate and consistent data extraction results overall.

coderabbitai · 2026-03-04T16:05:53Z

📝 Walkthrough

Walkthrough

A minor logic change in image filename extraction that adds delimiter-based early termination. When accumulating image filename characters, the parser now stops appending upon encountering delimiters (':', '=', '|', or newline) instead of including them in the final substring.

Changes

Cohort / File(s)	Summary
Image Extraction Logic `core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala`	Modified character accumulation in `totallyNotImageLinkRegex` to stop appending when delimiter characters are encountered, reducing unwanted trailing delimiters in extracted image filenames.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing a newline handling bug in image extraction within ImageExtractorNew.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sonarqubecloud · 2026-03-04T16:06:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

🧹 Nitpick comments (1)

core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala (1)

280-294: Consider extracting a shared delimiter predicate.

The delimiter check is duplicated in two branches. Centralizing it reduces the chance of future branch drift regressions.

♻️ Proposed refactor

 def totallyNotImageLinkRegex(section: String): ListBuffer[String] = {
   val charArray = section.reverse.toCharArray
   val iterator = charArray.iterator
   val sb = new mutable.StringBuilder()
   val images = ListBuffer[String]()
   var imageFound = false
+  `@inline` def isDelimiter(ch: Char): Boolean =
+    ch == ':' || ch == '=' || ch == '|' || ch == '\n'
   while (iterator.hasNext) {
     var c = iterator.next()
@@
-      } else if (iterator.hasNext) {
-        if (c != ':' && c != '=' && c != '|' && c != '\n') {
+      } else if (iterator.hasNext) {
+        if (!isDelimiter(c)) {
           sb.append(c)
         }
@@
       } else {
-        if (c != ':' && c != '=' && c != '|' && c != '\n') {
+        if (!isDelimiter(c)) {
           sb.append(c)
         }
         imageFound = false
         images += sb.reverse.toString()
         sb.clear()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala`
around lines 280 - 294, The duplicated delimiter check (c != ':' && c != '=' &&
c != '|' && c != '\n') in ImageExtractorNew.scala should be extracted into a
single helper predicate to avoid drift; add a private def or local function like
isDelimiter(c: Char): Boolean (or isNotDelimiter if you prefer) near the
surrounding method in the ImageExtractorNew object/class and replace both
occurrences in the branch that appends to sb and the final-else branch with
calls to that predicate (e.g., if (isNotDelimiter(c)) sb.append(c) else { ...
}); update usages in the same method to use the helper and keep behavior
identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala`:
- Around line 280-294: The duplicated delimiter check (c != ':' && c != '=' && c
!= '|' && c != '\n') in ImageExtractorNew.scala should be extracted into a
single helper predicate to avoid drift; add a private def or local function like
isDelimiter(c: Char): Boolean (or isNotDelimiter if you prefer) near the
surrounding method in the ImageExtractorNew object/class and replace both
occurrences in the branch that appends to sb and the final-else branch with
calls to that predicate (e.g., if (isNotDelimiter(c)) sb.append(c) else { ...
}); update usages in the same method to use the helper and keep behavior
identical.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7f7f8c3f-cd52-4883-aa50-d1bba5c1da2e

📥 Commits

Reviewing files that changed from the base of the PR and between 892ddfe and 4ce6f1d.

📒 Files selected for processing (1)

core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala

jimkont

thanks @anshuman9468

Fix image extraction newline bug in ImageExtractorNew

4ce6f1d

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

jimkont approved these changes Mar 5, 2026

View reviewed changes

jimkont merged commit 907552e into dbpedia:master Mar 5, 2026
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix image extraction newline bug in ImageExtractorNew#837

Fix image extraction newline bug in ImageExtractorNew#837
jimkont merged 1 commit into
dbpedia:masterfrom
anshuman9468:fix-image-extraction-newline

anshuman9468 commented Mar 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

sonarqubecloud Bot commented Mar 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

jimkont left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anshuman9468 commented Mar 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

sonarqubecloud Bot commented Mar 4, 2026

Quality Gate passed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

jimkont left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anshuman9468 commented Mar 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 4, 2026 •

edited

Loading