Fix image extraction newline bug in ImageExtractorNew#837
Conversation
📝 WalkthroughWalkthroughA minor logic change in image filename extraction that adds delimiter-based early termination. When accumulating image filename characters, the parser now stops appending upon encountering delimiters (':', '=', '|', or newline) instead of including them in the final substring. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala (1)
280-294: Consider extracting a shared delimiter predicate.The delimiter check is duplicated in two branches. Centralizing it reduces the chance of future branch drift regressions.
♻️ Proposed refactor
def totallyNotImageLinkRegex(section: String): ListBuffer[String] = { val charArray = section.reverse.toCharArray val iterator = charArray.iterator val sb = new mutable.StringBuilder() val images = ListBuffer[String]() var imageFound = false + `@inline` def isDelimiter(ch: Char): Boolean = + ch == ':' || ch == '=' || ch == '|' || ch == '\n' while (iterator.hasNext) { var c = iterator.next() @@ - } else if (iterator.hasNext) { - if (c != ':' && c != '=' && c != '|' && c != '\n') { + } else if (iterator.hasNext) { + if (!isDelimiter(c)) { sb.append(c) } @@ } else { - if (c != ':' && c != '=' && c != '|' && c != '\n') { + if (!isDelimiter(c)) { sb.append(c) } imageFound = false images += sb.reverse.toString() sb.clear()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala` around lines 280 - 294, The duplicated delimiter check (c != ':' && c != '=' && c != '|' && c != '\n') in ImageExtractorNew.scala should be extracted into a single helper predicate to avoid drift; add a private def or local function like isDelimiter(c: Char): Boolean (or isNotDelimiter if you prefer) near the surrounding method in the ImageExtractorNew object/class and replace both occurrences in the branch that appends to sb and the final-else branch with calls to that predicate (e.g., if (isNotDelimiter(c)) sb.append(c) else { ... }); update usages in the same method to use the helper and keep behavior identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala`:
- Around line 280-294: The duplicated delimiter check (c != ':' && c != '=' && c
!= '|' && c != '\n') in ImageExtractorNew.scala should be extracted into a
single helper predicate to avoid drift; add a private def or local function like
isDelimiter(c: Char): Boolean (or isNotDelimiter if you prefer) near the
surrounding method in the ImageExtractorNew object/class and replace both
occurrences in the branch that appends to sb and the final-else branch with
calls to that predicate (e.g., if (isNotDelimiter(c)) sb.append(c) else { ...
}); update usages in the same method to use the helper and keep behavior
identical.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7f7f8c3f-cd52-4883-aa50-d1bba5c1da2e
📒 Files selected for processing (1)
core/src/main/scala/org/dbpedia/extraction/mappings/ImageExtractorNew.scala



Description
The issue in ImageExtractorNew.scala caused image names extracted from sections to include a leading newline (\n). This happened because when iterator.hasNext() returned false at the last character, the code appended the character directly without checking if it was a newline.
Example input:
\nLRO_WAC_South_Pole_Mosaic.jpg|Previously extracted result:
\nLRO_WAC_South_Pole_Mosaic.jpgThe fix adds a condition to skip newline characters before appending them to the builder. This ensures the correct extraction:
LRO_WAC_South_Pole_Mosaic.jpgTesting
Due to Scala 2.11 compatibility issues with modern JDK (object java.lang.Object in compiler mirror not found), the full Maven test suite could not run.
Instead, a standalone test program replicating the extraction loop was used with the problematic input. The output correctly extracted:
Extracted Image: [LRO_WAC_South_Pole_Mosaic.jpg]confirming the fix works as expected.
Issue Link-: #748
Summary by CodeRabbit