Skip to content

Commit 52ceaed

Browse files
task(34647) ContentletFactory migration to OS (#34691)
### Proposed Changes --- 🔧 Refactor: Separate Index Operations and Add OpenSearch Migration Foundation Summary This PR introduces a major architectural refactoring to separate index operations from the main content factory implementation and establishes the foundation for OpenSearch migration. This represents the beginning of a clean separation of concerns between content management and search engine operations. 🚀 Key Changes 1. Index Operations Extraction & Abstraction - Extracted index operations from ESContentFactoryImpl into dedicated, specialized classes - Created ContentFactoryIndexOperations interface - pure contract for search engine agnostic operations - Implemented ContentFactoryIndexOperationsES - Elasticsearch-specific implementation - Implemented ContentFactoryIndexOperationsOS - OpenSearch-specific implementation (partial) 2. Search Engine Agnostic Domain Objects - Created immutable domain objects to eliminate direct library dependencies: - SearchHit - unified representation of individual search results - SearchHits - collection of search results with metadata - TotalHits - search count information with relation metadata - Relation - enum for count precision (exact vs estimate) - Factory methods for seamless conversion between Elasticsearch/OpenSearch types and domain objects - JSON serialization support for REST APIs and caching 3. Migration Documentation & Analysis Annotations - @IndexMetadata - Documents current migration state and index interaction patterns - @IndexLibraryIndependent - Enforces architectural purity for search engine agnostic APIs - ArchUnit tests to prevent library-specific dependencies in abstraction layers 4. Improved Code Quality - Comprehensive JavaDoc documentation with usage examples and architectural explanations - @inheritdoc annotations for implementation classes to maintain documentation consistency - Type-safe immutable objects using Immutables library 🔄 Architecture Benefits Before: ESContentFactoryImpl ──> Elasticsearch APIs (tightly coupled) After: ContentFactory ──> ContentFactoryIndexOperations (interface) ├── ContentFactoryIndexOperationsES └── ContentFactoryIndexOperationsOS ⚠️ Current Limitations (OpenSearch Implementation) Incomplete Features: - Scroll API implementation - Currently throws UnsupportedOperationException - Missing test coverage for OpenSearch implementation - Basic indexSearchScroll() - Uses simple pagination instead of true scroll functionality Planned Next Steps: 1. Complete OpenSearch Scroll API implementation 2. Add comprehensive integration tests for OpenSearch operations 3. Implement dynamic selection mechanism between ES/OS implementations 4. Performance benchmarking and optimization 🏗️ Architectural Impact Separation of Concerns: - Content operations remain in content factory classes - Search operations isolated in specialized index operation classes - Domain objects provide clean abstraction without vendor lock-in Future Migration Benefits: - Zero-downtime migration capability between search engines - A/B testing different search engine implementations - Unified API surface regardless of underlying search technology 📋 Testing Status - ✅ Elasticsearch operations - Fully functional with existing test coverage - ⚠️ OpenSearch operations - Basic functionality implemented, tests needed - ✅ Domain object conversions - Factory methods tested - ✅ ArchUnit rules - Architectural constraints enforced --- ⚠️ Note: This is a foundational PR. OpenSearch implementation requires additional work for production readiness, particularly Scroll API functionality and comprehensive testing. This PR fixes: #34647
1 parent ec7e3f7 commit 52ceaed

26 files changed

Lines changed: 2658 additions & 926 deletions

dotCMS/src/enterprise/java/com/dotcms/enterprise/priv/ESSearchAPIImpl.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
import java.util.List;
4343
import java.util.Map;
4444

45+
import static com.dotcms.content.elasticsearch.business.ContentFactoryIndexOperationsES.addBuilderSort;
4546
import static com.dotcms.content.elasticsearch.business.ESIndexAPI.INDEX_OPERATIONS_TIMEOUT_IN_MS;
4647

4748
/**
@@ -339,7 +340,7 @@ private SearchResponse esSearchRaw(JSONObject jsonObject, boolean live, User use
339340
searchSourceBuilder.from(offset);
340341

341342
if(UtilMethods.isSet(sortBy) ) {
342-
ESContentFactoryImpl.addBuilderSort(sortBy, searchSourceBuilder);
343+
addBuilderSort(sortBy, searchSourceBuilder);
343344
}
344345

345346
request.source(searchSourceBuilder);

dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentFactoryIndexOperationsES.java

Lines changed: 474 additions & 0 deletions
Large diffs are not rendered by default.

dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentFactoryImpl.java

Lines changed: 110 additions & 778 deletions
Large diffs are not rendered by default.

dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletAPIImpl.java

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@
4848
import com.dotcms.contenttype.transform.contenttype.ContentTypeTransformer;
4949
import com.dotcms.contenttype.transform.contenttype.StructureTransformer;
5050
import com.dotcms.contenttype.transform.field.LegacyFieldTransformer;
51+
import com.dotcms.contenttype.util.StoryBlockUtil;
5152
import com.dotcms.cost.RequestCost;
5253
import com.dotcms.cost.RequestPrices.Price;
5354
import com.dotcms.exception.ExceptionUtil;
@@ -58,8 +59,6 @@
5859
import com.dotcms.rendering.velocity.services.ContentletLoader;
5960
import com.dotcms.rendering.velocity.services.PageLoader;
6061
import com.dotcms.rest.AnonymousAccess;
61-
import com.dotcms.contenttype.util.StoryBlockUtil;
62-
import com.dotcms.util.JsonUtil;
6362
import com.dotcms.rest.api.v1.temp.DotTempFile;
6463
import com.dotcms.rest.api.v1.temp.TempFileAPI;
6564
import com.dotcms.storage.FileMetadataAPI;
@@ -70,6 +69,7 @@
7069
import com.dotcms.util.ConversionUtils;
7170
import com.dotcms.util.DotPreconditions;
7271
import com.dotcms.util.FunctionUtils;
72+
import com.dotcms.util.JsonUtil;
7373
import com.dotcms.util.ThreadContextUtil;
7474
import com.dotcms.util.xstream.XStreamHandler;
7575
import com.dotcms.variant.VariantAPI;
@@ -236,7 +236,6 @@
236236
import org.elasticsearch.action.search.SearchPhaseExecutionException;
237237
import org.elasticsearch.action.search.SearchResponse;
238238
import org.elasticsearch.search.SearchHit;
239-
import org.elasticsearch.search.SearchHits;
240239

241240

242241
/**
@@ -1677,20 +1676,20 @@ public List<ContentletSearch> searchIndex(String luceneQuery, int limit, int off
16771676
}
16781677

16791678
if (limit <= MAX_LIMIT) {
1680-
final SearchHits searchHits = contentFactory.indexSearch(queryWithPermissions, limit,
1679+
final com.dotcms.content.index.domain.SearchHits searchHits = contentFactory.indexSearch(queryWithPermissions, limit,
16811680
offset, sortBy);
16821681
final PaginatedArrayList<ContentletSearch> list = new PaginatedArrayList<>();
1683-
list.setTotalResults(searchHits.getTotalHits().value);
1682+
list.setTotalResults(searchHits.totalHits().value());
16841683

1685-
for (final SearchHit searchHit : searchHits.getHits()) {
1684+
for (final com.dotcms.content.index.domain.SearchHit searchHit : searchHits.hits()) {
16861685
try {
1687-
final Map<String, Object> sourceMap = searchHit.getSourceAsMap();
1686+
final Map<String, Object> sourceMap = searchHit.sourceAsMap();
16881687
final ContentletSearch conWrapper = new ContentletSearch();
1689-
conWrapper.setId(searchHit.getId());
1690-
conWrapper.setIndex(searchHit.getIndex());
1688+
conWrapper.setId(searchHit.id());
1689+
conWrapper.setIndex(searchHit.index());
16911690
conWrapper.setIdentifier(sourceMap.get("identifier").toString());
16921691
conWrapper.setInode(sourceMap.get("inode").toString());
1693-
conWrapper.setScore(searchHit.getScore());
1692+
conWrapper.setScore(searchHit.score());
16941693

16951694
list.add(conWrapper);
16961695
} catch (Exception e) {

dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ESContentletScroll.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
* }
3030
* </pre>
3131
*
32-
* @see ContentletFactory#createScrollQuery(String, com.liferay.portal.model.User, boolean, int)
32+
* @see com.dotmarketing.portlets.contentlet.business.ContentletFactory#createScrollQuery(String, com.liferay.portal.model.User, boolean, int)
3333
*/
3434
public interface ESContentletScroll extends AutoCloseable {
3535

Lines changed: 248 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,248 @@
1+
package com.dotcms.content.elasticsearch.business;
2+
3+
import static com.dotcms.content.elasticsearch.business.ContentFactoryIndexOperationsES.addBuilderSort;
4+
import static com.dotcms.content.elasticsearch.business.ESIndexAPI.INDEX_OPERATIONS_TIMEOUT_IN_MS;
5+
6+
import com.dotcms.content.elasticsearch.util.RestHighLevelClientProvider;
7+
import com.dotcms.content.index.domain.SearchHit;
8+
import com.dotcms.content.index.domain.SearchHits;
9+
import com.dotmarketing.common.model.ContentletSearch;
10+
import com.dotmarketing.exception.DotDataException;
11+
import com.dotmarketing.exception.DotRuntimeException;
12+
import com.dotmarketing.util.Config;
13+
import com.dotmarketing.util.Logger;
14+
import com.dotmarketing.util.PaginatedArrayList;
15+
import com.dotmarketing.util.UtilMethods;
16+
import com.liferay.portal.model.User;
17+
import io.vavr.Lazy;
18+
import io.vavr.control.Try;
19+
import java.util.ArrayList;
20+
import java.util.List;
21+
import java.util.Map;
22+
import java.util.Objects;
23+
import org.elasticsearch.action.search.ClearScrollRequest;
24+
import org.elasticsearch.action.search.SearchRequest;
25+
import org.elasticsearch.action.search.SearchResponse;
26+
import org.elasticsearch.action.search.SearchScrollRequest;
27+
import org.elasticsearch.client.RequestOptions;
28+
import org.elasticsearch.client.RestHighLevelClient;
29+
import org.elasticsearch.common.unit.TimeValue;
30+
import org.elasticsearch.search.builder.SearchSourceBuilder;
31+
import org.elasticsearch.search.sort.SortOrder;
32+
33+
/**
34+
* Private implementation of ESContentletScroll that encapsulates all ElasticSearch
35+
* Scroll API logic in one place.
36+
*/
37+
class ESContentletScrollImpl implements ESContentletScroll {
38+
39+
private static final Lazy<Integer> SCROLL_KEEP_ALIVE_MINUTES = Lazy.of(() ->
40+
Config.getIntProperty("ES_SCROLL_KEEP_ALIVE_MINUTES", 5));
41+
private static final Lazy<Integer> SCROLL_BATCH_SIZE = Lazy.of(() ->
42+
Config.getIntProperty("ES_SCROLL_BATCH_SIZE", 1000));
43+
44+
// State fields
45+
private String scrollId;
46+
private long totalHits = 0;
47+
private boolean hasMoreResults = false;
48+
private boolean firstBatchReturned = false;
49+
private List<ContentletSearch> firstBatch;
50+
private RestHighLevelClient esClient;
51+
private ContentFactoryIndexOperationsES indexOperations;
52+
53+
/**
54+
* Creates a new scroll query instance and initializes the scroll context.
55+
* The first batch is fetched immediately and cached for the first {@link #nextBatch()} call.
56+
*
57+
* @param luceneQuery Lucene query string
58+
* @param user User for permission checking (only used during initialization)
59+
* @param respectFrontendRoles Whether to respect frontend roles (only used during initialization)
60+
* @param batchSize Number of results to retrieve per batch
61+
* @param sortBy Sort criteria (e.g., "title asc", "moddate desc")
62+
* @throws DotRuntimeException if scroll initialization fails
63+
*/
64+
ESContentletScrollImpl(final String luceneQuery, final User user, final boolean respectFrontendRoles,
65+
final int batchSize, final String sortBy) {
66+
this.esClient = RestHighLevelClientProvider.getInstance().getClient();
67+
this.indexOperations = new ContentFactoryIndexOperationsES();
68+
69+
// Initialize scroll and fetch first batch
70+
this.firstBatch = Try.of(() -> {
71+
// Translate a query to ES format using the service
72+
final TranslatedQuery translatedQuery = TranslatedQuery.translateQuery(luceneQuery, sortBy);
73+
final String formattedQuery = LuceneQueryDateTimeFormatter
74+
.findAndReplaceQueryDates(translatedQuery.getQuery());
75+
76+
// Determine which index to query using the service
77+
final String indexToHit = indexOperations.inferIndexToHit(luceneQuery);
78+
79+
// Build search request using the service
80+
final SearchSourceBuilder sourceBuilder = indexOperations.createSearchSourceBuilder(formattedQuery, sortBy);
81+
sourceBuilder.timeout(TimeValue.timeValueMillis(INDEX_OPERATIONS_TIMEOUT_IN_MS));
82+
sourceBuilder.size(batchSize);
83+
84+
// Apply sorting using the service
85+
applySorting(sortBy, sourceBuilder);
86+
87+
final SearchRequest searchRequest = new SearchRequest()
88+
.indices(indexToHit)
89+
.source(sourceBuilder)
90+
.scroll(TimeValue.timeValueMinutes(SCROLL_KEEP_ALIVE_MINUTES.get()));
91+
92+
// Execute initial search
93+
final SearchResponse response = esClient.search(searchRequest, RequestOptions.DEFAULT);
94+
this.scrollId = response.getScrollId();
95+
final org.elasticsearch.search.SearchHits esSearchHits = response.getHits();
96+
97+
// Convert to domain SearchHits
98+
final SearchHits searchHits = SearchHits.from(esSearchHits);
99+
this.totalHits = Objects.requireNonNull(searchHits.totalHits()).value();
100+
101+
// Convert hits to ContentletSearch
102+
final List<ContentletSearch> results = getContentletSearchFromSearchHits(searchHits);
103+
this.hasMoreResults = (searchHits.hits() != null && !searchHits.hits().isEmpty());
104+
105+
Logger.debug(this.getClass(),
106+
() -> String.format("Scroll initialized: scrollId=%s, totalHits=%d, firstBatchSize=%d",
107+
scrollId, totalHits, results.size()));
108+
109+
return results;
110+
111+
}).getOrElseThrow(e -> {
112+
if (e instanceof DotRuntimeException) {
113+
return (DotRuntimeException) e;
114+
}
115+
return new DotRuntimeException("Error initializing scroll API: " + e.getMessage(), e);
116+
});
117+
}
118+
119+
@Override
120+
public List<ContentletSearch> nextBatch() throws DotDataException {
121+
// On the first call, return the cached first batch
122+
if (!firstBatchReturned) {
123+
firstBatchReturned = true;
124+
Logger.debug(this.getClass(),
125+
() -> String.format("Returning first batch: size=%d", firstBatch.size()));
126+
return firstBatch;
127+
}
128+
129+
// No more results
130+
if (!hasMoreResults) {
131+
return new ArrayList<>();
132+
}
133+
134+
// Fetch the next batch from the scroll
135+
return Try.of(() -> {
136+
final SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId)
137+
.scroll(TimeValue.timeValueMinutes(SCROLL_KEEP_ALIVE_MINUTES.get()));
138+
139+
final SearchResponse response = esClient.scroll(scrollRequest, RequestOptions.DEFAULT);
140+
final org.elasticsearch.search.SearchHits esSearchHits = response.getHits();
141+
142+
// Convert to domain SearchHits
143+
final SearchHits searchHits = SearchHits.from(esSearchHits);
144+
final List<ContentletSearch> results = getContentletSearchFromSearchHits(searchHits);
145+
this.hasMoreResults = (searchHits.hits() != null && !searchHits.hits().isEmpty());
146+
147+
Logger.debug(this.getClass(),
148+
() -> String.format("Scroll next batch: batchSize=%d, hasMore=%b",
149+
results.size(), hasMoreResults));
150+
151+
return results;
152+
153+
}).getOrElseThrow(e -> {
154+
if (e instanceof DotDataException) {
155+
return (DotDataException) e;
156+
}
157+
return new DotDataException("Error continuing scroll API: " + e.getMessage(), e);
158+
});
159+
}
160+
161+
@Override
162+
public long getTotalHits() {
163+
return totalHits;
164+
}
165+
166+
@Override
167+
public boolean hasMoreResults() {
168+
// If we haven't returned the first batch yet and it has results, there are more
169+
if (!firstBatchReturned && firstBatch != null && !firstBatch.isEmpty()) {
170+
return true;
171+
}
172+
return hasMoreResults;
173+
}
174+
175+
@Override
176+
public void close() {
177+
if (scrollId != null && esClient != null) {
178+
Try.run(() -> {
179+
final ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
180+
clearScrollRequest.addScrollId(scrollId);
181+
esClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
182+
Logger.debug(this.getClass(), () -> "Cleared scroll context: " + scrollId);
183+
}).onFailure(e ->
184+
Logger.error(this.getClass(), "Error clearing scroll context: " + e.getMessage(), e)
185+
);
186+
scrollId = null;
187+
}
188+
}
189+
190+
191+
private List<ContentletSearch> getContentletSearchFromSearchHits(final SearchHits searchHits) {
192+
PaginatedArrayList<ContentletSearch> list=new PaginatedArrayList<>();
193+
list.setTotalResults(searchHits.totalHits().value());
194+
195+
for (SearchHit sh : searchHits.hits()) {
196+
try{
197+
Map<String, Object> sourceMap = sh.sourceAsMap();
198+
ContentletSearch conwrapper= new ContentletSearch();
199+
conwrapper.setId(sh.id());
200+
conwrapper.setIndex(sh.index());
201+
conwrapper.setIdentifier(sourceMap.get("identifier").toString());
202+
conwrapper.setInode(sourceMap.get("inode").toString());
203+
conwrapper.setScore(sh.score());
204+
205+
list.add(conwrapper);
206+
}
207+
catch(Exception e){
208+
Logger.error(this,e.getMessage(),e);
209+
throw e;
210+
}
211+
212+
}
213+
return list;
214+
}
215+
216+
/**
217+
* Applies sorting to the search source builder based on sortBy parameter.
218+
*/
219+
private void applySorting(String sortBy, SearchSourceBuilder searchSourceBuilder) {
220+
if (UtilMethods.isSet(sortBy)) {
221+
sortBy = sortBy.toLowerCase();
222+
223+
if (sortBy.startsWith("score")) {
224+
String[] sortByCriteria = sortBy.split("[,|\\s+]");
225+
String defaultSecondarySort = "moddate";
226+
SortOrder defaultSecondaryOrder = SortOrder.DESC;
227+
228+
if (sortByCriteria.length > 2) {
229+
defaultSecondaryOrder = sortByCriteria[2].equalsIgnoreCase("desc")
230+
? SortOrder.DESC : SortOrder.ASC;
231+
}
232+
if (sortByCriteria.length > 1) {
233+
defaultSecondarySort = sortByCriteria[1];
234+
}
235+
236+
searchSourceBuilder.sort("_score", SortOrder.DESC);
237+
searchSourceBuilder.sort(defaultSecondarySort, defaultSecondaryOrder);
238+
} else if (!sortBy.startsWith("undefined") && !sortBy.startsWith("undefined_dotraw")
239+
&& !sortBy.equals("random")) {
240+
addBuilderSort(sortBy, searchSourceBuilder);
241+
}
242+
} else {
243+
searchSourceBuilder.sort("moddate", SortOrder.DESC);
244+
}
245+
}
246+
247+
}
248+

0 commit comments

Comments
 (0)