Skip to content

Add support for special characters#1479

Open
agusaldasoro wants to merge 9 commits intomasterfrom
fix/more-types
Open

Add support for special characters#1479
agusaldasoro wants to merge 9 commits intomasterfrom
fix/more-types

Conversation

@agusaldasoro
Copy link
Copy Markdown
Collaborator

@agusaldasoro agusaldasoro commented Mar 19, 2026

This pull request refactors and improves the handling of table and column names in the SMT-LIB generation logic, ensuring consistent ASCII conversion and proper mapping between original database schema names and their SMT-LIB representations. The changes introduce a new SmtTable abstraction to encapsulate SMT-specific naming, update all relevant logic to use this abstraction, and fix issues related to non-ASCII schema names. The most important changes are as follows:

SMT-LIB Name Handling and Abstraction:

  • Introduced the SmtTable abstraction in SmtLibGenerator to encapsulate SMT-LIB-specific names and maintain a mapping between original schema names and their ASCII-converted SMT-LIB counterparts, ensuring consistent handling of table and column names throughout the SMT generation process. [1] [2]

Consistent ASCII Conversion:

  • Updated all relevant code paths to consistently use ASCII-converted names for SMT-LIB symbols, including in table/column references, constraints, and value retrieval, to avoid issues with non-ASCII schema names. [1] [2] [3]

Constraint Generation Refactor:

  • Refactored the logic for generating table, key, and column constraints (unique, check, primary key, foreign key, boolean, and timestamp) to use the new SmtTable abstraction and ASCII-converted names, replacing direct access to the original schema DTOs. [1] [2] [3] [4] [5] [6] [7] [8]

Query Condition Parsing:

  • Modified the logic for parsing query conditions to use the correct SMT-LIB table name, falling back to ASCII conversion if the table is not found in the mapping.

Gene Mapping Fixes:

  • Fixed gene mapping in SMTLibZ3DbConstraintSolver to resolve SMT-LIB column names back to their original database column names, ensuring correct gene creation and type handling for database operations.

These changes collectively improve the robustness and correctness of SMT-LIB constraint generation, especially for schemas with non-ASCII identifiers.

@agusaldasoro agusaldasoro marked this pull request as ready for review March 19, 2026 21:04
@arcuri82 arcuri82 removed their request for review March 19, 2026 21:08
@arcuri82
Copy link
Copy Markdown
Collaborator

hi @agusaldasoro ,

thx. first ask @jgaleotti for a review. once he has given it, and you address it, then I can be asked for my review :)

* Characters that decompose under NFD (Å→A, and other accented letters like é, ü, ñ)
* are handled by normalizing to NFD form and stripping the remaining non-ASCII combining marks.
*/
fun sanitizeSmtIdentifier(name: String): String {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this belongs to org.evomaster.core.utils.StringUtils, perhapss a more suitable method name could be transliterateIdentifier()

val genes = mutableListOf<Gene>()
for (columnName in columns.fields) {
var gene: Gene = IntegerGene(columnName, 0)
// columnName is the sanitized (ASCII) version from Z3; resolve back to original DB column name
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the right term for this should be sanitized

// Only add GetValueSMTNode for the mentioned tables
for (table in schema.tables) {
val tableNameLower = table.id.name.lowercase()
val sanitizedTableNameLower = sanitizeSmtIdentifier(tableNameLower)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the right term should be "sanitized"

val originalColumn = table.columns.firstOrNull {
SmtLibGenerator.sanitizeSmtIdentifier(it.name).equals(columnName, ignoreCase = true)
}
val actualColumnName = originalColumn?.name ?: columnName
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actualColumnName -> columnName?

@@ -153,36 +153,41 @@ class SMTLibZ3DbConstraintSolver() : DbConstraintSolver {
// Create the list of genes with the values
val genes = mutableListOf<Gene>()
for (columnName in columns.fields) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnName -> smtColumnName?

private fun appendTableDefinitions(smt: SMTLib) {
for (table in schema.tables) {
val dataTypeName = "${StringUtils.capitalization(table.id.name)}Row"
val sanitizedTableName = convertToAscii(table.id.name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizedTableName -> smtTableName?

*/
private fun appendUniqueConstraints(smt: SMTLib, table: TableDto) {
val tableName = table.id.name.lowercase()
val tableName = convertToAscii(table.id.name).lowercase()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableName -> smtTableName?

*/
private fun appendPrimaryKeyConstraints(smt: SMTLib, table: TableDto) {
val tableName = table.id.name.lowercase()
val tableName = convertToAscii(table.id.name).lowercase()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableName -> smtTableName ?

@jgaleotti jgaleotti requested a review from arcuri82 March 19, 2026 22:22
*/
private fun getColumnReference(tableName: String, columnName: String): String {
return "(${columnName.uppercase()} ${tableName.lowercase()}$rowIndex)"
return "(${convertToAscii(columnName).uppercase()} ${convertToAscii(tableName).lowercase()}$rowIndex)"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to explain why this needs to be converted to ASCII

@@ -131,7 +132,7 @@ class SmtLibGenerator(private val schema: DbInfoDto, private val numberOfRows: I
* @return The corresponding SMT node.
*/
private fun parseCheckExpression(table: TableDto, condition: SqlCondition, index: Int): SMTNode {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not for this PR, but something to discuss as well with @jgaleotti , why is most of the code here relaying on TableDto and not org.evomaster.core.sql.schema.Table. DTOs are only meant as simple POJOs for transfer of data between processes, and not for business logic computation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a draft PR with that change, I'm not sure if the interface for fun solve(schemaDto: DbInfoDto, sqlQuery: String, numberOfRows: Int): List<SqlAction> should remain the same or we should change it too.

private fun appendTimestampConstraints(smt: SMTLib) {
for (table in schema.tables) {
val tableName = table.id.name.lowercase()
val smtTableName = convertToAscii(table.id.name).lowercase()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are so many calls to this convertToAscii, which makes it more difficult to read, and possibly easy to make mistake in future if code needs changing and missing a needed call. maybe create a new SmtTable class with that that you need, and pass that around the code instead of doing the conversions each time?

fun convertToAscii(name: String): String {
val replaced = name
.replace('Ø', 'O').replace('ø', 'o')
.replace("Æ", "AE").replace("æ", "ae")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about Norwegian å and Å? or Swedish characters like Ä, ä, Ö and ö ? or other special characters in all other languages?
we cannot have ad-hoc solutions like this, only specific to our test data. Need a general solution.

@agusaldasoro agusaldasoro requested a review from arcuri82 April 7, 2026 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants