Skip to content

Add support for special characters#1479

Open
agusaldasoro wants to merge 5 commits intomasterfrom
fix/more-types
Open

Add support for special characters#1479
agusaldasoro wants to merge 5 commits intomasterfrom
fix/more-types

Conversation

@agusaldasoro
Copy link
Collaborator

@agusaldasoro agusaldasoro commented Mar 19, 2026

This pull request focuses on improving the handling of database table and column names containing non-ASCII characters (such as Norwegian letters Æ, Ø, Å) in the SMT-LIB generation and parsing logic. The main change is the introduction and consistent application of a sanitization function to ensure all identifiers are valid SMT-LIB symbols. There are also minor improvements to type mapping and debugging output.

Sanitization and identifier handling:

  • Added a new sanitizeSmtIdentifier function in SmtLibGenerator to convert non-ASCII table and column names into valid SMT-LIB identifiers, including explicit replacements for Norwegian characters and normalization for accented letters. This function is now used throughout SMT-LIB generation and parsing. (Fba25d2eL539R539)
  • Updated all relevant code in SmtLibGenerator and SMTConditionVisitor to use sanitized identifiers for table and column names in SMT-LIB datatypes, constants, constraints, and queries. This ensures consistent and correct SMT-LIB output and parsing for schemas with non-ASCII names. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

SMT-LIB parsing and gene mapping:

  • When parsing Z3 output, updated logic to correctly map sanitized SMT-LIB column names back to the original database column names, ensuring genes are created with the correct identifiers and types.

Type mapping improvements:

  • Extended the SMT-LIB type mapping to support additional SQL types (INT, SERIAL, SMALLSERIAL, BIGSERIAL). (Fba25d2eL539R539)

Debugging and output:

  • Added a debug println to output the generated SMT-LIB string before invoking Z3, aiding in troubleshooting.

@agusaldasoro agusaldasoro marked this pull request as ready for review March 19, 2026 21:04
@arcuri82 arcuri82 removed their request for review March 19, 2026 21:08
@arcuri82
Copy link
Collaborator

hi @agusaldasoro ,

thx. first ask @jgaleotti for a review. once he has given it, and you address it, then I can be asked for my review :)

* Characters that decompose under NFD (Å→A, and other accented letters like é, ü, ñ)
* are handled by normalizing to NFD form and stripping the remaining non-ASCII combining marks.
*/
fun sanitizeSmtIdentifier(name: String): String {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this belongs to org.evomaster.core.utils.StringUtils, perhapss a more suitable method name could be transliterateIdentifier()

val genes = mutableListOf<Gene>()
for (columnName in columns.fields) {
var gene: Gene = IntegerGene(columnName, 0)
// columnName is the sanitized (ASCII) version from Z3; resolve back to original DB column name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the right term for this should be sanitized

// Only add GetValueSMTNode for the mentioned tables
for (table in schema.tables) {
val tableNameLower = table.id.name.lowercase()
val sanitizedTableNameLower = sanitizeSmtIdentifier(tableNameLower)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the right term should be "sanitized"

val originalColumn = table.columns.firstOrNull {
SmtLibGenerator.sanitizeSmtIdentifier(it.name).equals(columnName, ignoreCase = true)
}
val actualColumnName = originalColumn?.name ?: columnName
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actualColumnName -> columnName?

@@ -153,36 +153,41 @@ class SMTLibZ3DbConstraintSolver() : DbConstraintSolver {
// Create the list of genes with the values
val genes = mutableListOf<Gene>()
for (columnName in columns.fields) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columnName -> smtColumnName?

private fun appendTableDefinitions(smt: SMTLib) {
for (table in schema.tables) {
val dataTypeName = "${StringUtils.capitalization(table.id.name)}Row"
val sanitizedTableName = convertToAscii(table.id.name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizedTableName -> smtTableName?

*/
private fun appendUniqueConstraints(smt: SMTLib, table: TableDto) {
val tableName = table.id.name.lowercase()
val tableName = convertToAscii(table.id.name).lowercase()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableName -> smtTableName?

*/
private fun appendPrimaryKeyConstraints(smt: SMTLib, table: TableDto) {
val tableName = table.id.name.lowercase()
val tableName = convertToAscii(table.id.name).lowercase()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableName -> smtTableName ?

@jgaleotti jgaleotti requested a review from arcuri82 March 19, 2026 22:22
*/
private fun getColumnReference(tableName: String, columnName: String): String {
return "(${columnName.uppercase()} ${tableName.lowercase()}$rowIndex)"
return "(${convertToAscii(columnName).uppercase()} ${convertToAscii(tableName).lowercase()}$rowIndex)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment to explain why this needs to be converted to ASCII

@@ -131,7 +132,7 @@ class SmtLibGenerator(private val schema: DbInfoDto, private val numberOfRows: I
* @return The corresponding SMT node.
*/
private fun parseCheckExpression(table: TableDto, condition: SqlCondition, index: Int): SMTNode {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not for this PR, but something to discuss as well with @jgaleotti , why is most of the code here relaying on TableDto and not org.evomaster.core.sql.schema.Table. DTOs are only meant as simple POJOs for transfer of data between processes, and not for business logic computation

private fun appendTimestampConstraints(smt: SMTLib) {
for (table in schema.tables) {
val tableName = table.id.name.lowercase()
val smtTableName = convertToAscii(table.id.name).lowercase()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are so many calls to this convertToAscii, which makes it more difficult to read, and possibly easy to make mistake in future if code needs changing and missing a needed call. maybe create a new SmtTable class with that that you need, and pass that around the code instead of doing the conversions each time?

fun convertToAscii(name: String): String {
val replaced = name
.replace('Ø', 'O').replace('ø', 'o')
.replace("Æ", "AE").replace("æ", "ae")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about Norwegian å and Å? or Swedish characters like Ä, ä, Ö and ö ? or other special characters in all other languages?
we cannot have ad-hoc solutions like this, only specific to our test data. Need a general solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants