A grammar module to respect grammar rules when objects have a genre
Posted: Tue Apr 09, 2024 11:10 pm
Following a question on the discord Lysandus' Tomb, the question is: how is it possible to force Daggerfall to adapt texts to the genre of objects for languages where objects have a genre? English is a language with a very simple grammar: objects have no genre, adjectives don't need to be modified depending the genre nor the number, so English people that never learned another language have no idea of what I'm talking about. Let's take a very simple example.
In English, the sentence "the green %s" will be always valid, no matter by what %s will be replaced: a table, gauntlets... it always works fine.
Now if I take my language, French, and if I translate this string by "le %s vert", this is valid only if %s is an object of genre masculine and singular: "le chapeau vert", it's OK. But if the object is of genre feminine, it should be written "la table verte", the sentence "le table vert" is incorrect with two errors. And if %s is feminine and plural, "le bottes vert" should be written "les bottes vertes". Other languages as German or Polish are even worse because they have more than two genres and a more complex grammar.
Due to the simple English grammar, all games that are designed in English and that use variable expressions as %s where the word that will replace %s is unknown during the translation phase have no problem because there are few sentences where the genre may cause grammar issue. Only with humans and dual words as "his/her" or "king/queen", and for these few issues they invent a trick to choose one or the other word depending the genre of the concerned human. But they design nothing to be able to take into account the fact that objects have a genre in other languages. And so we, translators, are naked before this problem.
The only game I know where developers decided to implement ways to take this in account, probably because they are not native English speakers but Turkish, is Mount&Blade: Warband. They have implemented special expressions that permit to change the result depending the genre, at least for humans. For example, the expression {xxx/yyy} means "write xxx if the main character is male and yyy if she is a woman". So, a sentence in English as "Hello, you are beautiful today" said to the main character may be translated by "Bonjour, vous êtes {beau/belle} aujourd'hui", and the text parser of the game will write "vous êtes beau" or "vous êtes belle" depending the genre of the main character. This expression is the more simple one, TaleWorlds implemented in Warband other ones that permit to do things like that in other grammar cases, but not enough.
Their masterpiece in terms of respect to grammar is into M&B II:Bannerlord. In this game, they develop the concept of so-called "tokens", expressions written as {.xxx}, that are processed by a routine called ProcessToken. There are as much ProcessToken classes as languages implemented in their game, and each language has its own set of tokens and code to process them. The interest of the way it is done is that all the tokens are to be written into the translated files only and that the routine ProcessToken may be externalized in a dll for languages that are not natively translated by Bannerlord through the use of the library Harmony.
Before TaleWorlds launched officially the French language into release e1.70, the class FrenchTextProcessor that contained a void routine ProcessToken was already existing, so French translators as me that translated Bannerlord since its very first early access release created their own ProcessToken routine with tokens we invented. This lead to a dll called GrammaireFR that was able to implement French grammar rules needed to generate correct sentences into the French translation. And because this library, from which I am the last author, rely on different tokens that those used by TaleWorlds and has a totally original code, it may be used for other games. In other words, I have in hand the code of a French grammar library that could be used for DFU and any other game that would accept to call it.
The principle would be, in the game, to read a translated string from translation files, to store it into a string named for example StringToDisplay and to add a call ProcessToken(StringToDisplay) before to display it. The Grammar library would interpret the grammar tokens and send back into StringToDisplay a new string where the grammar tokens would have been replaced by a correct text.
To be more clear, let's take an example.
English string: "I want the green %s."
Direct French translation: "Je veux le %s vert." -> very bad if %s is feminine and/or plural: display of "Je veux le bottes vert." instead of "Je veux les bottes vertes."
French with tokens in translation files: Je veux {.le}%s vert{.es}
French processed by GrammaireFR: "Je veux le chapeau vert." ou "Je veux les bottes vertes.", OK!
The interest is that you don't have to implement in the game any specific code dedicated to the grammar rules of any language other than English, you just have to add one call to ProcessToken() with the string read into the translation file, and the grammar library will send back the correct string to be displayed. And if you do that call into the game routine dedicated to display string, you just have one line of code to add in the game and then you let the grammar library do the job before to display.
GrammaireFR.dll as it is today must be modified so that it becomes less dependent of Bannerlord content, but to transform it in a routine that may be used by several different games is something feasible without too much effort. In fact, when I developed GrammaireFR for Bannerlord, I was already thinking to a module that could be used by several games, but for that, devs for games that read translation files must accept to add one call into their game before displaying a translated string with grammar tokens. DFU may be the first.
You may watch a small video that shows the result of GrammaireFR upon the above example with that link:
https://drive.google.com/file/d/1PmZ6yu ... drive_link
I will provide later a small document that list all the tokens that exist into GrammaireFR and what they do.
No need to say that it could be possible to code grammar libraries for other languages, with their own tokens and rules, as it is done into Bannerlord that will stay a very good example for that. My code shall be public and translators in other languages may try to start from it to buid their own grammar module.
If you're interested, tell me what you think.
In English, the sentence "the green %s" will be always valid, no matter by what %s will be replaced: a table, gauntlets... it always works fine.
Now if I take my language, French, and if I translate this string by "le %s vert", this is valid only if %s is an object of genre masculine and singular: "le chapeau vert", it's OK. But if the object is of genre feminine, it should be written "la table verte", the sentence "le table vert" is incorrect with two errors. And if %s is feminine and plural, "le bottes vert" should be written "les bottes vertes". Other languages as German or Polish are even worse because they have more than two genres and a more complex grammar.
Due to the simple English grammar, all games that are designed in English and that use variable expressions as %s where the word that will replace %s is unknown during the translation phase have no problem because there are few sentences where the genre may cause grammar issue. Only with humans and dual words as "his/her" or "king/queen", and for these few issues they invent a trick to choose one or the other word depending the genre of the concerned human. But they design nothing to be able to take into account the fact that objects have a genre in other languages. And so we, translators, are naked before this problem.
The only game I know where developers decided to implement ways to take this in account, probably because they are not native English speakers but Turkish, is Mount&Blade: Warband. They have implemented special expressions that permit to change the result depending the genre, at least for humans. For example, the expression {xxx/yyy} means "write xxx if the main character is male and yyy if she is a woman". So, a sentence in English as "Hello, you are beautiful today" said to the main character may be translated by "Bonjour, vous êtes {beau/belle} aujourd'hui", and the text parser of the game will write "vous êtes beau" or "vous êtes belle" depending the genre of the main character. This expression is the more simple one, TaleWorlds implemented in Warband other ones that permit to do things like that in other grammar cases, but not enough.
Their masterpiece in terms of respect to grammar is into M&B II:Bannerlord. In this game, they develop the concept of so-called "tokens", expressions written as {.xxx}, that are processed by a routine called ProcessToken. There are as much ProcessToken classes as languages implemented in their game, and each language has its own set of tokens and code to process them. The interest of the way it is done is that all the tokens are to be written into the translated files only and that the routine ProcessToken may be externalized in a dll for languages that are not natively translated by Bannerlord through the use of the library Harmony.
Before TaleWorlds launched officially the French language into release e1.70, the class FrenchTextProcessor that contained a void routine ProcessToken was already existing, so French translators as me that translated Bannerlord since its very first early access release created their own ProcessToken routine with tokens we invented. This lead to a dll called GrammaireFR that was able to implement French grammar rules needed to generate correct sentences into the French translation. And because this library, from which I am the last author, rely on different tokens that those used by TaleWorlds and has a totally original code, it may be used for other games. In other words, I have in hand the code of a French grammar library that could be used for DFU and any other game that would accept to call it.
The principle would be, in the game, to read a translated string from translation files, to store it into a string named for example StringToDisplay and to add a call ProcessToken(StringToDisplay) before to display it. The Grammar library would interpret the grammar tokens and send back into StringToDisplay a new string where the grammar tokens would have been replaced by a correct text.
To be more clear, let's take an example.
English string: "I want the green %s."
Direct French translation: "Je veux le %s vert." -> very bad if %s is feminine and/or plural: display of "Je veux le bottes vert." instead of "Je veux les bottes vertes."
French with tokens in translation files: Je veux {.le}%s vert{.es}
French processed by GrammaireFR: "Je veux le chapeau vert." ou "Je veux les bottes vertes.", OK!
The interest is that you don't have to implement in the game any specific code dedicated to the grammar rules of any language other than English, you just have to add one call to ProcessToken() with the string read into the translation file, and the grammar library will send back the correct string to be displayed. And if you do that call into the game routine dedicated to display string, you just have one line of code to add in the game and then you let the grammar library do the job before to display.
GrammaireFR.dll as it is today must be modified so that it becomes less dependent of Bannerlord content, but to transform it in a routine that may be used by several different games is something feasible without too much effort. In fact, when I developed GrammaireFR for Bannerlord, I was already thinking to a module that could be used by several games, but for that, devs for games that read translation files must accept to add one call into their game before displaying a translated string with grammar tokens. DFU may be the first.
You may watch a small video that shows the result of GrammaireFR upon the above example with that link:
https://drive.google.com/file/d/1PmZ6yu ... drive_link
I will provide later a small document that list all the tokens that exist into GrammaireFR and what they do.
No need to say that it could be possible to code grammar libraries for other languages, with their own tokens and rules, as it is done into Bannerlord that will stay a very good example for that. My code shall be public and translators in other languages may try to start from it to buid their own grammar module.
If you're interested, tell me what you think.