A grammar module to respect grammar rules when objects have a genre

Discuss translation of Daggerfall Unity and the required Daggerfall installation. Help other communities learn how to translate Daggerfall using any available tools and processes.
Post Reply
User avatar
Daneel53
Posts: 107
Joined: Wed Jul 15, 2020 11:15 pm
Contact:

A grammar module to respect grammar rules when objects have a genre

Post by Daneel53 »

Following a question on the discord Lysandus' Tomb, the question is: how is it possible to force Daggerfall to adapt texts to the genre of objects for languages where objects have a genre? English is a language with a very simple grammar: objects have no genre, adjectives don't need to be modified depending the genre nor the number, so English people that never learned another language have no idea of what I'm talking about. Let's take a very simple example.

In English, the sentence "the green %s" will be always valid, no matter by what %s will be replaced: a table, gauntlets... it always works fine.
Now if I take my language, French, and if I translate this string by "le %s vert", this is valid only if %s is an object of genre masculine and singular: "le chapeau vert", it's OK. But if the object is of genre feminine, it should be written "la table verte", the sentence "le table vert" is incorrect with two errors. And if %s is feminine and plural, "le bottes vert" should be written "les bottes vertes". Other languages as German or Polish are even worse because they have more than two genres and a more complex grammar.

Due to the simple English grammar, all games that are designed in English and that use variable expressions as %s where the word that will replace %s is unknown during the translation phase have no problem because there are few sentences where the genre may cause grammar issue. Only with humans and dual words as "his/her" or "king/queen", and for these few issues they invent a trick to choose one or the other word depending the genre of the concerned human. But they design nothing to be able to take into account the fact that objects have a genre in other languages. And so we, translators, are naked before this problem.

The only game I know where developers decided to implement ways to take this in account, probably because they are not native English speakers but Turkish, is Mount&Blade: Warband. They have implemented special expressions that permit to change the result depending the genre, at least for humans. For example, the expression {xxx/yyy} means "write xxx if the main character is male and yyy if she is a woman". So, a sentence in English as "Hello, you are beautiful today" said to the main character may be translated by "Bonjour, vous êtes {beau/belle} aujourd'hui", and the text parser of the game will write "vous êtes beau" or "vous êtes belle" depending the genre of the main character. This expression is the more simple one, TaleWorlds implemented in Warband other ones that permit to do things like that in other grammar cases, but not enough.
Their masterpiece in terms of respect to grammar is into M&B II:Bannerlord. In this game, they develop the concept of so-called "tokens", expressions written as {.xxx}, that are processed by a routine called ProcessToken. There are as much ProcessToken classes as languages implemented in their game, and each language has its own set of tokens and code to process them. The interest of the way it is done is that all the tokens are to be written into the translated files only and that the routine ProcessToken may be externalized in a dll for languages that are not natively translated by Bannerlord through the use of the library Harmony.

Before TaleWorlds launched officially the French language into release e1.70, the class FrenchTextProcessor that contained a void routine ProcessToken was already existing, so French translators as me that translated Bannerlord since its very first early access release created their own ProcessToken routine with tokens we invented. This lead to a dll called GrammaireFR that was able to implement French grammar rules needed to generate correct sentences into the French translation. And because this library, from which I am the last author, rely on different tokens that those used by TaleWorlds and has a totally original code, it may be used for other games. In other words, I have in hand the code of a French grammar library that could be used for DFU and any other game that would accept to call it.

The principle would be, in the game, to read a translated string from translation files, to store it into a string named for example StringToDisplay and to add a call ProcessToken(StringToDisplay) before to display it. The Grammar library would interpret the grammar tokens and send back into StringToDisplay a new string where the grammar tokens would have been replaced by a correct text.

To be more clear, let's take an example.

English string: "I want the green %s."
Direct French translation: "Je veux le %s vert." -> very bad if %s is feminine and/or plural: display of "Je veux le bottes vert." instead of "Je veux les bottes vertes."
French with tokens in translation files: Je veux {.le}%s vert{.es}
French processed by GrammaireFR: "Je veux le chapeau vert." ou "Je veux les bottes vertes.", OK!

The interest is that you don't have to implement in the game any specific code dedicated to the grammar rules of any language other than English, you just have to add one call to ProcessToken() with the string read into the translation file, and the grammar library will send back the correct string to be displayed. And if you do that call into the game routine dedicated to display string, you just have one line of code to add in the game and then you let the grammar library do the job before to display.

GrammaireFR.dll as it is today must be modified so that it becomes less dependent of Bannerlord content, but to transform it in a routine that may be used by several different games is something feasible without too much effort. In fact, when I developed GrammaireFR for Bannerlord, I was already thinking to a module that could be used by several games, but for that, devs for games that read translation files must accept to add one call into their game before displaying a translated string with grammar tokens. DFU may be the first.

You may watch a small video that shows the result of GrammaireFR upon the above example with that link:
https://drive.google.com/file/d/1PmZ6yu ... drive_link

I will provide later a small document that list all the tokens that exist into GrammaireFR and what they do.

No need to say that it could be possible to code grammar libraries for other languages, with their own tokens and rules, as it is done into Bannerlord that will stay a very good example for that. My code shall be public and translators in other languages may try to start from it to buid their own grammar module.

If you're interested, tell me what you think. :)
In charge of Project French Daggerfall and DaggerfallSetup, dev. of DFTools in English.
French translator for many Warband mods and Bannerlord.

User avatar
pango
Posts: 3359
Joined: Wed Jul 18, 2018 6:14 pm
Location: France
Contact:

Re: A grammar module to respect grammar rules when objects have a genre

Post by pango »

Hi Daneel53,
We already discussed this topic elsewhere, so I'm glad you brought it here, fascinating topic!
Of course, as a developer, it raises a lot of questions...

How can the library know the gender of %s, and whether it's plural? Genders (and numbers) are language-specific (for things, at least), so it cannot be injected by the game, so I assume the library has to contain some dictionary to infer that, that need to handle all the nouns the game can throw at you.
And then, if they're more than one word parameter, how can you tell with what word each token should agree with? Are there proximity heuristics, or a syntax to explicitly tell that (say {s1.es} to adjust with the gender of s1, etc.)? What if you have to agree with several nouns (s1=pantalon s2=veste "Un{.e}{s1}et un[.e} {s2} vert{s1,s2.e}")? Etc.
I'd be really curious to see what the API looks like.

On a related topic, I recently read in the AMA thread created by Julian Lefay on Reddit that this is a pet peeve topic of him, that he learned several languages in different countries, and that he worked on a Daggerfall translation subsystem that were sadly never released (here). Ah!
Mastodon: @pango@fosstodon.org
When a measure becomes a target, it ceases to be a good measure.
-- Charles Goodhart

User avatar
Daneel53
Posts: 107
Joined: Wed Jul 15, 2020 11:15 pm
Contact:

Re: A grammar module to respect grammar rules when objects have a genre

Post by Daneel53 »

Hello pango,

Here is a link to take a document that explains what can do GrammaireFR, what it cannot do and what it could do in the future :
https://bit.ly/3VRdAt2

This answer to some of your questions, but not all. This conversation is to be continued... :)

Note: Sorry for non French speaking people, this file mainly comes from the Readme file of GrammaireFR.dll v2.0 as it was released three years ago. It was done for French translators, it talks about French grammar, so it is written in French. If I have time, I will try to realize an English translation of the document, but since then, DeepL is your friend. ;)
In charge of Project French Daggerfall and DaggerfallSetup, dev. of DFTools in English.
French translator for many Warband mods and Bannerlord.

User avatar
Daneel53
Posts: 107
Joined: Wed Jul 15, 2020 11:15 pm
Contact:

Re: A grammar module to respect grammar rules when objects have a genre

Post by Daneel53 »

I realized that GrammaireFR, as it was developed three years ago, is much more linked to Bannerlord that I tought. Obviouly I will have to change the logic of how it is done and I will have to create additional tokens.

I am currently rewriting it so that it becomes a French grammar library that can be easily used by any game and be usable with few modification by already existing games as DFU. During the process, I will switch the comments in code in English so that the code may be modified by other to build their own language grammar library if there is other fools ready for that.

So even if I don't give regular news here, I'm working on it. I should have something in hand in two or three weeks.

See you later, alligator... ;)
In charge of Project French Daggerfall and DaggerfallSetup, dev. of DFTools in English.
French translator for many Warband mods and Bannerlord.

User avatar
pango
Posts: 3359
Joined: Wed Jul 18, 2018 6:14 pm
Location: France
Contact:

Re: A grammar module to respect grammar rules when objects have a genre

Post by pango »

I was still digesting the documentation, but one interesting point I read was that at times the library could need hints from the game to do a perfect job; Like getting hints about whether some character is male or female, something that cannot always be correctly guessed from the name alone.
Given we have control over the game engine, that's certainly something that could be done. Since the list of hints localization libraries may need is not closed, I was thinking about sending some kind of "context" dictionary, something similar to a JSON object (even if the type is not JSON-related):

Code: Select all

{"Bedastyr Woodhouse":{"gender":"male", "count":1},"Vannayne Mooring":{"gender":"female","count":1}}
Mastodon: @pango@fosstodon.org
When a measure becomes a target, it ceases to be a good measure.
-- Charles Goodhart

Post Reply