Sicilian Learner's Dictionary -- Development Proposal

fissatu · 12 November 2017

Karl Farrugia said: ↑

May I suggest having a look at the Norwegian official dictionary online for inspiration on that can be done?

http://ordbok.uib.no/perl/ordbok.cgi?OPP=leie&ant_bokmaal=5&ant_nynorsk=5&nynorsk=+&ordbok=begge

If you take a look at this word, for example, and you go down to the verb and click on verb or v1, it opens a table with the full conjugation, which also shows the different forms of the verb in the infinitive and the different conjugated forms that are allowed, taken from different dialects. For each different meaning or use, there are a number of example phrases, especially useful for a learner to see what prepositions may be used after the verb, for example.

I understand that the situation in Sicilian is very different, however it might give you some useful ideas that may be borrowed. For the benefit of those who may not be familiar with the linguistic situation in Norway, there are 2 written standards and one of them, Nynork, was devised in the 19th century with the aim of providing a written form that is closer to the dialectal varieties in the country, as opposed to the Danish influence seen in the other standard, Bokmål. There is no such thing as a standard spoke Norwegian, and everyone just uses their dialect in all situations. This is why you can have a situation like the example I gave where one verb may have 3 (in some cases even 6) different conjugation patterns and 2 (in some cases 4) different infinitive forms, and all of them are considered correct.
Click to expand...

Good post Karl, I am familiar with the language situation in Norway, and personally, I think there are good lessons there for us too.

Tim · 13 November 2017

I'm also quite fascinated by some of the attempts to unify the Gallo-Italic languages, particularly Michael Dallera's Lombard Restruiturad Stàndard (LoReS)... he's in the Facebook group but I'm not sure if he's here in the forum. I don't want to speak for his work and risk being incorrect on something, but it seems really cool.

paul · 13 November 2017

Tim said: ↑

I'm also quite fascinated by some of the attempts to unify the Gallo-Italic languages, particularly Michael Dallera's Lombard Restruiturad Stàndard (LoReS)... he's in the Facebook group but I'm not sure if he's here in the forum. I don't want to speak for his work and risk being incorrect on something, but it seems really cool.
Click to expand...

We have a subforum for Gallo-Sicilian, if you're a speaker you're welcome to open conversations there and introduce the relevant people to that area.

Eryk · 16 November 2017

Karl Farrugia said: ↑

May I suggest having a look at the Norwegian official dictionary online for inspiration on that can be done?
Click to expand...

You certainly may! That's an excellent example.

Karl Farrugia said: ↑

http://ordbok.uib.no/perl/ordbok.cgi
Click to expand...

Hmm ... They used Perl for their project. That's what I'm using. I wonder if we could borrow some of their libraries? ... According to their about page:

The Unit for Digital Documentation (EDD) at the University of Oslo designed the database format and adapted the dictionaries for web publication (from 1994). This responsibility now rests with the IT-department at the University of Bergen.
Click to expand...

I clicked through the EDD's "Tjenester og verktøy", but I could not find any information about the database. But I did find a link to some more EDD dictionaries.

Cool stuff! Thank you for sharing this.

Eryk · 17 November 2017

Eryk said: ↑

Are all Sicilian verbs stem-changing verbs?
Click to expand...

Yes. They are.

Here's how Dr. Cipolla defines the rule (Mparamu, p. 64):

Gaetano Cipolla said:

If the stem vowel is i, it changes to e as in: aspittari-aspettu,
If the stem vowel is u, it changes to o as in: allungari-allongu,
If the stem vowel is e as in sèntiri, it changes to i: sintèmu, sintìti,
If the stem vowel is o as in mòriri, it changes to u: murèmu-murìti.
Click to expand...

I would define the rule differently. I would say:

all Sicilian verbs have an unstressed "stem" and a stressed "boot."

the infinitive reflects either the stem or the boot.

stem + ari

stem + iri (sc)

boot + iri

the stem appears in all of the conjugations except:

the present indicative -- 1st S., 2nd S., 3rd S., 3rd P.

the imperative -- 2nd S.,

the boot replaces the stem in those locations

finiri -- stem: fin, boot: finìsc

finisciu __ finemu
finisci __ finiti
finisci __ finìscinu

sèntiri -- stem: sint, boot: sènt

sentu __ sintemu
senti __ sintiti
senti __ sèntinu

aspittari -- stem: aspitt, boot: aspètt

aspettu __ aspittamu
aspetti __ aspittati
aspetti __ aspèttanu

mòriri -- stem: mur, boot: mòr

moru __ muremu
mori __ muriti
mori __ mòrinu

allungari -- stem: allung, boot: allòng

allongu __ allungamu
allonghi __ allungati
allonga __ allònganu

parrari -- stem: parr, boot: pàrr

parru __ parramu
parri __ parrati
parra __ pàrranu

rispùnniri -- stem: rispunn, boot: rispùnn

rispunnu __ rispunnemu
rispunni __ rispunniti
rispunni __ rispùnninu

crìdiri -- stem: crid, boot: crìd

cridu __ cridemu
cridi __ criditi
cridi __ crìdinu

Eryk · 17 November 2017

Eryk said: ↑

all Sicilian verbs have an unstressed "stem" and a stressed "boot."

the infinitive reflects either the stem or the boot.

stem + ari

stem + iri (sc)

boot + iri

the stem appears in all of the conjugations except:

the present indicative -- 1st S., 2nd S., 3rd S., 3rd P.

the imperative -- 2nd S.,

the boot replaces the stem in those locations

Click to expand...

I just rewrote my scripts to implement that rule and to implement Fissatu's corrections. It should be a big improvement. Attached is a ZIP file containing an XLSX spreadsheet and PDF printout.

There are 32 conjugated verbs -- allungari, arrispùnniri, aspittari, aviri, capiri, crìdiri, dari, diri, èssiri, fari, finiri, jiri, manciari, mèttiri, mòriri, ntènniri, pàriri, parrari, pèrdiri, pòniri, purtari, putiri, ripètiri, rispùnniri, sapiri, sèntiri, stari, studiari, tèniri, vèniri, vìdiri, vuliri

Does anyone have fresh eyes for me? Thanks in advance!

fissatu · 17 November 2017

Eryk said: ↑

I just rewrote my scripts to implement that rule and to implement Fissatu's corrections. It should be a big improvement. Attached is a ZIP file containing an XLSX spreadsheet and PDF printout.

There are 32 conjugated verbs -- allungari, arrispùnniri, aspittari, aviri, capiri, crìdiri, dari, diri, èssiri, fari, finiri, jiri, manciari, mèttiri, mòriri, ntènniri, pàriri, parrari, pèrdiri, pòniri, purtari, putiri, ripètiri, rispùnniri, sapiri, sèntiri, stari, studiari, tèniri, vèniri, vìdiri, vuliri

Does anyone have fresh eyes for me? Thanks in advance!
Click to expand...

Hi Eryk
just quickly, I think it's pariri (normal stress)

Eryk · 17 November 2017

fissatu said: ↑

Hi Eryk
just quickly, I think it's pariri (normal stress)
Click to expand...

Hi Fissatu, Thanks for the quick response. Bonner and Cipolla both put the stress on the "a" -- pàriri.

Importantly, if the stress fell on the penultimate, then it would be an exception to the rule laid out above ... which might require me to rethink the rule. So if you notice any more cases like that, please tell me. Thanks!

fissatu · 18 November 2017

I was just double checking Piccitto, it states that one locality in Enna pronounces pàriri, otherwise it's pariri.

Camilleri shows both parìri and pàriri - so I guess both must be acceptable.

Eryk · 18 November 2017

fissatu said: ↑

I was just double checking Piccitto, it states that one locality in Enna pronounces pàriri, otherwise it's pariri.

Camilleri shows both parìri and pàriri - so I guess both must be acceptable.
Click to expand...

Good work!

Also note that Dieli lists both mòriri and muriri. So we now have two examples of a caveat to the "boot + iri" part of the rule.

Good eye! Thanks!

Eryk · 19 November 2017

This project is going to fail and that's a good thing. It's a good thing because the result is going to be better.

Dr. Dieli compiled a great vocabulary list, but to take his work forward, we need to completely rewrite it. The result is going to be a better dictionary. And it's going to be a better dictionary because the community will rewrite it.

Specifically, Dr. Dieli's list needs a deeper level of definition. Part of speech and English/Italian translation is a great start. Now we need to add detail: usage notes, preferred forms, examples, verb conjugations, etc. Adding detail requires a new dictionary.

For example: "vìviri" and "vìviri." Both verbs have the same infinitive, but they're two different verbs ("Iu vissi" and "Iu vippi"). Dr. Dieli's dictionary does not contain enough detail to handle this situation.

So we either need to make an endless series of small edits to Dr. Dieli's work or we need a complete rewrite. Bite the bullet and rewrite it. At the very least, a complete rewrite is probably the best way to implement the new orthographic standard. And at best, a complete rewrite gives the community an opportunity to become involved.

The only question is: "How to rewrite it?" I propose a flow "dû Sicilianu Spirimentali ô Sicilianu Cadèmicu."

Let's use these Perl hashes to create a bunch of (experimental) spreadsheets -- a sheet of verbs, a sheet of nouns, a sheet of adjectives, etc. Then let's ask the community to correct the spreadsheets. Then we can load the approved version ntô Dizziunariu Ufficiali dâ Cadèmia Siciliana.

For the official dictionary, I'm going to recommend SQL -- not my Perl hashes. Perl is great for running experiments, but the community needs to store its work in a proper database.

And then the community will have a dictionary that truly belongs to them.

paul · 20 November 2017

Eryk said: ↑

This project is going to fail and that's a good thing. It's a good thing because the result is going to be better.
Click to expand...

Well put @Eryk , I think perhaps what we should focus then on is:

1) How to best facilitate information collection/compilation (spreadsheets is neat idea)
2) How to best present this information in a flexible interface that deals with the specific issues of Sicilian
3) How to manage this project.

We made an official GitHub for the Cadèmia, maybe we can start working out of there? github.com/cademia

dapal · 20 November 2017

I'm very happy that @Eryk has come up with this decision!
I agree SQL is the way to go.

As per the correcting approach: we cou'd show a random word/verb/... on page load, and let it be checked (✔/❌ and correction proposal) by users (allow anonymous users?). If we allow anonymous users, we should think of a Commission checking whether these crowdsourced entries are OK. We could also do a mix: autoapprove authenticated users (we should decide whom to give this power to), and let anonymous post suggestions.

I could make a Python web frontend for this kind of thing, if needed

Eryk · 20 November 2017

Thank you for the encouragement, @Paul and @dapal.

I wonder if we can "grow a seed into a flower." The "seed" is the information already available to us, in this case: Dr. Dieli's dictionary. The "flower" is an SQL dictionary database with Python frontend.

To "grow" the seed into the flower, we need a set of functions. The argument to the functions will be the already available information plus the information that we add. The values of the functions will be the spreadsheets that we will examine, correct and load into the SQL database.

The functions that I have in mind are no different from the ones you learned in math class. Just with words instead of numbers.

For example, suppose that f is a function of the variables: x and y.

f(x,y) = x^2 + y^2

When x=2 and y=3, the value of the function is 13:

f(x=2,y=3) = 2^2 + 3^2
f(x=2,y=3) = 13

Now, suppose that conjugate is a function of the variables: stem, boot, conjugation, tense and person. One value that it might return is "finìscinu":

conjugate( stem="fin", boot="finìsc", conjugation="iri", tense="present", person="3rd plural" ) = finìscinu

With that in mind, let's consider what information is available to us and what information we want to add. From Dr. Dieli's dictionary, we already have:

the word itself -- finiri

the part of speech -- verb

English translations -- to finish, to end

Italian translations -- finire, smettere

Based on the spelling, we can infer that the verb's conjugation is "iri." The information that we must add is that the boot is "finìsc".

And there's lots of information that we may want to add: example usage, regional variations, preferred forms, synonyms, antonyms, .... I think it would be really cool if we provided a Sicilian language definition for each word.

So our first task is to specify exactly what information we want to collect. For example:

the word itself

part of speech

etymology, regional variations, preferred forms

Sicilian language definition

synonyms, antonyms

English/Italian translation

examples of usage

other notes

For verbs, we also need:

stem, boot

conjugation

irregular forms

For nouns and adjectives, we also need:

irregular singular and plural forms

Let's begin making that list. Once we have a formal list, this project will take a life of its own.

paul said: ↑

We made an official GitHub for the Cadèmia, maybe we can start working out of there? github.com/cademia
Click to expand...

That would be awesome. I will organize my work and write a README for it.

paul · 21 November 2017

@Eryk , I notice you mention Dr. Dieli's dictionary, have you considered the Wiktionary? I personally consider it a better source, even if it's less organised it has many more words and many variants as well.

Eryk · 21 November 2017

paul said: ↑

@Eryk , I notice you mention Dr. Dieli's dictionary, have you considered the Wiktionary? I personally consider it a better source, even if it's less organised it has many more words and many variants as well.
Click to expand...

It's a great source. To be exact, Sicilian Wiktionary has 21,841 words while Dr. Dieli has 12,060.

You can download the whole Sicilian Wiktionary from dumps.wikimedia.org/scnwiktionary. The one to focus on is the one marked: "Articles, templates, media/file descriptions, and primary meta-pages." Each page is rolled up into a gigantic XML file, from which we could extract a lot of information.

The difference is that Dr. Dieli's lists are simple HTML tables, so it's super easy to work with them. And frankly, I admire Dr. Dieli's work because he poured so much of his heart into developing the language.

Ultimately, we will collect information from a lot of different lists. Dr. Dieli's list is just a great place to start.

paul · 21 November 2017

Wow, I had no idea Dieli's was that large. Thank you, for clarifying that.

fissatu · 22 November 2017

Eryk's reasoning is sound, a lot of info is there, and I'd agree given Dr Dieli's knowledge, it would be quality data.

However, the extra info being sought, well, it could get tied down for years trying to fill all those gaps. Even if one person was allocated one item of data to do, say they were tasked with doing the etymology of 11,000 words, that alone could take one person years, if two people were doing it, you might get that down to a couple of years, etc.

This probably needs some sort of group decision as to how ambitious we want to be with this, there being a trade off between completeness ( a very good thing) and having something useful up and running as quickly as possible (also very useful).

Not an easy decision.

Eryk · 23 November 2017

If we write the dictionary to write itself, we could have a complete work in a relatively short amount of time.

Think of it like "mail merge." You put the addresses into a spreadsheet and your word processor prints out hundreds of letters. We do the same thing here, but with the added twist that we automatically collect the information too. The spreadsheet writes itself. The letters write themselves. And pretty soon you have enough mail to fill a whole post office.

In this specific case, we program our computer to collect information from Dieli's dictionary, from Wiktionary, etc. The computer then populates a spreadsheet with information that it collected. A human being then compares the information in the spreadsheet with their own knowledge of the language, with textbooks, etc. and makes corrections to the spreadsheet. We then load the corrected spreadsheet into an "official dictionary."

The important step is to rigorously define exactly what information we want to collect.

For example, below is information that I collected on four verbs. That little bit of information on diri, vistiri and vistirisi correctly produces whole conjugations. We only need a little bit of information because the information that we are collecting is so well-defined. (Èssiri requires more information because it is almost entirely irregular).

I just finished defining what information to collect on verbs. My next steps are to define what information to collect on other parts of speech and to define what information to collect on words in general.

Once we know exactly what information to collect, we will know exactly what assistance to ask for and this dictionary will grow rapidly.

In the meantime, one thing that I will ask for is examples. For example, in Salvatore's video about taliari, he gives the examples:

"Talìu i picciriddi ca jòcanu."

"Taliamu a partita ô stàdiu."

Those are excellent examples. They really help you understand how to use the verb taliari. Good examples like that will help people learn the language quickly.
Code:
%{ $vnotes{"diri"} } = (
    verb => {
    conj => "xxiri",
    stem => "dic",
    boot => "dìc",
    irrg => {
        inf => "diri",
        pai => { quad => "dìss" },
        pap => "dittu",
        adj => "dittu",
    },      
    },);
Code:
%{ $vnotes{"vistiri"} } = (
    verb => {
    conj => "xxiri",
    stem => "vist",
    boot => "vèst",
    irrg => {
        inf => "vistiri",
    },
    },);
%{ $vnotes{"vistirisi"} } = (
    reflex => "vistiri",
    );
Code:
%{ $vnotes{"èssiri"} } = (
    dieli => ["essiri"],
    verb => {
    conj => "xxiri",
    stem => "ess",
    boot => "èss",
    irrg => {
        pri => { us => "sugnu", ds => "sì", ts => "è", up => "semu", dp => "siti", tp => "sunnu"},
        pim => { ds => "sia", ts => "fussi", up => "semu", dp => "siti", tp => "fùssiru"},
        pai => { us => "fui", ds => "fusti", ts => "fu", up => "fomu", dp => "fùstivu", tp => "foru"},
        imi => { us => "era", ds => "eri", ts => "era", up => "eramu", dp => "eravu", tp => "eranu"},
        ims => { us => "fussi", ds => "fussi", ts => "fussi", up => "fùssimu", dp => "fùssivu", tp => "fùssiru"},
        fti => { stem => "sa" },
        coi => { stem => "sa" },
        pap => "statu",
        adj => "statu",
    },      
    },);

Eryk · 30 November 2017

Ancora travagghiu ô dizziunariu. Juncìi quarchi palori e criai quarchi "ricoti di palori." (Mi pari ca sunnu cchiù "ricoti" ca "cullizzioni").

Però lu me travagghiu cchiù importanti è chiddu ca nun si vidi: Juncìi un' àutra classi di verbi a li perl hashes. Ora ê travagghiari a li sostantivi, aggittivi, avverbi, ...

Comu si dici in sicilianu: "Just keep truckin' on" ??

Log in or Sign up

Sicilian Learner's Dictionary -- Development Proposal

fissatu Member Staff Member

Tim New Member

paul Member Staff Member Standardisation Committee

Eryk New Member Academic Member

Eryk New Member Academic Member

Eryk New Member Academic Member

Attached Files:

sicilian-conjugs_n3.zip

fissatu Member Staff Member

Eryk New Member Academic Member

fissatu Member Staff Member

Eryk New Member Academic Member

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

dapal New Member

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

fissatu Member Staff Member

Eryk New Member Academic Member

Eryk New Member Academic Member

Share This Page

Log in or Sign up

Sicilian Learner's Dictionary -- Development Proposal

fissatu Member Staff Member

Tim New Member

paul Member Staff Member Standardisation Committee

Eryk New Member Academic Member

Eryk New Member Academic Member

Eryk New Member Academic Member

Attached Files:

sicilian-conjugs_n3.zip

fissatu Member Staff Member

Eryk New Member Academic Member

fissatu Member Staff Member

Eryk New Member Academic Member

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

dapal New Member

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

Eryk New Member Academic Member

paul Member Staff Member Standardisation Committee

fissatu Member Staff Member

Eryk New Member Academic Member

Eryk New Member Academic Member

Share This Page

Useful Searches