Sicilian Learner's Dictionary -- Development Proposal

Discussion in 'Materials for Learners' started by Eryk, 2 November 2017.

  1. fissatu

    fissatu Member Staff Member

    Good post Karl, I am familiar with the language situation in Norway, and personally, I think there are good lessons there for us too.
  2. Tim

    Tim New Member

    I'm also quite fascinated by some of the attempts to unify the Gallo-Italic languages, particularly Michael Dallera's Lombard Restruiturad Stàndard (LoReS)... he's in the Facebook group but I'm not sure if he's here in the forum. I don't want to speak for his work and risk being incorrect on something, but it seems really cool.
  3. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    We have a subforum for Gallo-Sicilian, if you're a speaker you're welcome to open conversations there and introduce the relevant people to that area.
  4. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    You certainly may! That's an excellent example.

    Hmm ... They used Perl for their project. That's what I'm using. I wonder if we could borrow some of their libraries? ... According to their about page:
    I clicked through the EDD's "Tjenester og verktøy", but I could not find any information about the database. But I did find a link to some more EDD dictionaries.

    Cool stuff! Thank you for sharing this. :)
  5. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Yes. They are.

    Here's how Dr. Cipolla defines the rule (Mparamu, p. 64):
    I would define the rule differently. I would say:
    • all Sicilian verbs have an unstressed "stem" and a stressed "boot."
    • the infinitive reflects either the stem or the boot.
      • stem + ari
      • stem + iri (sc)
      • boot + iri
    • the stem appears in all of the conjugations except:
      • the present indicative -- 1st S., 2nd S., 3rd S., 3rd P.
      • the imperative -- 2nd S.,
    • the boot replaces the stem in those locations

    -- stem: fin, boot: finìsc

    finisciu __ finemu
    finisci __ finiti
    finisci __ finìscinu
    sèntiri -- stem: sint, boot: sènt

    sentu __ sintemu
    senti __ sintiti
    senti __ sèntinu​

    aspittari -- stem: aspitt, boot: aspètt

    aspettu __ aspittamu
    aspetti __ aspittati
    aspetti __ aspèttanu​

    mòriri -- stem: mur, boot: mòr

    moru __ muremu
    mori __ muriti
    mori __ mòrinu​

    allungari -- stem: allung, boot: allòng

    allongu __ allungamu
    allonghi __ allungati
    allonga __ allònganu​

    parrari -- stem: parr, boot: pàrr

    parru __ parramu
    parri __ parrati
    parra __ pàrranu​

    rispùnniri -- stem: rispunn, boot: rispùnn

    rispunnu __ rispunnemu
    rispunni __ rispunniti
    rispunni __ rispùnninu​

    crìdiri -- stem: crid, boot: crìd

    cridu __ cridemu
    cridi __ criditi
    cridi __ crìdinu​
    Last edited: 17 November 2017 at 18:14
  6. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    I just rewrote my scripts to implement that rule and to implement Fissatu's corrections. It should be a big improvement. Attached is a ZIP file containing an XLSX spreadsheet and PDF printout.

    There are 32 conjugated verbs -- allungari, arrispùnniri, aspittari, aviri, capiri, crìdiri, dari, diri, èssiri, fari, finiri, jiri, manciari, mèttiri, mòriri, ntènniri, pàriri, parrari, pèrdiri, pòniri, purtari, putiri, ripètiri, rispùnniri, sapiri, sèntiri, stari, studiari, tèniri, vèniri, vìdiri, vuliri

    Does anyone have fresh eyes for me? Thanks in advance!

    Attached Files:

  7. fissatu

    fissatu Member Staff Member

    Hi Eryk
    just quickly, I think it's pariri (normal stress)
  8. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Hi Fissatu, Thanks for the quick response. Bonner and Cipolla both put the stress on the "a" -- pàriri.

    Importantly, if the stress fell on the penultimate, then it would be an exception to the rule laid out above ... which might require me to rethink the rule. So if you notice any more cases like that, please tell me. Thanks!
  9. fissatu

    fissatu Member Staff Member

    I was just double checking Piccitto, it states that one locality in Enna pronounces pàriri, otherwise it's pariri.

    Camilleri shows both parìri and pàriri - so I guess both must be acceptable.
  10. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Good work!

    Also note that Dieli lists both mòriri and muriri. So we now have two examples of a caveat to the "boot + iri" part of the rule.

    Good eye! Thanks!
  11. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    This project is going to fail and that's a good thing. It's a good thing because the result is going to be better.

    Dr. Dieli compiled a great vocabulary list, but to take his work forward, we need to completely rewrite it. The result is going to be a better dictionary. And it's going to be a better dictionary because the community will rewrite it.

    Specifically, Dr. Dieli's list needs a deeper level of definition. Part of speech and English/Italian translation is a great start. Now we need to add detail: usage notes, preferred forms, examples, verb conjugations, etc. Adding detail requires a new dictionary.

    For example: "vìviri" and "vìviri." Both verbs have the same infinitive, but they're two different verbs ("Iu vissi" and "Iu vippi"). Dr. Dieli's dictionary does not contain enough detail to handle this situation.

    So we either need to make an endless series of small edits to Dr. Dieli's work or we need a complete rewrite. Bite the bullet and rewrite it. At the very least, a complete rewrite is probably the best way to implement the new orthographic standard. And at best, a complete rewrite gives the community an opportunity to become involved.

    The only question is: "How to rewrite it?" I propose a flow "dû Sicilianu Spirimentali ô Sicilianu Cadèmicu."

    Let's use these Perl hashes to create a bunch of (experimental) spreadsheets -- a sheet of verbs, a sheet of nouns, a sheet of adjectives, etc. Then let's ask the community to correct the spreadsheets. Then we can load the approved version ntô Dizziunariu Ufficiali dâ Cadèmia Siciliana.

    For the official dictionary, I'm going to recommend SQL -- not my Perl hashes. Perl is great for running experiments, but the community needs to store its work in a proper database.

    And then the community will have a dictionary that truly belongs to them.
    dapal likes this.
  12. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    Well put @Eryk , I think perhaps what we should focus then on is:

    1) How to best facilitate information collection/compilation (spreadsheets is neat idea)
    2) How to best present this information in a flexible interface that deals with the specific issues of Sicilian
    3) How to manage this project.

    We made an official GitHub for the Cadèmia, maybe we can start working out of there?
    dapal likes this.
  13. dapal

    dapal New Member

    I'm very happy that @Eryk has come up with this decision!
    I agree SQL is the way to go.

    As per the correcting approach: we cou'd show a random word/verb/... on page load, and let it be checked (✔/❌ and correction proposal) by users (allow anonymous users?). If we allow anonymous users, we should think of a Commission checking whether these crowdsourced entries are OK. We could also do a mix: autoapprove authenticated users (we should decide whom to give this power to), and let anonymous post suggestions.

    I could make a Python web frontend for this kind of thing, if needed
  14. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Thank you for the encouragement, @Paul and @dapal.

    I wonder if we can "grow a seed into a flower." The "seed" is the information already available to us, in this case: Dr. Dieli's dictionary. The "flower" is an SQL dictionary database with Python frontend.

    To "grow" the seed into the flower, we need a set of functions. The argument to the functions will be the already available information plus the information that we add. The values of the functions will be the spreadsheets that we will examine, correct and load into the SQL database.

    The functions that I have in mind are no different from the ones you learned in math class. Just with words instead of numbers.

    For example, suppose that f is a function of the variables: x and y.

    f(x,y) = x^2 + y^2

    When x=2 and y=3, the value of the function is 13:

    f(x=2,y=3) = 2^2 + 3^2
    f(x=2,y=3) = 13

    Now, suppose that conjugate is a function of the variables: stem, boot, conjugation, tense and person. One value that it might return is "finìscinu":

    conjugate( stem="fin", boot="finìsc", conjugation="iri", tense="present", person="3rd plural" ) = finìscinu

    With that in mind, let's consider what information is available to us and what information we want to add. From Dr. Dieli's dictionary, we already have:
    • the word itself -- finiri
    • the part of speech -- verb
    • English translations -- to finish, to end
    • Italian translations -- finire, smettere
    Based on the spelling, we can infer that the verb's conjugation is "iri." The information that we must add is that the boot is "finìsc".

    And there's lots of information that we may want to add: example usage, regional variations, preferred forms, synonyms, antonyms, .... I think it would be really cool if we provided a Sicilian language definition for each word.

    So our first task is to specify exactly what information we want to collect. For example:
    • the word itself
    • part of speech
    • etymology, regional variations, preferred forms
    • Sicilian language definition
    • synonyms, antonyms
    • English/Italian translation
    • examples of usage
    • other notes
    For verbs, we also need:
    • stem, boot
    • conjugation
    • irregular forms
    For nouns and adjectives, we also need:
    • irregular singular and plural forms

    Let's begin making that list. Once we have a formal list, this project will take a life of its own.

    That would be awesome. I will organize my work and write a README for it.
    Last edited: 20 November 2017 at 20:19
  15. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    @Eryk , I notice you mention Dr. Dieli's dictionary, have you considered the Wiktionary? I personally consider it a better source, even if it's less organised it has many more words and many variants as well.
  16. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    It's a great source. To be exact, Sicilian Wiktionary has 21,841 words while Dr. Dieli has 12,060.

    You can download the whole Sicilian Wiktionary from The one to focus on is the one marked: "Articles, templates, media/file descriptions, and primary meta-pages." Each page is rolled up into a gigantic XML file, from which we could extract a lot of information.

    The difference is that Dr. Dieli's lists are simple HTML tables, so it's super easy to work with them. And frankly, I admire Dr. Dieli's work because he poured so much of his heart into developing the language.

    Ultimately, we will collect information from a lot of different lists. Dr. Dieli's list is just a great place to start.
  17. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    Wow, I had no idea Dieli's was that large. Thank you, for clarifying that.
  18. fissatu

    fissatu Member Staff Member

    Eryk's reasoning is sound, a lot of info is there, and I'd agree given Dr Dieli's knowledge, it would be quality data.

    However, the extra info being sought, well, it could get tied down for years trying to fill all those gaps. Even if one person was allocated one item of data to do, say they were tasked with doing the etymology of 11,000 words, that alone could take one person years, if two people were doing it, you might get that down to a couple of years, etc.

    This probably needs some sort of group decision as to how ambitious we want to be with this, there being a trade off between completeness ( a very good thing) and having something useful up and running as quickly as possible (also very useful).

    Not an easy decision.

Share This Page