Sicilian Learner's Dictionary -- Development Proposal

Discussion in 'General' started by Eryk, 2 November 2017.

  1. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    I am planning to develop a Sicilian Learner's Dictionary. I am writing to ask for your suggestions.

    As many of you know, Dr. Arthur Dieli and Arba Sicula compiled an extensive vocabulary list. My plan is to supplement his list with information about the individual words -- verb conjugations, notes on noun-adjective agreement, usage examples, regional variations, etc.

    For example, a learner might search the dictionary for a translation of the verb "to come" and find "veniri," which would then link to a page full of information about "veniri" -- a complete set of conjugations, notes on the irregular forms, usage notes and examples ("Û vegnu a pigghiu").

    With a little programming, computers can generate the conjugations almost automatically. And once the computer is aware of the irregular forms, it could generate the conjugations fully automatically.

    What a computer cannot provide is usage examples, regional variations, etc. Only human beings can do that. So my plan here is to collect examples, notes and variations from textbooks, this forum and the Facebook page.

    I will keep you posted on my progress. In the meantime, I created a search tool for Dr. Dieli's dictionary. It's available at: http://www.wdowiak.me/cgi-bin/sicilian.pl I hope it's helpful to you. Just please remember that the dictionary is Dr. Dieli's work, not mine!

    Thanks in advance for your help and suggestions,
    - eryk
     
    Tim likes this.
  2. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Just crossed my mind ... The dictionary should also include the etymology of the word. Did the word come from Latin, Greek, Arabic, Norman, French, Catalan, Spanish or some other language?
     
  3. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    I would suggest the feature set of scn.wiktionary.org is a good baseline, but perhaps the major difference being that we will use verified information. I suspect we can even work with that time to integrate the two projects. As you may have read in other conversations within the group, there is a tendency to over attribute words to foreign influence. The Sicilian linguistic soup is quite interesting in that way, and the mediterranean sprachbund equally so. There is a tendency due to a variety of reasons to over attribute Sicilian culture and language to foreign influence without properly understanding the particular linguistic situation of Sicily over the past 3000 years.

    This can be tied into our process of simultaneously documenting words and selecting the most widely used form as a "preferred literary form". We can list the "preferred literary form" as the word listing and provide known variants with additional information regarding location etc. I fear though the size of such a project will require a Wiki style platform, or in fact is a giant endeavour. What do you think we can reasonably accomplish? Perhaps we should simply start with Google Sheets, and then once the data becomes sufficient we transfer it to a SQL database. I do however fear syncronisation issues, etc.
     
  4. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    I'm going to keep that in mind because it's an important point.

    Thank you! I had forgotten about that one.

    And yes. I want to provide good, verified information about each word. Examples, usage notes and irregular forms are completely lacking from that dictionary.

    Yes!

    I don't think it will be too difficult. It will be time-consuming, but not difficult. But learning any language is time-consuming and this will be my way of learning the language, so let the project consume a little of my time.

    Perl is my programming language, so my plan is to create a bunch of hashes and save them into a Perl storable. We could also load it into an SQL database, but I prefer the flexibility of Perl's data structures.

    For example, suppose we have a hash of Sicilian verbs and another hash of verb endings. If the verb is a regular verb (like parrari), we just mark it regular (in the verb hash) and let the Perl script pair the stem of the verb (parr-) with the appropriate verb endings (-u, -i, -a, -amu, -ati, -anu). If the verb is irregular (like essiri), then we supply the irregular forms (sugnu, sì, è, semu, siti, sunnu) to the verb hash.

    This approach should save time because we only have to supply the irregular forms. And because a lot of irregular verbs follow a regular pattern (e.g. stem-changing verbs), we might only have to supply a small fraction of the irregular forms.

    And with our free time, we can focus on adding information to each Sicilian word in the hash -- examples, etymology, usage notes, preferred literary form, variants, etc.

    In regard to synchronization ... Let me get something working. Once it's working, we could post it to GitHib and collaborate there.
     
  5. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    You make another good point though Eryk, there's a lot of 'low hanging fruit', verb conjugations for example are an area of very high demand. Although variant across the island, the major forms are quite known and fairly predictable. There are probably another 1000 or so 'easy consensus words' as well.

    Did you see that we published the orthography today? It's available on the website.

    What data do you require to begin our project?
     
  6. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    True, but learners (like me) have to look up the forms a few times before they remember them. I could be wrong, but I think that a readily accessible list will that help people learn the conjugations quickly.

    Yes. I wanted to ask you about that. I have been skimming the document, but I have not had a chance to sit down and read it in full yet.

    Standardization is always a good thing. And since we're at the very beginning of a dictionary project, the cost of implementing the new standard is low, so we should make every effort to use the new standard.

    The difficulty (for me) is knowing which entries need to be changed to conform to the new standard. Is it possible to create a list of the most important rules? Or would the list be so long that you just have to read the document?

    Dr. Dieli's work contains 12,060 unique entries of Sicilian words, for which he provides 19,091 translations. He has given us plenty of data.

    What we need is an easy way to annotate that data. So first, we need to list the information that we will collect for each word -- part of speech, etymology, preferred literary form, regional variations, usage notes and examples, etc. (That task is more or less done). Then, we need an easy way for you, me (and others) to supply the information.

    Once we have an easy way for everyone to annotate, we can begin collaborating.

    I will try to put something together soon, but please be patient. (I have some economics classes to teach). Hopefully, we can have a system in place and begin working together before Thanksgiving.
     
  7. fissatu

    fissatu Member Staff Member

    There are a lot of regular verbs, but one complication, even for the regular verbs, are the vowell shifts, which happen a lot, e.g. Purtari - portu, porti, porta, purtamu, purtati, pòrtanu.
    On the plus side, there are some tenses which have very few irregularities.
     
  8. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Ciau Fissatu! Thanks for your response.

    Yes, but notice that the vowel shifts follow a "boot" pattern:

    portu purtamu
    porti purtati
    porta pòrtanu

    From a computer programming perspective, this irregularity is regular enough that all we have to do is tag the verb as a "stem changing verb" and write a rule that tells the computer to follow the "boot" pattern.
     
  9. fissatu

    fissatu Member Staff Member

    True enough, and also the i/e versions as well.
    It would certainly be worthwhile having a resource that distinguishes such verbs, noting that they all don't fall into that group, for example, mustrari retains the "u" throughout. It's very much a case by case basis, checking each verb out, and then triggering the relevant flag.
     
  10. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Below is a Perl script that conjugates three regular verbs (parrari, rispunniri, finiri) and directs the output to HTML files.

    Nothing is more confusing than a table of raw verb endings. I am sure that there are several mistakes, but I need a fresh mind to find them.

    Code:
    #!/usr/bin/env perl
    
    ##  Eryk Wdowiak
    ##  04 Nov 2017
    
    ##  perl script to conjugate Sicilian verbs
    
    ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##
    
    use strict;
    use warnings;
    
    my %vnotes = mk_vnotes() ;
    my %hashfn = mk_hashfn() ;
    
    mk_html("parrari") ;
    mk_html("rispunniri") ;
    mk_html("finiri") ;
    
    ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ## 
    
    sub mk_html {
        my $palora = $_[0] ;
        my $otfile = $palora . ".html" ;
      
        open( OTFILE , ">$otfile" ) || die "could not overwrite:  $otfile";
        print OTFILE '<!DOCTYPE html>' . "\n" ;
        print OTFILE '<html>' . "\n" ;
        print OTFILE '<head>' . "\n" ;
        print OTFILE '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . "\n" ;
        print OTFILE '</head>' . "\n" ;
        print OTFILE '<body>' . "\n" ;
        print OTFILE $hashfn{conjugate}->(\%{$vnotes{$palora}}) . "\n" ;
        print OTFILE '</body>' . "\n" ;
        print OTFILE '</html>' . "\n" ;
        close OTFILE
    }
    
    ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ## 
    
    sub mk_vnotes {
        my %vnotes = (
       parrari => {
           verb => {
           conj => "xxari",
           type => "reglr",
           stem => "parr" ,
           },
       },
       rispunniri => {
           verb => {
           conj => "xxiri",
           type => "reglr",
           stem => "rispunn",
           },
       },
       finiri => {
           verb => {
           conj => "sciri",
           type => "reglr",
           stem => "fin",
           },
       } ,
       );
        return %vnotes ;
    }
    
    ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ##  ## 
    
    sub mk_hashfn {
        my %hashfn = (
       conjugate => sub {
           my %palora = %{ $_[0] } ;
          
           my @tenses = ("pri","prs","pai","imi","ims","fti","coi","ger","pap") ;
           my %tnhash = ( "pri" => "present ind." ,
                  "prs" => "imperative" ,
                  "pai" => "past ind. (preterite)" ,
                  "imi" => "imperfect ind." ,
                  "ims" => "imperfect subj." ,
                  "fti" => "future ind." ,
                  "coi" => "conditional ind." ,
                  "ger" => "gerund" ,
                  "pap" => "past participle",
           ) ;
           my @people = ("us","ds","ts","up","dp","tp") ;
          
           my %vbconj = (
           xxari => {
               pri => {
               us => "u"   , ds => "i"   , ts => "a"   ,
               up => "àmu" , dp => "àti" , tp => "anu" ,
               },
               prs => { ##  us => "u"   ,
               ds => "a"   , ts => "assi"   ,
               up => "amu" , dp => "ati" , tp => "àssiru" ,
               },
               pai => {
               us => "ai "  , ds => "asti"   , ts => "au"   ,
               up => "ammu" , dp => "astivu" , tp => "aru" ,
               },
               imi => {
               us => "ava"   , ds => "avi"   , ts => "ava"   ,
               up => "àvamu" , dp => "àvavu" , tp => "àvanu" ,
               },
               ims => {
               us => "assi"   , ds => "assi"   , ts => "assi"   ,
               up => "àssimu" , dp => "àssivu" , tp => "àssiru" ,
               },
               fti => {
               us => "irò"   , ds => "irai"  , ts => "irà"    ,
               up => "iremu" , dp => "iriti" , tp => "irannu" ,
               },
               coi => {
               us => "iria"   , ds => "irissi" , ts => "iria" ,
               up => "iriamu" , dp => "iriavu" , tp => "irianu" ,
               },
               ger => "annu",
               pap => "atu",
           },
           xxiri => {
               pri => {
               us => "u"   , ds => "i"   , ts => "i"   ,
               up => "èmu" , dp => "ìti" , tp => "unu" ,
               },
               prs => { ##  us => "u"   ,
               ds => "i"   , ts => "issi"   ,
               up => "emu" , dp => "iti" , tp => "ìssiru" ,
               },
               pai => {
               us => "ivi " , ds => "isti"   , ts => "iu"  ,
               up => "emmu" , dp => "istivu" , tp => "eru" ,
               },
               imi => {
               us => "ìa"   , ds => "evi"   , ts => "ìa"     ,
               up => "ìamu" , dp => "èvavu" , tp => "ìavanu" ,
               },
               ims => {
               us => "issi"   , ds => "issi"   , ts => "issi"   ,
               up => "ìssimu" , dp => "ìssivu" , tp => "ìssiru" ,
               },
               fti => {
               us => "irò"   , ds => "irai"  , ts => "irà"    ,
               up => "iremu" , dp => "iriti" , tp => "irannu" ,
               },
               coi => {
               us => "iria"   , ds => "irissi" , ts => "iria" ,
               up => "iriamu" , dp => "iriavu" , tp => "irianu" ,
               },
               ger => "ennu",
               pap => "utu",  
           },
           sciri => {
               pri => {
               us => "isciu" , ds => "isci" , ts => "isci"    ,
               up => "èmu"   , dp => "ìti"  , tp => "isciunu" ,
               },
               prs => { ##  us => "u"   ,
               ds => "isci" , ts => "issi"   ,
               up => "emu" , dp => "iti"  , tp => "ìssiru" ,
               },
               pai => {
               us => "ivi " , ds => "isti"   , ts => "iu"  ,
               up => "emmu" , dp => "istivu" , tp => "eru" ,
               },
               imi => {
               us => "ìa"   , ds => "evi"   , ts => "ìa"     ,
               up => "ìamu" , dp => "èvavu" , tp => "ìavanu" ,
               },
               ims => {
               us => "issi"   , ds => "issi"   , ts => "issi"   ,
               up => "ìssimu" , dp => "ìssivu" , tp => "ìssiru" ,
               },
               fti => {
               us => "irò"   , ds => "irai"  , ts => "irà"    ,
               up => "iremu" , dp => "iriti" , tp => "irannu" ,
               },
               coi => {
               us => "iria"   , ds => "irissi" , ts => "iria" ,
               up => "iriamu" , dp => "iriavu" , tp => "irianu" ,
               },
               ger => "ennu" ,
               pap => "utu"  ,
           },
           ) ;
          
           ##  prepare output
           my $ot ;
          
           ##  PRI -- present indicative
           $ot .= '<div class="row">' . "\n" ;
           $ot .= '<p style="margin-bottom: 0.5em"><u>' . $tnhash{pri} . '</u></p>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           foreach my $person (@people[0..2]) {
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{pri}{$person} ;
           $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           foreach my $person (@people[3..5]) {
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{pri}{$person} ;
           $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;   
           $ot .= '</div>' . "\n" ;
          
           ##  PRS -- present subjunctive
           $ot .= '<div class="row">' . "\n" ;
           $ot .= '<p style="margin-bottom: 0.5em"><u>' . $tnhash{prs} . '</u></p>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' . '--' . '</p>' . "\n" ;
           foreach my $person (@people[1..2]) {
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{prs}{$person} ;
           $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           foreach my $person (@people[3..5]) {
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{prs}{$person} ;
           $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;   
           $ot .= '</div>' . "\n" ;
          
           ##  PAI -- past ind. (preterite)
           ##  IMI -- imperfect ind.
           ##  IMS -- imperfect subjunctive
           ##  FTI -- future indicative
           ##  COI -- conditional indicative
           foreach my $tense (@tenses[2..6]) {
           $ot .= '<div class="row">' . "\n" ;
           $ot .= '<p style="margin-bottom: 0.5em"><u>' . $tnhash{$tense} . '</u></p>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           foreach my $person (@people[0..2]) {
               $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
               $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{$tense}{$person} ;
               $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           foreach my $person (@people[3..5]) {
               $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
               $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{$tense}{$person} ;
               $ot .= '</p>' . "\n" ;
           }
           $ot .= '</div>' . "\n" ;   
           $ot .= '</div>' . "\n" ;
           }
          
           ##  GER -- gerund
           ##  PAP -- past participle
           $ot .= '<div class="row">' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           $ot .= '<p style="margin-bottom: 0.5em"><u>' . $tnhash{ger} . '</u></p>' . "\n" ;
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{ger} ;
           $ot .= '</p>' . "\n" ;
           $ot .= '</div>' . "\n" ;
           $ot .= '<div class="col-m-6 col-6">' . "\n" ;
           $ot .= '<p style="margin-bottom: 0.5em"><u>' . $tnhash{pap} . '</u></p>' . "\n" ;
           $ot .= '<p style="margin-top: 0em; margin-bottom: 0em">' ;
           $ot .= $palora{verb}{stem} . $vbconj{ $palora{verb}{conj} }{pap} ;
           $ot .= '</p>' . "\n" ;
           $ot .= '</div>' . "\n" ;   
           $ot .= '</div>' . "\n" ;
          
           return $ot ;
       }
       ) ;
    }
    
    
     
  11. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    @Eryk , when you have some time and you're ready to implement this, feel free to contact me and i'll collaborate. @fissatu is on the working group so he understands the standard quite thoroughly. @Salvatore is more busy but he's the primary author as well. If you let me know I can also get you admin access to the website, or at least a FTP account if you want to put it on a subdomain/subdirectory.
     
  12. fissatu

    fissatu Member Staff Member

    I've been meaning to do a proof-read, but just haven't got round to it yet.
     
  13. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Thanks! I was working on it today. Your help would be greatly appreciated.

    Attached are two ZIP files. Inside one of those ZIP file is an MS Excel spreadsheet with 25 conjugated verbs:
    • 2 everywhere irregular -- aviri, essiri
    • 6 regular -- finiri, parrari, purtari, ripetiri, rispunniri, sèntiri
    • 17 irregular -- dari, diri, cridiri, fari, iri, mettiri, ntènniri, pariri, perdiri, poniri, putiri, sapiri, stari, teniri, veniri, vidiri, vuliri
    Each of the verbs was automatically conjugated with the Perl script inside of the other ZIP file. I need to know where the errors are. And if possible, what pattern is producing the errors? To help find those errors, I have posted the hash of raw verb endings below.

    Thanks in advance!

    It would be nice to get there soon, but something tells me that I will need a bit more time. I'll keep working on it. Thanks!

    Code:
    "pri" => "present ind." ,
    "pim" => "present imperative" ,
    "pai" => "past ind. (preterite)" ,
    "imi" => "imperfect ind." ,
    "ims" => "imperfect subj." ,
    "fti" => "future ind." ,
    "coi" => "conditional ind." ,
    "ger" => "gerund" ,
    "pap" => "past participle"
    
    Code:
    sub mk_conj {
    
        ##  hash to create
        my %vbconj ;
     
        ##  same endings for both -ARI and -IRI
        my %xfti = ( us => "irò"    , ds => "irai"   , ts => "irà"    ,
            up => "iremu"  , dp => "iriti"  , tp => "irannu" ) ;
        my %xcoi = ( us => "iria"   , ds => "irissi" , ts => "iria"   ,
            up => "iriamu" , dp => "iriavu" , tp => "irianu" ) ;
    
        ##  the -ARI endings
        my %ari = (
       ##  same throughout -ARI
       pai => { us => "ai"     , ds => "asti"   , ts => "au"      ,
            up => "ammu"   , dp => "astivu" , tp => "aru"    },
       imi => { us => "ava"    , ds => "avi"    , ts => "ava"     ,
            up => "àvamu"  , dp => "àvavu"  , tp => "àvanu"  },
       ims => { us => "assi"   , ds => "assi"   , ts => "assi"    ,
            up => "àssimu" , dp => "àssivu" , tp => "àssiru" },
       fti => { %xfti } ,
       coi => { %xcoi } ,
       ger => "annu" ,
       pap => "atu"  ,
       ##   -xxARI   -CIARI   -GIARI   -xIARI
       pri => {
           xxari => {
           us => "u"   , ds => "i"   , ts => "a"   ,
           up => "àmu" , dp => "àti" , tp => "anu" },
           ciari => {
           us => "ciu"   , ds => "ci"    , ts => "cia"   ,
           up => "ciamu" , dp => "ciati" , tp => "cianu" },
           giari => {
           us => "giu"   , ds => "gi"    , ts => "gia"   ,
           up => "giamu" , dp => "giati" , tp => "gianu" },
           xiari => {
           us => "ìu"   , ds => "ìi"   , ts => "ìa"   ,
           up => "iàmu" , dp => "iàti" , tp => "ìanu" }
       },
       pim => {
           xxari => { ##  us => "u"   ,
                           ds => "a"     , ts => "assi"   ,
           up => "amu"   , dp => "ati"   , tp => "àssiru" },
           ciari => { ##  us => "ciu"   ,
                           ds => "cia"   , ts => "ciassi"   ,
           up => "ciamu" , dp => "ciati" , tp => "ciàssiru" },
           giari => { ##  us => "giu"   ,
                           ds => "gia"   , ts => "giassi"   ,
           up => "giamu" , dp => "giati" , tp => "giàssiru" },
           xiari => { ##  us => "u"   ,
                           ds => "a"     , ts => "assi"   ,
           up => "amu"   , dp => "ati"   , tp => "àssiru" }
       }
       ) ;
     
        ##  the -IRI endings
        my %iri = (
       ##  same throughout -IRI
       pai => { us => "ivi "   , ds => "isti"   , ts => "iu"      ,
            up => "emmu"   , dp => "istivu" , tp => "eru"    },
       imi => { us => "ìa"     , ds => "evi"    , ts => "ìa"      ,
            up => "ìamu"   , dp => "èvavu"  , tp => "ìavanu" },
       ims => { us => "issi"   , ds => "issi"   , ts => "issi"    ,
            up => "ìssimu" , dp => "ìssivu" , tp => "ìssiru" },
       fti => { %xfti } ,
       coi => { %xcoi } ,
       ger => "ennu" ,
       pap => "utu"  ,
       ##  -xxIRI   -SCIRI
       pri => {
           xxiri => {
           us => "u"   , ds => "i"   , ts => "i"   ,
           up => "èmu" , dp => "ìti" , tp => "unu" },
           sciri => {
           us => "isciu" , ds => "isci" , ts => "isci"    ,
           up => "èmu"   , dp => "ìti"  , tp => "isciunu" }
       },
       pim => {
           xxiri => { ##  us => "u"   ,
                         ds => "i"   , ts => "issi"   ,
           up => "emu" , dp => "iti" , tp => "ìssiru" },
           sciri => { ##  us => "u"   ,
                         ds => "isci" , ts => "issi"   ,
           up => "emu" , dp => "iti"  , tp => "ìssiru" }
       }
       ) ;
    
        %{ $vbconj{ari} } = %ari ;
        %{ $vbconj{iri} } = %iri ;
        return %vbconj;
    }
    
     

    Attached Files:

    Last edited: 11 November 2017
  14. paul

    paul is a Verified Memberpaul Member Staff Member Standardisation Committee

    @Eryk , Do you think we should have the code run live on the site or just have it create a bunch of static output files? I'm downloading your outputs now. We should get David in here too. He admins the organisation's github.
     
  15. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Live. Definitely. Live. ;)

    The project is smaller than one might expect. My plan is to create two Perl scripts that create an interface to a "database" of words. I put "database" in quotes because it's not really a database. Instead it's a set of four Perl storables -- one is the storable of annotations that generated the conjugations, the other three are the storables of Sicilian, English and Italian words collected by Arthur Dieli.

    One of the Perl scripts uses Dr. Dieli's dictionary to translate Sicilian-English and Sicilian-Italian. The other Perl script will use the storable of annotations to generate conjugations and list whatever detailed information we choose to provide about a given Sicilian word.

    Today, my goal is to integrate the two Perl scripts together, so that (for example) when a learner looks up the word "speak," they see the translated word "parrari," which links to the annotations and conjugations.

    Once the two Perl scripts are integrated, all we have left to do is annotate annotate annotate. ... At least in theory anyway. In practice, we will probably spend a good amount of time adding features to the Perl scripts.


    Excellent! :)


    Thank you! To make it easier for you to find the errors in the automatic conjugations, I uploaded a revised version in which the automatically generated conjugations are marked in bold. Thanks in advance for your help.
     
  16. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    That task was much easier than I thought it would be. It's already done.

    Check it out: http://www.wdowiak.me/cgi-bin/sicilian.pl If you search for any of the 25 verbs, there will be a link to the conjugations.

    The 25 verbs are: aviri, cridiri, dari, diri, essiri, fari, finiri, iri, mettiri, ntènniri, pariri, parrari, purtari, perdiri, poniri, putiri, ripetiri, rispunniri, sapiri, sèntiri, stari, teniri, veniri, vidiri, vuliri
     
  17. fissatu

    fissatu Member Staff Member

    I'll undertake to do a proof-read of those two files over the next few hours.
    In the meantime, not sure how these appear in your script, but looking at the above list:
    - according to the newly approved orthography, we would favour writing iri as jiri (including all the conjugations where relevant)
    - essiri should be èssiri
    - mettiri should be mèttiri (while I'm here - do we acknowledge the form mintiri?)
    - perdiri should be pèrdiri
    - poniri should be pòniri
    - ripetiri should be ripètiri
    - rispunniri should be rispúnniri, if we are using the two accents (while I'm here - do we try and tackle longer forms such as arrispúnniri?)
    - teniri should be tèniri
    - veniri should be vèniri
    cheers
    will return once proof-read is complete - I won't look for the accents again because they need to be reflected in every case where the stress doesn't fall on the penultimate, which means the 3rd person plural for most (if not all) present indicative conjugations will have an accent
     
  18. fissatu

    fissatu Member Staff Member

    Looking through the spreadsheet now. Most of my comments will be more questions about what the group views as being the closest to the "standard":
    - present indicative of aviri: do we shun the forms haiu, hai, havi and hannu? (Pitrè's grammatica shows the "h" forms)
    - present indicative of èssiri: question on whether sì and è should be si' and e' (as shown by Pitrè, but also I think favoured by Salva)
    - present indicative of finiri: question of whether 3rd pers sing should be finìscinu; also finemu does NOT need an accent
    - imperfect for finiri: should 2nd pers sing be finivi; 2nd pers plur finìavu; 3rd pers plur finìanu (this goes for all -iri verbs except for those overly irregular)
    - present indicative of parrari - parramu and parrati do NOT need an accent (but pàrranu does)
    - same for purtari
    - present indicative of ripètiri - ripitemu and ripititi do not need accents; should be ripètinu for 3rd pers plur
    - preterite for ripètiri - all of those unstressed "e"s become "i"s
    - same for imperfect
    - general observation for present indicative 3rd pers sing: should end in -anu or -inu when more or less regular, and needs accent
    - question about whether first person sing for preterite takes -ivi or ìi
    - unclear why sentiri has been given all those accents, I think the vowel should change to "i" when stress moves from the "e"
    - preterite for dari - detti vs desi for 1st and 3rd pers sing
    - back to Jiri, should take the "j" under our orthography, for preterite, not really familiar with those versions starting with v, I would have thought: jìi, jisti, jìu, jemu, jìstivu, and can't remember 3rd pers plur to be honest Jeru?
    - also not familiar with those imperfect subj for Jiri starting with "v"'
    - with present ind mèttiri, vowel must change to "i" for mittemu and mittiti, and indeed for the other tenses
    - same with ntènniri
    - same with pèrdiri
    - pòniri vowel must change to "u" when stress moves of "o"
    - stari - for imperfect ind and subj, are there really double "a"s like that?
    - tèniri and vèniri - vowell must shift to "i" when stress moves off the "e"
     
  19. Eryk

    Eryk is a Verified MemberEryk New Member Academic Member

    Grazii! Multu gintili!

    Thank you for reading through the spreadsheet. This is a big help. I will do my best to implement all of your suggestions. If all goes well, by the end of the week.

    In the meantime, let's classify some of your suggestions into two groups. One is a question of accents -- where and when to place them. The other is a question of alternative forms ("mèttiri vs. mintiri" and "rispúnniri vs. arrispúnniri").

    In regard to accents ... Unlike the human ear, a computer cannot divide a word into syllables, so a computer cannot know where to place an accent. Therefore, we need another way of arriving at the same result.

    I think we can get the right result if we "accent everything, then remove accents." Specifically, each conjugation has two parts: a stem and an ending. So let's put accents on both the stems and the endings, then let's remove the accent from the stem, the ending or both.

    For example, the stem of parrari is "pàrr" and the present indicative endings are: "-u, -i, -a, -àmu, -àti, -anu." In the singular forms, the accent falls on the penultimate, so we drop it from the stem:

    pàrr + u --> pàrru --> parru
    pàrr + i --> pàrri --> parri
    pàrr + a --> pàrra --> parra

    In the first and second person plural forms, the result is a (wrong) double accent, so we remove the first. Then we drop the remaining accents because the accent falls on the penultimate:

    pàrr + àmu --> p(à)rràmu --> parràmu --> parramu
    pàrr + àti --> p(à)rràti --> parràti --> parrati

    In the third person plural, there is no change because the accent does not fall on the penultimate:

    pàrr + anu --> pàrranu

    In practice, this means the third person plural will always have an accent in the present indicative, but none of the others will have an accent. That's an easy rule to write: "Drop all accents except in the third person plural."

    Stepping away from computers for a moment ... Are all Sicilian verbs stem-changing verbs? It seems to me that the "boot" pattern appears in placement of the accent:

    pàrru parràmu
    pàrri parràti
    pàrra pàrranu

    The verb "parrari" is not classified as a stem-changing verb, but the accent shifts follow the same pattern as stem-changing verbs.


    Finally, in regard to alternative forms like "mèttiri vs. mintiri" and "rispúnniri vs. arrispúnniri" ... There is no sense in doing the same work twice, especially in cases where one form is more popular than the alternatives. But the alternatives should also be included.

    In the next version, let's include some sort of "switch," so that regardless of which you word you click, you are taken to the "preferred literary form." And in that space, we can list the alternatives, explain which is preferred, describe regional differences, etc.


    Thanks again for your help. I will try to put out a new version that includes your suggestions by the end of the week.
     
    Last edited: 16 November 2017
  20. Karl Farrugia

    Karl Farrugia is a Verified MemberKarl Farrugia New Member

    May I suggest having a look at the Norwegian official dictionary online for inspiration on that can be done?

    http://ordbok.uib.no/perl/ordbok.cgi?OPP=leie&ant_bokmaal=5&ant_nynorsk=5&nynorsk=+&ordbok=begge

    If you take a look at this word, for example, and you go down to the verb and click on verb or v1, it opens a table with the full conjugation, which also shows the different forms of the verb in the infinitive and the different conjugated forms that are allowed, taken from different dialects. For each different meaning or use, there are a number of example phrases, especially useful for a learner to see what prepositions may be used after the verb, for example.

    I understand that the situation in Sicilian is very different, however it might give you some useful ideas that may be borrowed. For the benefit of those who may not be familiar with the linguistic situation in Norway, there are 2 written standards and one of them, Nynork, was devised in the 19th century with the aim of providing a written form that is closer to the dialectal varieties in the country, as opposed to the Danish influence seen in the other standard, Bokmål. There is no such thing as a standard spoke Norwegian, and everyone just uses their dialect in all situations. This is why you can have a situation like the example I gave where one verb may have 3 (in some cases even 6) different conjugation patterns and 2 (in some cases 4) different infinitive forms, and all of them are considered correct.
     
    Eryk, fissatu and paul like this.

Share This Page