Skip to content

Commit

Permalink
HTML pages updated
Browse files Browse the repository at this point in the history
  • Loading branch information
michnov committed Nov 6, 2024
1 parent ed2292d commit 15ab2c7
Show file tree
Hide file tree
Showing 6 changed files with 1,284 additions and 23 deletions.
59 changes: 45 additions & 14 deletions data_preparation/70.releasing/html/db_residency-cs.html
Original file line number Diff line number Diff line change
Expand Up @@ -149,20 +149,51 @@
<div stle='margin-top: 0px; margin-bottom: 20px;'><span class=langon>EN</span> | <span class=langoff><a href='/services/teitok-live/evaldio/cs/index.php?action=db_residency-cs'>CS</a></span></div><p class='header'><a href='index.php'>Evaldio</a></p><ul style='text-align: left'><li><a href='index.php?action=databases'>Databases</a><li><a href='index.php?action=browser'>Browse</a><li><a href='index.php?action=cqp'>Search</a><li><a target="repository" href='http://hdl.handle.net/11234/1-5731'>Download</a></ul><ul style='text-align: left'><li><a href='index.php?action=login' >Login</a></ul><hr style='opacity: 0.5; margin-top: 40px;'><p id=powby style='opacity: 0.5; font-size: smaller;'><span onClick="window.open('http://www.teitok.org/index.php', 'teitok');">Powered by <span style='font-family: Courier;'>&lt;TEI:TOK&gt;</span></span><br><span onClick="window.open('http://www.teitok.org/index.php?action=credits', 'teitok');">Maarten Janssen, 2014-</a></p>
</div>
<div id="main">
<h1>Datab&aacute;ze mluven&yacute;ch projevů v če&scaron;tině jako ciz&iacute;m jazyce (trval&yacute; pobyt v ČR)</h1>
<p dir="auto">Jazykov&yacute; korpus byl vytvořen v &Uacute;stavu form&aacute;ln&iacute; a aplikovan&eacute; lingvistiky Matematicko-fyzik&aacute;ln&iacute; fakulty Univerzity Karlovy za &uacute;čelem podpory v&yacute;uky, v&yacute;zkumu a hodnocen&iacute; jazykov&eacute; kompetence nerodil&yacute;ch mluvč&iacute;ch če&scaron;tiny. C&iacute;lem je poskytnout strukturovan&yacute; a snadno př&iacute;stupn&yacute; zdroj autentick&yacute;ch mluven&yacute;ch dat pro lingvisty, pedagogy, studenty, veřejnost a vědeckou komunitu. Korpus se zaměřuje na jazykovou &uacute;roveň A2, kter&aacute; je potřebn&aacute; pro udělen&iacute; trval&eacute;ho pobytu v Česk&eacute; republice. Audionahr&aacute;vky pro datab&aacute;zi poskytl &Uacute;stav jazykov&eacute; a odborn&eacute; př&iacute;pravy Univerzity Karlovy (ujop.cuni.cz).</p>
<h3><a href="index.php?action=browser&amp;class=database&amp;val=Datab&aacute;ze+mluven&yacute;ch+projevů+v+če&scaron;tině+jako+ciz&iacute;m+jazyce+%28trval&yacute;+pobyt+v+ČR%29">Vstup do korpusu &ndash; prohl&iacute;žen&iacute;</a></h3>
<h3><a href="https://lindat.mff.cuni.cz/services/teitok-live/evaldio/cs/index.php?action=cqp">Hled&aacute;n&iacute; v korpusu</a></h3>
<h3>Popis korpusu</h3>
<h3>Technick&aacute; dokumentace</h3>
<h3>Uživatelsk&aacute; př&iacute;ručka</h3>
<h3 dir="auto"><a href="https://ufal.mff.cuni.cz/automated-speech-scoring-czech">Str&aacute;nky projektu</a></h3>
<h3 dir="auto">Financov&aacute;n&iacute;</h3>
<p dir="auto">Vznik datab&aacute;ze byl financov&aacute;n z prostředků Programu na podporu aplikovan&eacute;ho v&yacute;zkumu v oblasti n&aacute;rodn&iacute; a kulturn&iacute; identity na l&eacute;ta 2023 až 2030 (NAKI III) Ministerstva kultury ČR v r&aacute;mci projektu <em>Automatick&eacute; hodnocen&iacute; mluven&eacute;ho projevu v če&scaron;tině</em> (DH23P03OVV037).</p>
<h3 dir="auto">Jak citovat</h3>
<p>Rysov&aacute; Kateřina, Nov&aacute;k Michal, Rysov&aacute; Magdal&eacute;na, Pol&aacute;k Peter, Bojar Ondřej: <em>Datab&aacute;ze mluven&yacute;ch projevů v če&scaron;tině jako ciz&iacute;m jazyce (trval&yacute; pobyt v ČR)</em>. &Uacute;stav form&aacute;ln&iacute; a aplikovan&eacute; lingvistiky MFF UK, Praha 2024. Dostupn&aacute; z WWW&nbsp;<a href="https://lindat.mff.cuni.cz/services/teitok-live/evaldio/cs/index.php?action=db_residency" rel="nofollow">https://lindat.mff.cuni.cz/services/teitok-live/evaldio/cs/index.php?action=db_residency</a>.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h1 id="datab&aacute;ze-mluven&yacute;ch-projevů-v-če&scaron;tině-jako-ciz&iacute;m-jazyce-trval&yacute;-pobyt-v-čr">Datab&aacute;ze mluven&yacute;ch projevů v če&scaron;tině jako ciz&iacute;m jazyce (trval&yacute; pobyt v ČR)</h1>
<p>Datab&aacute;ze mluven&yacute;ch projevů v če&scaron;tině jako ciz&iacute;m jazyce (trval&yacute; pobyt v ČR) je jazykov&yacute; korpus mluven&yacute;ch projevů nerodil&yacute;ch mluvč&iacute;ch če&scaron;tiny zaměřen&yacute; na jazykovou &uacute;roveň A2 (podle SERR), požadovanou pro udělen&iacute; trval&eacute;ho pobytu v Česk&eacute; republice. Obsahuje nahr&aacute;vky zaznamen&aacute;vaj&iacute;c&iacute; &uacute;stn&iacute; č&aacute;st <a href="http://ujop.cuni.cz/cce">Certifikovan&eacute; zkou&scaron;ky z če&scaron;tiny pro cizince</a>. Nahr&aacute;vky zahrnuj&iacute; dialogy mezi zkou&scaron;ej&iacute;c&iacute;m (rodil&yacute;m mluvč&iacute;m) a kandid&aacute;tem zkou&scaron;ky (nerodil&yacute;m mluvč&iacute;m). Kromě nahr&aacute;vek korpus obsahuje tak&eacute; jejich přepisy, kter&eacute; jsou opatřeny bohatou lingvistickou anotac&iacute;. K někter&yacute;m nahr&aacute;vk&aacute;m je připojeno v&iacute;ce přepisů od různ&yacute;ch anot&aacute;torů, což umožňuje srovn&aacute;n&iacute; různ&yacute;ch přepisů t&eacute;že nahr&aacute;vky a vyhodnocen&iacute; m&iacute;ry shody při převodu mluven&eacute; řeči do psan&eacute;ho textu.</p>
<p>Korpus je zveřejněn jako specializovan&aacute; veřejn&aacute; datab&aacute;ze s c&iacute;lem poskytnout strukturovan&yacute; a snadno př&iacute;stupn&yacute; zdroj autentick&yacute;ch mluven&yacute;ch dat pro lingvisty, pedagogy, studenty, vědeckou komunitu a &scaron;irokou veřejnost.</p>
<p>Jazykov&yacute; korpus byl vytvořen v <a href="https://ufal.mff.cuni.cz/">&Uacute;stavu form&aacute;ln&iacute; a aplikovan&eacute; lingvistiky Matematicko-fyzik&aacute;ln&iacute; fakulty Univerzity Karlovy</a> za &uacute;čelem podpory v&yacute;uky, v&yacute;zkumu a hodnocen&iacute; jazykov&eacute; kompetence nerodil&yacute;ch mluvč&iacute;ch če&scaron;tiny v r&aacute;mci projektu <a href="https://ufal.mff.cuni.cz/automated-speech-scoring-czech"><em>Automatick&eacute; hodnocen&iacute; mluven&eacute;ho projevu v če&scaron;tině</em></a>. Audionahr&aacute;vky poskytl <a href="https://ujop.cuni.cz/">&Uacute;stav jazykov&eacute; a odborn&eacute; př&iacute;pravy Univerzity Karlovy</a> (ujop.cuni.cz).</p>
<h2 id="statistiky">Statistiky</h2>
<p>Datab&aacute;ze obsahuje 63 nahr&aacute;vek zachycuj&iacute;c&iacute;ch stejn&yacute; počet zkou&scaron;ek a stejn&yacute; počet nerodil&yacute;ch mluvč&iacute;ch. Celkov&aacute; d&eacute;lka v&scaron;ech nahr&aacute;vek je 3h 15min 40s. Tabulka n&iacute;že ukazuje statistiky přepisů, přičemž pro každou nahr&aacute;vku byl vybr&aacute;n pr&aacute;vě jeden kanonick&yacute; přepis.</p>
<table>
<thead>
<tr class="header">
<th>&nbsp;</th>
<th style="text-align: right;">V&scaron;echny</th>
<th style="text-align: right;">Kanonick&eacute;</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Soubory</td>
<td style="text-align: right;">106</td>
<td style="text-align: right;">63</td>
</tr>
<tr class="even">
<td>Repliky</td>
<td style="text-align: right;">4 773</td>
<td style="text-align: right;">2 888</td>
</tr>
<tr class="odd">
<td>Tokeny</td>
<td style="text-align: right;">33 267</td>
<td style="text-align: right;">20 035</td>
</tr>
</tbody>
</table>
<h2 id="dokumentace">Dokumentace</h2>
<ul>
<li><a href="index.php?action=db_residency_manual">Uživatelsk&aacute; př&iacute;ručka</a></li>
<li><a href="index.php?action=db_residency_techdoc">Technick&aacute; dokumentace</a></li>
</ul>
<h2 id="licence">Licence</h2>
<p>Korpus je zveřejněn pod licenc&iacute; CC BY-NC-SA 4.0.</p>
<h2 id="financov&aacute;n&iacute;">Financov&aacute;n&iacute;</h2>
<p>Vznik datab&aacute;ze byl financov&aacute;n z prostředků Programu na podporu aplikovan&eacute;ho v&yacute;zkumu v oblasti n&aacute;rodn&iacute; a kulturn&iacute; identity na l&eacute;ta 2023 až 2030 (NAKI III) Ministerstva kultury ČR v r&aacute;mci projektu <em>Automatick&eacute; hodnocen&iacute; mluven&eacute;ho projevu v če&scaron;tině</em> (DH23P03OVV037).</p>
<h2 id="poděkov&aacute;n&iacute;">Poděkov&aacute;n&iacute;</h2>
<p>Autoři datab&aacute;ze srdečně děkuj&iacute; PhDr. Pavlovi Pečen&eacute;mu, Ph.D., z &Uacute;stavu jazykov&eacute; a odborn&eacute; př&iacute;pravy Univerzity Karlovy za poskytnut&iacute; audiodat.</p>
<h2 id="jak-citovat">Jak citovat</h2>
<p>Rysov&aacute; Kateřina, Nov&aacute;k Michal, Rysov&aacute; Magdal&eacute;na, Pol&aacute;k Peter, Bojar Ondřej: <em>Datab&aacute;ze mluven&yacute;ch projevů v če&scaron;tině jako ciz&iacute;m jazyce (trval&yacute; pobyt v ČR)</em>. &Uacute;stav form&aacute;ln&iacute; a aplikovan&eacute; lingvistiky MFF UK, Praha 2024. Dostupn&aacute; z WWW <a href="https://lindat.mff.cuni.cz/services/teitok-live/evaldio/cs/index.php?action=db_residency">https://lindat.mff.cuni.cz/services/teitok-live/evaldio/cs/index.php?action=db_residency</a>.</p>
</div>
</div>

Expand Down
49 changes: 45 additions & 4 deletions data_preparation/70.releasing/html/db_residency.html
Original file line number Diff line number Diff line change
Expand Up @@ -149,10 +149,51 @@
<div stle='margin-top: 0px; margin-bottom: 20px;'><span class=langon>EN</span> | <span class=langoff><a href='/services/teitok-live/evaldio/cs/index.php?action=db_residency'>CS</a></span></div><p class='header'><a href='index.php'>Evaldio</a></p><ul style='text-align: left'><li><a href='index.php?action=databases'>Databases</a><li><a href='index.php?action=browser'>Browse</a><li><a href='index.php?action=cqp'>Search</a><li><a target="repository" href='http://hdl.handle.net/11234/1-5731'>Download</a></ul><ul style='text-align: left'><li><a href='index.php?action=login' >Login</a></ul><hr style='opacity: 0.5; margin-top: 40px;'><p id=powby style='opacity: 0.5; font-size: smaller;'><span onClick="window.open('http://www.teitok.org/index.php', 'teitok');">Powered by <span style='font-family: Courier;'>&lt;TEI:TOK&gt;</span></span><br><span onClick="window.open('http://www.teitok.org/index.php?action=credits', 'teitok');">Maarten Janssen, 2014-</a></p>
</div>
<div id="main">
<h1>Database of Spoken Czech as a Foreign Language (Permanent Residency in the Czech Republic)</h1>
<p><span style="font-size: 11pt; font-family: 'arial', sans-serif; color: #000000; background-color: transparent; font-weight: 400; font-style: normal; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap;">The database was funded by the Programme to Support Applied Research in the Area of the National and Cultural Identity for the Years 2023 to 2030 (NAKI III) of the Ministry of Culture of the Czech Republic within the project <em>Automated Speech Scoring in Czech</em> (DH23P03OVV037).</span></p>
<p>Rysov&aacute; Kateřina, Nov&aacute;k Michal, Rysov&aacute; Magdal&eacute;na, Pol&aacute;k Peter, Bojar Ondřej: <em>Database of Spoken Czech as a Foreign Language (Permanent Residency in the Czech Republic)</em>. Institute of Formal and Applied Linguistics MFF UK, Prague 2024. Available from WWW <a href="https://lindat.mff.cuni.cz/services/teitok-live/evaldio/en/index.php?action=db_residency" rel="nofollow">https://lindat.mff.cuni.cz/services/teitok-live/evaldio/en/index.php?action=db_residency</a>.</p>
<p><a href="index.php?action=browser&amp;class=database&amp;val=Datab&aacute;ze+mluven&yacute;ch+projevů+v+če&scaron;tině+jako+ciz&iacute;m+jazyce+%28trval&yacute;+pobyt+v+ČR%29">Enter Corpus</a></p>
<h1 id="database-of-spoken-czech-as-a-foreign-language-permanent-residency-in-the-czech-republic">Database of Spoken Czech as a Foreign Language (Permanent Residency in the Czech Republic)</h1>
<p>Database of Spoken Czech as a Foreign Language (Permanent Residency in the Czech Republic) is the language corpus of spoken performances by non-native speakers of Czech focused on A2 level (according to the CEFR), which is required for the granting of permanent residency in the Czech Republic. It includes recordings capturing the oral part of the <a href="https://ujop.cuni.cz/UJOPEN-70.html?ujopcmsid=12:czech-language-certificate-exam-cce">Czech Language Certificate Exam</a>. The recordings consist of dialogues between the examiner (a native speaker) and the candidate (a non-native speaker). In addition to the recordings, the corpus also contains their transcriptions, which are richly linguistically annotated. Some recordings are accompanied by multiple transcriptions from different annotators, allowing for comparisons of various transcripts of the same recording and evaluations of the degree of consistency in converting spoken language into written text.</p>
<p>The corpus is published as a specialized public database aimed at providing a structured and easily accessible source of authentic spoken data for linguists, educators, students, the scientific community, and the general public.</p>
<p>The corpus was created at the <a href="https://ufal.mff.cuni.cz/">Institute of Formal and Applied Linguistics at the Faculty of Mathematics and Physics, Charles University</a> to support teaching, research, and assessment of language competence among non-native speakers of Czech as part of the project <a href="https://ufal.mff.cuni.cz/automated-speech-scoring-czech"><em>Automated Speech Scoring in Czech</em></a>. Audio recordings were provided by the <a href="https://ujop.cuni.cz/UJOPEN-1.html">Institute for Language and Preparatory Studies, Charles University</a> (ujop.cuni.cz).</p>
<h2 id="statistics">Statistics</h2>
<p>The database contains 63 recordings, capturing the same number of tests and the same number of non-native speakers. The total length of all recordings is 3h 15min 40s. The table below shows the transcription statistics, with one canonical transcription selected for each recording.</p>
<table>
<thead>
<tr class="header">
<th>&nbsp;</th>
<th style="text-align: right;">All</th>
<th style="text-align: right;">Canonical</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Files</td>
<td style="text-align: right;">106</td>
<td style="text-align: right;">63</td>
</tr>
<tr class="even">
<td>Utterances</td>
<td style="text-align: right;">4,773</td>
<td style="text-align: right;">2,888</td>
</tr>
<tr class="odd">
<td>Tokens</td>
<td style="text-align: right;">33,267</td>
<td style="text-align: right;">20,035</td>
</tr>
</tbody>
</table>
<h2 id="documentation">Documentation</h2>
<ul>
<li><a href="index.php?action=db_residency_manual">User Manual</a></li>
<li><a href="index.php?action=db_residency_techdoc">Technical Documentation</a></li>
</ul>
<h2 id="license">License</h2>
<p>The corpus is published under the CC BY-NC-SA 4.0 license.</p>
<h2 id="acknowledgment">Acknowledgment</h2>
<p>The database was funded by the Programme to Support Applied Research in the Area of the National and Cultural Identity for the Years 2023 to 2030 (NAKI III) of the Ministry of Culture of the Czech Republic within the project <em>Automated Speech Scoring in Czech</em> (DH23P03OVV037).</p>
<h2 id="special-thanks">Special Thanks</h2>
<p>The authors of the database sincerely thank PhDr. Pavel Pečen&yacute;, Ph.D., from the Institute for Language and Preparatory Studies, Charles University for providing audio data.</p>
<h2 id="how-to-cite">How to Cite</h2>
<p>Rysov&aacute; Kateřina, Nov&aacute;k Michal, Rysov&aacute; Magdal&eacute;na, Pol&aacute;k Peter, Bojar Ondřej: <em>Database of Spoken Czech as a Foreign Language (Permanent Residency in the Czech Republic)</em>. Institute of Formal and Applied Linguistics MFF UK, Prague 2024. Available from WWW https://lindat.mff.cuni.cz/services/teitok-live/evaldio/en/index.php?action=db_residency.</p>
</div>
</div>

Expand Down
Loading

0 comments on commit 15ab2c7

Please sign in to comment.