index.html

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Zooniverse Startwords</title>
<meta name="description" content="Supplementary web content for the Zooniverse article published on Startwords in 2021/2022. Contains HTML + JavaScript examples for the multi-language transcription keyboard built for the Scribes of the Cairo Geniza website.">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#00979D">
<link href="./main.css" rel="stylesheet">
</head>
<body>
<header>
  <h1>Zooniverse Startwords</h1>
</header>
<main>
  <h2>Introduction</h2>
  <section>
    <p>
      Welcome! This website contains supplementary content for the <a href="https://www.zooniverse.org/" target="_blank">Zooniverse</a>
      article published on <a href="https://startwords.cdh.princeton.edu/" target="_blank">Startwords</a>
      in 2021/2022. Contains HTML + JavaScript examples for the multi-language
      transcription keyboard built for the <a href="https://www.scribesofthecairogeniza.org/" target="_blank">Scribes of the Cairo Geniza</a>
      website.
    </p>
    <p>
      The following sections describe <b>how to build your own basic on-screen
      multi-language transcription keyboard</b> from scratch, and is designed
      for <i>researchers and programmers with some understanding of HTML and
      JavaScript.</i>
    </p>
    <p>
      If you're a more experienced web developer who's very familiar with React,
      you may want to jump straight into the <a href="https://github.com/zooniverse/scribes-of-the-cairo-geniza" target="_blank">Scribes of the Cairo Geniza repo</a>,
      possibly starting with the <a href="https://github.com/zooniverse/scribes-of-the-cairo-geniza/blob/master/src/components/AnnotationKeyboard.jsx" target="_blank">AnnotationKeyboard</a>
      component.
    </p>
  </section>
  <h2>A HTML Form</h2>
  <section>
    <a name="section-01"></a>
    <h3>01. The Basics: A Form With Some Text Input</h3>
    <p>
      Let's start by setting up a very basic web form. It has one text input
      field, one submit button, and one output panel.
    </p>
    <iframe src="./section-01.html" width="640" height="240"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-01.html" target="_blank">(view source code)</a>
    <p>
      This example breaks down the idea of a "text transcription website" to its
      simplest form. The text input field is what the user types into, and the
      output panel is (let's pretend) the server they're submitting their
      transcriptions to.
    </p>
    <p>
      Everything we build from this point onwards is meant to solve one very
      simple problem: <b>how do we allow users to type, into that text input
      field, in a language that's not native to theirs?</b> e.g. how do we help
      a user type in the text "ごはんを食べる" when they only have a
      US-International QWERTY keyboard, and we don't want to ask them to futz
      about in their computer settings to install a Japanese language pack?
    </p>
  </section>
  <section>
    <a name="section-02"></a>
    <h3>02. A Simple On-screen Keyboard</h3>
    <p>
      A straightforward solution is to create an on-screen keyboard for the
      user. In this example, we create a Japanese keyboard with 5 characters.
      Clicking on each button/"keyboard key" adds the corresponding character
      to the end of the text input field.
    </p>
    <p class="note">
      Note: we're using the Japanese hiragana characters あいうえお here because
      they map easily to the English characters AIUEO, and are written
      left-to-right. We'll build up to more complex alphabets, such as Hebrew
      and its right-to-left layout, in later sections.
    </p>
    <iframe src="./section-02.html" width="640" height="320"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-02.html" target="_blank">(view source code)</a>
    <p>
      The code here is simple, but we already come across a problem: what if the
      user wants to add a Japanese character in the middle (instead of at the
      end) of the text box? This is, after all, a very basic function for a
      normal text box - you can place the text cursor/caret at any part of
      the existing text and then start typing.
    </p>
  </section>
  <section>
    <a name="section-03"></a>
    <h3>03. Text Selection</h3>
    <p>
      This is actually a solved problem: we use the standard <code>HTMLInputElement</code>'s
      <code>selectionStart</code>, <code>selectionEnd</code>, and <code>setSelectionRange</code>
      to interact with the "text cursor" on the text input field.
    </p>
    <iframe src="./section-03.html" width="640" height="320"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-03.html" target="_blank">(view source code)</a>
    <p>
      In the example above, we've done two things in the code: 1. we ensure the
      Japanese characters are inserted at the position of the text cursor/caret,
      and 2. we ensure the text input maintains focus after the insertion. These
      may seem like minor coding considerations, but they're important to
      <b>ensure a consistent UX (User Experience), since users often have
      pre-set expectations on how UI (User Interface) elements should
      behave</b>.
    </p>
  </section>
  <section>
    <a name="section-04"></a>
    <h3>04. Physical Keyboard Key Capture</h3>
    <p>
      Alright, so we now have an on-screen keyboard. But what about the user's
      physical keyboard? A user might find it easier to use their physical
      keyboard to do text transcription, compared to clicking each on-screen
      keyboard button individually. With that in mind, let's try to translate
      those physical key presses into our custom character input.
    </p>
    <p>
      In this example, when the user presses the "A" key on their keyboard,
      the Japanese character あ is inserted into the text field instead. Same
      for the other characters: A -> あ , I -> い, U -> う, E -> え, O -> お
    </p>
    <iframe src="./section-04.html" width="640" height="320"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-04.html" target="_blank">(view source code)</a>
    <p class="note">
      If you have an on-screen keyboard AND you're capturing physical key input,
      it's a good idea to label those on-screen keyboard buttons with the
      corresponding physical keys.
    </p>
    <p>
      One of the biggest considerations here is <b>what kind of physical
      keyboard does your user have?</b> In our examples, we're making a very
      hard assumption that all our users have US-International QWERTY keyboards,
      and we choose to map <i>physical keyboard keys</i> to their replacement
      characters.
    </p>
    <p class="note">
      Note: there are different ways to get what the user typed into a text
      field. <code>keyboardEvent.code</code> corresponds to the PHYSICAL key on
      the keyboard. <code>keyboardEvent.key</code> corresponds to the TEXT VALUE
      of the key. If a user presses the "A" key on a US-International QWERTY
      keyboard, we get <code>code='KeyA'</code>, and <code>key='a'</code> (if
      shift/capslock is off) or <code>key='A'</code> (if shift/capslock is on).
    </p>
    <p>
      <b>WARNING:</b> now that we know <i>how</i> to capture and replace
      keyboard input, we also need to learn <i>when not to do so.</i> Sometimes,
      when a user presses the "A" key, they just want to type in the character
      "A", not "あ"! <b>Always allow your users the option to disable your
      on-screen keyboard.</b> The example above has no such option, but we'll
      explore how we can do this once we jump into the "multi-language"
      functionality of our onscreen keyboard.
    </p>
  </section>
  <h2>Multi-Language Keyboards</h2>
  <section>
    <a name="section-05"></a>
    <h3>05. Code Cleanup</h3>
    <p>
      Before we proceed with the advanced considerations of creating an
      on-screen keyboard with multiple languages, let's clean up our code.
    </p>
    <p>
      In the example below, you won't see many changes in terms of UI
      functionality, but a lot of the source code was altered. Notably:
    </p>
    <ul>
      <li>
        The Japanese characters have now been compiled into a "Japanese
        keyboard" data object, setting the stage for <b>dynamically-generated
        keyboards</b> for different languages.
      </li>
      <li>
        Similarly, we now have "English keyboard" and "QWERTY layout" data
        objects that help ensure <b>the visual layout of the on-screen keyboard
        matches the user's physical keyboard.</b>
      </li>
    </ul>
    <iframe src="./section-05.html" width="640" height="420"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-05.html" target="_blank">(view source code)</a>
  </section>
  <section>
    <a name="section-06"></a>
    <h3>06. Language Selection</h3>
    <p>
      Now that we have cleaned up the code so that the English and Japanese
      keyboards are stored data objects, we see that it's very simple to add
      new languages/keyboards to the system, and to allow the user to switch
      between those languages/keyboards.
    </p>
    <p>
      To illustrate this point, we've added a joke "Emoji keyboard" that maps
      QWERTY keys to arbitrary emoji characters. Typing in "Hello world" into
      input text field will result in the emoji "text" of "🐟🤣🦋🦋😍 😅😍🥰🦋🐒"
    </p>
    <iframe src="./section-06.html" width="640" height="420"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-06.html" target="_blank">(view source code)</a>
    <p class="note">
      Note: there is an option to select "(No keyboard)" here, which disables
      the on-screen keyboard as well as key capture. As mentioned earlier,
      <b>always allow your users the option to disable your on-screen
      keyboard.</b>
    </p>
    <p>
      At this point, you might realise one limitation to our solution: our code
      simply re-maps the QWERTY keyboard, so we can only have one character for
      one key.
    </p>
    <p>
      While we started our examples with a very simple 5-character Japanese
      keyboard, we unfortunately have to discard it since a proper,
      fully-functional Japanese keyboard is beyond the scope of this work. The
      Japanese <i>hiragana</i> writing system alone has 48 common base
      characters, which can be further modified with diacritics, character size,
      etc.
    </p>
    <p>
      In the next section, we'll start adding a Hebrew keyboard. The Hebrew
      alphabet has 22 characters, which will map very easily to English/QWERTY's
      26 characters. However, the Hebrew alphabet will introduce a new wrinkle:
      <b>right-to-left text,</b> which we'll need to solve.
    </p>
  </section>
  <section>
    <a name="section-07"></a>
    <h3>07. Hebrew and RTL languages</h3>
    <p>
      With the given assumption that English is the "default" language of web
      code (yes, we know, that discussion is a can of worms), it's unsurprising
      that that layout of most web pages default to left-to-right,
      top-to-bottom.
    </p>
    <p>
      As a result, we must be conscientious when we create on-screen keyboards
      for languages to read right-to-left, such as Hebrew and Arabic. In the
      example below, we've done two things:
    </p>
    <ul>
      <li>
        We've upgraded the keyboard data objects so each language, in addition
        to having characters, also has an <b>explicit "direction" value.</b>
        (Either "ltr" or "rtl")
      </li>
      <li>
        The text input field has an explicit CSS <code>direction</code> value
        that changes depending on the active keyboard.
      </li>
    </ul>
    <iframe src="./section-07.html" width="640" height="420"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-07.html" target="_blank">(view source code)</a>
    <p>
      Since we're only interested in creating a functional on-screen keyboard,
      we only modified the CSS direction of the text input field. On the other
      hand, if you're creating, e.g. <i>a whole website</i> that supports both
      LTR and RTL languages, then you need to be conscientous about the
      <i>layout of your entire website,</i> and whether that layout needs to be
      flipped along the horizontal axis to make sense to RTL readers.
    </p>
    <p class="note">
      Fun(?) Note: mixing LTR text with RTL text can lead to extremely confusing
      UI interactions. For example, in the text input field below, using your
      mouse, try to highlight the word APPLE plus one character before it and
      one character after it, i.e. <code>"הAPPLEן"</code>. Good luck!
    </p>
    <p>
      <input type="text" value="הגדלAPPLEהקטןBANANAסובב" style="font-size: 1.5em; color: #666; width:80%; margin: 0 auto; display: block;">
    </p>
  </section>
  <h2>Visual Script References</h2>
  <section>
    <a name="section-08"></a>
    <h3>08. Keys with Visual Script References</h3>
    <p>
      Now that we've proven that it's possible to map different key input to
      characters from different languages, we need to solve another problem.
      Our users will be looking at <b>handwritten manuscripts</b> from different
      regions and different eras, so it'll be very useful if they can have
      a <b>visual reference</b> for the different kind of <b>scripts
      (handwritten text)</b> available.
    </p>
    <p>
      Fortunately, this is a fairly straightforward matter of adding images -
      for each character, from various scripts - to our visual keyboard.
    </p>
    <p>
      In our example below, we've added the "Yemenite Square" visual script
      reference for the Hebrew keyboard.
    </p>
    <iframe src="./section-08.html" width="720" height="520"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-08.html" target="_blank">(view source code)</a>
    <p>
      The actual hard work comes in two parts. First, it requires a human hand
      to create the reference image JPEG for each style of script, and to ensure
      it has a consistent layout. Second, there's a one-off upfront development
      cost to map the visuals to the data. We found that this early investment
      is well worth it when we get into the next section.
    </p>
    <p class="note">
      For our project, we decided to put every character of the "Yemenite
      Square" Hebrew script into a single image file (i.e. as opposed to having
      dozens of image files, one for each character) and used a CSS technique
      called "image sprites" to separate each character when needed. For
      example, when we want to show the 'Alef' א character (top row, right-most
      column) we tell the code to "crop" the image at x=440px y=0px width=50px
      height=50px.
    </p>
    <img
      alt="Visual Hebrew script reference for Yemenite Square"
      title="Visual Hebrew script reference for Yemenite Square"
      class="illustration"
      src="images/yemenite-square.jpg"
    >
    <p class="caption">
      Visual Hebrew script reference for Yemenite Square
    </p>
  </section>
  <section>
    <a name="section-09"></a>
    <h3>09. Multiple Visual Script References</h3>
    <p>
      There are several advantages to organising our "Yemenite Square" Hebrew
      script into a single image file. Smaller downloads for our users is one,
      but more importantly, its consistent visual layout allows us to use it as
      a template to quickly deploy <b>multiple visual scripts.</b>
    </p>
    <p>
      In the example below, you'll see that we've added <b>six new Hebrew
      scripts,</b> and if you check the code, it only required <i>six additional
      lines of code.</i>
    </p>
    <iframe src="./section-09.html" width="720" height="560"></iframe>
    <a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-09.html" target="_blank">(view source code)</a>
    <p>
      While it's now trivial to add new scripts from a code perspective, please
      remember that it still takes a considerable amount of effort to create
      each individual script's JPEG. (So developers, please remember to thank
      the people who've been scanning the manuscripts, manually identifying the
      handwritten characters, and putting them into a nice image file for us.)
    </p>
    <p class="note">
      Below, you can see three different Hebrew scripts that we used. You'll
      note that while we made an effort to keep the visual layout, character
      position, and character size consistent across every style of script,
      some scripts are missing certain characters. For example, both Maghrebi
      Cursive and Byzantine Miniscule don't have a visual reference for the
      "elongated Kaf" ך character. In these cases, we simply didn't have a
      visual reference from the source.
    </p>
    <img
      alt="Visual Hebrew script reference for Yemenite Square"
      title="Visual Hebrew script reference for Yemenite Square"
      class="illustration"
      src="images/yemenite-square.jpg"
    >
    <p class="caption">
      Visual Hebrew script reference for Yemenite Square
    </p>

    <img
      alt="Visual Hebrew script reference for Byzantine Minuscule"
      title="Visual Hebrew script reference for Byzantine Minuscule"
      class="illustration"
      src="images/byzantine-minuscule.jpg"
    >
    <p class="caption">
      Visual Hebrew script reference for Byzantine Minuscule
    </p>

    <img
      alt="Visual Hebrew script reference for Maghrebi Cursive"
      title="Visual Hebrew script reference for Maghrebi Cursive"
      class="illustration"
      src="images/maghrebi-cursive.jpg"
    >
    <p class="caption">
      Visual Hebrew script reference for Maghrebi Cursive
    </p>
  </section>
</main>
<footer>
  This <a target="_blank" href="https://github.com/shaunanoordin/zooniverse-startwords">repo</a> was authored by
  <a target="_blank" href="https://github.com/snblickhan">Sam Blickhan</a>,
  <a target="_blank" href="https://github.com/wgranger">Will Granger</a>,
  <a target="_blank" href="https://github.com/shaunanoordin">Shaun A. Noordin</a>, and
  <a target="_blank" href="https://github.com/beckyrother">Becky Rother</a>.
</footer>
</body>
</html>