-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
388 lines (386 loc) · 18.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Zooniverse Startwords</title>
<meta name="description" content="Supplementary web content for the Zooniverse article published on Startwords in 2021/2022. Contains HTML + JavaScript examples for the multi-language transcription keyboard built for the Scribes of the Cairo Geniza website.">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="theme-color" content="#00979D">
<link href="./main.css" rel="stylesheet">
</head>
<body>
<header>
<h1>Zooniverse Startwords</h1>
</header>
<main>
<h2>Introduction</h2>
<section>
<p>
Welcome! This website contains supplementary content for the <a href="https://www.zooniverse.org/" target="_blank">Zooniverse</a>
article published on <a href="https://startwords.cdh.princeton.edu/" target="_blank">Startwords</a>
in 2021/2022. Contains HTML + JavaScript examples for the multi-language
transcription keyboard built for the <a href="https://www.scribesofthecairogeniza.org/" target="_blank">Scribes of the Cairo Geniza</a>
website.
</p>
<p>
The following sections describe <b>how to build your own basic on-screen
multi-language transcription keyboard</b> from scratch, and is designed
for <i>researchers and programmers with some understanding of HTML and
JavaScript.</i>
</p>
<p>
If you're a more experienced web developer who's very familiar with React,
you may want to jump straight into the <a href="https://github.com/zooniverse/scribes-of-the-cairo-geniza" target="_blank">Scribes of the Cairo Geniza repo</a>,
possibly starting with the <a href="https://github.com/zooniverse/scribes-of-the-cairo-geniza/blob/master/src/components/AnnotationKeyboard.jsx" target="_blank">AnnotationKeyboard</a>
component.
</p>
</section>
<h2>A HTML Form</h2>
<section>
<a name="section-01"></a>
<h3>01. The Basics: A Form With Some Text Input</h3>
<p>
Let's start by setting up a very basic web form. It has one text input
field, one submit button, and one output panel.
</p>
<iframe src="./section-01.html" width="640" height="240"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-01.html" target="_blank">(view source code)</a>
<p>
This example breaks down the idea of a "text transcription website" to its
simplest form. The text input field is what the user types into, and the
output panel is (let's pretend) the server they're submitting their
transcriptions to.
</p>
<p>
Everything we build from this point onwards is meant to solve one very
simple problem: <b>how do we allow users to type, into that text input
field, in a language that's not native to theirs?</b> e.g. how do we help
a user type in the text "ごはんを食べる" when they only have a
US-International QWERTY keyboard, and we don't want to ask them to futz
about in their computer settings to install a Japanese language pack?
</p>
</section>
<section>
<a name="section-02"></a>
<h3>02. A Simple On-screen Keyboard</h3>
<p>
A straightforward solution is to create an on-screen keyboard for the
user. In this example, we create a Japanese keyboard with 5 characters.
Clicking on each button/"keyboard key" adds the corresponding character
to the end of the text input field.
</p>
<p class="note">
Note: we're using the Japanese hiragana characters あいうえお here because
they map easily to the English characters AIUEO, and are written
left-to-right. We'll build up to more complex alphabets, such as Hebrew
and its right-to-left layout, in later sections.
</p>
<iframe src="./section-02.html" width="640" height="320"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-02.html" target="_blank">(view source code)</a>
<p>
The code here is simple, but we already come across a problem: what if the
user wants to add a Japanese character in the middle (instead of at the
end) of the text box? This is, after all, a very basic function for a
normal text box - you can place the text cursor/caret at any part of
the existing text and then start typing.
</p>
</section>
<section>
<a name="section-03"></a>
<h3>03. Text Selection</h3>
<p>
This is actually a solved problem: we use the standard <code>HTMLInputElement</code>'s
<code>selectionStart</code>, <code>selectionEnd</code>, and <code>setSelectionRange</code>
to interact with the "text cursor" on the text input field.
</p>
<iframe src="./section-03.html" width="640" height="320"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-03.html" target="_blank">(view source code)</a>
<p>
In the example above, we've done two things in the code: 1. we ensure the
Japanese characters are inserted at the position of the text cursor/caret,
and 2. we ensure the text input maintains focus after the insertion. These
may seem like minor coding considerations, but they're important to
<b>ensure a consistent UX (User Experience), since users often have
pre-set expectations on how UI (User Interface) elements should
behave</b>.
</p>
</section>
<section>
<a name="section-04"></a>
<h3>04. Physical Keyboard Key Capture</h3>
<p>
Alright, so we now have an on-screen keyboard. But what about the user's
physical keyboard? A user might find it easier to use their physical
keyboard to do text transcription, compared to clicking each on-screen
keyboard button individually. With that in mind, let's try to translate
those physical key presses into our custom character input.
</p>
<p>
In this example, when the user presses the "A" key on their keyboard,
the Japanese character あ is inserted into the text field instead. Same
for the other characters: A -> あ , I -> い, U -> う, E -> え, O -> お
</p>
<iframe src="./section-04.html" width="640" height="320"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-04.html" target="_blank">(view source code)</a>
<p class="note">
If you have an on-screen keyboard AND you're capturing physical key input,
it's a good idea to label those on-screen keyboard buttons with the
corresponding physical keys.
</p>
<p>
One of the biggest considerations here is <b>what kind of physical
keyboard does your user have?</b> In our examples, we're making a very
hard assumption that all our users have US-International QWERTY keyboards,
and we choose to map <i>physical keyboard keys</i> to their replacement
characters.
</p>
<p class="note">
Note: there are different ways to get what the user typed into a text
field. <code>keyboardEvent.code</code> corresponds to the PHYSICAL key on
the keyboard. <code>keyboardEvent.key</code> corresponds to the TEXT VALUE
of the key. If a user presses the "A" key on a US-International QWERTY
keyboard, we get <code>code='KeyA'</code>, and <code>key='a'</code> (if
shift/capslock is off) or <code>key='A'</code> (if shift/capslock is on).
</p>
<p>
<b>WARNING:</b> now that we know <i>how</i> to capture and replace
keyboard input, we also need to learn <i>when not to do so.</i> Sometimes,
when a user presses the "A" key, they just want to type in the character
"A", not "あ"! <b>Always allow your users the option to disable your
on-screen keyboard.</b> The example above has no such option, but we'll
explore how we can do this once we jump into the "multi-language"
functionality of our onscreen keyboard.
</p>
</section>
<h2>Multi-Language Keyboards</h2>
<section>
<a name="section-05"></a>
<h3>05. Code Cleanup</h3>
<p>
Before we proceed with the advanced considerations of creating an
on-screen keyboard with multiple languages, let's clean up our code.
</p>
<p>
In the example below, you won't see many changes in terms of UI
functionality, but a lot of the source code was altered. Notably:
</p>
<ul>
<li>
The Japanese characters have now been compiled into a "Japanese
keyboard" data object, setting the stage for <b>dynamically-generated
keyboards</b> for different languages.
</li>
<li>
Similarly, we now have "English keyboard" and "QWERTY layout" data
objects that help ensure <b>the visual layout of the on-screen keyboard
matches the user's physical keyboard.</b>
</li>
</ul>
<iframe src="./section-05.html" width="640" height="420"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-05.html" target="_blank">(view source code)</a>
</section>
<section>
<a name="section-06"></a>
<h3>06. Language Selection</h3>
<p>
Now that we have cleaned up the code so that the English and Japanese
keyboards are stored data objects, we see that it's very simple to add
new languages/keyboards to the system, and to allow the user to switch
between those languages/keyboards.
</p>
<p>
To illustrate this point, we've added a joke "Emoji keyboard" that maps
QWERTY keys to arbitrary emoji characters. Typing in "Hello world" into
input text field will result in the emoji "text" of "🐟🤣🦋🦋😍 😅😍🥰🦋🐒"
</p>
<iframe src="./section-06.html" width="640" height="420"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-06.html" target="_blank">(view source code)</a>
<p class="note">
Note: there is an option to select "(No keyboard)" here, which disables
the on-screen keyboard as well as key capture. As mentioned earlier,
<b>always allow your users the option to disable your on-screen
keyboard.</b>
</p>
<p>
At this point, you might realise one limitation to our solution: our code
simply re-maps the QWERTY keyboard, so we can only have one character for
one key.
</p>
<p>
While we started our examples with a very simple 5-character Japanese
keyboard, we unfortunately have to discard it since a proper,
fully-functional Japanese keyboard is beyond the scope of this work. The
Japanese <i>hiragana</i> writing system alone has 48 common base
characters, which can be further modified with diacritics, character size,
etc.
</p>
<p>
In the next section, we'll start adding a Hebrew keyboard. The Hebrew
alphabet has 22 characters, which will map very easily to English/QWERTY's
26 characters. However, the Hebrew alphabet will introduce a new wrinkle:
<b>right-to-left text,</b> which we'll need to solve.
</p>
</section>
<section>
<a name="section-07"></a>
<h3>07. Hebrew and RTL languages</h3>
<p>
With the given assumption that English is the "default" language of web
code (yes, we know, that discussion is a can of worms), it's unsurprising
that that layout of most web pages default to left-to-right,
top-to-bottom.
</p>
<p>
As a result, we must be conscientious when we create on-screen keyboards
for languages to read right-to-left, such as Hebrew and Arabic. In the
example below, we've done two things:
</p>
<ul>
<li>
We've upgraded the keyboard data objects so each language, in addition
to having characters, also has an <b>explicit "direction" value.</b>
(Either "ltr" or "rtl")
</li>
<li>
The text input field has an explicit CSS <code>direction</code> value
that changes depending on the active keyboard.
</li>
</ul>
<iframe src="./section-07.html" width="640" height="420"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-07.html" target="_blank">(view source code)</a>
<p>
Since we're only interested in creating a functional on-screen keyboard,
we only modified the CSS direction of the text input field. On the other
hand, if you're creating, e.g. <i>a whole website</i> that supports both
LTR and RTL languages, then you need to be conscientous about the
<i>layout of your entire website,</i> and whether that layout needs to be
flipped along the horizontal axis to make sense to RTL readers.
</p>
<p class="note">
Fun(?) Note: mixing LTR text with RTL text can lead to extremely confusing
UI interactions. For example, in the text input field below, using your
mouse, try to highlight the word APPLE plus one character before it and
one character after it, i.e. <code>"הAPPLEן"</code>. Good luck!
</p>
<p>
<input type="text" value="הגדלAPPLEהקטןBANANAסובב" style="font-size: 1.5em; color: #666; width:80%; margin: 0 auto; display: block;">
</p>
</section>
<h2>Visual Script References</h2>
<section>
<a name="section-08"></a>
<h3>08. Keys with Visual Script References</h3>
<p>
Now that we've proven that it's possible to map different key input to
characters from different languages, we need to solve another problem.
Our users will be looking at <b>handwritten manuscripts</b> from different
regions and different eras, so it'll be very useful if they can have
a <b>visual reference</b> for the different kind of <b>scripts
(handwritten text)</b> available.
</p>
<p>
Fortunately, this is a fairly straightforward matter of adding images -
for each character, from various scripts - to our visual keyboard.
</p>
<p>
In our example below, we've added the "Yemenite Square" visual script
reference for the Hebrew keyboard.
</p>
<iframe src="./section-08.html" width="720" height="520"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-08.html" target="_blank">(view source code)</a>
<p>
The actual hard work comes in two parts. First, it requires a human hand
to create the reference image JPEG for each style of script, and to ensure
it has a consistent layout. Second, there's a one-off upfront development
cost to map the visuals to the data. We found that this early investment
is well worth it when we get into the next section.
</p>
<p class="note">
For our project, we decided to put every character of the "Yemenite
Square" Hebrew script into a single image file (i.e. as opposed to having
dozens of image files, one for each character) and used a CSS technique
called "image sprites" to separate each character when needed. For
example, when we want to show the 'Alef' א character (top row, right-most
column) we tell the code to "crop" the image at x=440px y=0px width=50px
height=50px.
</p>
<img
alt="Visual Hebrew script reference for Yemenite Square"
title="Visual Hebrew script reference for Yemenite Square"
class="illustration"
src="images/yemenite-square.jpg"
>
<p class="caption">
Visual Hebrew script reference for Yemenite Square
</p>
</section>
<section>
<a name="section-09"></a>
<h3>09. Multiple Visual Script References</h3>
<p>
There are several advantages to organising our "Yemenite Square" Hebrew
script into a single image file. Smaller downloads for our users is one,
but more importantly, its consistent visual layout allows us to use it as
a template to quickly deploy <b>multiple visual scripts.</b>
</p>
<p>
In the example below, you'll see that we've added <b>six new Hebrew
scripts,</b> and if you check the code, it only required <i>six additional
lines of code.</i>
</p>
<iframe src="./section-09.html" width="720" height="560"></iframe>
<a class="link-to-code" href="https://github.com/shaunanoordin/zooniverse-startwords/blob/master/section-09.html" target="_blank">(view source code)</a>
<p>
While it's now trivial to add new scripts from a code perspective, please
remember that it still takes a considerable amount of effort to create
each individual script's JPEG. (So developers, please remember to thank
the people who've been scanning the manuscripts, manually identifying the
handwritten characters, and putting them into a nice image file for us.)
</p>
<p class="note">
Below, you can see three different Hebrew scripts that we used. You'll
note that while we made an effort to keep the visual layout, character
position, and character size consistent across every style of script,
some scripts are missing certain characters. For example, both Maghrebi
Cursive and Byzantine Miniscule don't have a visual reference for the
"elongated Kaf" ך character. In these cases, we simply didn't have a
visual reference from the source.
</p>
<img
alt="Visual Hebrew script reference for Yemenite Square"
title="Visual Hebrew script reference for Yemenite Square"
class="illustration"
src="images/yemenite-square.jpg"
>
<p class="caption">
Visual Hebrew script reference for Yemenite Square
</p>
<img
alt="Visual Hebrew script reference for Byzantine Minuscule"
title="Visual Hebrew script reference for Byzantine Minuscule"
class="illustration"
src="images/byzantine-minuscule.jpg"
>
<p class="caption">
Visual Hebrew script reference for Byzantine Minuscule
</p>
<img
alt="Visual Hebrew script reference for Maghrebi Cursive"
title="Visual Hebrew script reference for Maghrebi Cursive"
class="illustration"
src="images/maghrebi-cursive.jpg"
>
<p class="caption">
Visual Hebrew script reference for Maghrebi Cursive
</p>
</section>
</main>
<footer>
This <a target="_blank" href="https://github.com/shaunanoordin/zooniverse-startwords">repo</a> was authored by
<a target="_blank" href="https://github.com/snblickhan">Sam Blickhan</a>,
<a target="_blank" href="https://github.com/wgranger">Will Granger</a>,
<a target="_blank" href="https://github.com/shaunanoordin">Shaun A. Noordin</a>, and
<a target="_blank" href="https://github.com/beckyrother">Becky Rother</a>.
</footer>
</body>
</html>