-
Notifications
You must be signed in to change notification settings - Fork 18
/
Searchability.html
420 lines (389 loc) · 15.8 KB
/
Searchability.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8"/>
<style>
/*
This demo uses the method devised by Roel Nieskens for accessing
OpenType features. For further explanation, see
https://pixelambacht.nl/2019/fixing-variable-font-inheritance/.
*/
/*
font-family names this family and associates this declaration with
other possible declarations for Junicode 2.
src:url() points to the font file.
font-weight is either normal (400) or bold (700).
font-style: normal indicates that this declaration is for an upright
or roman font.
These fonts have been subsetted, so they're a fraction of their
original size.
*/
@font-face {
font-family: 'JuniusX';
src:url("./webfiles/JuniusX-Regular.woff2");
font-weight: normal;
font-style:normal;
}
@font-face {
font-family: 'JuniusX';
src:url("./webfiles/JuniusX-Bold.woff2");
font-weight: bold;
font-style:normal;
}
/*
In the :root declaration we define variablesa for the OpenType
tags we'll be using, together with their default values. We define
them here because :root is as far up the document tree as you can
go.
To use this model for your document, identify all the feature tags
you'll be using in your text and set the defaults here.
*/
:root {
--jvf-ss03: "ss03" off;
--jvf-ss17: "ss17" off;
--jvf-pcap: "pcap" off;
--jvf-ss16: "ss16" off;
--jvf-cv05: "cv05" off;
--jvf-cv09: "cv09" off;
--jvf-cv13: "cv13" off;
}
/*
Style the body. Make the font big so everybody can see what's
going on.
*/
body {
font-family: 'JuniusX';
font-size: 28px;
line-height: 150%;
margin-left: 10%;
margin-right: 10%;
}
/*
An asterisk selects any element. Every time we hit an element, we'll
call font-feature-settings to explicitly set all the OpenType features
we're using in this text. This block is called after the more specific
ones, so the variables we use in this rule will have been set in a
class definition.
*/
* {
font-feature-settings: var(--jvf-ss03),
var(--jvf-ss17),
var(--jvf-pcap),
var(--jvf-ss16),
var(--jvf-cv05),
var(--jvf-cv09),
var(--jvf-cv13);
}
/*
Define the elements we'll be using. Notice how we change the style
by setting the value of the variables in :root.
*/
h1 {
font-size: 115%;
font-weight: normal;
font-style: normal;
}
/*
pre will automatically use a fixed-width font. We won't mess with
whatever the standard is, except to make the type a bit smaller.
*/
pre {
font-size: 85%;
}
/*
Define classes. We only need to change the variable that's relevant
to each class: the others will remain at their current values.
*/
.mufitext {
margin-left: 40px;
}
/*
Note: the cvNN values are indexes into the list of values they
contain. However, "off" will also work to turn the features off.
*/
.medtext {
margin-left: 40px;
--jvf-ss03: "ss03" on;
--jvf-ss16: "ss16" on;
--jvf-cv05: "cv05" 1;
--jvf-cv09: "cv09" 1;
}
.dgrph {
--jvf-ss17: "ss17" on;
}
.longs {
--jvf-ss03: "ss03" on;
}
.pcap {
--jvf-pcap: "pcap" on;
}
.norrotunda {
--jvf-ss16: "ss16" off;
}
.dotlessi {
--jvf-cv13: "cv13" 1;
}
/* No OpenType features for this one, but just a big red letter. */
.initcap {
color: red;
font-size: 150%;
}
/*
The restoration and deletion classes provide an alternative way of
displaying MENOTA's brackets for editorial restorations and scribal
deletions, using the CSS content property. Using this method, the
brackets are not in the underlying text, but only in the displayed
text. This may or may not be desirable.
*/
.restoration:before { content: "["; }
.restoration:after { content: "]"; }
.deletion:before { content: "⸠"; }
.deletion:after { content: "⸡"; }
</style>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-DGTTVQHWG1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-DGTTVQHWG1');
</script>
</head>
<body lang="en">
<h1>Searchable web pages with MUFI and Junicode 2</h1>
<p>
Junicode 2 fully implements the recommendations of the
<a href="https://mufi.info/m.php?p=mufi">Medieval Unicode
Font Initiative</a> (MUFI), which offers an encoding scheme for medieval
texts based on <a href="https://home.unicode.org/">Unicode</a>,
the worldwide standard for text
exchange. You probably know that a computer text is a string of
numbers, each representing a character: using Unicode, you employ a
standard mapping of numbers to characters, ensuring that a text sent to you
by (say) your friend in Thailand will
look the same when it reaches you as it did when it left your friend’s
computer. MUFI does a similar job for medieval characters. For
example, if you need a Dutch libra sign, you won’t find it in Unicode;
but MUFI has it at code point U+F2EA (), and it will be recognizable
to anyone with a MUFI-compliant font (like Junicode 2).
</p>
<p>
To work its magic, MUFI selects characters of interest to medievalists
from the Unicode standard, and when a medieval character is missing
from Unicode it assigns a code point from a block of numbers set aside
for individuals and groups to use in any way they like: the
<b>Private Use Area</b>. MUFI-compliant fonts like Junicode 2 must all
agree on the mappings of code point to character defined by MUFI so
that texts can be exchanged reliably.
</p>
<p>
For example, here is the beginning of a diplomatic text of the Old
Norse <i>Vǫluspá</i>, copied and pasted from the
Medieval Nordic Text Archive (MENOTA—because Junicode 2 Italic is not yet
ready, the text is presented only in roman type):
</p>
<p class="mufitext" lang="is">
<span class="initcap">H</span>lıoðſ bið ec allar kinꝺir meiri oc miɴi maugo<br/>
heimꝺalar uilðo at ec ualꝼꜹþr uel ꝼyr telia ꝼoꝛn<br/>
ſpioll ꝼíra þꜹ er ꝼremſt um man. Ec mán iǫtna<br/>
ar um boꝛna þ⸠ꜹ̣⸡a er ꝼoꝛꝺom mic ꝼǫꝺꝺa hoꝼꝺo. nio man ec heima<br/>
nío iviþi[vr] miot uið mran ꝼyr molꝺ neðan.<br/>
</p>
<p>
Looks great, right? And it will read the same way whether it’s set in
Junicode 2, Andron Scriptor, Cardo, or Palemonas MUFI. There’s just one
problem—but it’s a significant one. To illustrate, look at the last
word in the second line:
<b>ꝼoꝛn</b>. That’s an <b>insular f</b> in the first position and an
<b>r rotunda</b> in the third—different flavors of <b>f</b> and <b>r</b>.
Now try searching
this text for the word “forn” using your browser’s search function
(Ctrl-F or Cmd-F). What happened? Your search skipped right over
<b>ꝼoꝛn</b> in the medieval text and landed on “forn” in this
paragraph. The fact that <b>f</b> and <b>r</b> are encoded differently
in this text from the way they’re encoded in plain text means that
this text can’t be searched the way most webpages can. (To be a little
more precise, some browsers will find the <b>f</b>, but none of them
will find the <b>r</b>).
</p>
<p>
MENOTA has its own powerful search tools, so that the inability to search
in the browsers is not a great loss; but this inability may also signal
problems with accessibility, which is increasingly
important to university administrations, funding agencies, and (of course)
end users. Is it possible, then, to create a MUFI-compliant webpage, rich
with medieval characters, that is both searchable and accessible?
</p>
<p>
Turns out that it is both possible and easy to create such a page
using Junicode 2. We’ll take it in two steps. The first will be to create
as plain an etext as we can get away with, and the second will be to
style this text using
<a href="https://www.w3.org/Style/CSS/Overview.en.html">CSS</a>
(“Cascading Style Sheets,” the standard for
styling web pages) and the features of Junicode 2.
</p>
<p>
Here’s my attempt to convert the passage to plain text:
</p>
<p class="mufitext" lang="is">
<span class="initcap">H</span>lioðs bið ec allar kindir meiri oc mini maugo<br/>
heimdalar uilðo at ec ualfavþr uel fyr telia forn<br/>
spioll fíra þav er fremst um man. Ec mán iǫtna<br/>
ar um borna þ⸠aṿ⸡a er fordom mic fǫdda hofdo. nio man ec heima<br/>
nío iviþi[vr] miot uið mę́ran fyr mold neðan.
</p>
<p>
If “plain text” is what you can type on a U.S. keyboard, we haven’t
quite made it: we still have
several accented characters and the Icelandic thorn and eth, not to
mention the strange brackets in line 4 (U+2E20 and U+2E21—we’ll deal
with these later). The Icelandic
characters are familiar enough to medievalists that we needn’t worry
about them. The dot under the <b>v</b> in what was an
<b class="dgrph">av</b> digraph (which we took
apart to make it searchable as <b>av</b>, but we’ll
put it back together later) is a <b>combining
diacritical mark</b>: you type it right after the base character, and
your software takes care of positioning it correctly. The accented
<b>ę́</b> was a character in the Private Use Area, but I
have changed it to <b>ę</b> (U+0119, common in modern Polish) plus
another combining mark, U+0301, the acute accent. That makes it
standard Unicode, like the other accented characters. The <b>ę</b>
is searchable as <b>e</b>, and browsers will
ignore the marks when searching.
</p>
<p>
The next step is to select what we need from among the OpenType
features of Junicode 2.
<a href="https://docs.microsoft.com/en-us/typography/opentype/spec/">OpenType</a>
is a standard for font construction that
allows fonts to do all kinds of clever things. To cite one common
example, when you type the letters “find” and the <b>f</b> and the <b>i</b> come out
joined in a ligature (“find”), an OpenType feature did that. But
Junicode 2 has more than eighty OpenType features that do a lot of useful
things.
</p>
<p>
Here are the features we’ll want to apply to the whole text:
</p>
<ul>
<li>All instances of <b>s</b> in this text are long
<span class="longs">(s)</span>: use the “Long
s” feature (hist or ss03) for that.</li>
<li>All instances of <b>d</b> are round: use
Character Variant 5 (cv05).</li>
<li>All instances of <b>f</b> are insular (ꝼ): use Character
Variant 9 (cv09).</li>
<li>Instances of <b>r</b> are round after rounded characters like <b>o</b> and <b>ꝺ</b>;
use “Contextual r rotunda” (ss16).</li>
</ul>
<p>
The features whose four-letter tags begin with <b>ss</b> are Stylistic
Sets, and they are either on or off. The Character Variant features
contain a list of variants of a particular character; we select from
this list with a numerical index (starting with 1).
To turn on these OpenType features, then, we need the following CSS rule:
</p>
<pre><code>font-feature-settings: "ss03" on, "cv05" 1, "cv09" 1, "ss16" on;</code></pre>
<p>
When we apply that to our fragment of <i>Vǫluspá</i>, here’s what we get:
</p>
<p class="medtext" lang="is">
<span class="initcap">H</span>lioðs bið ec allar
kindir meiri oc mini maugo<br/>
heimdalar uilðo at ec ualfavþr uel
fyr telia forn<br/>
spioll fíra þav er fremst um man. Ec mán iǫtna<br/>
ar um borna þ⸠aṿ⸡a er fordom mic fǫdda hofdo. nio man ec heima<br/>
nío iviþi[vr] miot uið mę́ran fyr mold neðan.
</p>
<p>
One line of CSS, and already we’re almost home! But there are several
special cases to attend to. There’s the dotless <b>i</b> in <b>Hlioðs</b>, the small
capital <b><span class="pcap">n</span></b> in <b>mi<span
class="pcap">n</span>i</b>, the <b class="dgrph">av</b> digraph that has to be
reassembled, and the <b>r rotunda</b> that should be a regular
<b>r</b> at the end of <b lang="is">ualfꜹþr</b>. For those, we devise CSS classes that turn
OpenType features on or off, and we apply these classes to individual
words or letters:
</p>
<ul>
<li>For dotless <b>i</b>, the style “dotlessi” selects the first variant
of <b>i</b> in Character Variant 13 (cv13 1).</li>
<li>For <b><span class="pcap">n</span></b>,
we need the “pcap” style, which turns on
the “petite capitals”
feature (pcap). (Petite capitals are different from regular small
caps in that they’re sized to be mixed with lowercase
letters.)</li>
<li>For ꜹ, a ligature with a phonetic value (like æ), distinct from other
ligatures, we use the “dgrph” style, which turns on “Rare Digraphs” (ss17).
</li>
<li>For the <b>r</b> that shouldn’t be rotunda, we use the “norrotunda”
style, which temporarily turns off the “Contextual r rotunda” feature (ss16).</li>
</ul>
<p>
As an example of what the HTML will look like, here’s the first line
of our text:
</p>
<pre><code><span class="initcap">H</span>l<span class="dotlessi">i</span>oðs bið
ec allar kindir meiri oc mi<span class="pcap">n</span>i maugo</code></pre>
<p>
Of course it looks illegible and hard to write, but for most projects it will
be generated automatically from an underlying text; and of course your readers
will not see the ugly HTML code or the CSS, but rather this:
</p>
<p class="medtext" lang="is">
<span class="initcap">H</span>l<span class="dotlessi">i</span>oðs bið ec allar
kindir meiri oc mi<span class="pcap">n</span>i maugo<br/>
heimdalar uilðo at ec ualf<span class="dgrph">av</span>þ<span class="norrotunda">r</span> uel
fyr telia forn<br/>
spioll fíra þ<span class="dgrph">av</span> er fremst um man. Ec mán iǫtna<br/>
ar um borna þ⸠<span class="dgrph">aṿ</span>⸡a er fordom mic fǫdda hofdo. nio man ec heima<br/>
nío iviþi[vr] miot uið mę́ran fyr mold neðan.
</p>
<p>
and if any of them should happen to hit Ctrl-F and perform a
quick-and-dirty search for “kindir” or “forn”
or “mini” or “<span lang="is">ualfavþr</span>” or “meran,” they’ll find what they’re looking for.
</p>
<p>
For one final refinement, we change the brackets in the text to CSS classes:
“deletion” and “restoration.” The brackets themselves are not inserted in
the text, but rather displayed via the CSS content property. The effect of this change
is that searches of the text will ignore the brackets: thus a user can search
for (and find) <b lang="is">iviþivr</b>. When displayed by this method, editorial interventions
can also be changed programmatically—e.g. highlighted or hidden.
The final text:
</p>
<p class="medtext" lang="is">
<span class="initcap">H</span>l<span class="dotlessi">i</span>oðs bið ec allar
kindir meiri oc mi<span class="pcap">n</span>i maugo<br/>
heimdalar uilðo at ec ualf<span class="dgrph">av</span>þ<span class="norrotunda">r</span> uel
fyr telia forn<br/>
spioll fíra þ<span class="dgrph">av</span> er fremst um man. Ec mán iǫtna<br/>
ar um borna þ<span class="dgrph deletion">aṿ</span>a er fordom mic fǫdda hofdo. nio man ec heima<br/>
nío iviþi<span class="restoration">vr</span> miot uið mę́ran fyr mold neðan.
</p>
<p>
This document is not only a demonstration, but also a how-to for
producing a searchable online document. For detailed instructions,
view the source for this page. The CSS is thoroughly commented so you
can tell exactly what each bit does.
</p>
<div style="clear: both; padding: 20px; background-color:
lightgray">
<p>
Junicode/Junicode 2 font copyright © 1998–2022 by Peter
S. Baker.
</p>
<p>
<a href="https://github.com/psb1558/Junicode-font">Development site</a> · <a href="https://psb1558.github.io/Junicode-New/">Specimen Page</a>
</p>
<p>
Licensed under the <a href="https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=OFL">Open Font License, v. 1.1</a>.
</p>
</div>
</body>
</html>