-
Notifications
You must be signed in to change notification settings - Fork 487
/
index.html
564 lines (564 loc) · 28.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>PDR: Laboratory 6: Hashing</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
<link rel="stylesheet" href="../../markdown.css" />
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<h1 id="pdr-laboratory-6-hashing">PDR: Laboratory 6: Hashing</h1>
<p><a href="../index.html">Go up to the Labs table of contents
page</a></p>
<h3 id="objective">Objective</h3>
<p>The purpose of this lab is to gain experience using hash tables by
building an interesting application.</p>
<h3 id="background">Background</h3>
<p>A <a href="https://en.wikipedia.org/wiki/Hash_table">hash table</a>
is a dictionary in which keys are mapped to array positions by a
<em>hash function</em>. Having more than one key map to the same
position is called a <em>collision</em>. There are many ways to resolve
collisions, but they may be divided into two primary strategies:
separate chaining and open addressing.</p>
<h3 id="tutorial">Tutorial</h3>
<p>Look over the first 6 sections (through and including ‘exit status’)
of the <a
href="https://en.wikibooks.org/wiki/Bash_Shell_Scripting">Wikibooks
article on Bash Shell Scripting</a>. You will be writing a shell script
during the in-lab, but you should probably start reading it before then.
The remaining sections of that article will be the tutorial for the next
lab, so feel free to read on, if you are interested.</p>
<h3 id="recommended-readings">Recommended Readings</h3>
<ol type="1">
<li>Learn how to use C++ file streams to read from data files. See the
<a href="http://www.cplusplus.com/doc/tutorial/files/">Input/output with
files article</a> on <a
href="http://www.cplusplus.com/">cplusplus.com</a>. You can also see the
code in the provided <a
href="code/getWordInGrid.cpp.html">getWordInGrid.cpp</a> (<a
href="code/getWordInGrid.cpp">src</a>) file which uses streams as well.
You may want to use the <a
href="http://www.cplusplus.com/reference/fstream/ifstream/">good()</a>
method in the ifstream class – you can put it in a <code>while</code>
loop: <code>while (foo.good())</code>.</li>
</ol>
<h2 id="procedure">Procedure</h2>
<h3 id="pre-lab">Pre-lab</h3>
<ol type="1">
<li>Create a word puzzle solver</li>
<li>Files to download:
<ul>
<li>The 5 provided code files: <a
href="code/getWordInGrid.cpp.html">getWordInGrid.cpp</a> (<a
href="code/getWordInGrid.cpp">src</a>), <a
href="code/primenumber.cpp.html">primenumber.cpp</a> (<a
href="code/primenumber.cpp">src</a>), <a
href="code/timer.cpp.html">timer.cpp</a> (<a
href="code/timer.cpp">src</a>), <a href="code/timer.h.html">timer.h</a>
(<a href="code/timer.h">src</a>), <a
href="code/timer_test.cpp.html">timer_test.cpp</a> (<a
href="code/timer_test.cpp">src</a>). These can also be downloaded all at
once via the <a href="code.zip">code.zip</a> file.</li>
<li>The data files in the labs/lab06/data/ directory. These can also be
downloaded all at once via the <a href="data.zip">data.zip</a>
file.</li>
</ul></li>
<li>Files to submit: Makefile, wordPuzzle.cpp, timer.h/cpp,
hashTable.h/cpp</li>
</ol>
<h3 id="in-lab">In-lab</h3>
<ol type="1">
<li>Learn about the <code>-O2</code> clang++ optimization flag</li>
<li>Compare your output with the expected output using output
redirection, pipes, and Unix commands</li>
<li>Test your program with the 250x250 grid using words.txt, and the
300x300 grid using words2.txt</li>
<li>Create a shell script to calculate the average running time of your
program</li>
<li>Files to download: none</li>
<li>Files to submit: averagetime.sh</li>
</ol>
<h3 id="post-lab">Post-lab</h3>
<ol type="1">
<li>Optimize your implementation of the word puzzle solver</li>
<li>Files to download: none</li>
<li>Files to submit: Makefile, wordPuzzle.cpp, timer.h/cpp,
hashTable.h/cpp</li>
</ol>
<hr />
<h2 id="word-puzzle-problem">Word Puzzle Problem</h2>
<p>Consider the problem called <a
href="https://en.wikipedia.org/wiki/Word_search">word search</a>,
illustrated in the diagram below.</p>
<p><a href="https://en.wikipedia.org/wiki/File:Wordsearch.svg"><img
src="500px-Wordsearch.svg.png" alt="word search puzzle" /></a></p>
<p>In this 10x10 grid of letters, the goal is to find words in the
puzzle in any of the eight directions. The word circled in red is in the
south-east direction. Words can go backward (although none do in this
example), but they cannot “wrap around” from one side to the other (or
from top to bottom, etc.).</p>
<p>A string of letters from the grid depends on four values:</p>
<ul>
<li>The <em>x</em> value of the starting letter</li>
<li>The <em>y</em> value of the starting letter</li>
<li>The direction, <em>d</em>, of the word</li>
<li>The length, <em>l</em>, of the string</li>
</ul>
<hr />
<h2 id="pre-lab-1">Pre-lab</h2>
<h3 id="overview">Overview</h3>
<p>For this lab you need to implement a solution to the word puzzle
problem described above: given a dictionary and word search grid, your
program should output all the words in the grid that are in the
dictionary.</p>
<p>There are two main parts to this lab: creating a hash table and
writing a word search solver. A high-level specification is provided for
the word search solver and the hash table it must use; more details are
available in the rest of the pre-lab section.</p>
<p>Your word search solver, implemented in <code>wordPuzzle.cpp</code>,
should:</p>
<ul>
<li>Take in a dictionary file and grid file, in that order, using
command line parameters</li>
<li>Read in a dictionary and store its words in a hash table
<ul>
<li>Which is implemented without using a vector of vectors or any STL
hash table</li>
</ul></li>
<li>Read in a word search grid no larger than 500x500 and store it in an
appropriate data structure</li>
<li>Output every word of length three or greater and its location in the
grid
<ul>
<li>The required output format is described below</li>
</ul></li>
</ul>
<p>We are not as interested in how fast your program runs for the
pre-lab; leave any optimizations for the post-lab.</p>
<h3 id="storing-the-dictionary">Storing the dictionary</h3>
<p>As there may be a large number of words in the dictionary, you will
have to store those words in a hash table to facilitate efficient
lookup.</p>
<p>You <strong>must</strong> write your own hash table for this lab. We
leave the implementation up to you, as long as you do not use a vector
of vectors or any STL hash table.</p>
<h3 id="manipulating-the-grid">Manipulating the grid</h3>
<p><a href="code/getWordInGrid.cpp.html">getWordInGrid.cpp</a> (<a
href="code/getWordInGrid.cpp">src</a>) provides two useful functions:
<code>readInGrid()</code> and <code>getWordInGrid()</code>. The former
reads in a grid file using C++ streams; the latter returns a word in a
2D grid of letters in a given direction.</p>
<p>You should make sure you understand how these functions work, as you
will most likely be using them in your final program.</p>
<h3 id="timing">Timing</h3>
<p>We are interested in timing how long it takes to find all the valid
words in a grid, not including the time it took to read in the
dictionary and grid files or perform any other sort of pre-processing.
You can use the code provided in <a
href="code/timer.cpp.html">timer.cpp</a> (<a
href="code/timer.cpp">src</a>), <a href="code/timer.h.html">timer.h</a>
(<a href="code/timer.h">src</a>), and <a
href="code/timer_test.cpp.html">timer_test.cpp</a> (<a
href="code/timer_test.cpp">src</a>) to time the relevant portion of your
program. Although you don’t need to print out the timing for the pre and
post-labs, it is still helpful to time your solver to help measure the
effectiveness of any optimizations you may implement.</p>
<h3 id="other-details">Other details</h3>
<p><strong>Input grids:</strong> Each input grid file has three lines.
The first line is the number of rows, and the second is the number of
columns, both representable using integers. The third line is the grid
data, represented by strictly alphabetical characters with no spaces
(i.e. it will contain <code>rows * cols</code> characters). Example
grids are available in the labs/lab06/data/ directory.</p>
<p><strong>Dictionary files:</strong> Dictionaries contain one word per
line. The longest word in our data files is 22 letters. Words which
contain non-alphabetical characters may occur in the dictionary, but
would never appear in a valid grid. Your program should be able to
handle dictionaries with such words, although you are not required to
put them into the hash table.</p>
<p><strong>Valid words:</strong> Your program should <em>only</em>
report words with three or more letters – there are simply too many hits
if one and two letter words are allowed, and it’s difficult to judge
correctness, even on very small word searches.</p>
<p><strong>Case sensitivity:</strong> Searches are
<em>case-sensitive</em>. Thus, if the word in the dictionary is ‘Foo’,
it should not register a match to the text ‘foo’ in the grid.</p>
<p><strong>Duplicates:</strong> If a word occurs more than once in a
grid, then each instance should be treated as a separate word.</p>
<h3 id="sample-execution-run">Sample Execution Run</h3>
<p>Below is a sample execution run to show you the input and output
format we are looking for. The output shown is a result of running
<code>./a.out words.txt 4x7.grid.txt</code></p>
<pre><code>N (3, 2): text
E (0, 3): sod
N (2, 5): fad
E (1, 0): pax
NW(3, 6): eft
E (2, 0): ace
W (2, 4): tee
NE(2, 4): tat
SW(0, 6): tat
9 words found</code></pre>
<p>The exact spacing and order that the words are found in does not
matter, as we can easily (and automatically) ignore that when comparing
your output to the desired output. Everything else should all be the
same. The in-lab discusses how to directly compare your output to the
expected output (while ignoring spaces and word order) - see the “Input
and Output” section of the in-lab.</p>
<h3 id="submission">Submission</h3>
<p>You should submit any files required for your word puzzle solver to
run as well as a Makefile that produces an <code>a.out</code>
executable.</p>
<h3 id="hints">Hints</h3>
<h4 id="separation-of-concerns">Separation of concerns</h4>
<p>Given the multi-faceted nature of this lab, it is helpful to only
focus on one portion at the time. Start out by using the STL hashtable
<a
href="https://en.cppreference.com/w/cpp/container/unordered_set">unordered_set</a>
to verify that your word search works, and then work on writing your own
hash table implementation to replace it.</p>
<p>The last thing you want is to be staring at incorrect output and not
knowing whether that’s because of your solver or your hashtable!</p>
<h4 id="that-hint-didnt-help-i-still-have-no-clue-how-to-start">That
hint didn’t help, I still have no clue how to start</h4>
<p>Take it one step at a time.</p>
<p>Start off by reading in the dictionary and printing out all of its
words. Then store those words in a hashtable (probably the STL one for
now). Great, you have a dictionary now.</p>
<p>Next, read in the grid (use the functions in
<code>getWordInGrid.cpp</code> to help you!). Plan out how to use the
dictionary to find all the valid words in the grid. There are four
variables that control what word is returned – what does that imply in
terms of loops?</p>
<p>At this point, hopefully you have enough to get you going.</p>
<h4 id="storing-the-grid">Storing the grid</h4>
<p>Creating a dynamic 2D array in C++ is more difficult than it should
be – one solution is to create a vector of vectors, but that is not the
most efficient means to do it. For this lab, you can just create a
500x500 static array, and can assume that you will not have your program
run on larger input grids. This is not very elegant, but it will work
until we go over dynamic array creation in lecture.</p>
<h4 id="command-line-parameters">Command-line parameters</h4>
<p>Your program must be able to run successfully if started with
<code>./a.out <dictionary_file> <grid_file></code> and no
input.</p>
<p>See the <a href="../../slides/04-arrays-bigoh.html">slide set on
arrays</a> if you need a refresher on command-line parameters; you can
also see the <a
href="../../slides/code/04-arrays-bigoh/cmdlineparams.cpp.html">cmdlineparams.cpp</a>
(<a href="../../slides/code/04-arrays-bigoh/cmdlineparams.cpp">src</a>)
file.</p>
<h4 id="hash-table-sizes">Hash table sizes</h4>
<p>As discussed in lecture, a hash table needs to be a prime number in
size in order to work. You can adapt the code in the <a
href="code/primenumber.cpp.html">primenumber.cpp</a> (<a
href="code/primenumber.cpp">src</a>) file to determine the next highest
prime number (of course, the next highest prime number is determined
after you double the size of your original hash table).</p>
<p>Your program will need to handle input dictionaries of various sizes
and create hash tables of the appropriate size. To get the program
working the first time, you can just hard code a prime number table
size. But at some point, you will have to handle different size hash
tables.</p>
<ul>
<li>One option is to implement a <code>rehash()</code> function that
will double the size of the array, hash all the old values into the new
table, and then remove the old table.</li>
<li>A second option is to do two passes through the dictionary file. The
first time you count the number of elements, use that to create an
appropriate sized hash table, and the second time through the dictionary
file you insert all the words into the table itself.</li>
</ul>
<hr />
<h2 id="in-lab-1">In-lab</h2>
<h3 id="the--o2-flag">The <code>-O2</code> flag</h3>
<p>There are many times when we want our executable program to run as
fast as possible. An example of this is the current lab – as we are
testing our programs for speed (and we want them to run fast), we want
to tell the compiler to write an optimized executable. To do this, we
specify the ‘-O2’ flag, short for “optimization level 2” (capital O).
There are multiple optimization levels that clang++ provides, each with
benefits and drawbacks, but we will be using ‘-O2’ for this lab. To
compile a program with this optimization level, we would enter:</p>
<pre><code>clang++ -O2 -Wall wordPuzzle.cpp timer.cpp hashTable.cpp</code></pre>
<p>If we wanted to use the -c flag (such as through a Makefile), then
the commands would be:</p>
<pre><code>clang++ -O2 -c wordPuzzle.cpp
clang++ -O2 -c timer.cpp
clang++ -O2 -c hashTable.cpp
clang++ -O2 wordPuzzle.o timer.o hashTable.o</code></pre>
<p>Note that on this last one, that the -O2 flag must be used on each of
the individual compilations as well as the final link. Thus, when put
into a Makefile, you should make your CXXFLAGS variable include ‘-O2’ so
that it applies to all compilations.</p>
<p>We could tell the compiler to always use the -O2 flag, but there are
some disadvantages to doing so. Debugging the program is much harder
with an optimized executable – much of the debugging information is
removed to improve the speed. The compiler also takes longer to compile
the program when optimizing. Thus, programmers usually don’t use the -O2
flag until they know the program works and want to start focusing on
performance.</p>
<p>Try running your program both with and without the -O2 flag. Do you
notice a difference in the timed speed of the program? Our test program
decreased in execution time from about 15 seconds to 12 seconds with the
-O2 flag.</p>
<h3 id="input-and-output">Input and Output</h3>
<p>When we are running the programs, we will want to make sure that they
are reporting the same results. To do so, we will need to save the
output into a file, and then make sure that file matches the sample
input file.</p>
<p>To redirect output to a file, we enter the following command (we’ve
seen output redirection before – in <a
href="../../tutorials/03-04-more-unix/unix3.html">section 3 of the first
Unix tutorial</a> – so this should be familiar material).</p>
<pre><code>./a.out words2.txt 300x300.grid.txt > output.txt</code></pre>
<p>Recall that this command will <strong>OVERWRITE</strong> output.txt
(if it exists), and save all the output from the executable into that
file. This program does not ask for input; if it did, we would not see
the prompt, but would still have to provide the input.</p>
<p>Once the program is completed, we will need to see if that file is
the same as the sample input. To do so, enter the following command.
We’ll assume, for this example, that we are using the 300x300 grid, and
words2.txt.</p>
<pre><code># Use `man diff` to learn more about the diff command
diff output.txt 300x300.words2.out.txt</code></pre>
<p>This will examine the two files byte-by-byte, and look for
differences. If there are no differences, then <code>diff</code> will
not print out ANY output. If there are differences, then they will be
printed to the screen. This means that if <code>diff</code> prints
anything, then your program is NOT producing the same results as we are
expecting. All hope is not lost, though! If you look at the two files
(you can load them up both in Emacs), it may very well be that the
difference is in formatting – you use two spaces, we use one space, for
example.</p>
<p>Since we don’t care about spacing, you can provide the
<code>-w</code> flag to <code>diff</code> to have it ignore all
whitespace.</p>
<p>If the files are not sorted the same – meaning that the words listed
are the same, but in a different order – then diff will report
differences between the files. You can use the Unix <code>sort</code>
command to sort BOTH files before comparison to fix this:</p>
<pre><code># Pipe the result of your program execution to `sort`, and then redirect that to output.txt
./a.out words2.txt 300x300.grid.txt | sort > output.txt
sort 300x300.words2.out.txt > words2.sorted.out.txt
diff output.txt words2.sorted.out.txt</code></pre>
<p>If there are a lot of differences, then you should run your program
on a small grid with the smaller word file (words2.txt). Once you solve
the differences with the smaller grid file and word file, most of the
differences with the larger grid/word files should disappear.</p>
<h3 id="shell-scripting">Shell Scripting</h3>
<p>Through the various programming languages we’ve seen (Java, C++,
etc.), we can make a computer do many things. Yet there are still
limitations to what we can do in these languages. How could you get a
directory listing in Java, for example? Or easily invoke Unix commands
from C++? While these are certainly possible, they are often difficult
to accomplish.</p>
<p>Programmers often have a need to do these sorts of things. For
example, a script that we wrote takes all the submissions made to
gradescope, compiles and executes them, saves the execution output to
various files and checks the results for correctness. This allows the
compilation, execution, and grading of your submissions to be
automated.</p>
<p>The <em>shell</em> is the command-line interface that you use in the
terminal. There are many shells out there, some of the more popular
being bash, zsh, and csh. Bash is the standard shell that the vast
majority of shell scripts are written in, and is the shell we’ll be
using for this tutorial.</p>
<p>The original Unix shell was written by Stephen Bourne and released in
1978. It was just called ‘sh’, which is short for ‘shell’. In 1987,
Brian Fox updated the shell, and decided to name it ‘Bourne-again
shell’, a play on the original author’s name. Hence the name of
‘bash’.</p>
<p>A <em>shell script</em>, then, is a series of commands for the shell
to execute. Shell scripts started off as just a way to have a series of
simple commands in one place. Over time, additional functionality was
added to these scripts, mostly the things we are used to in programming
languages, such as variables and control structures (<code>for</code>
and <code>while</code> loops, <code>if</code> and <code>case</code>
conditionals, etc.). Today, shell scripts are a very powerful
programming language that one can use to do many things.</p>
<p>Shell scripts are useful when one needs to call a large number of
Unix commands (such as the aforementioned compilation/execution
example). Significant computation, such as finding words in a word
puzzle, or a lot of arithmetic, are not well suited to shell
scripts.</p>
<h3 id="average-running-time">Average Running Time</h3>
<p>Write a shell script <code>averagetime.sh</code> that will prompt the
user for the dictionary and grid file names used by your word puzzle
executable, which must be called <code>a.out</code>. It will then run
the program five times using those parameters. It will record the time
of each execution run, and, once the runs are completed, print out the
average run time. Note that you have not yet seen conditions
(<code>if</code> or <code>case</code>) or loops (<code>for</code> or
<code>while</code>) in shell scripts, so we do not expect your script to
have either of these – you should just have 5 separate commands without
a loop.</p>
<p>Modify your word puzzle program to print out the total time taken as
the last line of output. Floating point arithmetic in bash is quite
complicated, so try to use integers when possible – you may want to
multiply the result from <code>Timer::getTime()</code> by 1000 to obtain
the number of milliseconds taken.</p>
<p>You can then capture that line by piping it through the <code>tail
-1</code> command – the following line would run the program, only keep
the last line, and save that output to a variable.</p>
<pre><code>RUNNING_TIME=`./a.out | tail -1`</code></pre>
<p>The back quote <code>`</code> tells bash to execute whatever is in
between the quotes rather than treating it as a string. It is usually
located right below the Escape key on a keyboard.</p>
<p>Armed with this, the rest of the required concepts for the shell
script are in first six sections of the article.</p>
<p>As you are going through the tutorial, if there is a Unix command
that you do not know (or you once knew and have since forgotten), you
can find out more information about that command by entering <code>man
command</code> at the Unix prompt. This brings up the manual for that
command, including all of the command-line parameters.</p>
<p>Your script should have comments (anything on a line after a ‘#’ is a
comment).</p>
<p>Below are a few notes to keep in mind when writing your shell script.
Bash is a very powerful language, but it can be rather finicky and
unforgiving with syntax.</p>
<ul>
<li>The shell script takes two inputs (dictionary file and grid file),
in that order; no command-line parameters.</li>
<li>Your program should be called <code>averagetime.sh</code>, and
should have <code>#!/bin/bash</code> as the very first line of the
script
<ul>
<li>If you don’t put the <code>#!/bin/bash</code>, it will use the wrong
shell, and your program won’t work properly!</li>
</ul></li>
<li>When setting variables, do not have spaces around the equals
sign</li>
<li>Keep in mind that to grab program output (such as the output of the
binary program), you use back quotes (`)</li>
<li>To execute your script, you can just enter
<code>./averagetime.sh</code>. If your computer denies you permission,
remind it who’s boss and put it in its place with <code>chmod +x
averagetime.sh</code>. This tells your Unix system that averagetime.sh
is a program that can be executed (remember chmod?).</li>
</ul>
<h3 id="hints-1">Hints</h3>
<h4 id="debugging-shell-scripts">Debugging shell scripts</h4>
<p>Shell scripts don’t have debuggers, but we can use old-style printing
instead: you can always print out the value of the variable through the
<code>echo</code> command, such as <code>echo ${variable}</code>.</p>
<h4 id="integer-division">Integer division</h4>
<p>Bash’s arithmetic expansion <code>$(( ... ))</code> performs integer
division, so as long as you are using integers in the expression itself,
you shouldn’t run into any issues.</p>
<hr />
<h2 id="post-lab-1">Post-lab</h2>
<p>You will need to optimize your code for the solution to the word
puzzle. The auto-grader for this assignment will be running your code on
larger dictionaries / word grids. In addition to this, your code will be
given less time to execute.</p>
<p>To standardize timings, you should run all of your tests on
words2.txt and 300x300.grid.txt.</p>
<h3 id="code-optimization">Code optimization</h3>
<p>Your coding task for the post-lab is to optimize your program. Try to
get your solver running as fast as possible – 3 seconds is a good
starting point, though you should still try to optimize it further
regardless of how fast your program already runs.</p>
<p>First, record how long it takes to run your program with the
<code>-O2</code> flag but no other optimizations. We will need this
value later.</p>
<p>There are plenty of optimizations you can try to implement, some of
which may work better with your code than others. Here are some
ideas:</p>
<ul>
<li>Experiment with different hash functions and different
implementations of hash tables
<ul>
<li>Choose a good load factor, λ, for your hash table</li>
<li>Implement a reasonable collision resolution strategy</li>
<li>Try different collision resolution strategies to find the fastest
one</li>
<li>Try different data structures for the buckets if you are using
separate chaining</li>
<li>Look up better hash functions for strings</li>
</ul></li>
<li>Buffer the output (keep it in some data structure) and print it out
once after the timing code has finished, rather than every time a word
is found</li>
<li>Store <em>prefixes</em> of each word in the dictionary in addition
to the word itself
<ul>
<li>If the word is “amazing”, you would store “ama”, “amaz”, “amazi”,
“amazin”, and “amazing” in the hash table. There would need to be some
way to differentiate between prefixes (“amaz”) and the actual words
(“amazing”). This way if you are working in a given direction, and the
particular string you are generating is not a prefix, then you know
there are no further words in the dictionary in the given
direction.</li>
</ul></li>
<li>Keep track of a previous hash to help compute the next one faster.
<ul>
<li>If you have just computed the hash for “foo”, then you can keep that
hash value on hand to compute the hash for “food” faster.</li>
</ul></li>
<li>Any others that you can think of?</li>
</ul>
<p>Keep track of what you tried – if you try each of the collision
resolution strategies, and find that the original one you used is
faster, make note of your findings and try to figure out why that
is.</p>
<p>There are two restrictions on what optimizations you may attempt. You
may not:</p>
<ul>
<li>Write assembly code or any other non-C++ code
<ul>
<li>The point of this lab is to exercise your knowledge of C++</li>
</ul></li>
<li>Use any optimization level other than <code>-O2</code>
<ul>
<li><code>-O2</code> provides a baseline for all the solutions, and your
program is not guaranteed to function the same when using higher
optimization levels</li>
</ul></li>
</ul>
<p>Additionally, although we are trying to get you to think about how to
make your program faster, we still want clean, correct, and readable
code. Your solutions will be judged on correctness and style.</p>
<h3 id="post-lab-reflection">Post-lab Reflection</h3>
<p>As you finish up the lab, consider the following questions:</p>
<ul>
<li>What is the big-theta running time of the word-search portion of
your program?
<ul>
<li>Your running time will be in terms of <em>r</em> (rows), <em>c</em>
(columns), and <em>w</em> (words) – you can assume that the maximum word
size is some small constant</li>
</ul></li>
<li>What kind of optimizations did you attempt?
<ul>
<li>Can you provide a runtime for each of those optimizations?</li>
<li>Why did you try each optimization? Did it work (why/why not)?</li>
<li>Did you encounter problems with any of the optimizations?</li>
</ul></li>
<li>What was the overall speedup due to your optimizations?
<ul>
<li>If your original running time was <em>x</em> and your optimized
running time was <em>y</em>, then the speedup is
<em>x</em>/<em>y</em></li>
<li>For example, if your original program took 10 seconds and you
optimized it down to 5 seconds, your speedup is 2.0</li>
</ul></li>
</ul>
<p>You do not need to submit anything for these questions, but they are
a good way to look back on the assignment and test your knowledge of
hashtables and runtime analysis.</p>
</body>
</html>