-
Notifications
You must be signed in to change notification settings - Fork 0
/
ElasticSearch_Server.mm
3551 lines (3293 loc) · 127 KB
/
ElasticSearch_Server.mm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<map version="0.9.0">
<!-- To view this file, download free mind mapping software FreeMind from http://freemind.sourceforge.net -->
<node CREATED="1368622273645" ID="ID_1310544137" MODIFIED="1368622285052" TEXT="ElasticSearch Server">
<node CREATED="1368622314755" ID="ID_1936492349" MODIFIED="1384773280783" POSITION="right" TEXT="2. Searching your data">
<node CREATED="1368651454967" ID="ID_794450864" MODIFIED="1368653371075" TEXT="Understanding the querying and indexing process">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<ul>
<li>
<b>Indexing</b>: process of preparing the document sent to ElasticSearch and storing it in the index
</li>
<li>
<b>Searching</b>: process of matching the documents that satisfy search query requirements
</li>
<li>
<b>Analysis</b>: process of preparing of the field content and converting it to terms so it can be written into the Lucene index. During indexing  the data in fields is divided into stream of tokens (words) which are written into the index as terms (tokens with additional information: position in the input text, ...). Analysis can consist of following steps:
<ul>
<li>
<b>Tokenization</b>: input text is turned into token stream by the tokenizer during this stage
</li>
<li>
<b>Filtering</b>: zero or more filters can process tokens in the token stream. E.g. stopwords filter can remove irrelevant tokens, synonyms filter can add new tokens or change existing and the lowercase filter will make all tokens lowercase.
</li>
</ul>
</li>
<li>
<b>Analyzer</b>: is single tokenizer with zero or more filters. We can specify analyzers when working with fields, types and queries.
</li>
</ul>
<p>
Analysis process just discussed is used during searching and indexing and both index-time analysis and query time analysis can be configured differently. It's important that the terms produced during index and query time match, because if they don't, we'll have to find the documents manually. E.g. If stemming is used during indexing but not while searching, we'll have to pass stemmed words to the search query in order to find the documents.
</p>
</body>
</html></richcontent>
</node>
<node CREATED="1368651475407" ID="ID_753416788" MODIFIED="1368689205642" TEXT="Mappings">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
This mappings will be used for the rest of the 2nd chapter:
</p>
<p>
</p>
<p>
{
</p>
<p>
  "book" : {
</p>
<p>
    "_index" : {
</p>
<p>
      "enabled" : true
</p>
<p>
    },
</p>
<p>
    "_id" : {
</p>
<p>
      "index": "not_analyzed",
</p>
<p>
      "store" : "yes"
</p>
<p>
    },
</p>
<p>
    "properties" : {
</p>
<p>
      "author" : {
</p>
<p>
        "type" : "string"
</p>
<p>
      },
</p>
<p>
      "characters" : {
</p>
<p>
        "type" : "string"
</p>
<p>
      },
</p>
<p>
      "copies" : {
</p>
<p>
        "type" : "long",
</p>
<p>
        "ignore_malformed" : false
</p>
<p>
      },
</p>
<p>
      "otitle" : {
</p>
<p>
        "type" : "string"
</p>
<p>
      },
</p>
<p>
      "tags" : {
</p>
<p>
        "type" : "string"
</p>
<p>
      },
</p>
<p>
      "title" : {
</p>
<p>
        "type" : "string"
</p>
<p>
      },
</p>
<p>
      "year" : {
</p>
<p>
        "type" : "long",
</p>
<p>
        "ignore_malformed" : false,
</p>
<p>
        "index" : "analyzed"
</p>
<p>
      },
</p>
<p>
      "available" : {
</p>
<p>
        "type" : "boolean",
</p>
<p>
        "index" : "analyzed"
</p>
<p>
      }
</p>
<p>
    }
</p>
<p>
  }
</p>
<p>
}
</p>
</body>
</html></richcontent>
<node CREATED="1368651482311" ID="ID_1250393072" MODIFIED="1389212178293" TEXT="Data">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
These data will be used for the rest of the 2nd chapter:
</p>
<pre>{
"book" : {
"_index" : {
"enabled" : true
},
"_id" : {
"index": "not_analyzed",
"store" : "yes"
},
"properties" : {
"author" : {
"type" : "string"
},
"characters" : {
"type" : "string"
},
"copies" : {
"type" : "long",
"ignore_malformed" : false
},
"otitle" : {
"type" : "string"
},
"tags" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"year" : {
"type" : "long",
"ignore_malformed" : false,
"index" : "analyzed"
},
"available" : {
"type" : "boolean",
"index" : "analyzed"
}
}
}
}</pre>
</body>
</html></richcontent>
</node>
</node>
<node COLOR="#006633" CREATED="1368651490663" FOLDED="true" ID="ID_923186511" MODIFIED="1393876748318" TEXT="Querying Elastic Search">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
We talk to <i>ES</i> with the REST API using HTTP request containing JSON-structured data.
</p>
<p>
</p>
<p>
When we want to send more than a simple query we do it same way - we structure it using JSON object and send it to <i>ES</i>. This is called <b>Query DSL</b>.
</p>
<p>
</p>
<p>
<i>ES</i> supports two kinds of queries:
</p>
<ul>
<li>
<b>basic</b>: are used just for querying (such as term query)
</li>
<li>
<b>compound</b> can combine multiple queries (such as the bool query)
</li>
</ul>
<p>
In addition to these two types, our queries can have <b>filter queries</b>  which are used to narrow result by certain criteria.
</p>
</body>
</html></richcontent>
<font NAME="SansSerif" SIZE="10"/>
<node CREATED="1368651517783" ID="ID_918815408" MODIFIED="1381570820849" TEXT="Simple query">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
The simplest way to query <i>ES</i> is to use URI request query.
</p>
<p>
</p>
<p>
E.g.: we want to search for the word "crime" in the title field:
</p>
<p>
</p>
<p>
<b>curl -XGET 'localhost:9200/library/book/_search?q=title:crime&pretty=true' </b>
</p>
<p>
</p>
<p>
From Query DSL point of view
</p>
<p>
</p>
<p>
Simplest query is the <b>term</b> query which searches for given term in a given field (Term query is not analyzed and thus we have to provide exact term we're searching for)
</p>
<p>
</p>
<pre>{
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
<p>
Querying for data means sending GET HTTP request to the _search REST end point of index / type we want to search (both can be omitted). So if we want to search our example <i><b>library</b></i> index we'd use something like
</p>
<pre>curl -XGET 'localhost:9200/library/book/_search?pretty=true' -d '{
"query" : {
"term" : { "title" : "crime" }
}
}'</pre>
<p>
The <b>-d</b> switch tells curl to send the request body (the string enclosed in apostrophes). The <b>pretty=true</b> query parameter tells Elastic search to pretty print the output.
</p>
<p>
In response we get:
</p>
<pre>{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "library",
"_type" : "book",
"_id" : "4",
"_score" : 0.19178301, "_source" : { "title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}
} ]
}
}</pre>
<p>
As said earlier, query can be directed to a particular index and / or type and there're more possibilities: we can query several indices in parallel or query one index regardless of the type. There follows the sum of possible call types and addressing:
</p>
<ol>
<li>
Request to index and type:
<pre>curl -XGET 'localhost:9200/library/book/_search' -d @query.json</pre>
</li>
<li>
Request to index and all types in it:
<pre>curl -XGET 'localhost:9200/library/_search' -d @query.json</pre>
</li>
<li>
Request to all indices:
<pre>curl -XGET 'localhost:9200/_search' -d @query.json</pre>
</li>
<li>
Request to few indices:
<pre>curl -XGET 'localhost:9200/library,bookstore/_search' -d @query.json</pre>
</li>
<li>
Request to multiple indices and mutliple types in them:
<pre>curl -XGET 'localhost:9200/library,bookstore/book,recipes/_search' -d @query.json</pre>
</li>
</ol>
</body>
</html></richcontent>
</node>
<node CREATED="1368651526071" ID="ID_1854474302" MODIFIED="1381572057332" TEXT="Paging and results size">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Elastic search provides standard paging facilities. There're 2 additional properties that can be set in the request body:
</p>
<ul>
<li>
<b>from</b>: specifies from which document we want to start the result, defaults to 0 and is equivalent to e.g. MySQL <b>offset</b>
</li>
<li>
<b>size</b>: specifies maximum number of documents we want to retrieve, defaults to 10 and is equivalent to MySQL <b>limit</b>
</li>
</ul>
</body>
</html></richcontent>
</node>
<node CREATED="1368651537567" ID="ID_1800598811" MODIFIED="1381572818088" TEXT="Returning the version">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
In addition to all retrieved informations we can also want to return version of the document. To do that we need to set top level <b>version</b>  property to true in the JSON object so it looks like this:
</p>
<pre>{
"version" : true,
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
<p>
After running this we'll get:
</p>
<pre>{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "library",
"_type" : "book",
"_id" : "4",
<b>"_version" : 1,</b>
"_score" : 0.19178301, "_source" : { "title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true}
} ]
}</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651546575" ID="ID_1521105849" MODIFIED="1381574396419" TEXT="Limiting the score">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
For not-so-standard use cases we can limit the minimum score that the document must have to be considered a match. In order to use it we must provide the <b>min_score</b> property on the top level of the JSON object. So:
</p>
<pre>{
"min_score" : 0.75,
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
Returns documents with minimum score 0.75 ...
<pre>{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651558335" ID="ID_170337058" MODIFIED="1389213103994" TEXT="Choosing the fields we want to return">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
With use of the <b>fields</b> array in the request body, ES allows to define which fields should be included in the response (note that only fields marked as <b>stored</b> in the mappings (or when the <b>_source</b>  field is used) can be returned). So if we want to return only the <b>title</b> and <b>year </b>fields (in all documents in the result) we'd use following query:
</p>
<pre>{
"fields" : [ "title", "year" ],
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
<p>
So we'd get following result:
</p>
<pre>{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "library",
"_type" : "book",
"_id" : "4",
"_score" : 0.19178301,
"fields" : {
"title" : "Crime and Punishment",
"year" : 1886
}
} ]
}
}</pre>
<p>
There're three things to note:
</p>
<ul>
<li>
if we don't define the fields array, it'll use the default value and return the <b>_source</b> field if available.
</li>
<li>
if we use the <b>_source</b> field and request a field that is not stored, that field will be extracted from the <b>_source</b> field (it required additional processing)
</li>
<li>
when all fields should be returned, we use <b>*</b> as the field name
</li>
</ul>
<p>
<b>Note:</b> if the <b>_source</b> field is used, from performance point of view it's better to return the <b>_source</b> field instead of multiple stored fields.
</p>
</body>
</html></richcontent>
<node CREATED="1368651570423" ID="ID_1659794144" MODIFIED="1389213574333" TEXT="Partial fields">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Partial fields allow us to control how fields are loaded from the <b>_source</b>  field. ES exposes <b>include</b> and <b>exclude</b> properties of the <b>partial_fields</b>  object, so we can in-/ex- -clude fields depending on values of these properties. E.g. to include fields starting with "<i>titl</i>" and exclude those starting with "<i>chara</i>" we use following query:
</p>
<pre>{
"partial_fields" : {
"partial1" : {
"include" : [ "titl*" ],
"exclude" : [ "chara*" ]
}
},
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
</body>
</html></richcontent>
</node>
</node>
<node CREATED="1368651582463" ID="ID_1712230196" MODIFIED="1389214835707" TEXT="Using script fields">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
ES allows us to use script evaluated values to be returned with result documents. To use it we add <b>script_fields</b> section to the JSON query object and a named object(s) for each wanted result. E.g. to return a value <b>correctYear</b> (as value ot the <b>year</b> field minus 1800) we'd run:
</p>
<p>
</p>
<pre>{
"script_fields" : {
"correctYear" : {
"script" : "doc['year'].value – 1800"
}
},
"query" : {
"term" : { "title" : "crime" }
}
}</pre>
<p>
When preceding query is tun against the testing data it will throw an exception as we don't store the <b>year</b> field. Only stored fields (or those available in the <b>_source</b> field. So after modifications it should look like following:
</p>
<pre>{
"script_fields" : {
"correctYear" : {
"script" : "_source.year – 1800"
}
},
"query" : {
"term" : { "title" : "crime" }
}
}
</pre>
<p>
And it should return something like:
</p>
<pre>{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.19178301,
"hits" : [ {
"_index" : "library",
"_type" : "book",
"_id" : "4",
"_score" : 0.19178301,
"fields" : {
"correctYear" : 86
}
} ]
}
}</pre>
</body>
</html></richcontent>
<font NAME="SansSerif" SIZE="12"/>
<node CREATED="1368651594159" ID="ID_659767064" LINK="#ID_263972596" MODIFIED="1389215244978" TEXT="Passing parameters to script fields">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
We can use a <u>variable name</u> and pass its value in the <u>parameters section</u> instead of using that hardcoded (<i>1800</i>) value in the equation.
</p>
<pre> {
"script_fields" : {
"correctYear" : {
"script" : "_source.year – paramYear",
"params" : { // <=- Parameters section
"paramYear" : 1800 // <=- <b>paramYear</b> variable
}
}
},
"query" : {
"term" : { "title" : "crime" }
}
}
</pre>
<p>
Look at the using scripts section.
</p>
</body>
</html></richcontent>
</node>
</node>
<node CREATED="1368651656247" ID="ID_1850581341" MODIFIED="1372922121649" TEXT="Choosing the right search type (advanced)">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
It's possible to choose the way our query is processed internally (when appropriate)
</p>
<p>
</p>
<p>
To control query execution we pass <b>search_type</b> request parameter and set it to one of following values:
</p>
<ul>
<li>
<b>query_and_fetch</b> usually fastest and simplest implementation, query is executed against all needed shards in parallel and all shards return <b>size</b> results number, maximum size of whole resultset will be value of parameter <b>size * noOfShards</b>
</li>
<li>
<b>query_then_fetch</b> in the 1st step query is executed to sort and rank documents and then only relevant shards are fetched for documents, max number of results is equal to the <b>size</b> parameter
</li>
<li>
<b>dfs_query_and_fetch</b> similar to query_and_fetch type, in addition to query_and_fetch in the initial query phase distributed term frequencies are computed to allow more precise scoring of returned documents
</li>
<li>
<b>dfs_query_then_fetch</b> same to query_then_fetch as dfs_query_and_fetch to query_and_fetch
</li>
<li>
<b>count</b> special search type that only returns the number of documents that matched the query
</li>
<li>
<b>scan</b> another special search type. It should be used when expecting large number of results returned by the query. It differs  from usual queries because after sending the 1st request ES responds with <b>scroll</b> identifier and all the other queries need to be run against the <b>_search/scroll</b> REST endpoint and need to send the returned scroll identifier in the request body. More about this functionality in the "Why is the result on the later pages slow" in Chapter 8 Dealing with problems
</li>
</ul>
<p>
So if we want to use the simplest search type, we would run following command:
</p>
<p>
</p>
<pre>curl -XGET 'localhost:9200/library/book/_search?pretty=true&search_type=query_and_fetch' -d '{
"query" : {
"term" : { "title" : "crime" }
}
}'</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651677063" ID="ID_1952524766" MODIFIED="1389301661227" TEXT="Search execution preference (advanced)">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
There's one additional way to control the search process: what types of shards will the search be executed on.
</p>
<p>
ES uses shards and replicas available on request target node and all other nodes in the cluster by default (and it's mostly proper behavior).
</p>
<p>
There might be situations where other settings would be preferred: to change these settings we use the <b>preference</b> request parameter to one of the following values:
</p>
<ul>
<li>
<b>_primary</b>: operation will be executed on primary shards only
</li>
<li>
<b>_primary_first</b>: operation will be executed on primary shards if they are available, if not it will be executed on other shards
</li>
<li>
<b>_local</b>: executes only on the request target node (if possible).
</li>
<li>
<b>_only_node:node_id</b>:executes on specified node
</li>
<li>
A custom value: custom value may be passed, requests with the same values will be executed on the same shards
</li>
</ul>
<p>
E.g. to execute a query on local shards only we use:
</p>
<pre>curl -XGET 'localhost:9200/library/_search?preference=_local' -d '{
"query" : {
"term" : { "title" : "crime" }
}
}'</pre>
</body>
</html></richcontent>
</node>
</node>
<node CREATED="1368651704695" FOLDED="true" ID="ID_1882653736" MODIFIED="1393876752612" TEXT="Basic queries">
<node CREATED="1368651712183" ID="ID_973026064" MODIFIED="1368689777686" TEXT="term">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
One of the simplest query. Matches any document, that has a term in a given field.
</p>
<p>
</p>
<p>
<b>Term query is not analyzed.</b>
</p>
<p>
</p>
<pre><b>{
"query" : {
"term" : {
"title" : "crime"
}
}
}</b>
</pre>
<p>
We can include <b>boost</b> attribute to affect importance of the given term (but in such case we have to change the query a bit).
</p>
<p>
</p>
<pre><b>{
"query" : {
"term" : {
"title" : {
"value" : "crime",
"boost" : 10.0
}
}
}
}</b>
</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651748215" ID="ID_611682208" MODIFIED="1368690307036" TEXT="terms">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Allows match documents that have certain terms in their contents.
</p>
<p>
</p>
<p>
If we want documents that have the terms "novel" or "book" in the <b>tags</b> field, we should use something like this:
</p>
<p>
</p>
<pre><b>{
"query" : {
"terms" : {
"tags" : [ "novel", "book" ],
"minimum_match" : 1
}
}
}</b></pre>
Such query returns documents having one or both of the searched terms in the <b>tags</b> field. That's because of <b>minimum_match</b> attribute setting to 1. If we would want to match document with both provided terms, we would set its value to 2.
</body>
</html></richcontent>
</node>
<node CREATED="1368651754751" ID="ID_247819405" MODIFIED="1372929530004" TEXT="match">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
The match query takes given search parameters, analyze them and constructs appropriate query for them (automatically selecting proper analyzer (the same one used during indexing))
</p>
<p>
</p>
<p>
Match (and multi match) query does not support Lucene query syntax.
</p>
<p>
</p>
<p>
It fits perfectly as a query handler for the search box. Example of the <u>simplest</u>  match query can look like this:
</p>
<p>
</p>
<pre>{
"query" : {
"match" : {
"title" : "crime and punishment"
}
}
}</pre>
<p>
</p>
<p>
which would match all documents which have terms "crime" or "and" or "punishment" in the title
</p>
<p>
</p>
</body>
</html></richcontent>
<node CREATED="1368651760055" ID="ID_470677832" MODIFIED="1376338148318" TEXT="Boolean match">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Boolean match query analyzes provided text and makes boolean query out of it.
</p>
<p>
</p>
<p>
Few parameters allow us to control Boolean match queries behavior:
</p>
<ul>
<li>
<b>operator</b>: can be <b>or</b> | <b>and</b> and tells what operator is used to connect created boolean clauses, default is <b>or</b>
</li>
<li>
<b>analyzer</b>: specifies name of the analyzer used to analyze query text, default is the <b>default</b> analyzer
</li>
<li>
<b>fuzziness</b>: providing the value of this parameter allows us to construct fuzzy queries. Value is in range <b>from 0.0 to 1.0</b> for a <b>string</b> object. This parameter will be used to set the similarity while constructing fuzzy queries
</li>
<li>
<b>prefix_length</b>: allows to control behavior of the fuzzy query, for more info see <u>The fuzzy like this query</u> section
</li>
<li>
<b>max_expansions</b>: allows to control behavior of the fuzzy query again, for more info see as above
</li>
</ul>
<p>
Parameters should be wrapped in the name of the field we are running the query against, so to run boolean match query against the <b>title</b>  field, we'd send query like:
</p>
<p>
</p>
<pre>{
"query" : {
"match" : {
"title" : {
"query" : "crime and punishment",
"operator" : "and"
}
}
}
}</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651767007" ID="ID_1141974764" MODIFIED="1373313532028" TEXT="phrase match">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Phrase match query is similar to the Boolean query, but instead of constructing the Boolean clauses from the analyzed text it constructs a phrase query.
</p>
<p>
Available parameters are:
</p>
<ul>
<li>
<b>slop</b>: integer value that defines number of unknown words which can appear between terms in the text query so that phrase matches
</li>
<li>
<b>analyzer</b>: name of the analyzer that will be used to analyze the query text, default is the default analyzer
</li>
</ul>
<p>
Sample phrase match query against the <b>title</b> field could look like following code:
</p>
<p>
</p>
<pre>{
"query" : {
"match_phrase" : {
"title" : {
"query" : "crime and punishment",
"slop" : 1
}
}
}
}</pre>
</body>
</html></richcontent>
</node>
<node CREATED="1368651775047" ID="ID_1112970012" MODIFIED="1373313890498" TEXT="match phrase prefix">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Last type of the match query is match phrase prefix query, which is almost the same as the prefix match query, but in addition it allows prefix matches on the last term in the query text. In addition to the parameters of the match phrase query it exposes the <b>max_expansions</b>  parameter which controls how many prefixes the last terms will be rewritten to.
</p>
<p>
Sample query could look like this:
</p>
<p>
</p>
<pre>{
"query" : {
"match_phrase_prefix" : {
"title" : {
"query" : "crime and punishment",
"slop" : 1,
"max_expansions" : 20
}
}
}
}</pre>
</body>
</html></richcontent>
<font NAME="SansSerif" SIZE="12"/>
</node>
</node>
<node CREATED="1368651787599" ID="ID_1767391984" MODIFIED="1373314184838" TEXT="multi match">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
Multi match query is the same as the match query, but instead of running against a single field it can be run against multiple fields using the <b>fields</b>  parameter.
</p>
<p>
All parameters used with the match query can be used with the multi match query.
</p>
<p>
So to match against <b>title</b> and <b>otitle</b> fields we can run following query:
</p>
<p>
</p>
<pre>{
"query" : {
"multi_match" : {
"query" : "crime punishment",
"fields" : [ "title", "otitle" ]