Sanity Testing your Ranking Algorithm
Information Retrieval 2018
Suppose you have implemented your BM25 ranker and you’re not sure if it’s working as
expected. This guide will show a simple way to check that your algorithm seems to make
sense. Finally, a small number of test topics and sample outputs are provided from the Indri
search engine.
1 Testing the Ranker
1.1 Test Cases
The rst way to check your ranker is performing as expected is to build a set of test cases.
For example, create a xed set of values to test on, and then vary them to observe whether
the algorithm is doing what is expected. As a concrete example, let us assume that we have
a single term query, t, and a document d, where:
ft = 10,
fd;t = 2,
Ld = 256,
AL = 344,
N = 100;000
k1 = 1:2 and b = 0:75
Now, plug these values into your BM25 implementation and observe the output. The
next step is to vary one of these values in a way that we know what the outcome should
be. For example, let us now assume that instead of having fd;t = 2, we have fd;t = 5. Since
we know that a higher fd;t should result in a higher score, we can compare the outputs to
make sure this occurs. Similar tests can be conducted on ft, Ld, and so on.
1.2 Ranked Retrieval
The following example assumes we ran the topic ancient city ruins across the latimes
collection, and we assigned a query id of 1001 to this topic. We retrieved the top-10 results
which are as follows:
1
1001 LA082690 -0120 1 18.2249
1001 LA070190 -0077 2 17.7167
1001 LA101589 -0071 3 17.6664
1001 LA022290 -0129 4 16.4931
1001 LA031890 -0033 5 16.3479
1001 LA052090 -0200 6 15.9897
1001 LA070989 -0114 7 15.6409
1001 LA060390 -0138 8 15.2404
1001 LA111990 -0035 9 15.1859
1001 LA050690 -0087 10 15.0315
Next, we can use the UNIX grep tool to nd a few of these documents in the latimes le
to get a sense of whether we should be retrieving these documents (that is, if our algorithm
is working correctly). In particular, we can use the -A n
ag, which will show us the rst n
lines after matching the text we are interested in.
So returning to our example, running
grep “LA082690-0120” -A 100 /path/to/latimes
will return the rst 100 lines of the latimes le after the document identier was found. You
may wish to show more or less than 100 lines depending on the output. In this case, we
search for the rst document according to our ranking. Shown is a shortened output:
LA082690 -0120
ANCIENT INDIAN SITE FOUND IN COLORADO
Two college students have stumbled upon the virtually untouched ruins of a 1 ,100 – year – old Anasazi Indian village in southwestern Colorado .
The six – acre Mountain Sheep Village , the name given the site , probably had about 200 structures and may have housed 150 to 200 Indians as early as AD 850 , said Kristie Arrington , an archeologist for the Bureau of Land Management , the agency that controls the discovery site .
…
The two were tracking bighorn sheep with another student , Patricia Dahnke , when 2 they discovered the ruins .
Wire
Clearly, this document would seem to be relevant to the query we ran. We can continue
this process with other documents and topics to get some idea of whether we think the ranker
works or not. The same idea can be applied to phrase search. Some other things to keep in
mind is that queries with more terms would generally have higher scores, and terms that are
less frequent would generally give higher scores than frequent terms. You can check these
by adding or removing terms. For example, try running three successive queries where you
add various terms to observe what happens to the scores of the documents.
2 Example topics and output
The following topics were ran using the Indri search engine and are provided as a guide only.
Please note that your results will almost denitely not be the same as those presented in
the following examples. Dierences such as stemming, normalization rules, stopping, and so
on will cause dierent document lengths, term frequencies and document frequencies, all of
which will impact the BM25 ranking. Just because your ranking is not the same as presented,
it does not mean that it is incorrect. With that being said, at least some overlap between
the presented documents and the documents that you are retrieving could be expected.
1001: ancient city ruins
1001 LA082690 -0120 1 18.2249
1001 LA070190 -0077 2 17.7167
1001 LA101589 -0071 3 17.6664
1001 LA022290 -0129 4 16.4931
1001 LA031890 -0033 5 16.3479
1001 LA052090 -0200 6 15.9897
1001 LA070989 -0114 7 15.6409
1001 LA060390 -0138 8 15.2404
1001 LA111990 -0035 9 15.1859
1001 LA050690 -0087 10 15.0315
1001 LA100889 -0143 11 14.5277
1001 LA092489 -0137 12 14.501
1001 LA050789 -0137 13 14.4906
1001 LA010790 -0118 14 14.1099
1001 LA010790 -0113 15 13.968
1001 LA042990 -0029 16 13.3977
1001 LA110789 -0024 17 13.3195
3
1001 LA043090 -0023 18 13.2658
1001 LA121690 -0056 19 12.8632
1001 LA062390 -0052 20 12.6796
1001 LA050789 -0127 21 12.4564
1001 LA100790 -0073 22 12.213
1001 LA102890 -0022 23 12.1304
1001 LA080389 -0001 24 11.9818
1001 LA120990 -0064 25 11.6796
1001 LA091290 -0114 26 11.5391
1001 LA090990 -0179 27 11.3245
1001 LA081690 -0220 28 11.2892
1001 LA120890 -0057 29 11.0876
1001 LA073090 -0036 30 11.0488
1001 LA053190 -0004 31 10.6839
1001 LA080589 -0099 32 10.5028
1001 LA071190 -0066 33 10.4416
1001 LA123190 -0068 34 10.3244
1001 LA010489 -0017 35 10.3125
1001 LA030490 -0236 36 10.1415
1001 LA021989 -0012 37 10.1207
1001 LA012890 -0103 38 10.1084
1001 LA020490 -0007 39 10.0123
1001 LA020289 -0128 40 9.82722
1001 LA010189 -0082 41 9.61907
1001 LA050189 -0094 42 9.58602
1001 LA070190 -0034 43 9.49652
1001 LA102989 -0141 44 9.37461
1001 LA070289 -0103 45 9.3584
1001 LA071890 -0088 46 9.34189
1001 LA041790 -0102 47 9.31119
1001 LA021890 -0139 48 9.29745
1001 LA050690 -0208 49 9.28234
1001 LA012190 -0116 50 9.13307
1001 LA012289 -0170 51 9.11053
1001 LA071089 -0082 52 9.10524
1001 LA021789 -0021 53 9.0905
1001 LA012790 -0047 54 9.06578
1001 LA070289 -0101 55 9.03123
1001 LA061389 -0090 56 8.99522
1001 LA082690 -0081 57 8.97519
1001 LA102689 -0014 58 8.92272
1001 LA041990 -0064 59 8.84785
1001 LA071190 -0156 60 8.83788
1001 LA073090 -0112 61 8.76649
1001 LA071590 -0036 62 8.76234
1001 LA050490 -0073 63 8.71695
4
1001 LA072189 -0037 64 8.68262
1001 LA102890 -0014 65 8.6675
1001 LA090389 -0057 66 8.65396
1001 LA061290 -0162 67 8.59106
1001 LA040890 -0159 68 8.48068
1001 LA052689 -0126 69 8.4554
1001 LA121089 -0075 70 8.45049
1001 LA082690 -0054 71 8.44387
1001 LA080789 -0021 72 8.44296
1001 LA073190 -0125 73 8.40903
1001 LA020890 -0095 74 8.38305
1001 LA100990 -0043 75 8.37012
1001 LA080489 -0151 76 8.34337
1001 LA121289 -0028 77 8.29423
1001 LA071590 -0125 78 8.28264
1001 LA110690 -0073 79 8.27817
1001 LA111489 -0024 80 8.21671
1001 LA091890 -0089 81 8.21598
1001 LA120290 -0037 82 8.19076
1001 LA030589 -0043 83 8.18282
1001 LA011590 -0081 84 8.17532
1001 LA050490 -0102 85 8.15058
1001 LA033090 -0182 86 8.14557
1001 LA101090 -0120 87 8.13969
1001 LA082889 -0010 88 8.11667
1001 LA071889 -0071 89 8.10088
1001 LA123189 -0182 90 8.10081
1001 LA040789 -0169 91 8.09826
1001 LA121789 -0054 92 8.09101
1001 LA100790 -0212 93 8.08555
1001 LA102190 -0082 94 8.08528
1001 LA070790 -0087 95 8.07389
1001 LA112090 -0021 96 8.06253
1001 LA121690 -0101 97 8.06019
1001 LA102290 -0121 98 8.05119
1001 LA051989 -0117 99 8.04553
1001 LA050690 -0089 100 8.03952
1002: north korean army
1002 LA081690 -0055 1 19.1494
1002 LA052990 -0086 2 18.5347
1002 LA070989 -0058 3 18.0936
1002 LA113089 -0010 4 18.053
1002 LA073089 -0072 5 17.6144
1002 LA061589 -0084 6 17.5994
1002 LA041390 -0080 7 17.5011
5
1002 LA102089 -0006 8 17.4334
1002 LA060490 -0047 9 17.3157
1002 LA092189 -0136 10 17.2384
1002 LA081590 -0089 11 16.911
1002 LA060690 -0025 12 16.818
1002 LA100989 -0044 13 16.6925
1002 LA072790 -0010 14 16.0332
1002 LA100190 -0040 15 15.7201
1002 LA070290 -0041 16 15.4658
1002 LA101690 -0039 17 15.0821
1002 LA061089 -0093 18 14.7332
1002 LA090690 -0236 19 14.7007
1002 LA080389 -0058 20 14.6485
1002 LA021490 -0046 21 14.6434
1002 LA021890 -0105 22 14.5923
1002 LA100990 -0173 23 14.577
1002 LA022789 -0049 24 14.5138
1002 LA021690 -0165 25 14.459
1002 LA090590 -0142 26 14.4577
1002 LA070189 -0068 27 14.414
1002 LA071490 -0086 28 14.413
1002 LA080289 -0075 29 14.4113
1002 LA081790 -0066 30 14.3329
1002 LA011889 -0063 31 14.2825
1002 LA032989 -0070 32 14.2621
1002 LA072490 -0043 33 14.2342
1002 LA032090 -0178 34 14.2277
1002 LA050389 -0082 35 14.2134
1002 LA102789 -0127 36 14.1475
1002 LA100790 -0074 37 14.1113
1002 LA041290 -0221 38 14.0802
1002 LA032889 -0108 39 14.0611
1002 LA022589 -0073 40 14.0232
1002 LA071089 -0054 41 14.0157
1002 LA062289 -0145 42 13.9955
1002 LA041490 -0074 43 13.9848
1002 LA092189 -0135 44 13.926
1002 LA112889 -0099 45 13.9134
1002 LA102290 -0110 46 13.8814
1002 LA032989 -0140 47 13.8734
1002 LA022289 -0077 48 13.8528
1002 LA030789 -0134 49 13.8422
1002 LA091989 -0052 50 13.8368
1002 LA101690 -0167 51 13.8282
1002 LA010989 -0099 52 13.8263
1002 LA091190 -0088 53 13.8253
6
1002 LA060190 -0165 54 13.8146
1002 LA030389 -0004 55 13.8029
1002 LA112289 -0149 56 13.6526
1002 LA020589 -0198 57 13.6393
1002 LA101590 -0020 58 13.6315
1002 LA060189 -0113 59 13.6037
1002 LA090990 -0185 60 13.5386
1002 LA060690 -0083 61 13.4654
1002 LA072889 -0091 62 13.4098
1002 LA053190 -0088 63 13.3736
1002 LA101889 -0164 64 13.3707
1002 LA071289 -0073 65 13.3322
1002 LA011490 -0083 66 13.3056
1002 LA090889 -0166 67 13.2568
1002 LA031190 -0113 68 13.2478
1002 LA102789 -0078 69 13.2387
1002 LA070790 -0064 70 13.1881
1002 LA022490 -0053 71 13.181
1002 LA061789 -0060 72 13.0308
1002 LA101789 -0151 73 12.9063
1002 LA121290 -0145 74 12.8659
1002 LA050989 -0012 75 12.8461
1002 LA062490 -0040 76 12.8152
1002 LA010890 -0039 77 12.7234
1002 LA053190 -0097 78 12.6027
1002 LA113089 -0008 79 12.539
1002 LA072690 -0105 80 12.5135
1002 LA090390 -0022 81 12.5101
1002 LA042089 -0113 82 12.5101
1002 LA021489 -0066 83 12.4705
1002 LA051990 -0071 84 12.4653
1002 LA010390 -0136 85 12.4061
1002 LA072490 -0113 86 12.3331
1002 LA082390 -0255 87 12.3279
1002 LA020389 -0025 88 12.3093
1002 LA062090 -0141 89 12.209
1002 LA080589 -0119 90 12.0649
1002 LA062989 -0106 91 12.0412
1002 LA020590 -0041 92 12.0158
1002 LA071690 -0055 93 12.0142
1002 LA012989 -0121 94 11.9852
1002 LA070389 -0053 95 11.9649
1002 LA072889 -0050 96 11.9473
1002 LA052590 -0174 97 11.9151
1002 LA082689 -0053 98 11.8467
1002 LA081589 -0030 99 11.7189
7
1002 LA022090 -0112 100 11.7173
1003: stock market in
ation
1003 LA091890 -0170 1 18.4022
1003 LA031190 -0184 2 18.1166
1003 LA090990 -0196 3 18.0174
1003 LA072490 -0125 4 17.9245
1003 LA050989 -0162 5 17.8938
1003 LA112189 -0159 6 17.8852
1003 LA051289 -0181 7 17.7477
1003 LA011690 -0127 8 17.6092
1003 LA101489 -0111 9 17.5734
1003 LA030790 -0158 10 17.5611
1003 LA051389 -0075 11 17.5294
1003 LA031189 -0085 12 17.4619
1003 LA042890 -0108 13 17.3584
1003 LA031589 -0115 14 17.2168
1003 LA082490 -0148 15 17.1143
1003 LA051290 -0069 16 17.0726
1003 LA080790 -0165 17 16.9372
1003 LA082490 -0057 18 16.7524
1003 LA032089 -0108 19 16.6958
1003 LA041289 -0145 20 16.6925
1003 LA120389 -0224 21 16.6684
1003 LA052090 -0030 22 16.662
1003 LA021189 -0068 23 16.6549
1003 LA051690 -0154 24 16.6147
1003 LA102289 -0218 25 16.5102
1003 LA082090 -0077 26 16.4341
1003 LA122689 -0024 27 16.3631
1003 LA042389 -0140 28 16.3407
1003 LA032389 -0042 29 16.2976
1003 LA070390 -0114 30 16.1985
1003 LA043089 -0190 31 16.1222
1003 LA091990 -0156 32 16.1195
1003 LA091989 -0149 33 16.1097
1003 LA070789 -0002 34 16.0847
1003 LA122689 -0130 35 16.0476
1003 LA052189 -0060 36 16.0238
1003 LA122390 -0142 37 15.9884
1003 LA050290 -0103 38 15.9348
1003 LA051090 -0241 39 15.9295
1003 LA022790 -0076 40 15.9183
1003 LA102089 -0033 41 15.8606
1003 LA091989 -0115 42 15.8533
1003 LA081790 -0147 43 15.8475
8
1003 LA012590 -0169 44 15.822
1003 LA072889 -0171 45 15.8096
1003 LA051490 -0104 46 15.7983
1003 LA020590 -0104 47 15.7952
1003 LA052489 -0097 48 15.7633
1003 LA092490 -0113 49 15.7531
1003 LA010489 -0065 50 15.7523
1003 LA042490 -0030 51 15.7515
1003 LA012589 -0064 52 15.7403
1003 LA120790 -0144 53 15.7395
1003 LA090290 -0188 54 15.712
1003 LA042389 -0141 55 15.6881
1003 LA012789 -0175 56 15.6672
1003 LA082790 -0064 57 15.6512
1003 LA083090 -0253 58 15.6174
1003 LA032090 -0111 59 15.5466
1003 LA082689 -0080 60 15.486
1003 LA080790 -0098 61 15.4822
1003 LA101789 -0152 62 15.4454
1003 LA100990 -0165 63 15.3944
1003 LA090589 -0051 64 15.3141
1003 LA022589 -0034 65 15.2955
1003 LA063089 -0170 66 15.1439
1003 LA022689 -0064 67 15.1222
1003 LA013090 -0156 68 15.1002
1003 LA021489 -0161 69 15.0902
1003 LA110690 -0072 70 14.9719
1003 LA042090 -0041 71 14.9576
1003 LA032889 -0044 72 14.9477
1003 LA022390 -0066 73 14.9206
1003 LA062090 -0149 74 14.9032
1003 LA080790 -0097 75 14.8755
1003 LA051189 -0173 76 14.8456
1003 LA010290 -0045 77 14.8444
1003 LA123089 -0087 78 14.786
1003 LA090889 -0070 79 14.7116
1003 LA032790 -0073 80 14.7093
1003 LA020189 -0095 81 14.6673
1003 LA030389 -0094 82 14.6662
1003 LA102589 -0127 83 14.6612
1003 LA080490 -0097 84 14.6589
1003 LA111090 -0141 85 14.6584
1003 LA061189 -0021 86 14.6374
1003 LA082190 -0054 87 14.6314
1003 LA072590 -0139 88 14.6172
1003 LA022889 -0033 89 14.5907
9
1003 LA021589 -0045 90 14.5689
1003 LA072990 -0011 91 14.5517
1003 LA071990 -0071 92 14.528
1003 LA081890 -0112 93 14.5262
1003 LA102689 -0168 94 14.5211
1003 LA031789 -0136 95 14.5134
1003 LA022490 -0132 96 14.4908
1003 LA011989 -0170 97 14.4773
1003 LA083089 -0107 98 14.4645
1003 LA052389 -0070 99 14.4102
1003 LA032790 -0137 100 14.3988
10