html/adminmanual-5.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
   2 <HTML>
   3 <HEAD>
   4  <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
   5  <TITLE>The DXSpider Installation and Administration Manual : Filtering</TITLE>
   6  <LINK HREF="adminmanual-6.html" REL=next>
   7  <LINK HREF="adminmanual-4.html" REL=previous>
   8  <LINK HREF="adminmanual.html#toc5" REL=contents>
   9 </HEAD>
  10 <BODY>
  11 <A HREF="adminmanual-6.html">Next</A>
  12 <A HREF="adminmanual-4.html">Previous</A>
  13 <A HREF="adminmanual.html#toc5">Contents</A>
  14 <HR>
  15 <H2><A NAME="s5">5. Filtering</A></H2>
  16
  17 <P>Filters can be set for spots, announcements and WWV.  You will find the directories for these under /spider/filter.  You will find some examples in the directories with the suffix <EM>.issue</EM>.  There are two types of filter, one for incoming information and one for outgoing information. Outgoing filters are in the form <EM>CALLSIGN.pl</EM> and incoming filters are in the form <EM>in_CALLSIGN.pl</EM>.  Filters can be set for both nodes and users.
  18 <P>
  19 <P>All filters work in basically the same way.  There are several elements delimited by commas.
  20 There can be many lines in the filter and they are read from the top by the program.
  21 When writing a filter you need to think carefully about just what you want to achieve.  You
  22 are either going to write a filter to <EM>accept</EM> or to <EM>reject</EM>.
  23 Think of a filter as having 2 main elements.  For a reject filter, you would have a line
  24 or multiple lines rejecting the things you do not wish to receive and then a default
  25 line accepting everything else that is not included in the filter.  Likewise, for an
  26 accept filter, you would have a line or multiple lines accepting the things you wish
  27 to receive and a default line rejecting everthing else.
  28 <P>
  29 <P>In the example below, a user requires a filter that would only return SSB spots
  30 posted in Europe on the HF bands.  This is achieved by first rejecting the CW section
  31 of each HF band and rejecting all of VHF, UHF etc based on frequency.
  32 Secondly, a filter rule is set based on CQ zones to only accept spots posted in
  33 Europe.  Lastly, a default filter rule is set to reject anything outside the filter.
  34 <P>
  35 <BLOCKQUOTE><CODE>
  36 <PRE>
  37 $in = [
  38         [ 0, 0, 'r', # reject all CW spots
  39                 [
  40                 1800.0, 1850.0,
  41                 3500.0, 3600.0,
  42                 7000.0, 7040.0,
  43                 14000.0, 14100.0,
  44                 18068.0, 18110.0,
  45                 21000.0, 21150.0,
  46                 24890.0, 24930.0,
  47                 28000.0, 28180.0,
  48                 30000.0, 49000000000.0,
  49                 ] ,1 ],
  50         [ 1, 11, 'n', [ 14, 15, 16, 20, 33, ], 15 ], #accept EU
  51         [ 0, 0, 'd', 0, 1 ], # 1 = want, 'd' = everything else
  52 ];
  53 </PRE>
  54 </CODE></BLOCKQUOTE>
  55 <P>
  56 <P>The actual elements of each filter are described more fully in the following sections.
  57 <P>
  58 <H2><A NAME="ss5.1">5.1 Spots</A>
  59 </H2>
  60
  61 <P>The elements of the Spot filter are ....
  62 <P>
  63 <BLOCKQUOTE><CODE>
  64 <PRE>
  65 [action, field_no, sort, possible_values, hops]
  66 </PRE>
  67 </CODE></BLOCKQUOTE>
  68 <P>
  69 <P>There are 3 elements here to look at.  Firstly, the action element.  This is very simple and only 2 possible states exist, accept (1) or drop (0).
  70 <P>
  71 <P>The second element is the field_no.  There are 13 possiblities to choose from here ....
  72 <P>
  73 <BLOCKQUOTE><CODE>
  74 <PRE>
  75       0 = frequency
  76       1 = call
  77       2 = date in unix format
  78       3 = comment
  79       4 = spotter
  80       5 = spotted dxcc country
  81       6 = spotter's dxcc country
  82       7 = origin
  83       8 = spotted itu
  84       9 = spotted cq
  85       10 = spotter's itu
  86       11 = spotter's cq
  87       12 = callsign of the channel on which the spot has appeared
  88 </PRE>
  89 </CODE></BLOCKQUOTE>
  90 <P>
  91 <P>The third element tells us what to expect in the fourth element.  There are 4 possibilities ....
  92 <P>
  93 <BLOCKQUOTE><CODE>
  94 <PRE>
  95      n - numeric list of numbers e.g. [ 1,2,3 ]
  96      r - ranges of pairs of numbers e.g. between 2 and 4 or 10 to 17 - [ 2,4, 10,17 ]
  97      a - an alphanumeric regex
  98      d - the default rule
  99 </PRE>
 100 </CODE></BLOCKQUOTE>
 101 <P>
 102 <P>The fifth element is simply the hops to set in this filter.  This would only be used if the filter was for a node of course and overrides the hop count in hop_table.pl.
 103 <P>
 104 <P>So, let's look at an example spot filter.  It does not matter in the example who the filter is to be used for.
 105 So, what do we need in the filter?  We need to filter the spots the user/node requires and also set a default rule for anything else outside the filter.  Below is a simple filter that stops spots arriving from outside Europe.
 106 <P>
 107 <BLOCKQUOTE><CODE>
 108 <PRE>
 109 $in = [
 110   [ 0, 4, 'a', '^(K|N|A|W|VE|VA|J)'],  # 0 = drop, 'a' = alphanumeric
 111   [ 1, 0, 'd', 0, 1 ],                 # 1 = want, 'd' = everything else
 112                      ];
 113 </PRE>
 114 </CODE></BLOCKQUOTE>
 115 <P>
 116 <P>So the filter is wrapped in between a pair of square brackets.  This tells Spider to look in between these limits.  Then each line is contained within its own square brackets and ends with a comma.
 117 Lets look carefully at the first line.  The first element is 0 (drop).  Therefore anything we put on this line will not be accepted.  The next element is 4.  This means we are filtering by the spotter.  The third element is the letter "a" which tells the program to expect an alphanumeric expression in the fourth element.  The fourth element is a list of letters separated by the pipe symbol.
 118 <P>
 119 <P>What this line does is tell the program to drop any spots posted by anyone in the USA, Canada or Japan.
 120 <P>
 121 <P>The second line is the default rule for anything else.  The "d" tells us this and the line simply reads... accept anything else.
 122 <P>
 123 <P>You can add as many lines as you need to complete the filter but if there are several lines of the same type it is neater to enclose them all as one line.  An example of this is where specific bands are set.  We could write this like this ....
 124 <P>
 125 <BLOCKQUOTE><CODE>
 126 <PRE>
 127 [ 0,0,'r',[1800.0, 2000.0], 1],
 128 [ 0,0,'r',[10100.0, 10150.0], 1],
 129 [ 0,0,'r',[14000.0, 14350.0], 1],
 130 [ 0,0,'r',[18000.0, 18200.0], 1],
 131 </PRE>
 132 </CODE></BLOCKQUOTE>
 133 <P>
 134 <P>But the line below achieves the same thing and is more efficient ....
 135 <P>
 136 <BLOCKQUOTE><CODE>
 137 <PRE>
 138   [ 0, 0, 'r',
 139     [
 140       1800.0, 2000.0,         # top band
 141       10100.0, 10150.0,       # WARC
 142       14000.0, 14350.0,       # 20m
 143       18000.0, 18200.0,       # WARC
 144     [ ,1 ],
 145 </PRE>
 146 </CODE></BLOCKQUOTE>
 147 <P>
 148 <P>
 149 <H2><A NAME="ss5.2">5.2 Announcements</A>
 150 </H2>
 151
 152 <P>
 153 <BLOCKQUOTE><CODE>
 154 <PRE>
 155
 156 # This is an example announce or filter allowing only West EU announces
 157 #
 158 # The element list is:-
 159 # 0 - callsign of announcer
 160 # 1 - destination * = all, &lt;callsign> = routed to the node
 161 # 2 - text
 162 # 3 - * - sysop, &lt;some text> - special list eg 6MUK, ' ', normal announce
 163 # 4 - origin
 164 # 5 - 0 - announce, 1 - wx
 165 # 6 - channel callsign (the interface from which this spot came)
 166
 167 $in = [
 168         [ 1, 0, 'a', '^(P[ABCDE]|DK0WCY|G|M|2|EI|F|ON)' ],
 169         [ 0, 0, 'd', 0 ]
 170 ];
 171 </PRE>
 172 </CODE></BLOCKQUOTE>
 173 <P>In this example, only the prefixes listed will be allowed.  It is possible to be quite specific.  The Dutch prefix "P" is followed by several secondary identifiers which are allowed.  So, in the example, "PA" or "PE" would be ok but not "PG".  It is even possible to allow information from a single callsign.  In the example this is DK0WCY, to allow the posting of his Aurora Beacon.
 174 <P>
 175 <H2><A NAME="ss5.3">5.3 WWV</A>
 176 </H2>
 177
 178 <P>
 179 <BLOCKQUOTE><CODE>
 180 <PRE>
 181
 182 # This is an example WWV filter
 183 #
 184 # The element list is:-
 185 # 0 - nominal unix date of spot (ie the day + hour:13)
 186 # 1 - the hour
 187 # 2 - SFI
 188 # 3 - K
 189 # 4 - I
 190 # 5 - text
 191 # 6 - spotter
 192 # 7 - origin
 193 # 8 - incoming interface callsign
 194
 195 # this one doesn't filter, it just sets the hop count to 6 and is
 196 # used mainly just to override any isolation from WWV coming from
 197 # the internet.
 198
 199 $in = [
 200         [ 1, 0, 'd', 0, 6 ]
 201 ];
 202 </PRE>
 203 </CODE></BLOCKQUOTE>
 204 <P>
 205 <P>It should be noted that the filter will start to be used only once a user/node has logged out and back in again.
 206 <P>I am not going to spend any more time on these filters now as they will become more "comprehensive" in the near future.
 207 <P>
 208 <H2><A NAME="ss5.4">5.4 Filtering Mail</A>
 209 </H2>
 210
 211 <P>In the /spider/msg directory you will find a file called badmsg.pl.issue.  Rename this to badmsg.pl and edit the file.  The original looks something like this ....
 212 <P>
 213 <BLOCKQUOTE><CODE>
 214 <PRE>
 215
 216 # the list of regexes for messages that we won't store having
 217 # received them (bear in mind that we must receive them fully before
 218 # we can bin them)
 219
 220
 221 # The format of each line is as follows
 222
 223 #     type      source             pattern
 224 #     P/B/F     T/F/O/S            regex
 225
 226 # type: P - private, B - bulletin (msg), F - file (ak1a bull)
 227 # source: T - to field, F - from field,  O - origin, S - subject
 228 # pattern: a perl regex on the field requested
 229
 230 # Currently only type B and P msgs are affected by this code.
 231 #
 232 # The list is read from the top down, the first pattern that matches
 233 # causes the action to be taken.
 234
 235 # The pattern can be undef or 0 in which case it will always be selected
 236 # for the action specified
 237
 238
 239
 240 package DXMsg;
 241
 242 @badmsg = (
 243 'B',    'T',    'SALE',
 244 'B',    'T',    'WANTED',
 245 'B',    'S',    'WANTED',
 246 'B',    'S',    'SALE',
 247 'B',    'S',    'WTB',
 248 'B',    'S',    'WTS',
 249 'B',    'T',    'FS',
 250 );
 251 </PRE>
 252 </CODE></BLOCKQUOTE>
 253 <P>
 254 <P>I think this is fairly self explanatory.  It is simply a list of subject headers that we do not want to pass on to either the users of the cluster or the other cluster nodes that we are linked to.  This is usually because of rules and regulations pertaining to items for sale etc in a particular country.
 255 <P>
 256 <H2><A NAME="ss5.5">5.5 Filtering DX callouts</A>
 257 </H2>
 258
 259 <P>In the same way as mail, there are some types of spot we do not wish to pass on to users or linked cluster nodes.  In the /spider/data directory you will find a file called baddx.pl.issue.  Rename this to baddx.pl and edit the file.  The original looks like this ....
 260 <P>
 261 <BLOCKQUOTE><CODE>
 262 <PRE>
 263
 264 # the list of dx spot addresses that we don't store and don't pass on
 265
 266
 267 package DXProt;
 268
 269 @baddx = qw
 270
 271  FROG
 272  SALE
 273  FORSALE
 274  WANTED
 275  P1RATE
 276  PIRATE
 277  TEST
 278  DXTEST
 279  NIL
 280  NOCALL
 281 );
 282 </PRE>
 283 </CODE></BLOCKQUOTE>
 284 <P>
 285 <P>Again, this is simply a list of names we do not want to see in the spotted field of a DX callout.
 286 <P>
 287 <P>
 288 <HR>
 289 <A HREF="adminmanual-6.html">Next</A>
 290 <A HREF="adminmanual-4.html">Previous</A>
 291 <A HREF="adminmanual.html#toc5">Contents</A>
 292 </BODY>
 293 </HTML>