techdoc/protocol.pod

   1 =head1 NAME
   2
   3 DXSpiderWeb Orthogonal Communications Protocol
   4
   5 =head1 SYNOPSIS
   6
   7  <Origin>,<TimeSeq>,<Hop>,<FrmUser>,<To>,<ToUser>|<Tag>,<Data>...
   8
   9 =head1 ABSTRACT
  10
  11 For many years DX Clusters have used a protocol which was designed
  12 for a non-looped tree of nodes. This has probably never, reliably,
  13 been achieved in practice; certainly not recently. This document
  14 describes a complete replacement for that protocol. It allows a
  15 fully looped network, is inherently extensible and should be simple
  16 to implement (especially in perl).
  17
  18 All implementations of this protocol shall B<only> use this protocol
  19 for inter-node communications.
  20
  21 =head1 DESCRIPTION
  22
  23 This protocol is encoded in UTF8 with HTTP style escaping. It is
  24 designed to be an extensible basis for any type of one to many
  25 "instant" line-based communications tasks.
  26
  27 This protocol is designed to be flood routed in a meshed network in
  28 as efficient a manner as possible.
  29
  30 Each message consists of a L</Routing Section> and a L</Command Section>.
  31 The two sections are separated with the '|' character and the whole
  32 message is terminated in the standard RFC/Internet manner with the
  33 ascii <carraige return><linefeed> characters. It follows that these
  34 characters (as well as a small number of other reserved characters)
  35 can only be sent escaped. This is described further in the
  36 L</Command Section>.
  37
  38 Most of this document is concerned with the L</Routing Section>, however
  39 some L</Standard Commands> which all implementation should issue and
  40 must accept are described.
  41
  42 =head1 Routing Section
  43
  44 The application that implements this protocol is essentially a line
  45 oriented message router. One line equals one message. Each line is
  46 effectively a datagram.
  47
  48 It is assumed that nodes are connected to
  49 each other using a "reliable" streaming protocol such as TCP/IP or
  50 AX25. Having said that: in context, messages in this protocol could be
  51 multi/broadcast, either "as is" or wrapped in some other framing
  52 protocol.
  53
  54 Because this is an unreliable, best effort, "please route my packets
  55 through your node" protocol, there is no guarantee that a message
  56 will get to the other side of a mesh of nodes. There may be a
  57 discontinuity either caused by outage or deliberate filtering.
  58
  59 However, as it is envisaged that most messages will be flood routed or,
  60 in the case of directed messages (those that have L</To> and/or
  61 L</ToUser> fields) down all interfaces showing a route for that
  62 direction, it is unlikely that messages will be lost in practice.
  63
  64 =head2 Field Description
  65
  66 Only the first three fields in the L</Routing Section> are compulsory
  67 and indicate that this is a broadcast to be sent to all nodes coming
  68 from the L</Origin>. If the message needs to be identified as coming
  69 from a user on a node, then the L</FrmUser> field is added.
  70
  71 Adding a L</To> and/or L</ToUser> field will restrict the destinations
  72 or recipients that receive this message.
  73
  74 The L</Hop> field is incremented on receipt of a message on a node.
  75
  76 Fields are separated by the comma ',' character with the last field
  77 required followed by the vertical bar '|' character.
  78
  79 If trailing fields are missed out then superfluous commas can also
  80 be left out. If intervening fields are missing then no space needs
  81 to be left for the separating comma.
  82
  83 The characters allowed in the routing section are restricted. Any
  84 invalid characters in any field will cause the whole message to be
  85 silently dropped.
  86
  87 More detailed descriptions of the fields follow:
  88
  89 =over
  90
  91 =item Origin
  92
  93 This is a compulsory field. It is the name of the originating node.
  94 The field can contain up to 12 characters in the set [-A-Z0-9_] in
  95 any order. Higher layers may restrict this further.
  96
  97 The field must not be changed by any other node.
  98
  99 =item TimeSeq
 100
 101 This is a compulsory field. It is a 10 hexadecimal digit string which
 102 consists of a day no (1-31), seconds within that day (0-86399) [6
 103 hex digits] that are concatenated with a sequence number (0-65535)
 104 [4 hex digits] making the total of 10.
 105
 106 The date portion is constructed as:
 107
 108   my $date = ((gmtime)[3] << 18) | (time % 86400);
 109
 110 The sequence number is simply an unsigned short (or 16 bit) number
 111 starting at 0.
 112
 113 Each message originated at this node will increment the sequence
 114 number.
 115
 116 =item Hop
 117
 118 This is a compulsory field. It is the number of hops from the
 119 originating node. It is incremented immediately on receipt and
 120 before determining its value.
 121
 122 So the originating node sends a message with a L</Hop> of 0, the
 123 neighbouring nodes must increment this field before passing
 124 it on to higher layers for onward processing.
 125
 126 Implementations may have an upper limit to this field and may
 127 silently drop incoming messages with a L</Hop> count greater than the
 128 limit.
 129
 130 =item FrmUser
 131
 132 This field is optional. It is the identifier of the originating
 133 user.  If it is missing then the message is
 134 assumed to come from the originating node itself.
 135
 136 It can consist of up to 12 characters in the set [-A-Z0-9_]
 137 in any order. Higher layers may restrict this further.
 138
 139 =item To
 140
 141 This field is optional. It is a string of up to 12 characters
 142 in the set [-A-Z0-9_] in any order.
 143
 144 This field is used either to indicate particular node destination
 145 or to differentiate this broadcast in some way by making this
 146 message as a member of a L</Channel>. Any message can be sent
 147 down any L</Channel>. The names of L</Channel>s and their usage
 148 is entirely up to the implementor.
 149
 150 It is assumed that node names can be differentiated from user
 151 names and L</Channel> names.
 152
 153 If the field is set to a particular node destination, it will
 154 be routed (rather than broadcast) to that node. However, any
 155 intervening nodes are free to duplicate the message and send
 156 it down more than one, likely looking, interface - depending on any
 157 network policies that may pertain.
 158
 159 =item ToUser
 160
 161 This field is optional. It is a string of up to 12 characters
 162 in the set [-A-Z0-9_] in any order. Higher layers may restrict
 163 this further.
 164
 165 Conventionally this field is used to indicate the user to whom
 166 this message is directed. In an ideal world the L</To> field
 167 will be set, by the originating node, to the identifier of the node
 168 on which this user resides.
 169
 170 If the L</To> field is not set then this message will be
 171 broadcast. However, should a node become apparent (on route)
 172 then nodes are free to fill in the L</To> field and proceed
 173 with a more directed approach.
 174
 175 If it becomes apparent (on route) that there may be more than
 176 one possible L</To> destination for a L</ToUser> then a node
 177 may duplicate the message (keeping the same L</TimeSeq>) and
 178 route it onwards. Because of the L</DeDuplication> inherent in
 179 the system, it is indeterminate as to which destination will
 180 receive the message. It is possible for all or just some
 181 destinations to receive the message. The tuple (L</Origin>,
 182 L</TimeSeq>) will determine uniqueness.
 183
 184 This field can, in the case where L</To>
 185 is set to the name of a node, be set to a L</Channel>. If this
 186 is the case then this will cause this message to be sent to
 187 a L</Channel> on the L</To> node only.
 188
 189 =back
 190
 191 =head2 Channel
 192
 193 Channels are a concept very similar to that on IRC. It is a
 194 way of segregating data flows in a network. In principle, subject
 195 to local policy or application requirements, any data (or
 196 L</Command Section>) can be sent down any channel.
 197
 198 It is up to the implementation whether to use this feature or not.
 199
 200 =head2 Routing
 201
 202 It is assumed that nodes will be connected in a looped network with
 203 more than one route available (in many cases) to another node.
 204
 205 In anycase, most traffic is not directed, but broadcast to all users
 206 on all nodes.
 207
 208 Each message is uniquely identified by the (L</Origin>,L</TimeSeq>)
 209 tuple. The basic system will learn which interfaces can see what nodes
 210 by looking at the tuple and merging that with the L</Hop> count.
 211 Each interface remembers the latest L</TimeSeq> with the lowest L</Hop>
 212 for each L</Origin> that arrives on that interface. It also remembers
 213 the number of messages for that L</Origin> that has been received on
 214 that interface.
 215
 216 Any message for onward broadcast is duplicated and sent out on all
 217 interfaces that it did not come in on.
 218
 219 Any message that is directed to a particular node will be sent out on
 220 the "best" interface based on routing information gathered so far. If there
 221 is more than one possible route then, depending on network or local
 222 policy, the message may be duplicated and sent on other interfaces
 223 as well.
 224
 225 =head2 DeDuplication
 226
 227 On receipt of a message, its unique tuple (L</Origin>,L</TimeSeq>) is
 228 checked against a hash table. If it exists: the message is silently
 229 dropped. If it does not exist in the hash table then the tuple is
 230 added.
 231
 232 The hash table is periodically cleaned, removing tuples that
 233 have expired. The length of time a tuple remains in the hash table
 234 is implementation dependant but could easily be several days, if
 235 required.
 236
 237 This mechanism only ensures that a message broadcast around the network
 238 travels the least distance and through the fewest nodes possible. It
 239 is up to higher layers to make sure that data carried is not, itself,
 240 duplicated!
 241
 242 =head2 Examples
 243
 244  # on link startup
 245  GB7TLH,3D02350001,0|HELLO
 246
 247  # on user startup
 248  GB7TLH,3D042506F2,0,G1TLH|HELLO
 249
 250  # on user disconnection
 251  GB7TLH,3D9534F32D,0,G1TLH|BYE
 252
 253  # a talk (actually 'text') message to a user (some distance away
 254  # from the origin node)
 255  GB7TLH,3D03450019,3,G1TLH,GB7BAA,G8TIC|T,Hiya Mike what's happening?
 256
 257  # a talk/chat/text message to a channel or group
 258  GB7TLH,0413525F23,2,G1TLH,VHF|T,2m is opening on MS
 259
 260  # a ping to find the whereabouts and distance of a user from a node
 261  # the hex number on the end is the ping ID
 262  GB7TLH,1512346543,0,,,G7BRN|PING,9F4D
 263
 264  # the same from a user on GB7TLH
 265  GB7TLH,1512346543,0,G1TLH,,G7BRN|PING,23
 266
 267  # this effectively asks whether the user is on-line on a particular node
 268  GB7TLH,1512346543,0,G1TLH,GB7DJK,G7BRN|PING,35DE
 269
 270  # A possible reply, same ID as ping followed by the no of hops on the
 271  # received ping
 272  GB7DJK,1512450534,3,G7BRN,GB7TLH,G1TLH|PONG,35DE,3
 273
 274
 275 =head1 Command Section
 276
 277 The L</Command Section> of the message contains the actual data being
 278 passed. It is called the Command Section because all commands
 279 are identified with a L</Tag> which is implemented by
 280 the software using this protocol.
 281
 282 The L</Tag> is separated from its data by a comma ','. All fields
 283 in any subsequent data shall be separated by a comma ','.
 284 All fields shall
 285 be HTTP encoded such that reserved characters (comma ',',
 286 vertical bar '|',
 287 percent '%',
 288 equals '='
 289 and non printable characters less than 127 (or %7F in hex)
 290 [including newline and carraige return] are tranlated to
 291 their two hex digit equivalent preceeded by the percent '%' character.
 292
 293 For example:
 294
 295  "%0D%0A" is "<carriage return><linefeed>".
 296  "hello%2C there" is "hello, there"
 297
 298 This is not standard CSV, fields are not quoted (delimited with either
 299 ' or ").
 300
 301 All national characters above 127 are UTF8 encoded in the
 302 standard perl 5.8.x way. It follows that all (perl) programs that
 303 are written according to this specification must say:
 304
 305  use UTF8;
 306
 307 A message (or line) is terminated with <carriage return><linefeed>
 308 0x0d 0x0a. Incoming messages must be accepted even when terminated
 309 with just <linefeed>.
 310
 311 Care must be taken to make sure that fields have any reserved characters
 312 encoded. In particular: it is perfectly permissible to have <linefeed>
 313 characters in a field - so long as they are escaped.
 314
 315 Fields come in two styles: either simple fields (just containing
 316 data) or B<key>=B<value> pairs. Each pair must be separated from
 317 the next by a comma ','. The B<key> must consist of the set of
 318 characters [a-z0-9_] (ie lowercase letters, digits and underscore),
 319 with a leading letter. The B<value> must be HTTP encoded as
 320 specified above and can otherwise contain any character.
 321
 322 There is no maximum size specified for a message. It is up to each
 323 implimentation to enforce one (if only for their own protection).
 324
 325 =head2 Tag
 326
 327 The L</Tag> consists of string of uppercase letters and digits, starting
 328 with a leading, uppercase, letter. Tags should be as short as is meaningful.
 329
 330 Valid tags would be:
 331
 332  DX
 333  PC23
 334  ANN
 335
 336 Invalid tags include:
 337
 338  1AAA
 339  dx
 340  Ann
 341
 342 =head2 Standard Commands
 343
 344 There are a number of L</Standard Commands> which must be accepted by
 345 all implementations.
 346
 347 =head1 AUTHOR
 348
 349 Dirk Koopman, G1TLH, E<lt>djk@tobit.co.ukE<gt>
 350
 351 =head1 COPYRIGHT AND LICENSE
 352
 353 Copyright 2004 by Dirk Koopman, G1TLH
 354
 355 This library is free software; you can redistribute it and/or modify
 356 it under the same terms as Perl itself.
 357
 358 =cut
 359
 360