forked from multi-io/xml-mapping
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
1084 lines (868 loc) · 48.2 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
= XML-MAPPING: XML-to-object (and back) Mapper for Ruby, including XPath Interpreter
Xml-mapping is an easy to use, extensible library that allows you to
semi-automatically map Ruby objects to XML trees and vice versa.
== Download
For downloading the latest version, CVS repository access etc. go to:
http://rubyforge.org/projects/xml-mapping/
== Contents of this Document
- {Example}[aref:example]
- {Single-attribute Nodes}[aref:sanodes]
- {Default Values}[aref:defaultvalues]
- {Single-attribute Nodes with Sub-objects}[aref:subobjnodes]
- {Attribute Handling Details, Augmenting Existing Classes}[aref:attrdefns]
- {Other Nodes}[aref:onodes]
- {choice_node}[aref:choice_node]
- {Readers/Writers}[aref:readerswriters]
- {Multiple Mappings per Class}[aref:mappings]
- {Defining your own Node Types}[aref:definingnodes]
- {XPath Interpreter}[aref:xpath]
== {Example}[a:example]
(example document stolen + extended from
http://www.castor.org/xml-mapping.html)
=== Input Document:
:include: order.xml
=== Mapping Class Declaration:
:include: order.rb
=== Usage:
:include: order_usage.intout
As shown in the example, you have to include XML::Mapping into a class
to turn it into a "mapping class". There are no other restrictions
imposed on mapping classes; you can add attributes and methods to
them, include additional modules in them, derive them from other
classes, derive other classes from them etc.pp.
An instance of a mapping class can be created from/converted into an
XML node with methods like XML::Mapping::ClassMethods.load_from_xml,
XML::Mapping#save_to_xml, XML::Mapping::ClassMethods.load_from_file,
XML::Mapping#save_to_file. Special class methods like "text_node",
"array_node" etc., called *node* *factory* *methods*, may be called
from the body of the class definition to define instance attributes
that are automatically and bidirectionally mapped to subtrees of the
XML element an instance of the class is mapped to.
== {Single-attribute Nodes}[a:sanodes]
For example, in the definition
class Address
include XML::Mapping
text_node :city, "City"
text_node :state, "State"
numeric_node :zip, "ZIP"
text_node :street, "Street"
end
the first call to #text_node creates an attribute named "city" which
is mapped to the text of the XML child element defined by the XPath
expression "City" (xml-mapping includes an XPath interpreter that can
also be used seperately; see below[aref:xpath]). When you create an
instance of +Address+ from an XML element (using
Address.load_from_file(file_name) or
Address.load_from_xml(rexml_element)), that instance's "city"
attribute will be set to the text of the XML element's "City" child
element. When you convert an instance of +Address+ into an XML
element, a sub-element "City" is added and its text is set to the
current value of the +city+ attribute. The other node types
(numeric_node, array_node etc.) work analogously. Generally said, when
an instance of the above +Address+ class is created from or converted
to an XML tree, each of the four nodes in the class maps some parts of
that XML tree to a single, specific attribute of the +Adress+
instance. The name of that attribute is given in the first argument to
the node factory method. Such a node is called a "single-attribute
node". All node types that come with xml-mapping except one
(+choice_node+, which I'll talk about below) are single-attribute
nodes.
=== {Default Values}[a:defaultvalues]
For each single-attribute node you may define a <i>default value</i>
which will be set if there was no value defined for the attribute in
the XML source.
From the example:
class Signature
include XML::Mapping
text_node :position, "Position", :default_value=>"Some Employee"
end
The semantics of default values are as follows:
- when creating a new instance from scratch:
- attributes with default values are set to their default values
- attributes without default values are left unset
(when defining your own initializer, you'll have to call the
inherited _initialize_ method in order to get this behaviour)
- when loading an instance from an XML document:
- attributes without default values that are not represented in the
XML raise an error
- attributes with default values that are not represented in the XML
are set to their default values
- all other attributes are set to their respective values as present
in the XML
- when saving an instance to an XML document:
- unset attributes without default values raise an error
- attributes with default values that are set to their default
values are not saved
- all other attributes are saved
This implies that:
- attributes that are set to their respective default values are not
represented in the XML
- attributes without default values must be set explicitly before
saving
=== {Single-attribute Nodes with Sub-objects}[a:subobjnodes]
Single-attribute nodes of type +array_node+, +hash_node+, and
+object_node+ recursively map one or more subtrees of their XML to
sub-objects (e.g. array elements or hash values) of their
attribute. For example, with the line
array_node :signatures, "Signed-By", "Signature", :class=>Signature, :default_value=>[]
, an attribute named "signatures" is added to the surrounding class
(here: +Order+); the attribute will be an array whose elements
correspond to the XML sub-trees yielded by the XPath expression
"Signed-By/Signature" (relative to the tree corresponding to the
+Order+ instance). Each element will be of class +Signature+
(internally, each element is created from its corresponding XML
subtree by just calling
<tt>Signature.load_from_xml(the_subtree)</tt>). The reason why the
path "Signed-By/Signature" is provided in two arguments instead of
just one combined one becomes apparent when marshalling the array
(along with the surrounding +Order+ object) back into a sequence of
XML elements. When that happens, "Signed-By" names the common base
element for all those elements, and "Signature" is the path that will
be duplicated for each element. For example, when the +signatures+
attribute contains an array with 3 +Signature+ instances (let's call
them <tt>sig1</tt>, <tt>sig2</tt>, and <tt>sig3</tt>) in it, it will
be marshalled to an XML tree that looks like this:
<Signed-By>
<Signature>
[marshalled object sig1]
</Signature>
<Signature>
[marshalled object sig2]
</Signature>
<Signature>
[marshalled object sig3]
</Signature>
</Signed-By>
Internally, each +Signature+ instance is stored into its
<tt><Signature></tt> sub-element by calling
<tt>the_signature_instance.fill_into_xml(the_sub_element)</tt>. The
input document in the example above shows how this ends up looking.
<tt>hash_node</tt>s work similarly, but they define hash-valued attributes
instead of array-valued ones.
<tt>object_node</tt>s are the simplest of the three types of
single-attribute nodes with sub-objects. They just map a single given
subtree directly to their attribute value. See the example for
examples :)
The mentioned methods +load_from_xml+ and +fill_into_xml+ are the only
methods classes must implement in order to be usable in the
<tt>:class=></tt> keyword arguments to node factory methods. Mapping
classes (i.e. classes that <tt>include XML::Mapping</tt>)
automatically inherit those functions and can thus be readily used in
<tt>:class=></tt> arguments, as shown for the +Signature+ class in the
+array_node+ call above. In addition to that, xml-mapping adds those
methods to some of Ruby's core classes, namely +String+ and +Numeric+
(and thus +Float+, +Integer+, and +BigInt+). So you can also use
strings or numbers as sub-objects of attributes of +array_node+,
+hash_node+, or +object_node+ nodes. For example, say you have an XML
document like this one:
:include: stringarray.xml
, and you want to map all the names to a string array attribute
+names+, you could do it like this:
:include: stringarray.rb
usage:
:include: stringarray_usage.intout
As a side node, this feature actually makes +text_node+ and
+numeric_node+ special cases of +object_node+. For example,
<tt>text_node :attr, "path"</tt> is the same as <tt>object_node :attr,
"path", :class=>String</tt>.
==== Polymorphic Sub-objects, Marshallers/Unmarshallers
Besides the <tt>:class</tt> keyword argument, there are alternative
ways for a single-attribute node with sub-objects to specify the way
the sub-objects are created from/marshalled into their subtrees.
First, it's possible not to specify anything at all -- in that case,
the class of a sub-object will be automatically deduced from the root
element name of its subtree. This allows you to achieve a kind of
"polymorphic", late-bound way to decide about the sub-object's
class. The following example document contains a hierarchical,
recursive set of named "documents" and "folders", where folders hold a
set of entries, each of which may again be either a document or a
folder:
:include: documents_folders.xml
This can be mapped to Ruby like this:
:include: documents_folders.rb
Usage:
:include: documents_folders_usage.intout
As you see, the <tt>Folder#entries</tt> attribute is mapped via an
array_node that does not specify a <tt>:class</tt> or anything else to
govern the instantiation of the array's elements. This causes
xml-mapping to deduce the class of each array element from the root
element name of the corresponding XML tree. In this example, the root
element name is either "document" or "folder". The mapping between
root element names and class names is the one briefly described in
example[aref:example] at the beginning of this document -- the
unqualified class name is just converted to lower case and "dashed",
e.g. Foo::Bar::MyClass becomes "my-class"; and you may overwrite this
on a per-class basis by calling <tt>root_element_name
"the-new-name"</tt> in the class body. In our example, the root
element name "document" leads to an instantiation of class +Document+,
and the root element name "folder" leads to an instantiation of class
+Folder+.
Incidentally, the last example shows that you can readily derive
mapping classes from one another (as said before, you can also derive
mapping classes from other classes, include other modules into them
etc. at will). This works just like intuition thinks it should -- when
deriving one mapping class from another one, the list of nodes in
effect when loading/saving instances of the derived class will consist
of all nodes of that class and all superclasses, starting with the
topmost superclass that has nodes defined. There is one thing to take
care of though: When deriving mapping classes from one another, you
have to make sure to <tt>include XML::Mapping</tt> in each class. This
requirement exists purely due to ease-of-implementation
considerations; there are probably ways to do away with it, but the
inconvenience seemed not severe enough for me to bother (as
yet). Still, you might get "strange" errors if you forget to do it for
a class.
Besides the <tt>:class</tt> keyword argument and no argument, there is
a third way to specify the way the sub-objects are created
from/marshalled into their subtrees: <tt>:marshaller</tt> and/or
<tt>:unmarshaller</tt> keyword arguments. Here you pass procs in which
you just do all the work manually. So this is basically a "catch-all"
for cases where the other two alternatives are not appropriate for the
problem at hand. (*TODO*: Use other example?) Let's say we want to
extend the +Signature+ class from the initial example to include the
date on which the signature was created. We want the new XML
representation of such a signature to look like this:
:include: time_node_w_marshallers.xml
So, a new "signed-on" element was added that holds the day, month, and
year. In the +Signature+ instance in Ruby, we want the date to be
stored in an attribute named +signed_on+ of type +Time+ (that's Ruby's
built-in +Time+ class).
One could think of using +object_node+, but something like
<tt>object_node :signed_on, "signed-on", :class=>Time</tt> won't work
because +Time+ isn't a mapping class and doesn't define methods
+load_from_xml+ and +fill_into_xml+ (we could easily define those
though; we'll talk about that possibility here[aref:attrdefns] and
here[aref:definingnodes]). The fastest, most ad-hoc way to achieve
what we want are :marshaller and :unmarshaller keyword arguments, like
this:
:include: time_node_w_marshallers.intout
The <tt>:unmarshaller</tt> proc will be called whenever a +Signature+
instance is being read in from an XML source. The +xml+ argument
passed to the proc contains (as a REXML::Element instance) the XML
subtree corresponding to the node's attribute's sub-object currently
being read. In the case of our +object_node+, the sub-object is just
the node's attribute (+signed_on+) itself, and the subtree is the one
rooted at the <signed-on> element (if this were e.g. an +array_node+,
the <tt>:unmarshaller</tt> proc would be called once for each array
element, and +xml+ would hold the subtree corresponding to the
"current" array element). The proc is expected to extract the
sub-object's data from +xml+ and return the sub-object. So we have to
read the "year", "month", and "day" elements, construct a +Time+
instance from them and return that. One could just use the REXML API
to do that, but I've decided here to use the XPath interpreter that
comes with xml-mapping (xml/xxpath), and specifically the
'xml/xxpath_methods' utility library that adds methods like +first+ to
REMXML::Element. We call +first+ on +xml+ three times, passing XPath
expressions to extract the "year"/"month"/"day" sub-elements,
construct the +Time+ instance from that and return it. The XPath
library is explained in more detail below[aref:xpath].
The <tt>:marshaller</tt> proc will be called whenever a +Signature+
instance is being written into an XML tree. +xml+ is again the XML
subtree rooted at the <signed-on> element (it will still be empty when
this proc is called), and +value+ is the current value of the
sub-object (again, since this is an +object_node+, +value+ is the
node's attribute, i.e. the +Time+ instance). We have to fill +xml+
with the data from +value+ here. So we add three elements "year",
"month" and "day" and set their texts to the corresponding values from
+value+. The commented-out code shows an alternative implementation of
the same thing using the XPath interpreter.
It should be mentioned again that :marshaller/:unmarshaller procs are
possible with all single-attribute nodes with sub-objects, i.e. with
+object_node+, +array_node+, and +hash_node+. So, if you wanted to map
a whole array of date values, you could use +array_node+ with the same
:marshaller/:unmarshaller procs as above, for example:
array_node :birthdays, "birthdays", "birthday",
:unmarshaller=> <as above>,
:marshaller=> <as above>
You can see that :marshaller/:unmarshaller procs give you more
flexibility, but they also impose more work because you essentially
have to do all the work of marshalling/unmarshalling the sub-objects
yourself. If you find yourself copying and pasting
marshaller/unmarshaller procs all over the place, you should instead
define your own node type or mix the marshalling/unmarshalling
capabilities into the +Time+ class itself. This is explained
here[aref:attrdefns] and here[aref:definingnodes], and you'll see that
it's not really much more work than writing :marshaller and
:unmarshaller procs (you essentially just move the code from those
procs into your own node type resp. into the +Time+ class), so you
should not hesitate to do this.
Another thing worth mentioning is that you don't have to specify
*both* a :marshaller and an :unmarshaller simultaneously. You can as
well give only one of them, and in addition to that pass a
<tt>:class</tt> argument or no argument. When you do that, the
specified marshaller (or unmarshaller) will be used when marshalling
(resp. unmarshalling) the sub-objects, and the other passed argument
(<tt>:class</tt> or none) will be employed when unmarshalling
(resp. marshalling) the sub-objects. So, in effect, you can deactivate
or "short-cut" some part of the marshalling/unmarshalling
functionality of a node type while retaining another part.
=== {Attribute Handling Details, Augmenting Existing Classes}[a:attrdefns]
I'll shed some more light on how single-attribute nodes add mapped
attributes to Ruby classes. An attribute declaration like
text_node :city, "City"
maps some portion of the XML tree (here: the "City" sub-element) to an
attribute (here: "city") of the class whose body the declaration
appears in. When writing (marshalling) instances of the surrounding
class into an XML document, xml-mapping will read the attribute value
from the instance using the function named +city+; when reading
(unmarshalling) an instance from an XML document, xml-mapping will use
the one-parameter function <tt>city=</tt> to set the attribute in the
instance to the value read from the XML document.
If these functions don't exist at the time the node declaration is
executed, xml-mapping adds default implementations that simply
read/write the attribute value to instance variables that have the
same name as the attribute. For example, the +city+ attribute
declaration in the +Address+ class in the example added functions
+city+ and <tt>city=</tt> that read/write from/to the instance
variable <tt>@city</tt>.
If, however, these functions already exist prior to defining the
attributes, xml-mapping will leave them untouched, so your precious
self-written accessor methods that do whatever complicated internal
processing of the data won't be overwritten.
This means that you can not only create new mapping classes from
scratch, you can also take existing classes that contain some
"business logic" and "augment" them with xml-mapping capabilities. As
a simple example, let's augment Ruby's "Time" class with node
declarations that declare XML mappings for the day, month etc. fields:
:include: time_augm.intout
Here XML mappings are defined for the existing fields +year+, +month+
etc. Xml-mapping noticed that the getter methods for those attributes
existed, so it didn't overwrite them. When calling +save_to_xml+ on a
+Time+ object, these methods are called and return the object's values
for those fields, which then get written to the output XML.
So you can convert +Time+ objects into XML trees. What about reading
them back in from XML? All XML reading operations go through
<tt><Class>.load_from_xml</tt>. The +load_from_xml+ class method
inherited from XML::Mapping (see
XML::Mapping::ClassMethods#load_from_xml) allocates a new instance of
the class (+Time+), then calls +fill_from_xml+
(i.e. XML::Mapping#fill_from_xml) on it. +fill_from_xml+ iterates over
all our nodes in the order of their definition. For each node, its
data (the <year>, or <month>, or <day> etc. element) is read from the
XML source and then written to the +Time+ instance via the respective
setter method (<tt>year=</tt>, <tt>month=</tt>, <tt>day=</tt>
etc.). These methods didn't exist in +Time+ before (+Time+ objects are
immutable), so xml-mapping defined its own, default setter methods
that just set <tt>@year</tt>, <tt>@month</tt> etc. This is of course
pretty useless because +Time+ objects don't hold their time in these
variables, so the setter methods don't really change the time of the
+Time+ object. So we have to redefine +load_from_xml+ for the +Time+
class:
:include: time_augm_loading.intout
== {Other Nodes}[a:onodes]
All nodes I've shown so far (node types text_node, numeric_node,
boolean_node, object_node, array_node, and hash_node) were
single-attribute nodes: The first parameter to the node factory method
of such a node is an attribute name, and the attribute of that name is
the only piece of the state of instances of the node's mapping class
that gets read/written by the node.
=== {choice_node}[a:choice_node]
There is one node type distributed with xml-mapping that is not a
single-attribute node: +choice_node+. A +choice_node+ allows you to
specify a sequence of pairs, each consisting of an XPath expression
and another node (any node is supported here, including other
choice_nodes). When reading in an XML source, the choice_node will
delegate the work to the first node in the sequence whose
corresponding XPath expression was matched in the XML. When writing an
object back to XML, the choice_node will delegate the work to the
first node whose data was "present" in the object (for
single-attribute nodes, the data is considered "present" if the node's
attribute is non-nil; for choice_nodes, the data is considered
"present" if at least one of the node's sub-nodes is "present").
As a (somewhat contrived) example, here's a mapping for +Publication+
objects that have either a single author (contained in an "author" XML
attribute) or several "contributors" (contained in a sequence of
"contr" XML elements):
:include: publication.intout
The symbols :if, :then, and :elsif (but not :else -- see below) in the
+choice_node+'s node factory method call are ignored; they may be
sprinkled across the argument list at will (preferably the way shown
above of course) to increase readability.
The rest of the arguments specify the mentioned sequence of XPath
expressions and corresponding nodes.
When reading a +Publication+ object from XML, the XPath expressions
from the +choice_node+ (<tt>@author</tt> and +contr+) will be matched
in sequence against the source XML tree until a match is found or the
end of the argument list is reached. If the end is reached, an
exception is raised. Otherwise, for the first XPath expression that
matched, the corresponding node will be invoked (i.e. used to read
actual data from the XML source into the +Person+ object). If you
specify :else, :default, or :otherwise in place of an XPath
expression, this is treated as an XPath expression that always
matches. So you can use :else (or :default or :otherwise) for a
"fallback" node that will be used if none of the other XPath
expressions matched (an example for this follows).
When writing a +Publication+ object back to XML, the first node in the
sequence whose data is "present" in the source object will be invoked
to write data from the object into the target XML tree (and the
corresponding XPath expression will be created in the XML tree if it
doesn't exist already). If there is no such node in the sequence, an
exception is raised. As said above, for single-attribute nodes, the
node's data is considered "present" if the node's attribute is
non-nil. So, if you write a +Publication+ object to XML, and either
the +author+ or the +contributors+ attribute of the object is set, it
will be written; if both attributes are nil, an exception will be
raised.
A frequent use case for choice_nodes will probably be object
attributes that may be represented in multiple alternative ways in
XML. As an example, consider "Person" objects where the name of the
person should be stored alternatively in a sub-element named +name+,
or an attribute named +name+, or in the text of the +person+ element
itself. You can achieve this with +choice_node+ like this:
:include: person.intout
Here all sub-nodes of the choice_nodes are single-attribute nodes
(text_nodes) with the same attribute (+name+). As you see, when
writing persons to XML, the name is always stored in a <name>
sub-element. Of course, this is because that alternative appears first
in the choice_node.
=== {Readers/Writers}[a:readerswriters]
Finally, _all_ nodes support keyword arguments :reader and :writer
which allow you to extend or completely override the reading and/or
writing functionality of the node with your own code. The :reader as
well as the :writer argument must be a proc that takes as its
arguments the Ruby object to be read/written (instance of the mapping
class the node belongs to) and the XML tree to be written to/read
from. An optional third argument may be specified -- it will receive a
proc that wraps the default reader/writer functionality of the
node.
The :reader proc is for reading (from the XML into the object), the
:writer proc is for writing (from the object into the XML).
Here's a (really contrived) example:
:include: reader.intout
So there's a "Foo" class with a text_node that would by default
(without the :reader and :writer proc) map the Ruby attribute "name"
to the XML attribute "name". The :reader proc is invoked when reading
from XML into a +Foo+ object. The +xml+ argument is the XML tree,
+obj+ is the object. +default_reader+ is the proc that wraps the
default reading functionality of the node. We invoke it at the
beginning. For this text_node, the default reading functionality is to
take the text of the "name" attribute of +xml+ and put it into the
+name+ attribute of +obj+. After that, we take the text of the "more"
attribute of +xml+ and append it to the +name+ attribute of +obj+. So
the XML tree <tt><foo name="Jim" more="XYZ"/></tt> is converted to a
+Foo+ object with +name+="JimXYZ".
In our :writer proc, we only take +obj+ (the +Foo+ object to be
written to XML) and +xml+ (the XML tree the stuff is to be written
to). Analogously to the :reader, we could take a proc that wraps the
default writing functionality of the node, but we don't do that
here--we completely override the writing functionality with our own
code, which just takes the +name+ attribute of the object and writes
"hi <the name> ho" to a +bar+ XML attribute in the XML tree (stupid
example, I know).
As a special convention, if you specify both a :reader and a :writer
for a node, and in both cases you do /not/ call the default behaviour,
then you should use the generic node type +node+, e.g.:
class SomeClass
include XML::Mapping
....
node :reader=>proc{|obj,xml| ...},
:writer=>proc{|obj,xml| ...}
end
(since you're completely replacing both the reading and the writing
functionality, you're effectively replacing all the functionality of
the node, so it would be pointless and confusing to use one of the
more "specific" node types)
As you see, the purpose of readers and writers is to make it possible
to augment or override a node's functionality arbitrarily, so there
shouldn't be anything that's absolutely impossible to achieve with
xml-mapping. However, if you use readers and writers without invoking
the default behaviour, you really do everything manually, so you're
not doing any less work than you would do if you weren't using
xml-mapping at all. So you'll probably use readers and/or writers for
those bits of your mapping semantics that can't be achieved with
xml-mapping's predefined node types (an alternative approach might be
to override the +post_load+ and/or +post_save+ instance methods on the
mapping class -- see the reference documentation).
An advice similar to the one given above for marshallers/unmarshallers
applies here as well: If you find yourself writing lots of readers and
writers that only differ in some easily parameterizable aspects, you
should think about defining your own node types. We talk about that
below[aref:definingnodes], and it generally just means that you move
the (sensibly parameterized) code from your readers/writers to your
node types.
== {Multiple Mappings per Class}[a:mappings]
Sometimes you might want to represent the same Ruby object in multiple
alternative ways in XML. For example, the name of a "Person" object
could be represented either in a "name" element or a "name" attribute.
xml-mapping supports this by allowing you to define multiple disjoint
"mappings" for a mapping class. A mapping is by convention identified
with a symbol, e.g. <tt>:my_mapping</tt>, <tt>:other_mapping</tt>
etc., and each mapping comprises a root element name and a set of node
definitions. In the body of a mapping class definition, you switch to
another mapping with <tt>use_mapping :the_mapping</tt>. All following
node declarations will be added to that mapping *unless* you specify
the option :mapping=>:another_mapping for a node declaration (all node
types support that option). The default mapping (the mapping used if
there was no previous +use_mapping+ in the class body) is named
<tt>:_default</tt>.
All the worker methods like <tt>load_from_xml/file</tt>,
<tt>save_to_xml/file</tt>, <tt>load_object_from_xml/file</tt> support
a <tt>:mapping</tt> keyword argument to specify the mapping, which
again defaults to <tt>:_default</tt>.
In the following example, we define two mappings (the default one and
a mapping named <tt>:other</tt>) for +Person+ objects with a name, an
age and an address:
:include: examples/person_mm.intout
In this example, each of the two mappings contains nodes that map the
same set of Ruby attributes (name, age and address). This is probably
what you want most of the time (since you're normally defining
multiple XML mappings for the same Ruby data), but it's not a
necessity at all. When a mapping class is defined, xml-mapping will
add all Ruby attributes from all mappings to it.
You may have noticed that the <tt>object_node</tt>s in the +Person+
class apply the mapping they were themselves defined in to their
sub-ordinated class (+Address+). This is the case for all
{Single-attribute Nodes with Sub-objects}[aref:subobjnodes]
(+object_node+, +array_node+ and +hash_node+) unless you explicitly
specify a different mapping for the sub-object(s) using the option
:sub_mapping, e.g.
object_node :address, "address", :class=>Address, :sub_mapping=>:other
== {Defining your own Node Types}[a:definingnodes]
It's easy to write additional node types and register them with the
xml-mapping library (the following node types come with xml-mapping:
+node+, +text_node+, +numeric_node+, +boolean_node+, +object_node+,
+array_node+, +hash_node+, +choice_node+).
I'll first show an example, then some more theoretical insight.
=== Example
Let's say we want to extend the +Signature+ class from the example to
include the time at which the signature was created. We want the new
XML representation of such a signature to look like this:
:include: order_signature_enhanced.xml
(we only save year, month and day to make this example shorter), and
the mapping class declaration to look like this:
:include: order_signature_enhanced.rb
(i.e. a new "time_node" declaration was added).
We want this +time_node+ call to define an attribute named +signed_on+
which holds the date value from the XML document in an instance of
class +Time+.
This node type can be defined with this piece of code:
:include: time_node.rb
The last line registers the new node type with the xml-mapping
library. The name of the node factory method ("time_node") is
automatically derived from the class name of the node type
("TimeNode").
There will be one instance of the node type +TimeNode+ per +time_node+
declaration per mapping class (not per mapping class instance). That
instance (the "node" for short) will be created by the node factory
method (+time_node+); there's no need to instantiate the node type
directly. The +time_node+ method places the node into the mapping
class; the @owner attribute of the node is set to reference the
mapping class. The node factory method passes the mapping class the
node appears in (+Signature+), followed by its own arguments, to the
node's constructor. In the example, the +time_node+ method calls
<tt>TimeNode.new(Signature, :signed_on, "signed-on",
:default_value=>Time.now)</tt>). +new+ of course creates the node and
then delegates the arguments to our initializer +initialize+. We first
call the superclass's initializer, which strips off from the argument
list those arguments it handles itself, and returns the remaining
ones. In this case, the superclass XML::Mapping::SingleAttributeNode
handles the +Signature+, <tt>:signed_on</tt> and
<tt>:default_value=>Time.now</tt> arguments -- +Signature+ is stored
into <tt>@owner</tt>, <tt>:signed_on</tt> is stored into
<tt>@attrname</tt>, and <tt>{:default_value=>Time.now}</tt> is stored
into <tt>@options</tt>. The remaining argument list
<tt>["signed-on"]</tt> is returned; we capture the
<tt>"signed-on"</tt> string in _path_ (the rest of the argument list
(an empty array) we capture in _args_ for returning it at the end of
the initializer. This isn't strictly necessary, it's just a convention
that a node class initializer should always return those arguments it
didn't handle itself). We'll interpret _path_ as an XPath expression
that locates the time value relative to the parent mapping object's
XML tree (in this case, this would be the XML tree rooted at the
<tt><Signature></tt> element, i.e. the tree the +Signature+ instance
was read from). We'll later have to read/store the year, month, and
day values from <tt>path+"/year"</tt>, <tt>path+"/month"</tt>, and
<tt>path+"/day"</tt>, respectively, so we create (and precompile)
three corresponding XPath expressions using XML::XXPath.new and store
them into member variables of the node. XML::XXPath is an XPath
implementation that is bundled with xml-mapping. It is very
incomplete, but it supports writing (not just reading) of XML nodes,
which is needed to support writing data back to XML. The XML::XXPath
library is explained in more detail below[aref:xpath].
The +extract_attr_value+ method is called whenever an instance of the
mapping class the node belongs to (+Signature+ in the example) is
being created from an XML tree. The parameter _xml_ is that tree
(again, this is the tree rooted at the <tt><Signature></tt> element in
this example). The method implementation is expected to extract the
single attribute's value from _xml_ and return it, or raise
XML::Mapping::SingleAttributeNode::NoAttrValueSet if the attribute was
"unset" in the XML (this exception tells the framework that the
default value should be put in place if it was defined), or raise any
other exception to signal an error and abort the whole process. Our
superclass XML::Mapping::SingleAttributeNode will store the returned
single attribute's value into the <tt>signed_on</tt> attribute of the
+Signature+ instance being read in. In our implementation, we apply
the xpath expressions created during initialization to _xml_
(e.g. <tt>@y_path.first(xml)</tt>). An expression
_xpath_expr_.first(_xml_) returns (as a REXML element) the first
sub-element of _xml_ that matches _xpath_expr_, or raises
XML::XXPathError if there was no such element. We apply REXML's _text_
method to the returned element to get out the element's text, convert
it to integer, and supply it to the constructor of the +Time+ object
to be returned. As a side note, if an XPath expression matches XML
attributes, XML::XXPath methods like _first_ will return
XML::XXPath::Accessors::Attribute nodes that behave similarly to
REXML::Element nodes, including support for messages like _name_ and
_text_, so this would've worked also if our XPath expressions had
referred to XML attributes, not elements. The +default_when_xpath_err+
thing calls the supplied block and returns its value, but maps the
exception XML::XXPathError to the mentioned
XML::Mapping::SingleAttributeNode::NoAttrValueSet (any other
exceptions fall through unchanged). As said above,
XML::Mapping::SingleAttributeNode::NoAttrValueSet is caught by the
framework (more precisely, by our superclass
XML::Mapping::SingleAttributeNode), and the default value is set if it
was provided. So you should just wrap +default_when_xpath_err+ around
any applications of XPath expressions whose non-presence in the XML
you want to be considered a non-presence of the attribute you're
trying to extract. (XML::XXPath is designed to know knothing about
XML::Mapping, so it doesn't raise
XML::Mapping::SingleAttributeNode::NoAttrValueSet directly)
The +set_attr_value+ method is called whenever an instance of the
mapping class the node belongs to (+Signature+ in the example) is
being stored into an XML tree. The _xml_ parameter is the XML tree (a
REXML element node; here this is again the tree rooted at the
<tt><Signature></tt> element); _value_ is the current value of the
single attribute (in this example, the <tt>signed_on</tt> attribute of
the +Signature+ instance being stored). _xml_ will most probably be
"half-populated" by the time this method is called -- the framework
calls the +set_attr_value+ methods of all nodes of a mapping class in
the order of their definition, letting each node fill its "bit" into
_xml_. The method implementation is expected to write _value_ into
(the correct sub-elements of) _xml_, or raise an exception to signal
an error and abort the whole process. No default value handling is
done here; +set_attr_value+ won't be called at all if the attribute
had been set to its default value. In our implementation we grab the
year, month and day values from _value_ (which must be a +Time+), and
store it into the sub-elements of _xml_ identified by XPath
expressions <tt>@y_path</tt>, <tt>@m_path</tt> and <tt>@d_path</tt>,
respectively. We do this by calling XML::XXPath#first with an
additional parameter <tt>:ensure_created=>true</tt>. An expression
_xpath_expr_.first(_xml_,:ensure_created=>true) works just like
_xpath_expr_.first(_xml_) if _xpath_expr_ was already present in
_xml_. If it was not, it is created (preferably at the end of _xml_'s
list of sub-nodes), and returned. See below[aref:xpath] for a more
detailed documentation of the XPath interpreter.
=== Element order in created XML documents
As just said, XML::XXPath, when used to create new XML nodes,
generally appends those nodes to the end of the list of subnodes of
the node the xpath expression was applied to. All xml-mapping nodes
that come with xml-mapping use XML::XXPath when writing data to XML,
and therefore also append their data to the XML data written by
preceding nodes (the nodes are invoked in the order of their
definition). This means that, generally, your output data will appear
in the XML document in the same order in which the corresponding
xml-mapping node definitions appeared in the mapping class (unless you
used XPath expressions like foo[number] which explicitly dictate a
fixed position in the sequence of XML nodes). For instance, in the
+Order+ class from the example at the beginning of this document, if
we put the <tt>:signatures</tt> node _before_ the <tt>:items</tt>
node, the <tt><Signed-By></tt> element will appear _before_ the
sequence of <tt><Item></tt> elements in the output XML.
The following is a more systematic overview of the basic node
types. The description is self-contained, so some information from the
previous section will be repeated.
=== Node Types Are Ruby Classes
A node type is implemented as a Ruby class derived from
XML::Mapping::Node or one of its subclasses.
The following node types (node classes) come with xml-mapping (they
all live in the XML::Mapping namespace, which I've left out here for
brevity):
Node
+-SingleAttributeNode
| +-SubObjectBaseNode
| | +-ObjectNode
| | +-ArrayNode
| | +-HashNode
| +-TextNode
| +-NumericNode
| +-BooleanNode
+-ChoiceNode
XML::Mapping::Node is the base class for all nodes,
XML::Mapping::SingleAttributeNode is the base class for
{single-attribute nodes}[aref:sanodes], and
XML::Mapping::SubObjectBaseNode is the base class for
{single-attribute nodes with
sub-objects}[aref:subobjnodes]. XML::Mapping::TextNode,
XML::Mapping::ArrayNode etc. are of course the +text_node+,
+array_node+ etc. we've talked about in this document. When you've
written a new node class, you register it with xml-mapping by calling
<tt>XML::Mapping.add_node_class MyNode</tt>. When you do that,
xml-mapping automatically defines the node factory method for your
class -- the method's name (e.g. +my_node+) is derived from the node's
class name (e.g. Foo::Bar::MyNode) by stripping all parent module
names, and then converting capital letters to lowercase and preceding
them with an underscore. In fact, this is just how all the predefined
node types are defined -- those node types are not "special"; they're
defined in the source file +xml/mapping/standard_nodes.rb+ and then
registered normally in +xml/mapping.rb+. The source code of the
built-in nodes is not very long or complicated; you may consider
reading it in addition to this text to gain a better understanding.
=== How Node Types Work
The xml-mapping core "operates" node types as follows:
==== Node Initialization
As said above, when a node class is registered with xml-mapping by
calling <tt>XML::Mapping.add_node_class TheNodeClass</tt>, xml-mapping
automatically generates the node factory method for that type. The
node factory method will effectively be defined as a class method of
the XML::Mapping module, which is why one can call it from the body of
a mapping class definition. The generated method will create a new
instance of the node class (a *node* for short) by calling _new_ on
the node class. The list of parameters to _new_ will consist of <i>the
mapping class, followed by all arguments that were passed to the node
factory method</i>. For example, when you have this node declaration:
class MyMappingClass
include XML::Mapping
my_node :foo, "bar", 42, :hi=>"ho"
end
, then the node factory method (+my_node+) calls
<tt>MyNode.new(MyMappingClass, :foo, "bar", 42, :hi=>"ho")</tt>.
_new_ of course creates the instance and calls _initialize_ on it. The
_initialize_ implementation will generally store the parameters into
some instance variables for later usage. As a convention, _initialize_
should always extract from the parameter list those parameters it
processes itself, process them, and return an array containing the
remaining (still unprocessed) parameters. Thus, an implementation of
_initialize_ follows this pattern:
def initialize(*args)
myparam1,myparam2,...,myparamx,*args = super(*args)
.... process the myparam1,myparam2,...,myparamx ....
# return still unprocessed args
args
end
(since the called superclass initializer is written the same way, the
parameter array returned by it will already be stripped of all
parameters that the superclass initializer (or any of its
superclasses's initializers) processed)
This technique is a simple way to "chain" the initializers of all
superclasses of a node class, starting with the topmost one (Node), so
that each initializer can easily find out and process the parameters
it is responsible for.
The base node class XML::Mapping::Node provides an _initialize_
implementation that, among other things (described below), adds _self_
(i.e. the created node) to the internal list of nodes held by the
mapping class, and sets the @owner attribute of _self_ to reference
the mapping class.
So, effectively there will be one instance of a node class (a node)
per node definition, and that instance lives in the mapping class the
node was defined in.
==== Node Operation during Marshalling and Unmarshalling
When an instance of a mapping class is created or filled from an XML
tree, xml-mapping will call +xml_to_obj+ on all nodes defined in that
mapping class in the {mapping}[aref:mappings] the node is defined in,
in the order of their definition. Two parameters will be passed: the
mapping class instance being created/filled, and the XML tree the
instance is being created/filled from. The implementation of
+xml_to_obj+ is expected to read whatever pieces of data it is
responsible for from the XML tree and put it into the appropriate
variables/attributes etc. of the instance.
When an instance of a mapping class is stored or filled into an XML
tree, xml-mapping will call +obj_to_xml+ on all nodes defined in that
mapping class in the {mapping}[aref:mappings] the node is defined in,
in the order of their definition, again passing as parameters the
mapping class instance being stored, and the XML tree the instance is
being stored/filled into. The implementation of +obj_to_xml+ is
expected to read whatever pieces of data it is responsible for from
the instance and put it into the appropriate XML elements/XML attr
etc. of the XML tree.
=== Basic Node Types Overview
The following is an overview of how initialization and
marshalling/unmarshalling is implemented in the node base classes
(Node, SingleAttributeNode, and SubObjectBaseNode).
TODO: summary table: member var name; introduced in class; meaning
==== Node
In _initialize_, the mapping class and the option arguments are
stripped from the argument list. The mapping class is stored in
@owner, the option arguments are stored (as a hash) in @options (the
hash will be empty if no options were given). The
{mapping}[aref:mappings] the node is defined in is determined
(:mapping option, last <tt>use_mapping</tt> or <tt>:_default</tt>) and
stored in @mapping. The node then stores itself in the list of nodes
of the mapping class belonging to the mapping
(<tt>@owner.xml_mapping_nodes(:mapping=>@mapping)</tt>; see
XML::Mapping::ClassMethods#xml_mapping_nodes). This list is the list
of nodes later used when marshalling/unmarshalling an instance of the
mapping class with respect to a given mapping. This means that node
implementors will not normally "see" anything of the mapping (they
don't need to access the @mapping variable) because the
marshalling/unmarshalling methods
(<tt>obj_to_xml</tt>/<tt>xml_to_obj</tt>) simply won't be called if
the node's mapping is not the same as the mapping the
marshalling/unmarshalling is happening with.
Furthermore, if :reader and/or :writer options were given,
<tt>xml_to_obj</tt> resp. <tt>obj_to_xml</tt> are transparently
overwritten on the node to delegate to the supplied :reader/:writer
procs.
The marshalling/unmarshalling methods
(<tt>obj_to_xml</tt>/<tt>xml_to_obj</tt>) are not implemented in
+Node+ (they just raise an exception).
==== SingleAttributeNode
In _initialize_, the attribute name is stripped from the argument list
and stored in @attrname, and an attribute of that name is added to the
mapping class the node belongs to.
During marshalling/unmarshalling of an object to/from XML,
single-attribute nodes only read/write a single piece of the object's