1 # =====================================================================
2 # xmlbreak.sed:
3 # break long [x]html input into lines, for better RCS versioning.
4 #
5 # Copyright (c) 2007,2008,2009 Carlo Strozzi
6 #
7 # This program is free software; you can redistribute it and/or modify
8 # it under the terms of the GNU General Public License as published by
9 # the Free Software Foundation; version 2 dated June, 1991.
10 #
11 # This program is distributed in the hope that it will be useful,
12 # but WITHOUT ANY WARRANTY; without even the implied warranty of
13 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14 # GNU General Public License for more details.
15 #
16 # You should have received a copy of the GNU General Public License
17 # along with this program; if not, write to the Free Software
18 # Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
19 #
20 # =====================================================================
21
22 # =====================================================================
23 # This program is meant to be used instead of tidy(1) on data produced
24 # by some buggy GUI editors, such as TinyMCE, that produce broken
25 # markup which will not pass through tidy. The resulting data may still
26 # be rendered correctly by browsers, so this may still be acceptable
27 # HTML but broken XHTML. So, if the latter mode is used, we need to
28 # tweak the input data which is not going to be scanned by tidy(1).
29 # This filter also helps not to bloat RCS versioning files because of
30 # those HTML editors that send all the data in one single line.
31 # =====================================================================
32
33 # Turn CR+LF line-end convention into NL.
34 s/
$//
35
36 # Same as above, but for Macintosh clients.
37 s/
/\
38 /g
39
40 # This looks a bit elaborated at a first glance, but we need to avoid
41 # the insertion of an additional newline at every repeated page editing.
42
43 s/\(.\)<\([^\/]\)/\1\
44 <\2/g
45
46 # The TinyMCE WYSIWYG AJAX editor insists on turning hard line breaks
47 # into
sequences, which will result in duplicated line breaks in
48 #
element contents that end with real newlines. Although it is 49 # not up to TypeWriter to try and fix bugs of other programs, in this 50 # case it may be worth doing it since the solution is general enough. 51 # It sounds logical to me that if a
follows a newline, then the 52 # newline can be safely removed as the task of breaking the line is 53 # the rendered page is left to the
itself. 54 55 s/\n
/
/g 56 57 # Make sure a blank is inserted before each newline, or XHTML 1.1 will 58 # swallow the spacing between the two words that are across the break. 59 60 s/ *$/ / 61 62 # EOF