Hyphenation in Docbook PDF Tables
Thursday, 17 December 2009 20:58
jho
Hyphenation is complicated. Better minds than mine have detailed why in a great swack of places, so I'm not going to repeat the reasons here. The reason I mention this at all is that one of the places where the complicated nature of hyphenation manifests is within tables (because they sport short line lengths and therefore force more hyphenation decisions per word) and I've recently came across a more elegant solution than I was previously using, which was simply to add zero-width spaces after every single character in every single table entry. At PMC-Sierra, hyphenation within table cells is particularly problematic because many of the table cell entries are in formats that resist hyphenation: URLs, field names, formulas, and other types of content that have no clues that would enable a normal hyphenation routine to figure out where the break should be. The result of this is that in the unmodified Docbook XSL publishing scripts, these entries resist hyphenation and simply bleed into the next cell. Not a popular option with the reader community. The template below can be inserted into your modification layer and addresses the vast majority of hyphenation corner cases. It does this by reading the contents of every table cell and inserting a potential hyphenation break after every type of character found in the tableentry.hyphenate.chars parameter. In the event that Joomla messes up the following XSLT enough to make is unusable, just take a look at the hyphenate-URL template in the stock DocbookXSL (I'm using 1.7.3) style sheets. It's just a minor tweaking to apply it to table entries. I think Bob Stayton pointed this one out to me. Use and enjoy. <xsl:template name="hyphenate-tableentries"> <xsl:param name="entry" select="''"/> <xsl:choose> <xsl:when test="$tableentry.hyphenate = ''"> <xsl:value-of select="$entry"/> </xsl:when> <xsl:when test="string-length($entry) > 1"> <xsl:variable name="char" select="substring($entry, 1, 1)"/> <xsl:value-of select="$char"/> <xsl:if test="contains($tableentry.hyphenate.chars, $char)"> <!-- Do not hyphen in-between // --> <xsl:if test="not($char = '/' and substring($entry,2,1) = '/')"> <xsl:copy-of select="$tableentry.hyphenate"/> </xsl:if> </xsl:if> <!-- recurse to the next character --> <xsl:call-template name="hyphenate-tableentries"> <xsl:with-param name="entry" select="substring($entry, 2)"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="$entry"/> </xsl:otherwise> </xsl:choose> </xsl:template><!-- two parameters used by the process; one to determine the list of characters to break words on, an the other to set the character you want to use to indication hyphenation. In my case, I don't insert a character at all --> <xsl:param name="tableentry.hyphenate.chars">,_.-/&;:</xsl:param> <xsl:param name="tableentry.hyphenate"></xsl:param><!-- call the template at every occurance of the entry element --> <xsl:template match="entry//text()"> <xsl:call-template name="hyphenate-tableentries"> <xsl:with-param name="entry" select="."/> </xsl:call-template> </xsl:template>
Last Updated on Thursday, 04 March 2010 16:45
|
DeltaXML, changebars, and Docbook
Friday, 24 July 2009 21:28
jho
One of the most important things to understand as a Technical Communications manager when you're transitioning your company documentation and documentation process away from being Microsoft Word based and towards a centralized CMS is that you're also transitioning away from being a departmental speedbump and towards being a potential barricade. This transition can be terrifying at certain moments, like the moments when a critical document needs to be published and your CMS is spewing error messages rather than neatly formatted PDFs. This happened to me while ago. When attempting to publish a differenced version of a docbook publication which had been created with the most excellent DeltaXML Core tool, the XEP processor began spewing "com.renderx.xep.lib.InternalException: No matching point found for change-bar" along with details about exactly which change-bar was lacking a companion change-bar-end. Digging into the FO code revealed that there was indeed a matching change bar end for problematic opening tag, and we were off to the races, 'cause a missing closing tag could be a problem in my XSLT, but when the end is there an XEP can't find it, it looks more like a bug in XEP, which is roughly a thousand times more frightening. I can patch my XSLT. I can't patch XEP. Two days worth of troubleshooting later it turned out that the problem was a known bug within XEP interacting with some extra spaces added to the output XSL:FO by the DeltaXML Docbook XSLT. It wouldn't happen everywhere in the document because it only crops up in conjuction with certain kinds of rollbacks that XEP performs to satisfy keep restraints on orphens, etc. The solution was pretty simple. I had to modify the DeltaXML XSLT to add a zero-width character () immediately before and after every change-bar-end. The document published, the screaming stopped, my ears healed, we all moved on.
Last Updated on Monday, 04 January 2010 15:35
Bluestream XDocs CMS 2.1
Tuesday, 21 April 2009 15:14
jho
There's a lot I could write about Bluestream and the XDocs product. The 1.1 version was...not well suited to the setting that we tried to deploy it into. The 2.1 version, however, is nearly as fast, stable, and pleasing as the 1.1 version was slow, buggy, and frustrating. It requires users to learn new ways of doing things. I don't apologize for that; change requires change, and this change is progress. It supports DITA beautifully and is adaptable to other schemas with some fairly minor development. The bundled XMLmind editor combined with the low licensing cost makes this system the undisputed value leader in its space. Any sysadmin who takes the time needed to learn how to leverage the combination of the two tools (XDocs and XMLmind) can't help but be impressed with what this system can do for such a stunningly low price. I don't work for Bluestream, have received no considerations from Bluestream, and have, in fact, directed a large volume of money towards Bluestream. If I have my way, I'll be directing more.
Last Updated on Tuesday, 03 November 2009 23:11
|
File under "My Shameful Secret"
Thursday, 12 November 2009 20:44
jho
One of the drums that I regularly beat when indoctrinating new users to an XML CMS is to stop thinking that layout and formatting matter so much; the content matters most, and the look of it is a detail. Authors who work in tools like Word tend to get lazy and start using formatting tricks to get around the necessity of clear language and clear thought. Writing clearly is difficult, and I'm firmly of the belief that most of the resistance to XML authoring comes from people discovering that they can no longer take the easy way out of writing their content. When one can't run back to funky formatting to make up the shortcomings in one's writing, one needs to actually sit down, think, plan, and write like an adult. This is what I mean when I say that appearance is a detail and content is king. And I believe it, I really do. However... However, the look of a long document that has been produced by the publishing scripts I've created at PMC thrills me. It's embarrassing, but true. A document that was produced in Word or Framemaker or other tools that allow users to introduce tweaks to formatting may make the individual page that was tweaked look a little better, but when you're scanning a document hundreds or thousands of pages long, the reliable, mechanical, predictable layout and formatting of an XML-sourced document gives the document as a whole a solidity and air of professionalism that you can just never get otherwise. Everything in its place, everything flawlessly regular, every compromise a consistent compromise. It's gorgeous. So the truth is that I'm just a layout junkie like the rest of them; I'm just a junkie on a different scale. What the heck; go big.
Last Updated on Monday, 01 February 2010 21:32
Current Employer: PMC-Sierra
Thursday, 23 April 2009 00:00
jho
My current contract is at PMC-Sierra, Inc. I've been contracted to manage the development and rollout of a Bluestream CMS while maintaining existing XML resources and systems. This contract will end December 31, 2010. Addendum... It's a done deal. Bluestream XDocs 2.1 is up and running with users in Burnaby, Freemont, Shanghai, Bangalore, and misc. remote points. We are way out of beta and edging Microsoft Word out the door on a whole slew of fronts. We're getting one more significant upgrade to XMLmind which should significantly improve performance when opening large Docbook files (600+ pages, 200+ images) and after that we're moving onto some more interesting problems like automating data mashups and Ecilpse help generation. Good times.
Last Updated on Monday, 25 January 2010 16:07
|
|
|
|
|
|
Page 1 of 2 |