Keith Denny
2014-09-05 02:49:52 UTC
Hello,
I am attempting to use POI to support a document/template tool and am
receiving unexpected results when I am parsing through an XWPFDocument.
Specifically, when I start reviewing each line of text, the String return
from the XWPFRun.getText() call is not the same text that is visible in the
actual document. Here are my specific details:
*Simple Use Case*
- Create a MS Word 2010 document, i.e. Test.docx NOTE: Although I am
basically doing something similar to MS templates, I am not using a .dotx
file; rather, my starting point is a .docx file.
- In the document, insert a text *<<TAG>>* such as 'Dear <<CLIENT_NAME>>'
NOTE: In the Word document, the line of characters 'Dear
<<CLIENT_NAME>>' exists all on a single line
- The *<<TAG>>* is a placeholder that will be dynamically replaced by a
custom document management system. In this case, there is a system entity
tag with the identifier as <<CLIENT_NAME>> and when the document is parsed,
the code will look to see if the entity tag, such as <<CLIENT_NAME>>,
exists in the document and will replace it with a real runtime value.
*Simplified Code:*
InputStream in = mContent.getBinaryStream();
String _newText;
XWPFDocument _doc = new XWPFDocument(in);
for (XWPFParagraph p : _doc.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null) {
LinkedHashMap<String, String> _entityMap =
(LinkedHashMap<String, String>)req.getSession().getAttribute("ENTITY_MAP");
Set<String> _entityKeys = _entityMap.keySet();
for (String key:_entityKeys) {
if (text.contains(key.trim())) {
_newText =
next.replace(key,_entityMap.get(key));
r.setText(_newText, 0);
}
}
}
}
}
*Results:*
One call to r.getText(0) returns only '<<CLIENT_' ;therefore, there's no
match with the comparison check of the entity tag of <<CLIENT_NAME>>. The
following call to r.getText(0) returns only 'NAME>>'. Again, obviously, no
match.
Sometimes, r.getText(0) returns <<CLIENT_NAME and leaves the trailing ">>"
for the next call to r.getText(0). Again, obviously, no match.
Sometimes, some tags do get returned by XWPFRun.getText() and the
substitution occurs as planned.
*Questions*
1. If the literal string of characters in the actual MS Word document exist
in one single line of text, why does XWPFRun.getText() return the line as
multiple sets of text characters?
2. How do I ensure that I get the actual line, as it exists in the MS Word
document, in POI so I can inspect and replace key text?
Any help would be greatly appreciated. Thank you in advance for your
feedback.
Sincerely,
Keith G. Denny
I am attempting to use POI to support a document/template tool and am
receiving unexpected results when I am parsing through an XWPFDocument.
Specifically, when I start reviewing each line of text, the String return
from the XWPFRun.getText() call is not the same text that is visible in the
actual document. Here are my specific details:
*Simple Use Case*
- Create a MS Word 2010 document, i.e. Test.docx NOTE: Although I am
basically doing something similar to MS templates, I am not using a .dotx
file; rather, my starting point is a .docx file.
- In the document, insert a text *<<TAG>>* such as 'Dear <<CLIENT_NAME>>'
NOTE: In the Word document, the line of characters 'Dear
<<CLIENT_NAME>>' exists all on a single line
- The *<<TAG>>* is a placeholder that will be dynamically replaced by a
custom document management system. In this case, there is a system entity
tag with the identifier as <<CLIENT_NAME>> and when the document is parsed,
the code will look to see if the entity tag, such as <<CLIENT_NAME>>,
exists in the document and will replace it with a real runtime value.
*Simplified Code:*
InputStream in = mContent.getBinaryStream();
String _newText;
XWPFDocument _doc = new XWPFDocument(in);
for (XWPFParagraph p : _doc.getParagraphs()) {
for (XWPFRun r : p.getRuns()) {
String text = r.getText(0);
if (text != null) {
LinkedHashMap<String, String> _entityMap =
(LinkedHashMap<String, String>)req.getSession().getAttribute("ENTITY_MAP");
Set<String> _entityKeys = _entityMap.keySet();
for (String key:_entityKeys) {
if (text.contains(key.trim())) {
_newText =
next.replace(key,_entityMap.get(key));
r.setText(_newText, 0);
}
}
}
}
}
*Results:*
One call to r.getText(0) returns only '<<CLIENT_' ;therefore, there's no
match with the comparison check of the entity tag of <<CLIENT_NAME>>. The
following call to r.getText(0) returns only 'NAME>>'. Again, obviously, no
match.
Sometimes, r.getText(0) returns <<CLIENT_NAME and leaves the trailing ">>"
for the next call to r.getText(0). Again, obviously, no match.
Sometimes, some tags do get returned by XWPFRun.getText() and the
substitution occurs as planned.
*Questions*
1. If the literal string of characters in the actual MS Word document exist
in one single line of text, why does XWPFRun.getText() return the line as
multiple sets of text characters?
2. How do I ensure that I get the actual line, as it exists in the MS Word
document, in POI so I can inspect and replace key text?
Any help would be greatly appreciated. Thank you in advance for your
feedback.
Sincerely,
Keith G. Denny