Discussion:
Reducing the size of POI
Jeff G
2011-05-26 21:44:57 UTC
Permalink
I'm using POI strictly for *reading Excel xls & xlsx *documents. I'm using
this as part of a Java Web Start app with somewhat low bandwidth. POI is by
far my biggest size hog. Is there any way I can reduce the size of this?
Are all these libraries needed? This is what I have...

dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans

Is all this necessary? Over 10MB of stuff... yikes.

- Jeff
Nick Burch
2011-05-26 22:34:54 UTC
Permalink
Post by Jeff G
I'm using POI strictly for *reading Excel xls & xlsx *documents. I'm
using this as part of a Java Web Start app with somewhat low bandwidth.
POI is by far my biggest size hog. Is there any way I can reduce the
size of this? Are all these libraries needed? This is what I have...
dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans
If you're just doing excel files, you can ditch poi-scratchpad and
poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
can cut it back to only the main poi jar. If you need to work with .xlsx
files, then you need the xml related jars, the poi-ooxml jar, and the cut
down schemas (poi-ooxml-schemas). You might be able to shrink the
ooxml-schemas file by excluding the word and powerpoint related bits,
ditto cutting out the xwpf and xslf parts of poi-ooxml, not sure how much
that'd save.

Nick
Jeff G
2011-05-26 23:58:02 UTC
Permalink
Nick, Great tips - thanks for insight. The xml files are the largest, so
I'm very interested in how to trim them. I opened them up, but I can't tell
by looking what folders are for word, powerpoint, xwpf, and xslf.

- Jeff
Post by Nick Burch
Post by Jeff G
I'm using POI strictly for *reading Excel xls & xlsx *documents. I'm
using this as part of a Java Web Start app with somewhat low bandwidth. POI
is by far my biggest size hog. Is there any way I can reduce the size of
this? Are all these libraries needed? This is what I have...
dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans
If you're just doing excel files, you can ditch poi-scratchpad and
poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
can cut it back to only the main poi jar. If you need to work with .xlsx
files, then you need the xml related jars, the poi-ooxml jar, and the cut
down schemas (poi-ooxml-schemas). You might be able to shrink the
ooxml-schemas file by excluding the word and powerpoint related bits, ditto
cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
save.
Nick
---------------------------------------------------------------------
Dave Fisher
2011-05-27 02:24:23 UTC
Permalink
The poi-ooxml-schemas jar is built from the unit test coverage, you reduce that by giving up unit tests. You can delete them from the directory tree.

You'll need a source distro and then you'll need to delete the parts of the directory tree you don't need. It should be clear what is what, you'll focus on keeping XSSF, HSSF, SS, POIFS, OOXML bases classes...

You'll then need to do your own build with ant.

http://poi.apache.org/howtobuild.html

Regards,
Dave
Post by Jeff G
Nick, Great tips - thanks for insight. The xml files are the largest, so
I'm very interested in how to trim them. I opened them up, but I can't tell
by looking what folders are for word, powerpoint, xwpf, and xslf.
- Jeff
Post by Nick Burch
Post by Jeff G
I'm using POI strictly for *reading Excel xls & xlsx *documents. I'm
using this as part of a Java Web Start app with somewhat low bandwidth. POI
is by far my biggest size hog. Is there any way I can reduce the size of
this? Are all these libraries needed? This is what I have...
dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans
If you're just doing excel files, you can ditch poi-scratchpad and
poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
can cut it back to only the main poi jar. If you need to work with .xlsx
files, then you need the xml related jars, the poi-ooxml jar, and the cut
down schemas (poi-ooxml-schemas). You might be able to shrink the
ooxml-schemas file by excluding the word and powerpoint related bits, ditto
cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
save.
Nick
---------------------------------------------------------------------
Mark Fortner
2011-05-27 05:05:43 UTC
Permalink
This kinda begs the question "is POI modular enough". I've seen a number of
questions arising from people not having the right set of dependent
libraries. But having a lighter weight set of libraries would also be
useful. Perhaps as the original poster suggested, having a separate library
for each type of document would make things easier.

Since I don't tend to build POI I was wondering if it would be difficult to
modify the build to produce separate jars and to perhaps zip up the
dependencies that people keep neglecting to download?

Mark

On May 26, 2011 7:24 PM, "Dave Fisher" <***@comcast.net> wrote:

The poi-ooxml-schemas jar is built from the unit test coverage, you reduce
that by giving up unit tests. You can delete them from the directory tree.

You'll need a source distro and then you'll need to delete the parts of the
directory tree you don't need. It should be clear what is what, you'll focus
on keeping XSSF, HSSF, SS, POIFS, OOXML bases classes...

You'll then need to do your own build with ant.

http://poi.apache.org/howtobuild.html

Regards,
Dave
Nick, Great tips - thanks for insight. The xml files...
Nick Burch
2011-05-27 11:12:21 UTC
Permalink
Post by Mark Fortner
This kinda begs the question "is POI modular enough". I've seen a number of
questions arising from people not having the right set of dependent
libraries. But having a lighter weight set of libraries would also be
useful. Perhaps as the original poster suggested, having a separate library
for each type of document would make things easier.
Given the ratio of questions to the list for "I'm missing a bit of POI
because I've forgotten a jar" to "I don't want all of POI", I think the
push would possibly be towards a single monolithic jar!

There's quite a bit of code that's common between all the components, so
we'd end up with something like:
* poi-core
* poi-hssf
* poi-hslf
* poi-hwpf
* poi-all-other-scratchpad
* poi-ooxml-core
* poi-ooxml-xssf
* poi-ooxml-xwpf
* poi-ooxml-xslf
* poi-ooxml-schemas-core
* poi-ooxml-schemas-xssf
* poi-ooxml-schemas-xwpf
* poi-ooxml-schemas-xslf
and possibly something else... The risk of people missing something or
getting one from the wrong version seems much to high to me!

Also, people interested in getting a cut down version of POI are likely to
all have different requirements. If you want only excel, but also low
memory, then you can exclude much of the hssf usermodel and keep just the
low level parts. It all depends. I think it's probably better for people
with specific requirements to slice and dice it how they need.
Post by Mark Fortner
Since I don't tend to build POI I was wondering if it would be difficult
to modify the build to produce separate jars and to perhaps zip up the
dependencies that people keep neglecting to download?
If you download the binary release, then it has all the dependencies in
it, along with the POI jars and the documentation. If you use maven, it
handles fetching the dependencies for you. They're all already there...

Nick
Jeff G
2011-05-27 13:25:02 UTC
Permalink
I'm sure there are probably technical reasons for the structure, but from
someone that's green to the java world, less jars would make sense to me.
But have a few options based on application, not file type. poi-common.jar,
poi-excel.jar, poi-word.jar, poi-powerpoint.jar. If you want all of office
you have all four files, if you just need Excel, you have two.

What is more common - developer only wanting pre-2003 office support or
current support but for a particular application?

The current structure seems to break it up into core, xml core, and xml
schemas. Is the xml core used without the xml schema? If I were to only
need pre-2003 support, it would probably be simpler to remove the folder for
xml classes than what we'd have to do now to try and break up the
applications.

- Jeff
Post by Nick Burch
Post by Mark Fortner
This kinda begs the question "is POI modular enough". I've seen a number of
questions arising from people not having the right set of dependent
libraries. But having a lighter weight set of libraries would also be
useful. Perhaps as the original poster suggested, having a separate library
for each type of document would make things easier.
Given the ratio of questions to the list for "I'm missing a bit of POI
because I've forgotten a jar" to "I don't want all of POI", I think the push
would possibly be towards a single monolithic jar!
There's quite a bit of code that's common between all the components, so
* poi-core
* poi-hssf
* poi-hslf
* poi-hwpf
* poi-all-other-scratchpad
* poi-ooxml-core
* poi-ooxml-xssf
* poi-ooxml-xwpf
* poi-ooxml-xslf
* poi-ooxml-schemas-core
* poi-ooxml-schemas-xssf
* poi-ooxml-schemas-xwpf
* poi-ooxml-schemas-xslf
and possibly something else... The risk of people missing something or
getting one from the wrong version seems much to high to me!
Also, people interested in getting a cut down version of POI are likely to
all have different requirements. If you want only excel, but also low
memory, then you can exclude much of the hssf usermodel and keep just the
low level parts. It all depends. I think it's probably better for people
with specific requirements to slice and dice it how they need.
Since I don't tend to build POI I was wondering if it would be difficult
Post by Mark Fortner
to modify the build to produce separate jars and to perhaps zip up the
dependencies that people keep neglecting to download?
If you download the binary release, then it has all the dependencies in it,
along with the POI jars and the documentation. If you use maven, it handles
fetching the dependencies for you. They're all already there...
Nick
---------------------------------------------------------------------
Nick Burch
2011-05-27 13:30:21 UTC
Permalink
Post by Jeff G
I'm sure there are probably technical reasons for the structure, but from
someone that's green to the java world, less jars would make sense to me.
You always need the main POI jar. If you just want excel .xls, stop there.
If you want the other binary file formats, add scratchpad.

If you want the ooxml formats, you add the poi-ooxml jar, a schemas jar,
and all the xml dependencies.
Post by Jeff G
What is more common - developer only wanting pre-2003 office support or
current support but for a particular application?
No idea, sorry. I think people tend to either want to write one format, or
read from all of them.
Post by Jeff G
The current structure seems to break it up into core, xml core, and xml
schemas. Is the xml core used without the xml schema?
No, but you have a choice of two schemas jars. You can either use the full
one, or the smaller "common parts" poi-ooxml-schemas one. That's one of
the main reasons for keeping it seperate.
Post by Jeff G
If I were to only need pre-2003 support, it would probably be simpler to
remove the folder for xml classes than what we'd have to do now to try
and break up the applications.
If you want to only do binary formats, you need the main poi jar, and
scratchpad for the non excel formats. You don't need any of the xml jars
(POI or dependencies) if you want to only do the older formats.

Nick
Jochen Wiedmann
2011-05-27 06:35:42 UTC
Permalink
I'm using POI strictly for *reading Excel xls & xlsx *documents.  I'm
using this as part of a Java Web Start app with somewhat low bandwidth. POI
is by far my biggest size hog.  Is there any way I can reduce the size of
this? Are all these libraries needed?  This is what I have...
A rather simple and, to me, very recommendable solution would be to
create a servlet that gets called by the applet and creates the excel
file. That way, you'd need absolutely no additional jar files in the
applet.

Jochen
--
I Am What I Am And That's All What I Yam (Popeye)
Jeff G
2011-06-03 21:31:25 UTC
Permalink
So what about dom4j & xmlbeans? Are these required for xlsx?
Post by Nick Burch
Post by Jeff G
I'm using POI strictly for *reading Excel xls & xlsx *documents. I'm
using this as part of a Java Web Start app with somewhat low bandwidth. POI
is by far my biggest size hog. Is there any way I can reduce the size of
this? Are all these libraries needed? This is what I have...
dom4j, poi, poi-contrib, poi-ooxml, poi-ooxml-schemas, poi-scratchpad,
xmlbeans
If you're just doing excel files, you can ditch poi-scratchpad and
poi-contrib. If you're happy to just work with .xls (not .xlsx), then you
can cut it back to only the main poi jar. If you need to work with .xlsx
files, then you need the xml related jars, the poi-ooxml jar, and the cut
down schemas (poi-ooxml-schemas). You might be able to shrink the
ooxml-schemas file by excluding the word and powerpoint related bits, ditto
cutting out the xwpf and xslf parts of poi-ooxml, not sure how much that'd
save.
Nick
---------------------------------------------------------------------
Nick Burch
2011-06-03 21:40:35 UTC
Permalink
Post by Jeff G
So what about dom4j & xmlbeans? Are these required for xlsx?
Yup, for the xml formats (xlsx, docx and pptx) you need:
* poi
* poi-scratchpad if working with .docx and .pptx
* poi-ooxml
* one of poi-ooxml-schemas or ooxml-schemas
* xmlbeans + it's dependencies (eg dom4j + stax)

Nick
Jeff G
2011-06-08 14:41:26 UTC
Permalink
Has anyone tried ProGuard on POI?
http://proguard.sourceforge.net/

- Jeff
Post by Nick Burch
Post by Jeff G
So what about dom4j & xmlbeans? Are these required for xlsx?
* poi
* poi-scratchpad if working with .docx and .pptx
* poi-ooxml
* one of poi-ooxml-schemas or ooxml-schemas
* xmlbeans + it's dependencies (eg dom4j + stax)
Nick
---------------------------------------------------------------------
Loading...