Discussion:
POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"
Jörn Franke
2018-09-29 21:54:16 UTC
Permalink
Dear all,

as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the functionality
to read office documents, such as MS Excel, on Big Data platforms, such as
Hadoop/Hive/Spark/Flink.

I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).

Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.

Do you have any idea on how I can deal with this check? Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....


Thank you.

Best regards
Nick Burch
2018-09-29 22:43:39 UTC
Permalink
Post by Jörn Franke
as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the
functionality to read office documents, such as MS Excel, on Big Data
platforms, such as Hadoop/Hive/Spark/Flink.
We should probably list that on the website! Do you have a few paragraph
blurb we can use?
Post by Jörn Franke
I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).
We need that for security reasons - newer Java versions won't let us
protect against zip bomb attacks as they inconveniently hide the expansion
stats, so we had to switch to commons to guard against it.
Post by Jörn Franke
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible.
Wave some CVEs at them and see if you can tempt an upgrade?

If not, you'd probably need to work with the commons folks to backport the
zip stats stuff to your old version, so you can keep the security stuff we
need? ***@commons is moderately quiet and fairly friendly :)

Nick
Jörn Franke
2018-09-29 23:31:37 UTC
Permalink
Hi Nick,

thank you for the quick response. It is already on the POI web page. I
fully agree with you that we should always use the latest version with
security fixes (I already started to file bugs with some of the platforms).
With dependency shading this is possible in my case. The developers/users
will need to shade the dependencies for their application, but I provide
examples, so it is not such a big issue to change.

best regards
Post by Nick Burch
Post by Jörn Franke
as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the
functionality to read office documents, such as MS Excel, on Big Data
platforms, such as Hadoop/Hive/Spark/Flink.
We should probably list that on the website! Do you have a few paragraph
blurb we can use?
Post by Jörn Franke
I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always
running
Post by Jörn Franke
into the exception in ZipArchiveThresholdInputStream "InputStream of
class
Post by Jörn Franke
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
Post by Jörn Franke
).
We need that for security reasons - newer Java versions won't let us
protect against zip bomb attacks as they inconveniently hide the expansion
stats, so we had to switch to commons to guard against it.
Post by Jörn Franke
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible.
Wave some CVEs at them and see if you can tempt an upgrade?
If not, you'd probably need to work with the commons folks to backport the
zip stats stuff to your old version, so you can keep the security stuff we
Nick
---------------------------------------------------------------------
Jörn Franke
2018-09-29 23:14:36 UTC
Permalink
Don't worry, I guess it was too late in the evening. I simply shade the
dependency to commons-compress and everything seems to work (and I still
can keep the POI integrated security mechanisms). Thanks btw. for 4.0.0
Post by Jörn Franke
Dear all,
as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the
functionality to read office documents, such as MS Excel, on Big Data
platforms, such as Hadoop/Hive/Spark/Flink.
I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.
Do you have any idea on how I can deal with this check? Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....
Thank you.
Best regards
pj.fanning
2018-09-29 23:50:15 UTC
Permalink
I just logged https://issues.apache.org/jira/browse/HADOOP-15804 but it can
take a long time for upgrades in hadoop dependencies due to the large number
of projects and the complex relationships between them.



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@poi.apache.org
For additional commands, e-mail: user-***@poi.apache.org

Loading...