POI 4.0.0 issues with new commons-compress library "InputStream of class [..] is not implementing InputStreamStatistics"

Discussion:

Jörn Franke

2018-09-29 21:54:16 UTC

Permalink

Dear all,

as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the functionality
to read office documents, such as MS Excel, on Big Data platforms, such as
Hadoop/Hive/Spark/Flink.

I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).

Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.

Do you have any idea on how I can deal with this check? Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....

Thank you.

Best regards

Nick Burch

2018-09-29 22:43:39 UTC

Permalink

Post by JÃ¶rn Franke
as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the
functionality to read office documents, such as MS Excel, on Big Data
platforms, such as Hadoop/Hive/Spark/Flink.

We should probably list that on the website! Do you have a few paragraph
blurb we can use?

Post by JÃ¶rn Franke
I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).

We need that for security reasons - newer Java versions won't let us
protect against zip bomb attacks as they inconveniently hide the expansion
stats, so we had to switch to commons to guard against it.

Post by JÃ¶rn Franke
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible.

Wave some CVEs at them and see if you can tempt an upgrade?

If not, you'd probably need to work with the commons folks to backport the
zip stats stuff to your old version, so you can keep the security stuff we
need? ***@commons is moderately quiet and fairly friendly :)

Nick

Jörn Franke

2018-09-29 23:31:37 UTC

Permalink

Hi Nick,

thank you for the quick response. It is already on the POI web page. I
fully agree with you that we should always use the latest version with
security fixes (I already started to file bugs with some of the platforms).
With dependency shading this is possible in my case. The developers/users
will need to shade the dependencies for their application, but I provide
examples, so it is not such a big issue to change.

best regards

Post by Nick Burch

We should probably list that on the website! Do you have a few paragraph
blurb we can use?

running

Post by JÃ¶rn Franke
into the exception in ZipArchiveThresholdInputStream "InputStream of

class

Post by JÃ¶rn Franke
[..] is not implementing InputStreamStatistics" (

https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789

Post by JÃ¶rn Franke
).

Post by JÃ¶rn Franke
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible.

Wave some CVEs at them and see if you can tempt an upgrade?
If not, you'd probably need to work with the commons folks to backport the
zip stats stuff to your old version, so you can keep the security stuff we
Nick
---------------------------------------------------------------------

Jörn Franke

2018-09-29 23:14:36 UTC

Permalink

Don't worry, I guess it was too late in the evening. I simply shade the
dependency to commons-compress and everything seems to work (and I still
can keep the POI integrated security mechanisms). Thanks btw. for 4.0.0

Post by JÃ¶rn Franke
Dear all,
as part of the HadoopOffice library (
https://github.com/zuinnote/hadoopoffice/wiki) we provide the
functionality to read office documents, such as MS Excel, on Big Data
platforms, such as Hadoop/Hive/Spark/Flink.
I want to release a new version supporting POI 4.0.0, but I have one
remaining blocking issue: The Big Data platforms use an old version of
commons-compress (between 1.4.x and 1.9.x). This means I am always running
into the exception in ZipArchiveThresholdInputStream "InputStream of class
[..] is not implementing InputStreamStatistics" (
https://svn.apache.org/viewvc/poi/trunk/src/ooxml/java/org/apache/poi/openxml4j/util/ZipArchiveThresholdInputStream.java?view=markup&pathrev=1832789
).
Unfortunately, updating these platforms to the latest commons-compress is
very intrusive and for many organizations not possible. I need now to find
a workaround for this. Alternative classpath settings are not working very
well and create another mess.
Do you have any idea on how I can deal with this check? Can I inject
somehow InputStreamStatistics in my InputStream? Or can I somehow inject my
own ZipArchiveInputStream?
Alternatively, could Apache POI instead of using ZipArchiveInputStream
create another class POIZipArchiveInputStream and let this custom class
extend ArchiveInputStream and implement InputStreamStatistics? This would
remove all my classpath issues with the Big Data platforms ....
Thank you.
Best regards

pj.fanning

2018-09-29 23:50:15 UTC

Permalink

I just logged https://issues.apache.org/jira/browse/HADOOP-15804 but it can
take a long time for upgrades in hadoop dependencies due to the large number
of projects and the complex relationships between them.

--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@poi.apache.org
For additional commands, e-mail: user-***@poi.apache.org