Discussion:
Number Formatting in HSSF
Frank Prins
2009-03-09 08:14:01 UTC
Permalink
Dear all,



I would like to make sure if I am correct on the number formatting, when
reading from Excel with the HSSF libraries. I am reading an xls file,
created with Excel 2003, and wonder about the possibilities to extract
the numbers from a sheet.

Suppose a user enters a number like '3.1415' in a cell, which has a
number formatting with two decimals, there are several manners of state
of the number has within the file:

- the floating point number which Excel uses internally, could be
something like '3.14150000003'

- the formatted number, shown in Excel, '3.14'

- the original number, '3.1415'



Am I correct in saying, both the first two occurrences can be retrieved
using POI, with:

a) String value = cell.toString();

b) HSSFDataFormatter formatter = new HSSFDataFormatter();

String value = formatter.formatCellValue(cell);

c) HSSFDataFormatter formatter = new HSSFDataFormatter();

double tariffDoubleValue = cell.getNumericCellValue();

String value = formatter.formatRawCellContents(tariffDoubleValue, 0,
"#.####");



Using the last possibility, you need to make the assumption that all
numbers are entered using the format "#,####"



Am I correct in this approach, or is there a possibility to retrieve the
original entered number? Tried to find some an answers in the list
archive and api, but could not find it.



thanks & regards,

Frank
MSB
2009-03-09 15:14:09 UTC
Permalink
Part of your problem can be explained far better by a reply that Josh made to
a similar sort of question that was concerned with floats becoming doubles;

"For clarity, here is the code which seems to reproduce your problem:

HSSFWorkbook wb = new HSSFWorkbook();
wb.createSheet("Sheet1").createRow(0).createCell(0).setCellValue(34.44F); //
F for float literal
OutputStream os = new FileOutputStream("Test34.44.xls");
wb.write(os);
os.close();

In Excel(2007) the cell value displays as 34.4399986267089.

POI uses double (forcing a conversion from float) because that's what Excel
needs.

The root of the problem is that many seemingly simple decimal values cannot
be representable exactly with an IEEE floating point (for example 0.01).
In this example, the float->double conversion seems to be the main source of
the precision loss, as seen with the following code:
System.out.println((double)34.44F); // outputs "34.439998626708984"

It's not just in type conversion that these floating point rounding
limitations can be seen. The following code uses only doubles:
System.out.println(0.05+0.01); // outputs "0.060000000000000005"

Since, POI/Excel uses doubles, doubles have higher precision than floats and
conversions are especially prone to rounding errors, your best bet is to use
doubles as much as possible. This won't eliminate all rounding errors, but
should improve things a lot. "


The reason I quoted that is that you may not be able to get exactly the
number the user entered owing to this sort of problem; but you should be
able to get very close.

Firstly, can I ask why you are extracting the value from the cell as a
String? Typically, you would do something like this, an example I have
copied from the Quick Guide here
http://poi.apache.org/spreadsheet/quick-guide.html#CellContents
http://poi.apache.org/spreadsheet/quick-guide.html#CellContents ;

Sheet sheet1 = wb.getSheetAt(0);
for (Row row : sheet1) {
for (Cell cell : row) {
CellReference cellRef = new CellReference(row.getRowNum(),
cell.getCellNum());
System.out.print(cellRef.formatAsString());
System.out.print(" - ");

switch(cell.getCellType()) {
case Cell.CELL_TYPE_STRING:

System.out.println(cell.getRichStringCellValue().getString());
break;
case Cell.CELL_TYPE_NUMERIC:
if(DateUtil.isCellDateFormatted(cell)) {

System.out.println(cell.getDateCellValue());
} else {

System.out.println(cell.getNumericCellValue());
}
break;
case Cell.CELL_TYPE_BOOLEAN:
System.out.println(cell.getBooleanCellValue());
break;
case Cell.CELL_TYPE_FORMULA:
System.out.println(cell.getCellFormula());
break;
default:
System.out.println();
}
}
}

As you can see, the basic process is to test the cell's type and call the
appropriate method to retrieve the contents as a double, String, etc. Excel
stores - at least I believe it does - numeric values as doubles, so this is
likely to get you as close as possible to the value actually stored in the
cell. It could be though that this is not the exact value the user entered
owing to the issues Josh outlined. I would guess the only way to be certain
to get excatly the value the user entered would be to use numeric String
literal values throughout.

Turning now to seeing the formatted value of the cell, you should not have
to provide the formatting String. It should be possible to call the
formatCellValue() method of the DataFormater class directly, passing the
reference to the cell and receiving a String containing the cell's contents
correctly formatted. I am not at my own machine so cannot test this piece of
code but it should work;

// Assuming that you have opened a Workbook, got the reference to a sheet,
row and then
// a cell, storing the latter into a variable called cell you should be able
to do something like this;

org.apache.poi.ss.usermodel.DataFormatter formatter = new
org.apache.poi.ss.usermodel.DataFormatter();
System.out.println(formatter.formatCellValue(cell));

That method call should use the data format that was applied to the cell
when it was added to the sheet. It means you do not need to supply the data
format String.

If this code fails, please let me know. I will be home later and can try to
put something better together. We are an Office free zone at work now -
thankfully - and so I cannot test anything here with absolute confidence,
sorry.
Post by Frank Prins
Dear all,
I would like to make sure if I am correct on the number formatting, when
reading from Excel with the HSSF libraries. I am reading an xls file,
created with Excel 2003, and wonder about the possibilities to extract
the numbers from a sheet.
Suppose a user enters a number like '3.1415' in a cell, which has a
number formatting with two decimals, there are several manners of state
- the floating point number which Excel uses internally, could be
something like '3.14150000003'
- the formatted number, shown in Excel, '3.14'
- the original number, '3.1415'
Am I correct in saying, both the first two occurrences can be retrieved
a) String value = cell.toString();
b) HSSFDataFormatter formatter = new HSSFDataFormatter();
String value = formatter.formatCellValue(cell);
c) HSSFDataFormatter formatter = new HSSFDataFormatter();
double tariffDoubleValue = cell.getNumericCellValue();
String value = formatter.formatRawCellContents(tariffDoubleValue, 0,
"#.####");
Using the last possibility, you need to make the assumption that all
numbers are entered using the format "#,####"
Am I correct in this approach, or is there a possibility to retrieve the
original entered number? Tried to find some an answers in the list
archive and api, but could not find it.
thanks & regards,
Frank
--
View this message in context: http://www.nabble.com/Number-Formatting-in-HSSF-tp22408718p22413448.html
Sent from the POI - User mailing list archive at Nabble.com.
Frank Prins
2009-03-10 08:59:42 UTC
Permalink
Hello Mark,

Thank you for your extensive answer, it is getting a bit more clear for
me now, although I run into difficulties with not being able to
duplicate the problem.

It definitively has to do with the float to double conversion here.
Using getNumericCellValue() sometimes returns me a value not being
exactly the value entered by the user. For example, I am looking at a
field in which the user entered the value 7,8268
This looks like a number with four decimals, but somehow when using
getNumericCellValue() it shows up as being 7.8267999999999995. Strange
thing is I cannot duplicate this, when typing the number in myself, it
keeps showing up like 7.8268, so correct.
Anyhow, I experimented a bit more with the retrieval of the value, and
found out I can overcome the above issue when casting to a float, like:

double doubleValue = cell.getNumericCellValue();
// format the number as floating point:
value = String.valueOf((float) doubleValue);

In this case I get exactly what the user originally used as input.
BTW, in my code, I indeed do a check on celltype. I tried to copy just
the part of code necessary for the issue I had...
Oh, and getting the value as a string.... I am writing the values to an
XML file, so that's why I took it as a String, no need for further math
here.

Thanks,
Frank

PS I can only dream of an Office-less office :-)
MSB
2009-03-10 16:18:13 UTC
Permalink
Glad to know that all of that was of some help, must admit I worried after
reading it back.

If you do ultimately want the number as a String, have you tried getting the
cell value as a double and then formatting it using the formatCellValue()
method yet? You may be lucky and find that the format set for the cell
manages to circumvent some of your problems and of course the
formatCellValue() method returns a String ready for inclusion into the xml
file. The only catch that I can think of is if the user has set a format
that includes - for example - five digits after the decimal point and then
they enter a floating point value with just four digits following the
decimal point.

Getting rid of Office has been a double edged sword in many ways - you would
be surprised how 'exercised' users can be when an operation they performed
with Excel is simply not supported by OpenOffice Calc; just as an example,
you can select many different rows in Excel and delete the lot, Calc will
not alow you to delete a series of non contiguous rows. Lots of fun.
Post by Frank Prins
Hello Mark,
Thank you for your extensive answer, it is getting a bit more clear for
me now, although I run into difficulties with not being able to
duplicate the problem.
It definitively has to do with the float to double conversion here.
Using getNumericCellValue() sometimes returns me a value not being
exactly the value entered by the user. For example, I am looking at a
field in which the user entered the value 7,8268
This looks like a number with four decimals, but somehow when using
getNumericCellValue() it shows up as being 7.8267999999999995. Strange
thing is I cannot duplicate this, when typing the number in myself, it
keeps showing up like 7.8268, so correct.
Anyhow, I experimented a bit more with the retrieval of the value, and
double doubleValue = cell.getNumericCellValue();
value = String.valueOf((float) doubleValue);
In this case I get exactly what the user originally used as input.
BTW, in my code, I indeed do a check on celltype. I tried to copy just
the part of code necessary for the issue I had...
Oh, and getting the value as a string.... I am writing the values to an
XML file, so that's why I took it as a String, no need for further math
here.
Thanks,
Frank
PS I can only dream of an Office-less office :-)
---------------------------------------------------------------------
--
View this message in context: http://www.nabble.com/Number-Formatting-in-HSSF-tp22408718p22437871.html
Sent from the POI - User mailing list archive at Nabble.com.
Frank Prins
2009-03-11 07:17:29 UTC
Permalink
Hey Mark,

Don't be worried, it all seems to work, at least enough to go on... One
thing I do not understand, is what you mean with getting the value as a
double, and then formatting it?
The formatCellValue works on the cell itself, not on a double as far as
I know?
Anyay, the code as printed in my last comment seem to work.

Thanks again for your time, regards,
frank
-----Original Message-----
Sent: dinsdag 10 maart 2009 17:18
Subject: RE: Number Formatting in HSSF
Glad to know that all of that was of some help, must admit I worried after
reading it back.
If you do ultimately want the number as a String, have you tried getting the
cell value as a double and then formatting it using the
formatCellValue()
method yet? You may be lucky and find that the format set for the cell
manages to circumvent some of your problems and of course the
formatCellValue() method returns a String ready for inclusion into the xml
file. The only catch that I can think of is if the user has set a format
that includes - for example - five digits after the decimal point and then
they enter a floating point value with just four digits following the
decimal point.
Getting rid of Office has been a double edged sword in many ways - you would
be surprised how 'exercised' users can be when an operation they performed
with Excel is simply not supported by OpenOffice Calc; just as an example,
you can select many different rows in Excel and delete the lot, Calc will
not alow you to delete a series of non contiguous rows. Lots of fun.
Post by Frank Prins
Hello Mark,
Thank you for your extensive answer, it is getting a bit more clear
for
Post by Frank Prins
me now, although I run into difficulties with not being able to
duplicate the problem.
It definitively has to do with the float to double conversion here.
Using getNumericCellValue() sometimes returns me a value not being
exactly the value entered by the user. For example, I am looking at a
field in which the user entered the value 7,8268
This looks like a number with four decimals, but somehow when using
getNumericCellValue() it shows up as being 7.8267999999999995.
Strange
Post by Frank Prins
thing is I cannot duplicate this, when typing the number in myself,
it
Post by Frank Prins
keeps showing up like 7.8268, so correct.
Anyhow, I experimented a bit more with the retrieval of the value,
and
Post by Frank Prins
found out I can overcome the above issue when casting to a float,
double doubleValue = cell.getNumericCellValue();
value = String.valueOf((float) doubleValue);
In this case I get exactly what the user originally used as input.
BTW, in my code, I indeed do a check on celltype. I tried to copy
just
Post by Frank Prins
the part of code necessary for the issue I had...
Oh, and getting the value as a string.... I am writing the values to
an
Post by Frank Prins
XML file, so that's why I took it as a String, no need for further
math
Post by Frank Prins
here.
Thanks,
Frank
PS I can only dream of an Office-less office :-)
---------------------------------------------------------------------
--
View this message in context: http://www.nabble.com/Number-Formatting-
in-HSSF-tp22408718p22437871.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
MSB
2009-03-11 07:43:06 UTC
Permalink
Sorry Frank, terminal brain fade. You are quite correct, it does work on the
cell and I have no idea what I was thinking about when I mentioned the
double value!!
Post by Frank Prins
Hey Mark,
Don't be worried, it all seems to work, at least enough to go on... One
thing I do not understand, is what you mean with getting the value as a
double, and then formatting it?
The formatCellValue works on the cell itself, not on a double as far as
I know?
Anyay, the code as printed in my last comment seem to work.
Thanks again for your time, regards,
frank
-----Original Message-----
Sent: dinsdag 10 maart 2009 17:18
Subject: RE: Number Formatting in HSSF
Glad to know that all of that was of some help, must admit I worried after
reading it back.
If you do ultimately want the number as a String, have you tried getting the
cell value as a double and then formatting it using the
formatCellValue()
method yet? You may be lucky and find that the format set for the cell
manages to circumvent some of your problems and of course the
formatCellValue() method returns a String ready for inclusion into the xml
file. The only catch that I can think of is if the user has set a format
that includes - for example - five digits after the decimal point and then
they enter a floating point value with just four digits following the
decimal point.
Getting rid of Office has been a double edged sword in many ways - you would
be surprised how 'exercised' users can be when an operation they performed
with Excel is simply not supported by OpenOffice Calc; just as an example,
you can select many different rows in Excel and delete the lot, Calc will
not alow you to delete a series of non contiguous rows. Lots of fun.
Post by Frank Prins
Hello Mark,
Thank you for your extensive answer, it is getting a bit more clear
for
Post by Frank Prins
me now, although I run into difficulties with not being able to
duplicate the problem.
It definitively has to do with the float to double conversion here.
Using getNumericCellValue() sometimes returns me a value not being
exactly the value entered by the user. For example, I am looking at
a
Post by Frank Prins
field in which the user entered the value 7,8268
This looks like a number with four decimals, but somehow when using
getNumericCellValue() it shows up as being 7.8267999999999995.
Strange
Post by Frank Prins
thing is I cannot duplicate this, when typing the number in myself,
it
Post by Frank Prins
keeps showing up like 7.8268, so correct.
Anyhow, I experimented a bit more with the retrieval of the value,
and
Post by Frank Prins
found out I can overcome the above issue when casting to a float,
double doubleValue = cell.getNumericCellValue();
value = String.valueOf((float) doubleValue);
In this case I get exactly what the user originally used as input.
BTW, in my code, I indeed do a check on celltype. I tried to copy
just
Post by Frank Prins
the part of code necessary for the issue I had...
Oh, and getting the value as a string.... I am writing the values to
an
Post by Frank Prins
XML file, so that's why I took it as a String, no need for further
math
Post by Frank Prins
here.
Thanks,
Frank
PS I can only dream of an Office-less office :-)
---------------------------------------------------------------------
--
View this message in context: http://www.nabble.com/Number-Formatting-
in-HSSF-tp22408718p22437871.html
Sent from the POI - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
---------------------------------------------------------------------
--
View this message in context: http://www.nabble.com/Number-Formatting-in-HSSF-tp22408718p22450107.html
Sent from the POI - User mailing list archive at Nabble.com.
Loading...