Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Binary file IO: Converting imported sequences of chars to desired type
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 51 - 67 of 67 - Collapse all  -  Translate all to Translated (View all originals) < Older 
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Jorgen Grahn  
View profile  
 More options Nov 5, 9:17 am
Newsgroups: comp.lang.c++
From: Jorgen Grahn <grahn+n...@snipabacken.se>
Date: 4 Nov 2009 21:47:48 GMT
Local: Thurs, Nov 5 2009 9:17 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type

I disagree there, in two ways:

- I belong to the school that claims protocols should be human-readable,
  because, well, it opens them up.  They get so much easier to
  manipulate, and even talk about.  Take HTTP as an example, or SMTP.

- I doubt that programmers are that good with hex.  Even if I limit
  myself to unsigned int, I can't tell what 0xbabe is.  Probably 40000
  or so. Or 30000?  Who knows?  There is a reason decimal is the default
  base in pretty much every language I know of ... including assembly
  languages.

...

> Since what we're talking about is only relevant for huge amounts of
> data, doing anything more with that data than just a cursory look at
> some numbers (which IMO is fine in octal or hex) generally needs a
> program anyway.

But for the text version of the data, that "program" is often a Unix
pipeline involving tools like grep, sort and uniq, or a Perl one-liner
you make up as you go.  Or it can be fed directly into gnuplot or
Excel. If the data is binary, you probably simply won't bother.

I think we have been misled a bit here, too. I haven't read the whole
thread, but it started with something like "dump a huge array of
floats to disk, collect it later".  If you take the more common case
"take this huge complex data structure and dump it to disk in a
portable format", you have a completely different situation, where the
non-text format isn't that much smaller or faster.

/Jorgen

--
  // Jorgen Grahn <grahn@  Oo  o.   .  .
\X/     snipabacken.se>   O  o   .


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian  
View profile  
 More options Nov 6, 11:06 am
Newsgroups: comp.lang.c++
From: Brian <c...@mailvault.com>
Date: Thu, 5 Nov 2009 15:36:04 -0800 (PST)
Local: Fri, Nov 6 2009 11:06 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 4, 3:47 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

I guess you're saying that the results are closer in some
cases because there's a lot of non-numeric data involved
in those complex data structures.  But aren't you ignoring
scientific applications where the majority of the data is
numeric?

Much earlier in the thread, Allnor wrote, "Binary files
are usually about 20%-70% of the size of the text file,
depending on numbers of significant digits and other
formatting text glyphs."  I don't think anyone has
directly disagreed with that statement yet.

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net

"How much better is it to get wisdom than gold! and to
get understanding rather to chosen than silver!"
Proverbs 16:16


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 6, 8:33 pm
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Fri, 6 Nov 2009 01:03:33 -0800 (PST)
Local: Fri, Nov 6 2009 8:33 pm
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 5, 11:36 pm, Brian <c...@mailvault.com> wrote:

> On Nov 4, 3:47 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:

    [...]

> > I think we have been misled a bit here, too. I haven't read
> > the whole thread, but it started with something like "dump a
> > huge array of floats to disk, collect it later".  If you
> > take the more common case "take this huge complex data
> > structure and dump it to disk in a portable format", you
> > have a completely different situation, where the non-text
> > format isn't that much smaller or faster.
> I guess you're saying that the results are closer in some
> cases because there's a lot of non-numeric data involved in
> those complex data structures.  But aren't you ignoring
> scientific applications where the majority of the data is
> numeric?

He spoke of the "more common case".  Certainly, most common
cases do include a lot of text data.  On the other hand, the
origine of this thread was dumping doubles: purely numeric data.
And while perhaps less common, they do exist, and aren't really
rare either.  (I've encountered them once or twice in my career,
and I'm not a numerics specialist.)

> Much earlier in the thread, Allnor wrote, "Binary files
> are usually about 20%-70% of the size of the text file,
> depending on numbers of significant digits and other
> formatting text glyphs."  I don't think anyone has
> directly disagreed with that statement yet.

The original requirement, if I remember correctly, included
rereading the data with no loss of precision.  This means 17
digits precision for an IEEE double, with an added sign, decimal
point and four or five characters for the exponent (using
scientific notation).  Add a separator, and that's 24 or 25
bytes, rather than 8.  So the 20% is off; 33% seems to be the
lower limit.  But in a lot of cases, that's a lot; it's
certainly something that has to be considered in some
applications.

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rune Allnor  
View profile  
 More options Nov 7, 4:21 am
Newsgroups: comp.lang.c++
From: Rune Allnor <all...@tele.ntnu.no>
Date: Fri, 6 Nov 2009 08:51:03 -0800 (PST)
Local: Sat, Nov 7 2009 4:21 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On 6 Nov, 10:03, James Kanze <james.ka...@gmail.com> wrote:

As I recall, I started by a purely technical question about
binary typecasts. Others started bringing in text formats.
I have only attempted to explain - in vain, it seems - why
text-based numerical formats is a no-go in technical
applications.

> Certainly, most common
> cases do include a lot of text data.

I am not talking about 'common' cases. I am talking about heavy-duty
work. Once you are talking about numeric data in the hundreds of
MBytes
(regardless of the storage format), any amount of accompagnying text
is irrelevant. One page of plain text takes about 2 kbytes.

There was, in fact, an 'improvment' to the ancient SEG-Y seismic
data format,

http://en.wikipedia.org/wiki/SEG_Y

the SEG-2,

http://diwww.epfl.ch/lami/detec/seg2.html

where a lot of the auxillary (numeric) information was specificed
to be stored on text format. I first saw the SEG-2 spec about ten
years ago, but I have never heard that it has actually been used.
The speed losses involved with converting data back and forth from
text to binary would fully explain why SEG-2 does not gain wide-
spread acceptence among the heavy-duty users.

Rune


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian  
View profile  
 More options Nov 7, 7:24 am
Newsgroups: comp.lang.c++
From: Brian <c...@mailvault.com>
Date: Fri, 6 Nov 2009 11:54:01 -0800 (PST)
Local: Sat, Nov 7 2009 7:24 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 6, 3:03 am, James Kanze <james.ka...@gmail.com> wrote:

I've worked on one scientific application for a little over
six months.  I hope to work with/on more scientific projects
in the future.

Yes.  I brought it up because I wasn't sure if Grahn was
agreeing with something Fiedler said about it being just a few
more bytes.  Even if it were 70% I wouldn't describe that as
a minor difference.

Brian Wood
http://www.webEbenezer.net


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 9, 1:57 am
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Sun, 8 Nov 2009 06:27:41 -0800 (PST)
Local: Mon, Nov 9 2009 1:57 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 6, 5:51 pm, Rune Allnor <all...@tele.ntnu.no> wrote:

Which, of course, raises the question as to why.  They're not
very useful unless you're doing exceptionally low level work.

> Others started bringing in text formats.

The original comment was just that---a parenthetical comment.
Text formats have many advantages, WHEN you can use them.  It's
also obvious that they have additional overhead---not nearly as
much as you claimed in terms of CPU, but they aren't free
either, neither in CPU time nor in data size.

> I have only attempted to explain - in vain, it seems - why
> text-based numerical formats is a no-go in technical
> applications.

And you blew it by giving exagerated figures:-).  Other than
that: they're not a no-go in technical applications.  They do
have too much overhead for some applications (not all), and in
such cases, you have to use a binary format.  Depending on other
requirements (portability, external requirements, etc.), you may
need a more or less complicated binary format.

> > Certainly, most common cases do include a lot of text data.
> I am not talking about 'common' cases. I am talking about
> heavy-duty work. Once you are talking about numeric data in
> the hundreds of MBytes (regardless of the storage format), any
> amount of accompagnying text is irrelevant. One page of plain
> text takes about 2 kbytes.

Yes.  I understand that.

In fact, now that you've mentionned seismic data, I agree that a
text format is probably not going to cut it.  I've actually
worked on one project in the field, and I know just how much
floating point data they can generate.

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rune Allnor  
View profile  
 More options Nov 9, 4:41 am
Newsgroups: comp.lang.c++
From: Rune Allnor <all...@tele.ntnu.no>
Date: Sun, 8 Nov 2009 09:11:23 -0800 (PST)
Local: Mon, Nov 9 2009 4:41 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On 8 Nov, 15:27, James Kanze <james.ka...@gmail.com> wrote:

I'm getting tired with re-iterating this for people who
are not interested in actually evaluating the numbers.

Look for an upcomimg post on comp.lang.c++.moderated, where
I distill the problem statement a bit, as well as present
a C++ test to see what kind of timing ratios I am talking about.

Rune


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Wood  
View profile  
 More options Nov 9, 9:45 am
Newsgroups: comp.lang.c++
From: Brian Wood <woodbria...@gmail.com>
Date: Sun, 8 Nov 2009 14:15:48 -0800 (PST)
Local: Mon, Nov 9 2009 9:45 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 8, 11:11 am, Rune Allnor <all...@tele.ntnu.no> wrote:

> On 8 Nov, 15:27, James Kanze <james.ka...@gmail.com> wrote:

> I'm getting tired with re-iterating this for people who
> are not interested in actually evaluating the numbers.

> Look for an upcomimg post on comp.lang.c++.moderated, where
> I distill the problem statement a bit, as well as present
> a C++ test to see what kind of timing ratios I am talking about.

> Rune

I took the liberty of copying your post from clc++m to here
as this newsgroup is faster as far as getting the posts out
there.

Hi all.

A couple of weeks ago I posted a question on comp.lang.c++ about some
technicality
about binary file IO. Over the course of the discussion, I discovered
to my
amazement - and, quite frankly, horror - that there seems to be a
school of
thought that text-based storage formats are universally preferable to
binary text
formats for reasons of portability and human readability.

The people who presented such ideas appeared not to appreciate two
details that
counter any benefits text-based numerical formats might offer:

1) Binary files are about 70-20% of the file size of the text files,
depending
   on the number of significant digits stored in the text files and
other
   formatting text glyphs.
2) Text-formatted numerical data take significantly longer to read and
write
   than binary formats.

Timings are difficult to compare, since the exact numbers depend on
buffering
strategies, buffer sizes, disk speeds, network bandwidths and so on.

I have therefore sketched a 'distilled' test (code below) to test what
overheads
are involved with formatting numerical data back and forth between
text and
binary formats. To eliminate the impact of peripherical devices, I
have used
a std::stringstream to store the data. The binary bufferes are
represented
by vectors, and I have assumed that a memcpy from the file buffer to
the
destination memory location is all that is needed to import the binary
format
from the file buffer. (If there are significant run-time overheads
associated with
moving NATIVE binary formats to the destination, please let me
know.)

The output on my computer is (do note the _different_ numbers of IO
cycles in the two cases!):

Sun Nov 08 19:48:54 2009 : Binary IO cycles started
Sun Nov 08 19:49:00 2009 : 1000 Binary IO cycles completed
Sun Nov 08 19:49:00 2009 : Text-format IO cycles started
Sun Nov 08 19:49:16 2009 : 100 Text-format IO cycles completed

A little bit of math produces *average*, *crude* numbers for IO
cycles:

Text:    6 seconds / (1000 * 1e6) read/write cycles =   6e-9 s per r/w
cycle
Binary: 16 seconds / (100  * 1e6) read/write cycles = 160e-9 s per r/w
cycle

which in turn means there is an overhead on the order of of
160e-9/6e-9 = 26x
associated with the text formats.

Add a little bit of other overheads, e.g. caused by the significantly
larger text file sizes in combination with suboptimal buffering
strategies,
and the relative numbers easily hit the triple digits. Not at all
insignificant when one works with large amounts of data under tight
deadlines.

So please: Shoot this demo down! Give it your best, and prove me
and my numbers wrong.

And to the textbook authors who might be lurking: Please include a
chapter on relative binary and text-based IO speeds in your upcoming
editions. Binary file formats might not fit into your overall
philosophies about human readability and universal portability of C++
code, but some of your readers might appreciate being made aware of
such practical details.

Rune

/
*************************************************************************** /
#include <iostream>
#include <sstream>
#include <time.h>
#include <vector>

int main()
{
        const size_t  NumElements = 1000000;
        std::vector<double> SourceBuffer;
        std::vector<double> DestinationBuffer;

        for (size_t n=0;n<NumElements;++n)
        {
                SourceBuffer.push_back(n);
                DestinationBuffer.push_back(0);
        }

        time_t rawtime;
        struct tm * timeinfo;

        time( &rawtime );
        timeinfo = localtime( & rawtime );
        std::string message( asctime (timeinfo) );
        message.erase(message.size()-1);

        std::cout  << message.c_str() << " : Binary IO cycles started"
                    << std::endl;

        size_t NumBinaryIOCycles = 1000;
        for (size_t n = 0; n < NumBinaryIOCycles; ++n)
        {
                for (size_t m = 0; m<NumElements; ++m )
                {
                        DestinationBuffer[m] = SourceBuffer[m];
                }
        }

        time( &rawtime );
        timeinfo = localtime( & rawtime );
        message=std::string( asctime (timeinfo) );
        message.erase(message.size()-1);

        std::cout << message.c_str() << " : " << NumBinaryIOCycles
                << " Binary IO cycles completed " << std:: endl;

        std::stringstream ss;
        const size_t NumTextFormatIOCycles = 100;

        time( &rawtime );
        timeinfo = localtime( & rawtime );
        message=std::string( asctime (timeinfo) );
        message.erase(message.size()-1);

        std::cout  << message.c_str() << " : Text-format IO cycles
started"
                   << std::endl;

        for (size_t n = 0; n < NumTextFormatIOCycles; ++n)
        {
                size_t m;
                for (m = 0; m < NumElements; ++m)
                        ss << SourceBuffer[m];

                m = 0;
                while(!ss.eof())
                {
                        ss >> DestinationBuffer[m];
                        ++m;
                }
        }

        time( &rawtime );
        timeinfo = localtime( & rawtime );
        message=std::string( asctime (timeinfo) );
        message.erase(message.size()-1);

        std::cout << message.c_str() << " : " << NumTextFormatIOCycles
                << " Text-format IO cycles completed " << std:: endl;

        return 0;

}

Brian Wood

    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Wood  
View profile  
 More options Nov 9, 10:14 am
Newsgroups: comp.lang.c++
From: Brian Wood <woodbria...@gmail.com>
Date: Sun, 8 Nov 2009 14:44:35 -0800 (PST)
Local: Mon, Nov 9 2009 10:14 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 8, 4:15 pm, Brian Wood <woodbria...@gmail.com> wrote:

That seems to me an inaccurate description of this thread.
Kanze has pointed out the strengths of text formats, but
has also noted that there are times when binary formats
are needed.  Who has been saying that text formats are
"universally preferable" to binary formats?

Brian Wood


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 9, 12:40 pm
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Sun, 8 Nov 2009 17:10:58 -0800 (PST)
Local: Mon, Nov 9 2009 12:40 pm
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 8, 6:11 pm, Rune Allnor <all...@tele.ntnu.no> wrote:

> On 8 Nov, 15:27, James Kanze <james.ka...@gmail.com> wrote:
> I'm getting tired with re-iterating this for people who
> are not interested in actually evaluating the numbers.

I actually did some measures, to check the numbers.  Your
numbers were wrong.  More to the point, actual numbers will vary
enormously from one implemenation to the next.

> Look for an upcomimg post on comp.lang.c++.moderated,

Not every one reads that group.  Not everyone agrees with its
moderation policy (as currently practiced).

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 9, 12:44 pm
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Sun, 8 Nov 2009 17:14:42 -0800 (PST)
Local: Mon, Nov 9 2009 12:44 pm
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 8, 11:44 pm, Brian Wood <woodbria...@gmail.com> wrote:

> On Nov 8, 4:15 pm, Brian Wood <woodbria...@gmail.com> wrote:
> > On Nov 8, 11:11 am, Rune Allnor <all...@tele.ntnu.no> wrote:

    [...]

> > A couple of weeks ago I posted a question on comp.lang.c++
> > about some technicality about binary file IO. Over the
> > course of the discussion, I discovered to my amazement -
> > and, quite frankly, horror - that there seems to be a school
> > of thought that text-based storage formats are universally
> > preferable to binary text formats for reasons of portability
> > and human readability.
> That seems to me an inaccurate description of this thread.
> Kanze has pointed out the strengths of text formats, but
> has also noted that there are times when binary formats
> are needed.  Who has been saying that text formats are
> "universally preferable" to binary formats?

I think he missed a "when possible", or something similar.
Binary formats are an optimization: you sometimes need this
optimization (and you certainly should be aware of the
possibility of using it), but you don't use them unless timing
or data size constraints make it necessary.

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alf P. Steinbach  
View profile  
 More options Nov 9, 4:36 pm
Newsgroups: comp.lang.c++
From: "Alf P. Steinbach" <al...@start.no>
Date: Mon, 09 Nov 2009 06:06:09 +0100
Local: Mon, Nov 9 2009 4:36 pm
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
* James Kanze:

> On Nov 8, 6:11 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
>> On 8 Nov, 15:27, James Kanze <james.ka...@gmail.com> wrote:

>> I'm getting tired with re-iterating this for people who
>> are not interested in actually evaluating the numbers.

> I actually did some measures, to check the numbers.  Your
> numbers were wrong.  More to the point, actual numbers will vary
> enormously from one implemenation to the next.

>> Look for an upcomimg post on comp.lang.c++.moderated,

> Not every one reads that group.  Not everyone agrees with its
> moderation policy (as currently practiced).

Would you care to elaborate on that hinting, please.

Cheers,

- Alf


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Rune Allnor  
View profile  
 More options Nov 9, 10:27 pm
Newsgroups: comp.lang.c++
From: Rune Allnor <all...@tele.ntnu.no>
Date: Mon, 9 Nov 2009 02:57:48 -0800 (PST)
Local: Mon, Nov 9 2009 10:27 pm
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On 9 Nov, 02:14, James Kanze <james.ka...@gmail.com> wrote:

*You* are accusing *me* of missing the fine print??!!

Let's see what I have written. From my post

http://groups.google.no/group/comp.lang.c++/msg/1c4004bbac86a046

[RA] > > File I/O operations with text-formatted floating-point data
     > > take time. A *lot* of time.

[JK] > A lot of time compared to what?

[RA] Wall clock time. Relative time, compared to dumping
     binary data to disk. Any way you want.

     ...
[RA] > > The rule-of-thumb is 30-60 seconds per 100 MBytes of
     > > text-formatted FP numeric data, compared to fractions of a
     > > second for the same data (natively) binary encoded (just try
     > > it).

[JK] > Try it on what machine:-).

[RA] Any machine. The problem is to decode text-formatted numbers
     to binary.

     ...
     Here is a test I wrote in matlab a few years ago, to demonstrate
     the problem (WinXP, 2.4GHz, no idea about disk):

     [matlab  code snipped]

     Output:
     ------------------------------------
     Wrote ASCII data in 24.0469 seconds
     Read ASCII data in 42.2031 seconds
     Wrote binary data in 0.10938 seconds
     Read binary data in 0.32813 seconds
     ------------------------------------

     Binary writes are 24.0/0.1 = 240x faster than text write.
     Binary reads are 42.2/0.32 = 130x faster than text read.

     ...
     The timing numbers (both absolute and relative) would be of
     similar orders of magnitude if you repeated the test with C++.
     ...
     The application I'm working with would need to crunch through
     some 10 GBytes of numerical data per hour.

I think these excerpts should be sufficient to sketch what
kind of world I am living and working in.

Do note thet I never - unlike some other paricipants in this
thread - claimed my numbers to be exact. I am fairly certain
my English is good enough that the above would reasonably be
expected to be interpreted by a reader as *representative*
numbers. If you look closely, I also commented that coding
up a program in C++ instead of matlab as I had done, would
result in *different* numbers, but not solve the fundamental
problem.

So I can't see any reason why you attack me for my numbers
being "wrong"; I never stated they were exact.

A few posts further out:

http://groups.google.no/group/comp.lang.c++/msg/0abdc440e78f98d6

  [RA] So what does text-based formats actually buy you?
  [JK] Shorter development times, less expensive development, greater
       reliability...

       In sum, lower cost.

  [RA] As long as you keep two factors in mind:
       1) The user's time is not yours (the programmer) to waste.
       2) The users's storage facilities (disk space, network
          bandwidth etc) are not yours (the programmer) to waste.

  [JK] The user pays for your time.  Spending it to do something
which
       results in a less reliable program, and that he doesn't need,
is
       irresponsible, and borders on fraud.

This one really pissed me off. Here I had explained to you
what application I am working with, made you aware of the users
requirements in the operational situation, and you explicitly
state that paying attention to such concerns is 'borderline fraud'!

So I can not interpret this in any other way than that you will
use text-based formats, come hell or high water. Which essentially
invalidate any otherwise relevant arguments you might have presented
throughout thread.

> Binary formats are an optimization:

No, it's not. The selection of file formats is a strategic desing
decision on a par with using binary O(lgN) or linear O(N) search
engines; like choosing betweene a O(NlgN) quick sort or a O(N^2)
bubble sort algorithm.

Such factors govern what problems can be handled by the software
with reasonable effort and within reasonable time.

True, both binary and text-based numerical IO are O(N), but since
text-based numerical IO is orders of magnitude slower, the strategic
impact on design decisions is the same.

> you sometimes need this
> optimization (and you certainly should be aware of the
> possibility of using it), but you don't use them unless timing
> or data size constraints make it necessary.

Hipocrate!

This is exactly what I have been arguing for days and weeks already.
What changed?

Rune


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 10, 1:07 am
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Mon, 9 Nov 2009 05:37:53 -0800 (PST)
Local: Tues, Nov 10 2009 1:07 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 9, 10:57 am, Rune Allnor <all...@tele.ntnu.no> wrote:

    [...]

> I think these excerpts should be sufficient to sketch what
> kind of world I am living and working in.

I fully understand what kind or world you're working in.  As a
consultant, I've worked on seismic applications too, albeit not
recently.

> Do note thet I never - unlike some other paricipants in this
> thread - claimed my numbers to be exact.

Off by more than an order of magnitude is not just a question of
"exact".

> I am fairly certain my English is good enough that the above
> would reasonably be expected to be interpreted by a reader as
> *representative* numbers. If you look closely, I also
> commented that coding up a program in C++ instead of matlab as
> I had done, would result in *different* numbers, but not solve
> the fundamental problem.
> So I can't see any reason why you attack me for my numbers
> being "wrong"; I never stated they were exact.

First, I didn't "attack" you.  On the whole, I understand your
problem.  Stating that the difference is some 100 times is
misleading, however.

I didn't say that.  I said that ignoring issues of development
time and reliability is fraud.  You have to make a trade off; if
text based IO isn't sufficiently fast for the users needs, or
requires too much additional space, then you use binary.  But
you consider the cost of doing so, and weigh it against the
other costs.

> So I can not interpret this in any other way than that you
> will use text-based formats, come hell or high water.

How do you read that into anything I've said.  I've simply
pointed out that using text does buy you something, or in other
words, using binary has a cost.  There's no doubt that using
text has other costs.  Engineering is about weighing the
difference costs; if you don't know what text based formats buy
you, then you can't weigh the costs accurately.

> Which essentially invalidate any otherwise relevant arguments
> you might have presented throughout thread.
> > Binary formats are an optimization:
> No, it's not. The selection of file formats is a strategic desing
> decision on a par with using binary O(lgN) or linear O(N) search
> engines; like choosing betweene a O(NlgN) quick sort or a O(N^2)
> bubble sort algorithm.

Which are also optimizations:-).

There are optimizations and optimizations.  Sometimes you do
know up front that you'll need the optimization; if you know
that you'll have to deal with millions of elements, you know up
front that a quadratic algorithm won't do the trick.

In the case of choosing binary, the motivation for doing so up
front is a bit different---after all, the difference will never
be other than linear.  Partially, the motivation can be
calculated: if you know the number of elements, you can
calculate the disk space needed up front.  In many cases,
however, you know that you'll be locked into the format you
choose, so you have to consider performance issues earlier.
Once you start considering performance issues, however, you're
talking about optimization.

> Such factors govern what problems can be handled by the software
> with reasonable effort and within reasonable time.
> True, both binary and text-based numerical IO are O(N), but since
> text-based numerical IO is orders of magnitude slower, the strategic
> impact on design decisions is the same.

There you go exagerating again.  It's not orders of magnitude
slower.  At the most, it's around 10 times slower, and often the
difference is even less.  That doesn't mean that its irrelevant,
and sometimes you will have to use a binary format (and
sometimes, you'll have to adapt the binary format, to make it
quicker).

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 10, 1:11 am
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Mon, 9 Nov 2009 05:41:43 -0800 (PST)
Local: Tues, Nov 10 2009 1:11 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 9, 5:06 am, "Alf P. Steinbach" <al...@start.no> wrote:

> * James Kanze:
> > On Nov 8, 6:11 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> >> On 8 Nov, 15:27, James Kanze <james.ka...@gmail.com> wrote:
> >> I'm getting tired with re-iterating this for people who
> >> are not interested in actually evaluating the numbers.
> > I actually did some measures, to check the numbers.  Your
> > numbers were wrong.  More to the point, actual numbers will
> > vary enormously from one implemenation to the next.
> >> Look for an upcomimg post on comp.lang.c++.moderated,
> > Not every one reads that group.  Not everyone agrees with
> > its moderation policy (as currently practiced).
> Would you care to elaborate on that hinting, please.

"Not everyone" means "at least me".  I stopped participating in
the group because I found the moderation was becoming too heavy
in some cases.  Others, I know, aren't bothered with it.  To
each his own.

--
James Kanze


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian  
View profile  
 More options Nov 10, 10:01 am
Newsgroups: comp.lang.c++
From: Brian <c...@mailvault.com>
Date: Mon, 9 Nov 2009 14:31:20 -0800 (PST)
Local: Tues, Nov 10 2009 10:01 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 9, 7:37 am, James Kanze <james.ka...@gmail.com> wrote:

This Gianni Mariani quote indicates he saw some
differences of more than 10x.

"However, reading and writing binary files can have HUGE
performance gains.  I once came across some numerical code
where it would read and write large datasets. These datasets
were 40-100MB.  The performance was horrendus.  Using mapped
files and binary data made the reading and writing virtually
zero cost and it improved the performance of the product by
nearly 10x times and in some tests over 1000x.  Be careful -
this is one application and the bottle neck was clearly
identified.  This may not be where your application spends
its time."

I hope to beef up the C++ Middleware Writer's support
for writing and reading data more generally.  To begin
with I'm going to focus on integral types and assume
8 bit bytes.  Currently we don't have support for uint8_t,
uint16_t, etc.  I guess those are the types I'll start with.
I'm going through the newsgroup archives to find snippets
that are helpful in this area.  If anyone has a link wrt
this, I'm interested.

Brian Wood
http://www.webEbenezer.net


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Kanze  
View profile  
 More options Nov 10, 10:27 am
Newsgroups: comp.lang.c++
From: James Kanze <james.ka...@gmail.com>
Date: Mon, 9 Nov 2009 14:57:46 -0800 (PST)
Local: Tues, Nov 10 2009 10:27 am
Subject: Re: Binary file IO: Converting imported sequences of chars to desired type
On Nov 9, 11:31 pm, Brian <c...@mailvault.com> wrote:


    Reply    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages < Older 
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google