NAME

Text::CSV::Munge.pm

Munge (verb) To modify data in some way the speaker doesn't need to go into right now or cannot describe succinctly.


VERSION

Version 0.01


SYNOPSIS

        # Init object.
        my $munged = Text::CSV::Munge->new();
        # Verbosity: 0 = Quiet; 1 = Info; 2 = Diagnose.
        $munged->set_keys( 'Verbosity Level' => 2 );
        # Load select columns from plural CSV's.
        $munged->merge_csvs(
                'file_path_1.csv', [],  # Empty (or missing) aref defaults to all columns.
                'file_path_2.csv', [1],
        );
        # Reduce data from CSV to these many digits per column.
        $munged->round_cols( 4, 4 );
        # Prettify column widths per channel.
        $munged->align_cols( 8, 8 );
        # Write out a copy of munged CVS data.
        $munged->write_csv( $munged->{file_path_base} . '.csv');


DESCRIPTION

Read in selected columns (channels) from one or more *.csv files and combine them into a array of channel (column) names and an array-of-arrays containing all channel (column) data rows. Primarily aimed at dealing with time-history recordings of data events previously saved in *.csv format.

Write out a single *.csv file from the array of channel names and the array-of-arrays containing channel data.

Features are deliberately minimalist as other routines for dealing with specific kinds of data are meant to be gathered in other co-related Perl modules.

The expected format of all *.csv files is as follows.

First Line

The first line shall be a row of names, one for each column (aka channel), quoted and separated by commas, with or without a trailing comma, thus...

        'Time (S)','Scan (V)',

...or thus...

        'Time (S)', 'Spray (V)'

...or thus...

        'Time (S)', 'Spray (V)',  'Time (S)', 'Scan (V)'
Subsequent Lines

All lines (rows) save the first shall contain values separated by commas, again with or without a trailing comma, thus...

        -.0999752,.2,'987e12',1.234-e5,'Foo','Bar','Any old string',


SUBROUTINES/METHODS

Most of these you will need to use. Some few you need only know about.

Initialized a new object

        my $munged = Text::CSV::Munge->new();

Load select columns form plural CSV files

        $munged->merge_csvs( $file_1, \@cols_1, $file_2, \@cols_2 );

Read in a succession of CSV files, keeping only such columns from each as desired. Arguments must be give in successive pairs. Each pair must consit of a file path followed by a array reference. The referenced array may contain only integers, these integers representing which columns (0 thru N) from the CSV of the file path shall be retained.

If all columns are to be retained for any preceeding filename, reference its columns by an empty array like so...

        $munged->merge_csvs( $file_1, [], $file_2, \@cols_2 );

...or else by no array like so...

        $munged->merge_csvs( $file_1, $file_2, \@cols_2 );

Store common column info.

        $munged->set_keys(
                'Channel Names' => ['Time', 'Amplitude', 'Current'],
                'Units' => ['Time', 'Volts', 'Amps'],
        );

By common is meant that whatever sort of info this be, all channels store a value for it. No channel may be without it. Common data are tidy data and may thus be written out to a CSV file.

The main object (a hash) is used to store info associated with the entire CSV data set. Info stored here will either be solitary scalars or else tidy, equal-length arrays. In either case, consider them as relating to the totality of read-in CSV data for that object.

Return common column info.

        $munged->get_key('Verbosity Level');                  # What it says.

Above is the simplest case, a single scalar.

        @{ $munged->get_key('Column Names') }                 # List of names for all channels.

The next simplest, a simple array, is exampled above.

        @{ $munged->get_key('Column Data Arefs') };           # List of arefs for all channels.
        @{ $munged->get_key('Column Data Arefs')->[0] };      # Data array for channel zero.
        $munged->get_key('Column Data Arefs')->[0]->[0 .. 5]; # List of 1st six points from channel zero.

Getting more complex, above are three examples of dealing with an array of arrays. In this case each sub-array contains the data from its respective column...that is to say, that column of data from the 2nd row on down.

Store uncommon column info.

        $munged->set_col_key($col_key, $value, @cols);

By uncommon is meant that whatever sort of info this be, not all channels store a value for it. Not all channels need even have a hash key for it. Unommon data are untidy data and may not be written out to a CSV file. Some other format might store this kind of untidy data into a header separate from the body of data. But untidy data are often needed for calcuation.

So each column also has a hash of its own. Here you may store any key/value pair you wish so as to describe the various columns, those anyway which were (for reasons of untidiness) absent from the CSV file but are needful later.

Typically, here is where you would store information entered by the user. For example, a Tektronix O'scope will store into a CSV only two columns: time and volts. Other instruments, while storing more columns, will often be equally terse about what their own values represent. Thus the bare CSV file may contain none of the further information needed to interpret or make use of the data.

The usuall case with CSV data is to leave interpretation of data to outboard programs (the scripts you write). Those outboard programs will apply, ex-post-facto, all needed formulae required to understand the data.

These ex-post-facto formulae will often require variables specific to given columns, such as: "Volts" and "Full Scale" if the column values are "mV/V"; or "Gage Factor" and "Transverse Sensitivity" if the column values are in "microstrain". Column specific variables such as these are typically entered, ex-post-facto, by the user.

In your script, store them by means of the set_col_key() method if they are general, and if no specific sub-module exists to handle that kind of data.

A sub-module, however, may exist to handle that very kind of data. There is, for instance, the sub-module Text::CSV::Munged::Strain to handle microstrain readings and formulae. Refer to the section on sub modules for more detail.

Return uncommon column info.

        my @values = $munged->get_col_key($col_key, @cols);

Info stored in column-specific hashes is returned as a list beause the request can be made for plural columns. It may be that no such key was set in one or more of the columns which you query. In such case, its place in the list will be held by 'n/a' signifying 'not applicable' or 'not appropriate', as you prefer.

There being no constraints applied in calling the get_col_key() method, its use is valid for all column keys regardless of how they were set. So it matters not whether you set the column key by set_col_key() or by foo_set_col_key() here in the outer-module or (inappropriately) by a direct call to a method in one of the sub-modules thus, Text::CSV::Munge::Foo->set_col_key_foo(). There being no difference in where or how they are stored, there is only this one retrieval method, here in the outer module.

Describe individual columns

        print $munged->describe(@cols);

Return a multi-line string listing the contents of all key/value pairs which have been set using the set_col_key method.

If you don't like how it breaks on newlines, simply adjust to suit. For example...

        my $desc = $munged->describe(@cols);
        $desc =~ s{\n}{\t}g;   # Change newlines into tabs.
        $desc =~ s{\t\t}{\n}g; # Change former double-newlines to single newlines.

Reduce column resolution

        $munged->round_cols( digits, @columns );

Reduce the resolution of listed columns by rounding off to a fixed number of decimals. Argument must be an integer followed by those columns to which the method shall be applied. Examples follow...

        $munged->round_cols( 3, 1, 5, 7 ) # Round off to 3 decimals columns 1, 5 and 7.
        $munged->round_cols( 1, 0 .. 9 )  # Round off to 1 decimal columns 0 through 9.

Why round off digits? Because a final report should contain values of no greater accuracy than the stated value accuracy. It is not proper to state that a value of 1.2459378549567 is accurate to +/- 1% as the value itself exceeds this level of accuracy.

Set column width in characters

        $munged->align_cols( digits, @columns );

Make columns pretty for text editors by padding to left N (minimum) spaces. Argument must be an integer followed by those columns to which the method shall be applied. Examples follow...

Align to a width of 12 characters columns 1, 5 and 7.

        $munged->align_cols( 12, 1, 5, 7 )

Align to a width of 8 characters columns 0 through 9.

        $munged->align_cols( 8, 0 .. 9 )

Results will be exceptional if any of the data in given columns are already wider than the integer width given. In such case the colum width will be aligned by the maximum actual data width so that columns indeed shall align.

Why align columns? Because a final report should not require a particular viewer to be read. This way your data may be viewed in either a spreadsheet, in a text editor or even in a web browser. The only stipulation is that viewing be done in a monospace font.

Combinational file names

        print $munged->{file_path_base};

As you open files with the merge_csvs method, a variable will have kept track of the file names and composed a composite name for output file use. This method will return that composite.

Writing post-munging CSV files

        $munged->write_csv( '/some/dir/' . $munged->{file_path_base} . '.csv' );

Write an output file of all columns currently retained. Shown is the use of the prior method in auto-creating an output file name.

Enforced hash key naming rules

        $hash_key_name = constrain_key( $hash_key_name );

Mostly this happens where you can't see it. Just know that if, for whatever reason, you set a hash key (probably for some column-specific quality) to either FooBar it will come out Foobar. Likewise fOO bAR will be turned into Foo Bar without your asking for it.

This rule is enforced by both the set_key() and set_col_key() upon storing a value and also by both get_key() and get_col_key() upon retrieval of same.


CONFIGURATION AND ENVIRONMENT

This module requires no configuration. It auto-searches for its dependencies by calling to File::Find.

My goal, as always, is OS-independence, but only have recources to design and test on these two platforms only:

NetBSD 2.0.2 running Perl 5.8.7

My main tower unit and both my laptops run this OS.

WinXP SP2 running ActiveState Perl 5.8.0.

At work I am required to endure this laborious, tiresome and decrepit OS.


SUB MODULES

This outer-module is generalized. Its set_col_key() method places no constraint upon the name of any column key (column-specific hash key) you wish to set. Thus you may, quite literally, set a column key named "foo" to the value of "bar" for one or more columns. And that is fine. In your own Perl scripts, employ this general set_col_key() method however you like.

Sub-modules, however, are intended to apply to a specific (mostly engineering) field. They will contain formulae which rely upon particular column-specific hash variables having fixed keys.

As a user I wished for the need to pay only minimal attention to the existence of sub-modules. I did not wish to bother knowing when to call them, or to keep track of which object belongs to which in the eventual Perl scripts which I shall write to use them.

I addressed that by writing into this outer and generalized "parent" module, a mechanism for automatically require-ing the needed sub-module and passing some few arguements to it.

Thus there exist the variously named foo_set_col_key() methods. Each exists to be used in lieu of the more general set_col_key() method. How it works is exactly the same except for enforcing constraints upon the column key name.

Currently bundled with this module are two supporting sub-modules. More are hopefully soon to follow. Possibly, you might care to contribute one for your own specific field? Presently I am envolved with strain readings, hence the first sub-module as will be described next.

Text::CSV::Munge::Strain

A sub-module specific to microstrain data used in mechanical engineering.

The following medhods are built into this, outer, generalized, parent module so as to handle two-way communication with the specialized sub-module.

Except for defering to these, you need not concern yourself over its use, except to refer to its POD, of course. Example use follows...

        # Provide infor required to calculate rosettes from gages.
        $munged->sg_set_col_key( 'Ohms', 120, 1 .. 6 );
        $munged->sg_set_col_key( 'Gage Factor', 2.01, 1 .. 6 );
        $munged->sg_set_col_key( 'Transverse Sensitivity', 0.015, 1 .. 6 );
        # Perform some calculations.
        $munged->sg_rosette_rect( 1 .. 3 );  # Solve 3 gages as a rosette.
        $munged->sg_rosette_delta( 4 .. 6 ); # Solve 3 gages as a rosette.
        # Useful to validate results from the two methods above.
        $munged-sg_retro_rosette( 4 .. 6 );  # Unsolve a rosette as 3 gages.

Text::CSV::Munge::Strain::Test

This sub-module is simply a test of this modules functionality, used only when building and/or maintaining this suite of Text::CSV::Munge::* modules. It has its own POD document detailing its usage.


SEE ALSO

This module works hand-in-hand with Chart::EPS_graph in that both keep channel names and channel data in identical kinds of arrays. Thus data gathered by Text::CSV::Munge may be immediately graphed by Chart::EPS_graph without further manipulation. Example follows...

        my $eps = Chart::EPS_graph->new(600,400);
        $eps->set(
                label_top   => 'Main Title',
                label_y1    => 'Left Y Axis Dimension (Units)',
                label_y1_2  => 'Left Y Axis Label',
                label_y2    => 'Right Y Axis Dimension (Units)',
                label_y2_2  => 'Right Y Axis Label',
                label_x     => 'X Axis (Units)',
                label_x_2   => 'Bottom Main Label',
                names       => $munged->get_key( 'Column Names' ),
                data        => $munged->get_key( 'Column Data Arefs' ),
                y1          => [7, 8, 10, 11],
                y2          => [9, 12],
        );
        $eps->write_eps( 'some/dir/graph.eps' );
        $eps->display();

Refer to POD in module Chart::EPS_graph for more complete details.


AUTHOR

Gan Uesli Starling <gan@starling.us>


LICENSE AND COPYRIGHT

Copyright (c) 2006 Gan Uesli Starling. All rights reserved.

This program is free software; you may redistribute and/or modify it under the same terms as Perl itself.