Text::CSV::Munge.pm
Munge (verb) To modify data in some way the speaker doesn't need to go into right now or cannot describe succinctly.
Version 0.01
# Init object. my $munged = Text::CSV::Munge->new();
# Verbosity: 0 = Quiet; 1 = Info; 2 = Diagnose. $munged->set_keys( 'Verbosity Level' => 2 );
# Load select columns from plural CSV's. $munged->merge_csvs( 'file_path_1.csv', [], # Empty (or missing) aref defaults to all columns. 'file_path_2.csv', [1], );
# Reduce data from CSV to these many digits per column. $munged->round_cols( 4, 4 );
# Prettify column widths per channel. $munged->align_cols( 8, 8 );
# Write out a copy of munged CVS data. $munged->write_csv( $munged->{file_path_base} . '.csv');
Read in selected columns (channels) from one or more *.csv
files and combine
them into a array of channel (column) names and an array-of-arrays containing
all channel (column) data rows. Primarily aimed at dealing with time-history
recordings of data events previously saved in *.csv
format.
Write out a single *.csv
file from the array of channel names and the
array-of-arrays containing channel data.
Features are deliberately minimalist as other routines for dealing with specific kinds of data are meant to be gathered in other co-related Perl modules.
The expected format of all *.csv
files is as follows.
The first line shall be a row of names, one for each column (aka channel), quoted and separated by commas, with or without a trailing comma, thus...
'Time (S)','Scan (V)',
...or thus...
'Time (S)', 'Spray (V)'
...or thus...
'Time (S)', 'Spray (V)', 'Time (S)', 'Scan (V)'
All lines (rows) save the first shall contain values separated by commas, again with or without a trailing comma, thus...
-.0999752,.2,'987e12',1.234-e5,'Foo','Bar','Any old string',
Most of these you will need to use. Some few you need only know about.
my $munged = Text::CSV::Munge->new();
$munged->merge_csvs( $file_1, \@cols_1, $file_2, \@cols_2 );
Read in a succession of CSV files, keeping only such columns from each as desired. Arguments must be give in successive pairs. Each pair must consit of a file path followed by a array reference. The referenced array may contain only integers, these integers representing which columns (0 thru N) from the CSV of the file path shall be retained.
If all columns are to be retained for any preceeding filename, reference its columns by an empty array like so...
$munged->merge_csvs( $file_1, [], $file_2, \@cols_2 );
...or else by no array like so...
$munged->merge_csvs( $file_1, $file_2, \@cols_2 );
$munged->set_keys( 'Channel Names' => ['Time', 'Amplitude', 'Current'], 'Units' => ['Time', 'Volts', 'Amps'], );
By common is meant that whatever sort of info this be, all channels store a value for it. No channel may be without it. Common data are tidy data and may thus be written out to a CSV file.
The main object (a hash) is used to store info associated with the entire CSV data set. Info stored here will either be solitary scalars or else tidy, equal-length arrays. In either case, consider them as relating to the totality of read-in CSV data for that object.
$munged->get_key('Verbosity Level'); # What it says.
Above is the simplest case, a single scalar.
@{ $munged->get_key('Column Names') } # List of names for all channels.
The next simplest, a simple array, is exampled above.
@{ $munged->get_key('Column Data Arefs') }; # List of arefs for all channels.
@{ $munged->get_key('Column Data Arefs')->[0] }; # Data array for channel zero.
$munged->get_key('Column Data Arefs')->[0]->[0 .. 5]; # List of 1st six points from channel zero.
Getting more complex, above are three examples of dealing with an array of arrays. In this case each sub-array contains the data from its respective column...that is to say, that column of data from the 2nd row on down.
$munged->set_col_key($col_key, $value, @cols);
By uncommon is meant that whatever sort of info this be, not all channels store a value for it. Not all channels need even have a hash key for it. Unommon data are untidy data and may not be written out to a CSV file. Some other format might store this kind of untidy data into a header separate from the body of data. But untidy data are often needed for calcuation.
So each column also has a hash of its own. Here you may store any key/value pair you wish so as to describe the various columns, those anyway which were (for reasons of untidiness) absent from the CSV file but are needful later.
Typically, here is where you would store information entered by the user. For example, a Tektronix O'scope will store into a CSV only two columns: time and volts. Other instruments, while storing more columns, will often be equally terse about what their own values represent. Thus the bare CSV file may contain none of the further information needed to interpret or make use of the data.
The usuall case with CSV data is to leave interpretation of data to outboard programs (the scripts you write). Those outboard programs will apply, ex-post-facto, all needed formulae required to understand the data.
These ex-post-facto formulae will often require variables specific to given columns, such as: "Volts" and "Full Scale" if the column values are "mV/V"; or "Gage Factor" and "Transverse Sensitivity" if the column values are in "microstrain". Column specific variables such as these are typically entered, ex-post-facto, by the user.
In your script, store them by means of the set_col_key()
method if they are
general, and if no specific sub-module exists to handle that kind of data.
A sub-module, however, may exist to handle that very kind of data. There is,
for instance, the sub-module Text::CSV::Munged::Strain
to handle microstrain
readings and formulae. Refer to the section on sub modules for more detail.
my @values = $munged->get_col_key($col_key, @cols);
Info stored in column-specific hashes is returned as a list beause the request
can be made for plural columns. It may be that no such key was set in one or
more of the columns which you query. In such case, its place in the list will
be held by 'n/a'
signifying 'not applicable' or 'not appropriate', as you
prefer.
There being no constraints applied in calling the get_col_key()
method, its
use is valid for all column keys regardless of how they were set. So it matters not
whether you set the column key by set_col_key()
or by foo_set_col_key()
here
in the outer-module or (inappropriately) by a direct call to a method in one
of the sub-modules thus, Text::CSV::Munge::Foo->set_col_key_foo()
. There being
no difference in where or how they are stored, there is only this one retrieval
method, here in the outer module.
print $munged->describe(@cols);
Return a multi-line string listing the contents of all key/value pairs which
have been set using the set_col_key
method.
If you don't like how it breaks on newlines, simply adjust to suit. For example...
my $desc = $munged->describe(@cols); $desc =~ s{\n}{\t}g; # Change newlines into tabs. $desc =~ s{\t\t}{\n}g; # Change former double-newlines to single newlines.
$munged->round_cols( digits, @columns );
Reduce the resolution of listed columns by rounding off to a fixed number of decimals. Argument must be an integer followed by those columns to which the method shall be applied. Examples follow...
$munged->round_cols( 3, 1, 5, 7 ) # Round off to 3 decimals columns 1, 5 and 7. $munged->round_cols( 1, 0 .. 9 ) # Round off to 1 decimal columns 0 through 9.
Why round off digits? Because a final report should contain values of no greater accuracy than the stated value accuracy. It is not proper to state that a value of 1.2459378549567 is accurate to +/- 1% as the value itself exceeds this level of accuracy.
$munged->align_cols( digits, @columns );
Make columns pretty for text editors by padding to left N (minimum) spaces. Argument must be an integer followed by those columns to which the method shall be applied. Examples follow...
Align to a width of 12 characters columns 1, 5 and 7.
$munged->align_cols( 12, 1, 5, 7 )
Align to a width of 8 characters columns 0 through 9.
$munged->align_cols( 8, 0 .. 9 )
Results will be exceptional if any of the data in given columns are already wider than the integer width given. In such case the colum width will be aligned by the maximum actual data width so that columns indeed shall align.
Why align columns? Because a final report should not require a particular viewer to be read. This way your data may be viewed in either a spreadsheet, in a text editor or even in a web browser. The only stipulation is that viewing be done in a monospace font.
print $munged->{file_path_base};
As you open files with the merge_csvs
method, a variable will have kept track
of the file names and composed a composite name for output file use. This method
will return that composite.
$munged->write_csv( '/some/dir/' . $munged->{file_path_base} . '.csv' );
Write an output file of all columns currently retained. Shown is the use of the prior method in auto-creating an output file name.
$hash_key_name = constrain_key( $hash_key_name );
Mostly this happens where you can't see it. Just know that if, for whatever
reason, you set a hash key (probably for some column-specific quality) to
either FooBar
it will come out Foobar
. Likewise fOO bAR
will be
turned into Foo Bar
without your asking for it.
This rule is enforced by both the set_key()
and set_col_key()
upon
storing a value and also by both get_key()
and get_col_key()
upon
retrieval of same.
This module requires no configuration. It auto-searches for its dependencies
by calling to File::Find
.
My goal, as always, is OS-independence, but only have recources to design and test on these two platforms only:
My main tower unit and both my laptops run this OS.
At work I am required to endure this laborious, tiresome and decrepit OS.
This outer-module is generalized. Its set_col_key()
method places no
constraint upon the name of any column key (column-specific hash key) you wish
to set. Thus you may, quite literally, set a column key named "foo" to the value
of "bar" for one or more columns. And that is fine. In your own Perl scripts,
employ this general set_col_key()
method however you like.
Sub-modules, however, are intended to apply to a specific (mostly engineering) field. They will contain formulae which rely upon particular column-specific hash variables having fixed keys.
As a user I wished for the need to pay only minimal attention to the existence of sub-modules. I did not wish to bother knowing when to call them, or to keep track of which object belongs to which in the eventual Perl scripts which I shall write to use them.
I addressed that by writing into this outer and generalized "parent" module, a
mechanism for automatically require
-ing the needed sub-module and passing
some few arguements to it.
Thus there exist the variously named foo_set_col_key()
methods. Each exists
to be used in lieu of the more general set_col_key()
method. How it works is
exactly the same except for enforcing constraints upon the column key name.
Currently bundled with this module are two supporting sub-modules. More are hopefully soon to follow. Possibly, you might care to contribute one for your own specific field? Presently I am envolved with strain readings, hence the first sub-module as will be described next.
A sub-module specific to microstrain data used in mechanical engineering.
The following medhods are built into this, outer, generalized, parent module so as to handle two-way communication with the specialized sub-module.
Except for defering to these, you need not concern yourself over its use, except to refer to its POD, of course. Example use follows...
# Provide infor required to calculate rosettes from gages. $munged->sg_set_col_key( 'Ohms', 120, 1 .. 6 ); $munged->sg_set_col_key( 'Gage Factor', 2.01, 1 .. 6 ); $munged->sg_set_col_key( 'Transverse Sensitivity', 0.015, 1 .. 6 );
# Perform some calculations. $munged->sg_rosette_rect( 1 .. 3 ); # Solve 3 gages as a rosette. $munged->sg_rosette_delta( 4 .. 6 ); # Solve 3 gages as a rosette.
# Useful to validate results from the two methods above. $munged-sg_retro_rosette( 4 .. 6 ); # Unsolve a rosette as 3 gages.
This sub-module is simply a test of this modules functionality, used only when
building and/or maintaining this suite of Text::CSV::Munge::*
modules. It
has its own POD document detailing its usage.
This module works hand-in-hand with Chart::EPS_graph
in that both keep
channel names and channel data in identical kinds of arrays. Thus data gathered
by Text::CSV::Munge
may be immediately graphed by Chart::EPS_graph
without
further manipulation. Example follows...
my $eps = Chart::EPS_graph->new(600,400);
$eps->set( label_top => 'Main Title', label_y1 => 'Left Y Axis Dimension (Units)', label_y1_2 => 'Left Y Axis Label', label_y2 => 'Right Y Axis Dimension (Units)', label_y2_2 => 'Right Y Axis Label', label_x => 'X Axis (Units)', label_x_2 => 'Bottom Main Label', names => $munged->get_key( 'Column Names' ), data => $munged->get_key( 'Column Data Arefs' ), y1 => [7, 8, 10, 11], y2 => [9, 12], );
$eps->write_eps( 'some/dir/graph.eps' );
$eps->display();
Refer to POD in module Chart::EPS_graph
for more complete details.
Gan Uesli Starling <gan@starling.us>
Copyright (c) 2006 Gan Uesli Starling. All rights reserved.
This program is free software; you may redistribute and/or modify it under the same terms as Perl itself.