Support for mixed signal CSV input data is desirable and should be
possible. The current implementation just happens to not fully cope with
arbitrary mixes of data types in columns yet. Add a quick workaround,
but also a TODO item to properly address the topic later.
Extend the CSV input module which was strictly limited to logic data so
far. Add support for analog data types. Implement the 'a' column format,
and feed analog data to the session bus.
This implementation feeds data of individual analog channels to the
session bus in separate packets each. This approach was found to work
most reliably, not all recipients support the submission of multiple
samples for multiple channels in a single packet.
A fixed 'digits' value is used. This needs to get addressed later.
Local experiments suggest that the 'double' data type for analog data
can result in erroneous visual presentation (observed with sigrok-cli).
Use 'float' for now, until the issue is understood and got fixed.
Support for double is prepared internally and is easily enabled.
Address several minor nits. Eliminate unneeded variables. Update text to
number conversion comments including wildcard handling. Remove empty
lines in init() which used to spill out a set of lines which all do the
same thing (evaluate a set of options) and shall belong together.
Move the creation of logic channels to the location where formats fields
get iterated, and column processing details get derived. This reduces a
lot of redundancy, and simplifies the addition of more data formats.
Update the list of TODO items at the top of the CSV input module's
source. Text line handling (counting line numbers) got fixed. Adding
support for analog channels was prepared, as are timestamp columns.
Rename the CSV input module's option keywords. To better reflect their
purpose, and for consistency across the rather complex set of options
and how they interact. Rearrange the list of options (not that the order
matters from the outside, but it's good to have during maintenance).
Update builtin help texts which will show up in applications, as well as
the source code comments which discuss these options in greater detail.
Would be nice to have a "general" help text for input modules which is
not tied to one single option, to provide an overview or use examples.
Arrange the option keys, short and long help texts such that the source
better reflects the applications' screen layout. To better support
future maintenance, again.
Consistently separate multi-work keywords for improved readability.
Prefer underscores over dashes for consistency with common keys in
shared infrastructure in other project sources (device options, MQ
items, etc).
Extend the "column-formats" option support in the CSV input module to
also support wildcards and automatic channel count detection. Move the
format string interpretation to the location where the first data line
or the optional header line are seen. Map the simple options (single
column number and channel count, or first column number and optional
channel count) to a format string, to unify internal code paths. Remove
code paths for the previous specific yet limited scenarios.
Rephrase the condition which keeps executing the "initial receive"
phase. The text line termination sequence gets derived from the first
complete text line, but other essential information is only gathered
later, potentially after skipping a large (user specified) amount of
input data. Keep checking for this essential setup data until data or
the header actually were seen, before regular processing of input data
starts.
Extend the CSV input module, introduce support for the "column-formats="
option. This syntax can express the previous single- and multi-column
semantics, as well as any arbitrary order of to-get-ignored, and single-
and multi-bit columns in several formats.
The previous "simple" keywords for single and multi column modes still
are in place, it's yet to get determined whether to axe them. Depends on
whether users can handle the format strings for these simple cases.
The previous implementation allowed CSV input files to use any line
termination in either CR only, LF only, or CR/LF format. The first EOL
was searched for and was recorded, but then was not used. Instead any of
CR or LF were considered a line termination. "Raw data processing" still
was correct, but line numbers in diagnostics were way off, and optional
features like skipping first N lines were not effective. Fix that.
Source code inspection suggests the "startline" feature did not work at
all. The user provided number was used in the initial optional search
for the header line (to get signal names) or auto-determination of the
number of columns. But then was not used when the file actually got
processed.
Reduce "state" in the CSV input module's context. Stick with variables
that are local to routines when knowledge of details need not be global.
Really base the processing of a column's input text on the column's
processing information which was gathered in the setup phase.
Rename few identifiers, to explicitly refer to logic channels (the only
currently supported data type of the CSV input module). Cease feeding
logic data to the session bus when there are no logic channels at all
(currently not really an option). Prepare for simpler dispatching of
parse routines should more data types get added in a future version.
Reduce some "clutter" (overly fragmented stuff that should go together
since it forms logical groups and is not really standalone). Address a
few more minor style nits (sizeof() redundancy, "seemingly inverse"
string comparison phrases).
Improve the code paths which determine logic channels' names from an
optional CSV file header line. Strip optional quotes from the column's
input text (re-use a SCPI helper routine for that). Also use the channel
name for multi-bit fields, append [0] etc suffixes in that case. Comment
on the manipulation of input data, which is acceptable since that very
data won't get processed another time in another code path.
Rephrase the CSV input module's implementation such that generic support
to "process a column" becomes available. All columns of an input file's
text line get inspected, a column can either get ignored, or converted
to logic data. A future version can then remove the current limitations
of single- and multi-column modes (either one single multi-bit cell, or
multiple single-bit cells which must be adjacent).
Combine the bin/oct/hex parse routines into one routine which handles up
to four bits per input number digit with common logic. Availability of
more data than channels (according to user specs) is not fatal.
Drop the counter intuitive "first-channel" option, use "first-column"
instead. Warn when comment leader and column separator are identical
(was silent before, may be unexpected). Extend diagnostics and address
minor readability nits, update comments. Rephrase logic channel name
assignment.
Use simple scalar options to derive generic processing details: Either
'single-column' and 'numchannels' are required, with an optional
'format' spec (resulting in single-column mode). Or 'first-column' with
an optional 'numchannels' (multi-column mode with fixed format, using
all available columns by default). The default is multi-column mode with
one logic channel per column and spanning all columns on a text line.
Don't clobber the value of the user provided 'header' option. Use a
separate flag to track whether the header line was seen before, or
needs to get skipped when it passes by.
Move the communication of the samplerate meta packet to the very spot
where logic sample data gets sent. This allows to optionally determine
late the samplerate, potentially from input data instead of user specs.
Move the helper routines which arrange for the data feed to an earlier
spot, so that references resolve without forward declarations. Rename
routines to reflect that they deal with logic data.
Slightly unobfuscate column text to logic data conversion, and reduce
redundancy. Move sample data preset to a central location.
Rephrase error messages, provide stronger hints as to why the input text
for a conversion was considered invalid.
The previous implementation assumed that in multi-column mode each cell
communicates exactly one bit of input (a logic channel). But only the
first character got tested. Tighten the check, to cover the whole input
text. This rejects fully invalid input, as well as increases robustness
since multi-bit input like "100" was mistaken as a value of 1 before.
Add documentation to the bin/hex/oct text parse routines, and move the
bin/hex/oct dispatcher to the location where its invoked routines are.
Stick with a TODO comment for parse_line() to reduce the diff size.
The parse_line() routine is rather complex, optionally accepts an upper
limit for the number of columns, but unconditionally assumes a first one
and drops preceeding fields. The rather generic n and k identifiers are
not helpful.
Use the 'seen' and 'taken' names instead which better reflect what's
actually happening. Remove empty lines which used to tear apart groups
of instructions which are strictly related. This organization neither
was helpful during maintenance.
Accept when comments are indented, trim the whitespace from text lines
after stripping off the comment. This avoids the processing of lines
which actually are empty, and improves robustness (avoids errors for a
non-fatal situation). Also results in more appropriate diagnostics at
higher log levels.
The CSV input module has grown so many options, that counting them by
hand became tedious and error prone. Eliminate the magic numbers in the
associated code paths.
This also has the side effect that the set is easy to re-order just by
adjusting the enum, no other code is affected. Help text and default
values is much easier to verify and adjust with the symbolic references.
[ see 'git diff --word-diff' for the essence of the change ]
Use size_t for things that get counted: column indices, channel numbers
(line numbers already used size_t). De-anonymize an enum to avoid 'int'
where it gets referenced. Adjust printf(3) format strings. Get unsigned
values from option lookups (stick with 32bits, should be acceptable for
spreadsheet columns and channel counts).
Address other minor nits while we are here: Also terminate the last item
in an enum declaration. Add a doxygen comment for parse_line(). Rename a
parameter to achieve tabular doc text layout.
Rephrase the #include statements in the CSV input module. "config" is
not a system header but is provided by the application source code.
Separate the config and system and application groups (their order is
essential). Alpha-sort the files within their group for simplified
maintenance.
Do for the CSV input module what commit 08f8421a9e did for VCD. Check
the channel list for consistency across re-imports of the same file.
This addresses the CSV part of bug #1241.
The cleanup() routine gets invoked upon shutdown, as well as before
re-importing another file. The cleanup() routine must not release
resources which get allocated in the init() routine, as the init()
routine won't run again in the module's lifetime. The cleanup() routine
must void those context fields which get evaluated in the next receive()
calls.
When the processing of columns of text lines detected errors, the loop
was aborted and the routine was left, but allocated resources were not
freed. Fix the remaining memory leaks in the error code paths.
The constant at the top of the source file is the number of samples in a
datafeed submission chunk. The previous implementation erroneously made
it the size in bytes. There is no need to round down the buffer size
according to the unit size.
The previous implementation sent one sigrok session datafeed packet per
processed CSV line. This is rather inefficient for the CSV input module,
and triggers a dramatic performance loss in the srzip output format.
Communicate up to 128K samples within one datafeed packet. This fixes
bug #695.
Factor out repeated calculation of the unit size which is derived from
the channel count. Fix a minor memory leak in an error path while we are
here. (Other memory leaks in rare error paths remain with this commit.)
Suggested-By: Elias Oenal <sigrok@eliasoenal.com>
On the Windows platform it appears to be popular to _not_ terminate the
very last line in a text file. Which results in an unmet constraint in
the CSV input module and an internal exception in PulseView which aborts
program execution.
Cope with the absence of the text line termination sequence at the very
end of the input stream. Keep all other checks in place, such that only
completely received text lines get processed.
This fixes bug #635.
"Document" the current state of the implementation in the CSV input
module's source code. Discuss how text handling is non-trivial, which
approaches are available and how they have drawbacks.
Mention the lack of support for the import of analog data as well.
The CSV input module supports variable length end-of-line encodings
(either CRLF, or CR, or LF). When a bunch of accumulated text lines got
processed, do skip the corresponding number of characters after the end
of the last processed line.
This fixes one of the issues discussed in bug #635.
The input module runs receive() and end() invocations which end up
calling process_buffer(). It's perfectly legal to call the process
routine with an empty accumulation buffer, especially when the process
routine was called from end().
This fixes a condition where PulseView raised a fatal error at the end
of a completed successful import.
Reported-By: Sergey Alirzaev <zl29ah@gmail.com>
Move an independent test for single/multi column operation out of a code
path that checked for and then processed text lines. This commit does
not change behaviour, but prepares a subsequent commit.
Factor out a magic string literal which held a delimiter set yet could
be mistaken for an (assumed) fixed termination string. Concentrate the
determination of the end-of-line text encoding as well as the resulting
set of possible deliminters in one nearby location. The symbolic name
for the delimiter set eliminates the doubt on its purpose.
Some of the standard helper functions take a log prefix parameter that is
used when printing messages. This log prefix is almost always identical to
the name field in the driver's sr_dev_driver struct. The only exception are
drivers which register multiple sr_dev_driver structs.
Instead of passing the log prefix as a parameter simply use the driver's
name. This simplifies the API, gives consistent behaviour between different
drivers and also makes it easier to identify where the message originates
when a driver registers sr_dev_driver structs.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
This makes the code shorter, simpler and more consistent, and also
ensures that the (same) debug messages are always emitted and the
packet.payload field is consistently set to NULL always, etc.
Since Autoconf places some important feature flags only into the
configuration header, it is necessary to include it globally to
guarantee a consistent build.
A few of these were pretty serious, like missing arguments,
passing integers where a string was expected, and so on.
In some places, change the types used by the code rather than
just the format strings.