Skip to content

Commit 26fe313

Browse files
committed
[metrics] support for space separator
1 parent ad9aac0 commit 26fe313

65 files changed

Lines changed: 1479 additions & 511 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

NEWS.md

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,6 @@ Features:
1010
The `z`/`Z` keys can also be used to increase/decrease the
1111
context by one in the LOG, TEXT, and TIMELINE views. Context
1212
lines are styled using the new `context-line` theme style.
13-
* Added a log format for the `fsck_apfs` and `fsck_hfs` tools on
14-
macOS, covering both the `started`/`completed` lifecycle lines
15-
and legacy `run` entries. This replaces the previous
16-
`fsck_hfs_log` format, which only matched the start lines.
17-
The new format exposes `device`, `tool`, and `action` fields,
18-
groups messages by device in the TIMELINE view, and highlights
19-
`error:` lines and `FILESYSTEM CLEAN` status messages.
2013
* Added a built-in `metrics_log` format that recognizes CSV
2114
files whose first column header is `Time`/`Timestamp`/`ts`/
2215
`Date...` and whose subsequent rows begin with a parseable
@@ -43,6 +36,21 @@ Features:
4336
target any table, including search-table columns.
4437
- `:clear-timeline-metric <label>` removes metrics.
4538
Up to four metrics can be added.
39+
* Added support for "tabular" formats (e.g. CSV, TSV).
40+
The format definition for this type of file sets
41+
`file-type` to `tabular` and then defines the known
42+
columns. When opening a file of this type, the
43+
separator will be automatically detected and the header
44+
compared against the columns defined in the tabular
45+
formats. If a good match is found, it will be used as
46+
the format for the file.
47+
* Added a log format for the `fsck_apfs` and `fsck_hfs` tools on
48+
macOS, covering both the `started`/`completed` lifecycle lines
49+
and legacy `run` entries. This replaces the previous
50+
`fsck_hfs_log` format, which only matched the start lines.
51+
The new format exposes `device`, `tool`, and `action` fields,
52+
groups messages by device in the TIMELINE view, and highlights
53+
`error:` lines and `FILESYSTEM CLEAN` status messages.
4654
* Log format value definitions now accept a `unit` object
4755
with `suffix` and `divisor` properties. `suffix` specifies
4856
how numeric fields are humanized. `divisor` normalizes

docs/schemas/format-v1.schema.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -943,7 +943,7 @@
943943
"enum": [
944944
"text",
945945
"json",
946-
"csv"
946+
"tabular"
947947
]
948948
},
949949
"max-unrecognized-lines": {

docs/source/formats.rst

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,23 @@ See the following formats that are built into lnav as examples:
5555
* `cloudflare_log.json <https://github.com/tstack/lnav/blob/master/src/formats/cloudflare_log.json>`_
5656
* `github_events_log.json <https://github.com/tstack/lnav/blob/master/src/formats/github_events_log.json>`_
5757

58+
.. _tabular_format:
59+
60+
Tabular files
61+
-------------
62+
63+
Delimited files (CSV, TSV, and similar) can be parsed by declaring
64+
a format with :code:`"file-type": "tabular"`. The first row of the
65+
file must be a header naming each column; the separator is
66+
auto-detected from the header and is one of comma, tab, semicolon,
67+
pipe (:code:`|`), or runs of two-or-more spaces.
68+
69+
Each column is mapped to a :code:`value` definition by name. The
70+
standard field bindings work the same as for other types of formats.
71+
A row may use a single :code:`-` or :code:`--` to indicate that
72+
:code:`opid-field` or :code:`thread-id-field` is absent for that
73+
row.
74+
5875
logfmt
5976
------
6077

@@ -194,7 +211,19 @@ object with the following fields:
194211
The `PCRE2 <http://www.pcre.org>`_ library is used by **lnav** to do all
195212
regular expression matching.
196213

197-
:json: True if each log line is JSON-encoded.
214+
:file-type: The shape of the file. One of:
215+
216+
:text: Plain-text log files matched by one or more
217+
:code:`regex` patterns. This is the default.
218+
:json: Each line is a JSON object (JSON-lines). The
219+
:code:`value` definitions name the JSON properties to
220+
extract and :code:`line-format` controls how messages
221+
are rendered.
222+
:tabular: A delimited file whose first row is a header
223+
naming each column. See :ref:`tabular_format`.
224+
225+
:json: (Deprecated, use :code:`"file-type": "json"` instead.) True if
226+
each log line is JSON-encoded.
198227

199228
:converter: An object that describes how an input file can be detected and
200229
then converted to a form that can be interpreted by **lnav**. For

src/base/intern_string.cc

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -776,8 +776,8 @@ string_fragment::transform_codepoints(
776776
}
777777
auto cp = read_res.unwrap();
778778
auto new_cp = xform(cp);
779-
ww898::utf::utf8::write(
780-
new_cp, [&out](const char b) { out.push_back(b); });
779+
ww898::utf::utf8::write(new_cp,
780+
[&out](const char b) { out.push_back(b); });
781781
}
782782
return out;
783783
}
@@ -817,6 +817,45 @@ string_fragment::column_width() const
817817
return retval;
818818
}
819819

820+
std::optional<uint32_t>
821+
string_fragment::cursor_impl::lookahead() const
822+
{
823+
if (this->ci_next_index >= this->ci_end) {
824+
return std::nullopt;
825+
}
826+
827+
int32_t index = this->ci_next_index;
828+
auto read_res = ww898::utf::utf8::read(
829+
[this, &index]() { return this->ci_string[index++]; });
830+
if (read_res.isErr()) {
831+
return this->ci_string[this->ci_next_index];
832+
}
833+
return read_res.unwrap();
834+
}
835+
836+
std::optional<uint32_t>
837+
string_fragment::cursor_impl::next()
838+
{
839+
this->ci_lookbehind = this->ci_next_lookbehind;
840+
if (this->ci_next_index >= this->ci_end) {
841+
return std::nullopt;
842+
}
843+
844+
int32_t index = this->ci_next_index;
845+
auto read_res = ww898::utf::utf8::read(
846+
[this, &index]() { return this->ci_string[index++]; });
847+
uint32_t retval;
848+
if (read_res.isErr()) {
849+
retval = this->ci_string[this->ci_next_index];
850+
this->ci_next_index += 1;
851+
} else {
852+
retval = read_res.unwrap();
853+
this->ci_next_index = index;
854+
}
855+
this->ci_next_lookbehind = retval;
856+
return retval;
857+
}
858+
820859
struct single_producer : string_fragment_producer {
821860
explicit single_producer(const string_fragment& sf) : sp_frag(sf) {}
822861

src/base/intern_string.hh

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -783,6 +783,36 @@ struct string_fragment {
783783

784784
uint64_t bloom_bits() const;
785785

786+
class cursor_impl {
787+
public:
788+
std::optional<uint32_t> lookbehind() const
789+
{
790+
return this->ci_lookbehind;
791+
}
792+
793+
std::optional<uint32_t> lookahead() const;
794+
795+
std::optional<uint32_t> next();
796+
797+
private:
798+
friend string_fragment;
799+
800+
explicit cursor_impl(const string_fragment& parent)
801+
: ci_string(parent.sf_string),
802+
ci_end(parent.sf_end),
803+
ci_next_index(parent.sf_begin)
804+
{
805+
}
806+
807+
const char* ci_string;
808+
int32_t ci_end;
809+
int32_t ci_next_index;
810+
std::optional<uint32_t> ci_lookbehind;
811+
std::optional<uint32_t> ci_next_lookbehind;
812+
};
813+
814+
cursor_impl cursor() const { return cursor_impl(*this); }
815+
786816
const char* sf_string;
787817
int32_t sf_begin;
788818
int32_t sf_end;

src/base/intern_string.tests.cc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,3 +453,34 @@ TEST_CASE("string_fragment::word helpers with wide chars")
453453
CHECK(sf.curr_word(4) == std::optional<int>(0));
454454
}
455455
}
456+
457+
TEST_CASE("string_fragment::cursor")
458+
{
459+
{
460+
const auto input = ""_frag;
461+
auto cursor = input.cursor();
462+
CHECK_FALSE(cursor.lookbehind().has_value());
463+
CHECK_FALSE(cursor.lookahead().has_value());
464+
CHECK_FALSE(cursor.next().has_value());
465+
}
466+
{
467+
const auto input = "hello"_frag;
468+
auto cursor = input.cursor();
469+
CHECK_FALSE(cursor.lookbehind().has_value());
470+
CHECK('h' == cursor.lookahead());
471+
CHECK('h' == cursor.lookahead());
472+
CHECK('h' == cursor.next());
473+
CHECK_FALSE(cursor.lookbehind().has_value());
474+
CHECK('e' == cursor.lookahead());
475+
CHECK('e' == cursor.next());
476+
CHECK('h' == cursor.lookbehind());
477+
CHECK('l' == cursor.next());
478+
CHECK('l' == cursor.next());
479+
CHECK('o' == cursor.next());
480+
CHECK_FALSE(cursor.next().has_value());
481+
CHECK('o' == cursor.lookbehind());
482+
CHECK_FALSE(cursor.next().has_value());
483+
CHECK('o' == cursor.lookbehind());
484+
CHECK_FALSE(cursor.lookahead().has_value());
485+
}
486+
}

src/base/separated_string.cc

Lines changed: 109 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,97 @@ is_suffix_char(char ch)
4040
return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || ch == '%';
4141
}
4242

43+
std::optional<char>
44+
separated_string::detect_separator(const string_fragment& str)
45+
{
46+
struct sep_state {
47+
char ss_char;
48+
size_t ss_count{0};
49+
};
50+
51+
size_t comma = 0;
52+
size_t tab = 0;
53+
size_t semi = 0;
54+
size_t vbar = 0;
55+
size_t space = 0;
56+
57+
auto in_quote = false;
58+
auto has_leading_spaces = false;
59+
60+
auto cur = str.cursor();
61+
while (cur.lookahead() == ' ') {
62+
(void) cur.next();
63+
has_leading_spaces = true;
64+
}
65+
while (true) {
66+
auto ch = cur.next();
67+
if (!ch) {
68+
break;
69+
}
70+
71+
auto behind = cur.lookbehind();
72+
auto ahead = cur.lookahead();
73+
if (in_quote) {
74+
if (ch == '"') {
75+
in_quote = false;
76+
}
77+
} else if (ch == '"') {
78+
in_quote = true;
79+
} else if (ch == '\t') {
80+
if (behind && behind != '\t') {
81+
tab += 1;
82+
}
83+
} else if (ch == ',') {
84+
if (behind && ahead && behind != ' ' && ahead != ' ') {
85+
comma += 1;
86+
}
87+
} else if (ch == ';') {
88+
if (behind && ahead && behind != ' ' && ahead != ' ') {
89+
semi += 1;
90+
}
91+
} else if (ch == '|') {
92+
if (behind && ahead && behind != ' ' && ahead != ' ') {
93+
vbar += 1;
94+
}
95+
} else if (ch == ' ') {
96+
if (behind && ahead && behind != ' ' && ahead == ' ') {
97+
space += 1;
98+
}
99+
}
100+
}
101+
102+
if (has_leading_spaces) {
103+
if (space > 0) {
104+
return ' ';
105+
}
106+
return std::nullopt;
107+
}
108+
109+
if (in_quote) {
110+
return std::nullopt;
111+
}
112+
113+
std::array<sep_state, 5> states = {{
114+
{',', comma},
115+
{'\t', tab},
116+
{';', semi},
117+
{'|', vbar},
118+
{' ', space},
119+
}};
120+
121+
std::sort(states.begin(),
122+
states.end(),
123+
[](const sep_state& a, const sep_state& b) {
124+
return a.ss_count > b.ss_count;
125+
});
126+
127+
if (states[0].ss_count == 0 || states[0].ss_count == states[1].ss_count) {
128+
return std::nullopt;
129+
}
130+
131+
return states[0].ss_char;
132+
}
133+
43134
std::string
44135
separated_string::unescape_quoted(string_fragment sf)
45136
{
@@ -108,7 +199,22 @@ separated_string::iterator::update()
108199
const char* p = this->i_pos;
109200
while (p < data_end) {
110201
if (!in_quotes && *p == sep_ch) {
111-
break;
202+
if (sep_ch == ' ' && p + 1 < data_end) {
203+
if ((!this->i_parent.ss_expected_count
204+
|| this->i_index + 1
205+
< this->i_parent.ss_expected_count.value())
206+
&& p + 1 < data_end && *(p + 1) == ' ')
207+
{
208+
while (p + 1 < data_end && *(p + 1) == ' ') {
209+
p += 1;
210+
}
211+
break;
212+
}
213+
state = TRAIL_WS;
214+
p += 1;
215+
} else {
216+
break;
217+
}
112218
}
113219
const char c = *p;
114220

@@ -206,7 +312,8 @@ separated_string::iterator::update()
206312
// end of input, convention says one more empty cell should be
207313
// emitted. Defer it to the next update() call via
208314
// i_pending_ghost so the user still sees the current cell first.
209-
if (p < data_end && p + 1 == data_end) {
315+
if (p < data_end && p + 1 == data_end && this->i_parent.ss_separator != ' ')
316+
{
210317
this->i_pending_ghost = true;
211318
}
212319
this->i_next_pos = (p < data_end) ? p + 1 : data_end;

src/base/separated_string.hh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,12 @@ struct separated_string {
5555
other, // anything else — text, identifiers, JSON blobs, etc.
5656
};
5757

58+
static std::optional<char> detect_separator(const string_fragment& str);
59+
5860
const char* ss_str;
5961
size_t ss_len;
6062
char ss_separator{','};
63+
std::optional<size_t> ss_expected_count;
6164

6265
separated_string(const char* str, size_t len) : ss_str(str), ss_len(len) {}
6366

@@ -119,9 +122,9 @@ struct separated_string {
119122

120123
iterator& operator++()
121124
{
125+
this->i_index += 1;
122126
this->i_pos = this->i_next_pos;
123127
this->update();
124-
this->i_index += 1;
125128

126129
return *this;
127130
}

0 commit comments

Comments
 (0)