Verb Edge

Verb Edge is the Edge type in a GEUL stream that represents predication and actions. It classifies 13,767 WordNet verbs into 10 Primitives and 68 Sub-primitives, then generates a 16-bit codebook via Sub-primitive-level Huffman coding.

Sub-documents

Document	Description
Semantic Role	16 Semantic Roles (4-bit encoding)
Qualifier	14 qualifiers including evidentiality, mood, tense, aspect

Verb Hierarchy

10 Primitive (top-level semantic categories)
 ├── BE          ├── PERCEIVE    ├── FEEL
 ├── THINK       ├── CHANGE      ├── CAUSE
 ├── MOVE        ├── COMMUNICATE ├── TRANSFER
 └── SOCIAL
  → 68 Sub-primitive (intermediate classification)
    → 559 Root Verb (root verbs)
      → 13,767 Leaf Verb (all WordNet verbs)

Primitives (major categories) serve only as conceptual groupings with no bit allocation
68 Sub-primitives receive frequency-based variable-length codes
Higher-frequency verb groups get shorter codes (4 to 8 bits)

Verb Edge Packet Types

All three packet types – Tiny, Short, and Full – share the same 16-bit verb body in their last word.

	Tiny	Short	Full
Words	2 (32bit)	3 (48bit)	5 (80bit)
Participants	16 patterns	512 patterns	19-bit flags
Qualifiers	7 patterns	3,640 patterns	27 bits
Verb body	16bit	16bit	16bit
Expected ratio	90%	7%	3%

Average packet size: 0.9x2 + 0.07x3 + 0.03x5 = 2.16 words

Tiny Verb Edge (2 words)

1st WORD:  [Prefix 5bit] [Target×Pattern 11bit]
2nd WORD:  [Verb body 16bit]

Target x Pattern: 18 Target x 113 patterns = 2,034 combinations
16 participant patterns x 7 qualifier patterns = 112 + 1 reserved = 113
Coverage ~90%

Short Verb Edge (3 words)

1st WORD:  [Prefix 6bit] [Type 1bit=0] [ParticipantPattern 9bit]
2nd WORD:  [Target×QualifierPattern 16bit]
3rd WORD:  [Verb body 16bit]

Full Verb Edge (5 words)

1st WORD:  [Prefix 6bit] [Type 1bit=1] [TargetParticipant 5bit] [ParticipantFlags 4bit]
2nd+3rd:   [ParticipantFlags 15bit] [Qualifier 17bit]
4th WORD:  [Qualifier 10bit] [Reserved 6bit]
5th WORD:  [Verb body 16bit]

16-bit Verb Body

┌─────────────────────────┬────────────────────────────┐
│   sub_primitive code    │     DFS index within tree  │
│   (4-8 bits, Huffman)   │     (8-12 bits)            │
└─────────────────────────┴────────────────────────────┘

sub_primitive code: 4-8 bits variable (Huffman code)
DFS index: Identifies individual verbs within the Sub-primitive

Code Length Distribution

Code length	Count	Total verbs	Ratio
4 bits	4	6,388	46.4%
5 bits	4	2,479	18.0%
6 bits	8	2,321	16.9%
7 bits	16	1,786	13.0%
8 bits	36	813	5.9%

DFS Index Bit Calculation

Sub-primitive verb count	Bits needed
1~256	8 bits
257~512	9 bits
513~1024	10 bits
1025~2048	11 bits
2049~4096	12 bits

Example: CHANGE-TRANSFORM = 0000 (4 bits) + 3,063 verbs (12 bits) = 16 bits.

Average Code Length

Average = Sum(code_length x verb_count) / total_verbs ≈ 5.14 bits

Method	Average bits
Fixed 7-bit (68 entries)	7.00
Huffman coding	5.14
Savings	1.86 bits (27%)

Primitive Major Categories (10)

Primitive	Meaning	Sub-primitive count	Verb count
BE	State/existence	8	899
PERCEIVE	Perception/cognition	4	218
FEEL	Emotion	6	204
THINK	Thought	6	769
CHANGE	Change	8	3,358
CAUSE	Causation/action	14	3,739
MOVE	Movement	6	2,182
COMMUNICATE	Communication	6	586
TRANSFER	Transfer	4	530
SOCIAL	Social action	6	387

Highest-frequency Sub-primitives (4-bit codes)

Sub-primitive	Code	Verb count	Ratio	Examples
CHANGE-TRANSFORM	`0000`	3,063	22.2%	“change”, “become”
CAUSE-USE	`0001`	1,358	9.9%	“use”, “employ”
MOVE-DISPLACE	`0010`	1,025	7.4%	“move”, “shift”
MOVE-GO	`0011`	942	6.8%	“go”, “travel”

The top 4 Sub-primitives account for 46.4% of all verbs.

Design Philosophy

Why Huffman Coding

CHANGE-TRANSFORM (22.2%) is overwhelmingly high-frequency
27% reduction in average bit count compared to fixed-bit allocation
Top 4 Sub-primitives account for 46.4% of the total

Why Remove Primitive Bits

Before: Primitive 3 bits + Sub_primitive 4 bits = 7 bits fixed
After: Sub_primitive direct coding = 4-8 bits variable
Up to 4-bit savings for high-frequency verbs

Maintaining Semantic Grouping

Primitive classification is retained for human readability and as a semantic clustering hint during LLM training.