Entity Node is a fixed-length 4-word (64-bit) packet in the GEUL stream that identifies entities (people, places, objects, organizations, concepts, etc.).

SIDX Essence

PropertyDescription
Non-uniqueMultiple entities can share the same SIDX
Multi-SIDXA single entity can have multiple SIDXs (by time period/role)
Bits = MeaningBit positions themselves represent attributes
Abstract/Concrete continuumDistinguished by Mode and Attributes fill level

Examples:

  • Trump (real estate businessman) → SIDX_A
  • Trump (president) → SIDX_B (different SIDX)
  • “Human + Male + Korea” → abstract “Korean man”
  • “Human + Male + Korea + 1946 + Business + …” → nearly a specific individual

Design Principles

Abandoning embedded Q-ID:

  • Invest all bits in pure semantic alignment
  • Maximize WMS SIMD filtering performance
  • Q-IDs are linked separately via Triple Edge: (Entity_SIDX, P-externalID, "Q12345")

No Serial bits needed:

  • WMS queries use a two-stage process: SIMD range narrowing → detail check within range
  • Serial numbers are meaningless digits that contribute nothing to SIMD
  • Investing those bits in semantic alignment narrows results further in stage 1

Bit Layout (4 words = 64 bits)

1st WORD (16 bits)
┌─────────┬──────┬────────────┐
│ Prefix  │ Mode │ EntityType │
│  7bit   │ 3bit │   6bit     │
└─────────┴──────┴────────────┘

2nd WORD (16 bits)
┌─────────────────────────────┐
│    Attributes upper 16 bits │
└─────────────────────────────┘

3rd WORD (16 bits)
┌─────────────────────────────┐
│   Attributes middle 16 bits │
└─────────────────────────────┘

4th WORD (16 bits)
┌─────────────────────────────┐
│    Attributes lower 16 bits │
└─────────────────────────────┘
FieldBitsSizeDescription
Prefix1-770001001 (Entity Node)
Mode8-1038 quantification/number modes
EntityType11-16664 top-level types
Attributes17-6448Type-specific variable schema

Mode (3 bits)

Mode unifies quantification and number of an entity into 3 bits.

CodeBinaryMeaningExample
0000Registered entityYi Sun-sin, Samsung, BTS
1001Definite singular“that person”
2010Definite few“those few”
3011Definite plural“those people”
4100Universal“every ~”
5101Existential“some ~”
6110Indefinite“any ~”
7111Generic“~ in general”

Registered Entity (Mode=0)

  • Entities mapped to external IDs such as Wikidata Q-IDs or WordNet Synsets
  • Q-IDs are linked via triples: (Entity_SIDX, P-externalID, "Q12345")
  • Unrelated to grammatical number: Samsung is “one entity” but awkward to call singular; BTS is a group but a single entity

Pronoun/Abstract (Mode=1~7)

  • Semantic range is specified by EntityType + Attributes
  • More bits filled → more specific
  • Example: Human(Type) + Male(Attr) + Korea(Attr) = “Korean man”

EntityType (6 bits = 64 types)

64 top-level types are assigned based on Wikidata P31 (instance of) frequency statistics. Detailed subclassification is handled by subtype bits within Attributes.

RangeCategoryType countRepresentative types
0x00-0x07Living/Person8Human, Taxon, Gene, Protein
0x08-0x0BChemistry/Material4Chemical, Compound, Mineral, Drug
0x0C-0x13Celestial8Star, Galaxy, Asteroid, Planet
0x14-0x1BTerrain/Nature8Mountain, River, Lake, Island
0x1C-0x23Place/Admin8Settlement, Village, Street, Park
0x24-0x2BArchitecture8Building, Church, School, Bridge
0x2C-0x2FOrganization4Organization, Business, PoliticalParty
0x30-0x3BCreative work12Painting, Document, Film, Album
0x3C-0x3FEvent/Other4SportsSeason, Event, Election, Other

Code Table (all 64)

CodeTypeQ-IDEntity count
0x00HumanQ512.5M
0x01TaxonQ165213.8M
0x02GeneQ71871.2M
0x03ProteinQ80541.0M
0x04CellLineQ21014462154K
0x05FamilyNameQ101352662K
0x06GivenNameQ202444128K
0x07FictionalCharacterQ1563261798K
0x08ChemicalQ1131451711.3M
0x09CompoundQ111731.1M
0x0AMineralQ794662K
0x0BDrugQ1214045K
0x0CStarQ5233.6M
0x0DGalaxyQ3182.1M
0x0EAsteroidQ3863249K
0x0FQuasarQ83373178K
0x10PlanetQ63415K
0x11NebulaQ120578K
0x12StarClusterQ1688455K
0x13MoonQ25373K
0x14MountainQ8502518K
0x15HillQ54050321K
0x16RiverQ4022427K
0x17LakeQ23397292K
0x18StreamQ47521194K
0x19IslandQ23442153K
0x1ABayQ3959425K
0x1BCaveQ3550920K
0x1CSettlementQ486972580K
0x1DVillageQ532245K
0x1EHamletQ5084148K
0x1FStreetQ79007711K
0x20CemeteryQ39614298K
0x21AdminRegionQ15284100K
0x22ParkQ2269845K
0x23ProtectedAreaQ47397235K
0x24BuildingQ41176292K
0x25ChurchQ16970286K
0x26SchoolQ9842242K
0x27HouseQ3947235K
0x28StructureQ811979216K
0x29SportsVenueQ1076486145K
0x2ACastleQ2341342K
0x2BBridgeQ1228038K
0x2COrganizationQ43229531K
0x2DBusinessQ4830453242K
0x2EPoliticalPartyQ727835K
0x2FSportsTeamQ84701795K
0x30PaintingQ33052131.1M
0x31DocumentQ4984845M
0x32LiteraryWorkQ7725634395K
0x33FilmQ11424335K
0x34AlbumQ482994303K
0x35MusicalWorkQ105543609195K
0x36TVEpisodeQ21191270177K
0x37VideoGameQ7889172K
0x38TVSeriesQ539842685K
0x39PatentQ43305660289K
0x3ASoftwareQ739713K
0x3BWebsiteQ3512712K
0x3CSportsSeasonQ27020041183K
0x3DEventQ165668210K
0x3EElectionQ4023111K
0x3FOther-For extension

Attributes (48 bits)

A type-specific variable schema interpreted differently for each EntityType. More bits are allocated to high-frequency attributes, and it is directly used for WMS SIMD filtering.

Human (0x00) Attributes

┌──────────┬────────┬────────┬──────┬────────┬────────┬─────────┬──────────┬────────────┬──────────┐
│ Subtype  │ Occup. │ Nation │ Era  │ Decade │ Gender │ Notab.  │ Language │ BirthArea  │  Field   │
│  5bit    │  6bit  │  8bit  │ 4bit │  4bit  │  2bit  │  3bit   │  6bit    │   6bit     │   4bit   │
└──────────┴────────┴────────┴──────┴────────┴────────┴─────────┴──────────┴────────────┴──────────┘
offset:  0        5       11      19     23      27      29        32         38          44

Star (0x0C) Attributes

┌────────────┬────────────┬──────────┬──────────┬────────┬────────┬──────────┬──────────┬────────┬────────┐
│ Constell.  │ SpectType  │ LumClass │ AppMag   │  RA    │  Dec   │  Flags   │ RadVel   │Redshift│Parallax│
│   7bit     │    4bit    │   3bit   │  4bit    │  4bit  │  4bit  │   6bit   │   5bit   │  5bit  │  4bit  │
└────────────┴────────────┴──────────┴──────────┴────────┴────────┴──────────┴──────────┴────────┴────────┘

Flag bit definitions:

  • bit0: IR (infrared source)
  • bit1: Radio (radio source)
  • bit2: X-ray (X-ray source)
  • bit3: Binary (binary star)
  • bit4: Variable (variable star)
  • bit5: HighPM (high proper motion)

Operations

Entity Creation

def make_entity(
    mode: int,           # 3 bits
    entity_type: int,    # 6 bits
    attrs: int           # 48 bits
) -> bytes:
    PREFIX = 0b0001001   # 7 bits (Entity Node)

    word1 = (PREFIX << 9) | (mode << 6) | entity_type
    word2 = (attrs >> 32) & 0xFFFF
    word3 = (attrs >> 16) & 0xFFFF
    word4 = attrs & 0xFFFF

    return (
        word1.to_bytes(2, 'big') +
        word2.to_bytes(2, 'big') +
        word3.to_bytes(2, 'big') +
        word4.to_bytes(2, 'big')
    )

Entity Parsing

def parse_entity(data: bytes) -> dict:
    word1 = int.from_bytes(data[0:2], 'big')
    word2 = int.from_bytes(data[2:4], 'big')
    word3 = int.from_bytes(data[4:6], 'big')
    word4 = int.from_bytes(data[6:8], 'big')

    prefix = (word1 >> 9) & 0x7F
    mode = (word1 >> 6) & 0x7
    entity_type = word1 & 0x3F
    attrs = (word2 << 32) | (word3 << 16) | word4

    return {
        'prefix': prefix,
        'mode': mode,
        'entity_type': entity_type,
        'attrs': attrs
    }

Examples

Registered Entity: Yi Sun-sin

# Yi Sun-sin (Q211789)
yi_sun_sin = make_entity(
    mode=0,              # Registered entity
    entity_type=0x00,    # Human
    attrs=(
        (0x06 << 43) |   # Subtype: Military
        (0x01 << 37) |   # Occupation: Admiral
        (0x52 << 29) |   # Nationality: Korea
        (0x5 << 25) |    # Era: Early Modern
        (0x0 << 21) |    # Decade: 1540s
        (0x01 << 19) |   # Gender: Male
        (0x7 << 16)      # Notability: 1000+
    )
)
# Q-ID link: Triple(yi_sun_sin_SIDX, P-externalID, "Q211789")

Abstract: “every Korean man”

all_korean_men = make_entity(
    mode=4,              # Universal (every)
    entity_type=0x00,    # Human
    attrs=(
        (0x52 << 29) |   # Nationality: Korea
        (0x01 << 19)     # Gender: Male
    )
)

Subtype Mapping

Many Wikidata types are subtypes of the 64 EntityTypes. The encoder inspects the P31 value and routes to the appropriate parent type.

Subtype (P31)Parent typeEntity count
Q13442814 (scholarly article)Document (0x31)45.2M
Q67206691 (infrared source)Star (0x0C)2.6M
Q13100073 (village of China)Village (0x1D)592K

Coverage

ItemValue
Total Wikidata entities117,419,925
Wikimedia internal (excluded)8,565,353 (7.3%)
SIDX target108,854,572 (92.7%)
Direct coverage by 64 types36,295,074 (33.3%)
Subtype absorption71,842,429 (66.0%)
Other fallback717,069 (0.7%)
Final coverage100%
Collision rate< 0.01%

Q-ID Linking

Entity Node does not embed Q-IDs internally. Instead, they are linked separately via Triple Edge.

Subject:  Entity_SIDX (64 bits)
Property: P-externalID (e.g., P-Wikidata)
Object:   "Q12345" (string or integer)