carverauto/serviceradar

Fork 0

feat: integrate Ash framework #706

New issue

Closed

opened 2026-03-28 04:27:39 +00:00 by mfreeman451 · 1 comment

mfreeman451 commented

2026-03-28 04:27:39 +00:00

Owner

Imported from GitHub.

Original GitHub issue: #2205
Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2205
Original created: 2025-12-23T21:23:44Z

Is your feature request related to a problem?

we're considering rewriting our golang based poller that is part of our microservices service oriented architecture. ServiceRadar is a network management and observability platform, there is a "core" engine that used to also run our API, written in go, that does device identity and reconciliation, used to provide API services, and receives updates from pollers over grpc. Pollers talk to agents and agents actually do grpc calls to additional checkers or services, so generally everything goes through the poller then to the core then to the database. collectors like netflow, bgp/bmp, snmp traps, and syslog, write messages to a nats jetstream queue and get processed by consumers and written to the database. we recently rewrote our web front end and are porting over the golang core API, to elixir/phoenix live view. we currently also use a serviceradar-datasvc, which is just a grpc API and wrapper around nats jetstream, and provides message broker, KV, and object store. Agents also have the ability to do TCP port sweep/scans and ICMP sweeps, that is really its only built in functionality. We also have a serviceradar-sync service that talks to integrations like armis or netbox (ipam) for device inventory and also creates config files in the datasvc which get read by the sweep service so it knows what devices to ping. We recently introduced the oban scheduler in the serviceradar-web-ng elixir app and i'm sort of wondering if that might be a good replacement/addition to the poller.. so right now to schedule jobs you create a config for serviceradar-poller, you can do it on bare metal/config file/through the KV, and it will poll the services (through the agent you configure it to basically proxy requests through), and collect healthchecks + results of service check. i think it might be better if we used the oban scheduler stuff to do all of this and maybe the core communicates down to the pollers and all the jobs are scheduled through oban? i want to be able to scale up to lots of different pollers and agents and have jobs sort of coordinated, maybe round-robin through multiple agents? we use teh concept of partitions in case you are putting agents/pollers on overlapping IP space, we know how to deal with these in the database. we also need to support multi tenancy, i think elixir contexts/scopes are built for all of that? heres some of our documentation, its a bit outdated now.. i think we might be due for a refresh here, only thing i dont know is how to basically make a network of microservices (beam instances) that are all talking to each other as one big brain? it would be nice to be able to like open the elixir/erts debugger or whatever, and look at every actor in ERTS

Describe the solution you'd like

Rewriting the Go-based poller into Elixir while leveraging the **Oban** scheduler is a natural progression for ServiceRadar, especially as you move the Core API to Phoenix. Elixir's concurrency model (the BEAM) is practically purpose-built for the "one big brain" network observability pattern you are describing.

Here is a strategic look at how you can architect this rewrite to achieve a distributed, multi-tenant, and coordinated polling brain.

### 1. Oban as the "Orchestrator" vs. "Executor"
While Oban is fantastic, you should distinguish between **scheduling** the job and **executing** the network I/O. 

*   **The Orchestrator (Core):** The Phoenix app uses **Oban** to manage the state of the jobs. Since you already use TimescaleDB/Postgres, Oban is perfect here. You can use **Oban Queues** to segment work (e.g., a `high_priority_polling` queue and a `background_discovery` queue).
*   **The Executors (Distributed Pollers):** Instead of the pollers just being gRPC servers waiting for work, they become BEAM nodes in a cluster. 
*   **Coordination:** Use **Oban Pro's** "Smart Engine" or simple queue partitioning to ensure that only certain nodes pick up certain jobs. Or, more simply, use Oban to insert a job, and have the poller nodes subscribe to that specific queue.

### 2. Making a "Big Brain" with Distributed Elixir
To make multiple BEAM instances act as one, you don't necessarily need gRPC between Elixir nodes. You can use **Distributed Erlang**.

*   **libcluster:** Use this library to automatically form a cluster between your Core and your Elixir Pollers (via DNS, Kubernetes labels, or Gossip).
*   **Horde:** This is the "secret sauce" for your "big brain." 
    *   **Horde.DynamicSupervisor:** Allows you to start a polling process on *any* node in the cluster. If a poller node dies, Horde automatically restarts that polling process on a different healthy node.
    *   **Horde.Registry:** A distributed, local-first registry so any node can find the process responsible for "Device A" without knowing which physical server it’s on.
*   **Partitioning:** You can create one `Horde.Supervisor` per **Partition**. This aligns with your database partitions for overlapping IP spaces, ensuring that the "Partition A" brain only talks to "Partition A" agents.

### 3. Multi-Tenancy and Contexts
Elixir's **Contexts** (e.g., `ServiceRadar.Inventory`, `ServiceRadar.Monitoring`) are perfect for this.
*   **Scopes:** In your `Inventory` context, every query should be scoped by `tenant_id` and `partition_id`.
*   **Ecto Multi-tenancy:** If you use Postgres schemas for multi-tenancy, Ecto's `prefix` support is first-class. If it's column-based, you can use a base query that always applies `where tenant_id = ^current_tenant_id`.

### 4. Coordinated Job Distribution (Round-Robin)
If you want to round-robin jobs through multiple agents:
1.  **Oban** picks up a job for a subnet.
2.  The job code queries the **Service Registry** (the new tables proposed in your docs) to find available agents for that partition.
3.  The poller uses a **Partitioned Worker Pool** or a simple `Enum.at(agents, Integer.mod(job_id, length(agents)))` to pick an agent.
4.  The communication down to the **Go Agents** can remain gRPC (via `grpc-elixir`), as those agents are lightweight and likely need to stay in Go for low-level socket/packet access (ICMP/TCP sweeps).

### 5. Looking into the Brain (Observability)
You mentioned wanting to use a debugger to see every actor. This is where Elixir shines:
*   **Phoenix LiveDashboard:** Since you are using Phoenix, you get a web-based "Observer." You can see memory usage, process counts, and NATS connection status in real-time.
*   **Observer:** On your dev machine, you can run `:observer.start()` and connect it to a remote production node to see the supervision tree of the entire cluster visually.
*   **Visualizing Actors:** If you implement your polling jobs as **GenServers** managed by Horde, you can see every individual "device poller" as a process in the tree. You can even "kill" one manually from the UI and watch the BEAM immediately restart it.

### Proposed Revised Architecture
```mermaid
graph TD
    subgraph "Core (Phoenix + Oban)"
        WEB[Web UI / LiveView]
        OBAN[Oban Scheduler]
        REG[Service Registry]
    end

    subgraph "BEAM Cluster (Distributed Erlang)"
        P1[Elixir Poller Node 1]
        P2[Elixir Poller Node 2]
    end

    subgraph "Edge Site (Partition A)"
        A1[Go Agent 1]
        A2[Go Agent 2]
    end

    OBAN -- Assigns Job --> P1
    P1 -- Distributed Registry --> P2
    P1 -- gRPC --> A1
    P1 -- gRPC --> A2
    A1 -- Scans/Polls --> Devices[Network Devices]
    
    %% Collectors Flow
    Collectors[Netflow/Syslog] -- NATS --> DataSvc[DataSvc/NATS]
    DataSvc -- Stream --> P2

Recommendation for the Rewrite

Port the Core Registry first: Implement the pollers_registry, agents_registry, and checkers_registry tables in CNPG as outlined in your service-registry-design.md.
Poller as a BEAM Node: Instead of a standalone Go binary, make the Poller a small Elixir application.
Oban for Scheduling: Use Oban to handle the persistence of "When should we poll?"
Horde for Distribution: Use Horde to decide "Which node is actually running the check right now?"
Keep Agents in Go: Your Go Agents are already efficient at TCP/ICMP sweeps and proxying checkers. No need to rewrite those yet—just have the Elixir Poller talk to them over gRPC.

This approach gives you the "One Big Brain" feel (cluster-wide process management) with the industrial-strength scheduling of Oban and the low-level performance of your existing Go agents.

Using Ash Framework for ServiceRadar would be a significant architectural "force multiplier," especially given your requirements for multi-tenancy, identity reconciliation, and a "distributed brain."

Ash is a declarative, resource-based framework for Elixir. Instead of writing boilerplate CRUD and complex scoping logic, you define "Resources" (Devices, Pollers, Agents) and the rules that govern them.

Here is how Ash would specifically solve your ServiceRadar rewrite challenges:

1. First-Class Multi-Tenancy (The "Partition" Problem)

Your docs mention the need to support multi-tenancy and overlapping IP spaces via partitions. In a standard Elixir app, you'd have to remember to add where tenant_id = ... to every query.

Ash Solution: Ash has native multi-tenancy built-in. You define a resource as multitenant. When the Poller or Web UI requests data, Ash automatically enforces the tenant/partition scope at the engine level. It is nearly impossible to accidentally leak data between tenants.

2. AshOban: Declarative Scheduling

Since you are already considering Oban, AshOban is a game-changer.

The Pattern: You define an "Action" on your Monitoring.Check resource (e.g., :run_check).
The Integration: You tell AshOban to trigger that action on a schedule or when a certain event happens. Ash handles the transaction, the retries, and the state changes of the check (e.g., pending -> active -> failed).
Benefit: This solves your "coordinated jobs" requirement by letting Oban handle the queueing while Ash handles the business logic and database state.

3. Identity Reconciliation (The "Canonical ID" Problem)

Your Go core currently does complex "device identity and reconciliation."

Ash Solution: You can use Ash Identities. You can define that a Device is unique based on a combination of (mac, partition_id) or (armis_id, tenant_id).
Calculations: Ash allows you to define "Calculations." You could have a calculation called canonical_status that runs the reconciliation logic you currently have in Go, but exposes it as a simple field that can be sorted and filtered in your Phoenix LiveView.

4. Aggregating the "Distributed Brain"

If you want multiple Elixir nodes to act as one, Ash's Actions and Relationships help model the cluster:

You could have a resource called ServiceRadar.Topology.Agent.
A relationship could be defined: has_many :checkers, ServiceRadar.Topology.Checker.
When you query an Agent on Poller Node A, Ash can be configured to fetch associated data or trigger actions across the BEAM cluster using its internal dispatching.

5. AshGraphql & AshJsonApi (The gRPC/API Bridge)

Your current system relies heavily on gRPC. While Elixir handles gRPC well, Ash can automatically generate a GraphQL or JSON:API layer from your resources.

This makes your new Phoenix Web UI extremely fast to build because the "Registry" logic you defined in Ash is automatically available to the frontend.

6. The "Outdated Docs" Problem

You mentioned your documentation is a bit outdated.

Ash Solution: Ash resources are self-documenting. Because you define the attributes, relationships, and actions in a DSL (Domain Specific Language), you can generate high-quality documentation (and even Mermaid diagrams) directly from the code. The "code is the spec."

Comparison: Standard Elixir vs. Elixir + Ash

Feature	Standard Elixir/Phoenix	Elixir + Ash Framework
Multi-tenancy	Manual `where` clauses / Ecto prefixes	Native, enforced at the resource level
Job Scheduling	Manual Oban worker modules	Declarative `AshOban` actions
Identity Logic	Hand-written reconciliation functions	Native `Identities` and `Changesets`
API Layer	Manual Controllers/Resolvers	Auto-generated GraphQL/JSON:API/Admin UI
Registry Logic	Proposed in `service-registry-design.md`	Defined as Ash Resources in days, not weeks

How to start the rewrite with Ash:

Define your Domain: Create an Inventory domain for Devices/Interfaces and a Monitoring domain for Pollers/Agents/Checkers.
Model the Registry: Take your service-registry-design.md and turn those SQL tables into Ash Resources.
Plug in Oban: Use AshOban to handle the recurring "Check Heartbeat" and "Run Sweep" tasks.
Use AshAdmin: Ash comes with a "free" Admin UI (AshAdmin). You can immediately see and edit all your pollers/agents/tenants in a GUI while you are still building the main Phoenix LiveView frontend.

Verdict: If your goal is to build a "big brain" that handles complex identities, multi-tenancy, and high-scale polling coordination, Ash is the correct tool. It will reduce the amount of Go code you are porting by roughly 60-70% because the framework handles the "plumbing" of a Service Oriented Architecture for you.


## what about ocsf

Yes, absolutely. In fact, Ash is arguably the best tool for this because it allows you to separate your business logic (how ServiceRadar behaves) from your data layout (the OCSF schema).

You don't have to choose between "The Ash Way" and "The OCSF Way." Ash's AshPostgres.DataLayer is designed to map Elixir resources to any existing database schema, no matter how rigid or standardized it is.

Here is how you implement OCSF within Ash for ServiceRadar:

1. Mapping Elixir Attributes to OCSF Columns

OCSF uses specific field names (e.g., type_id, uid, device_id). If your database already has these columns, you use the source option in Ash to point your attributes to the correct OCSF field names.

defmodule ServiceRadar.Inventory.Device do
  use Ash.Resource,
    data_layer: AshPostgres.DataLayer

  postgres do
    table "ocsf_device_inventory" # Your existing OCSF table
    repo ServiceRadar.Repo
  end

  attributes do
    uuid_primary_key :id, source: :uid # Map OCSF 'uid' to Elixir ':id'
    
    attribute :hostname, :string, source: :hostname
    attribute :ip, :string, source: :ip
    
    # OCSF specific integer IDs
    attribute :type_id, :integer, source: :type_id
    attribute :risk_level_id, :integer, source: :risk_level_id
  end
end

2. Handling Nested OCSF Objects (JSONB)

OCSF relies heavily on nested structures like os, hw_info, and network_interfaces. Ash handles these perfectly using Embedded Resources. This allows you to have full validation and typing on the OCSF JSON blobs.

# Define the nested OCSF OS structure
defmodule ServiceRadar.OCSF.OSInfo do
  use Ash.Resource, data_layer: :embedded

  attributes do
    attribute :name, :string
    attribute :version, :string
    attribute :type, :string # e.g. Linux, Windows
  end
end

# In your main Device resource:
attributes do
  attribute :os, ServiceRadar.OCSF.OSInfo, source: :os # Maps to JSONB column
end

3. SRQL vs. Ash Query

You currently have a custom SRQL (ServiceRadar Query Language) engine. Ash effectively provides a standardized version of this.

The Benefit: You can map SRQL queries directly to Ash Filters.
If a user types in:devices status:online, you can translate that into an Ash query: Device |> Ash.Query.filter(status == :online).
Because Ash understands the OCSF mapping, it will generate the correct SQL: SELECT * FROM ocsf_device_inventory WHERE status_id = 1.

4. TimescaleDB Integration (Hypertables)

Since your OCSF data (especially flows and activity) lives in TimescaleDB hypertables, Ash handles this transparently.

Ash doesn't care that a table is a hypertable; it treats it as a standard Postgres table.
You can define Ash Calculations to handle Timescale-specific functions like time_bucket.

calculations do
  calculate :five_min_bucket, :utc_datetime, expr(
    fragment("time_bucket('5 minutes', ?)", timestamp)
  )
end

5. Multi-Tenancy & Partitions

OCSF doesn't have a native "Partition" concept for overlapping IP spaces, but ServiceRadar does. You can use Ash's attribute-based multi-tenancy to keep OCSF clean while enforcing ServiceRadar rules.

multitenancy do
  strategy :attribute
  attribute :tenant_id
  # You can also scope by partition_id here
end

6. The "One Big Brain" Debugger

Because Ash resources are highly structured, when you open the Elixir Observer or a debugger, you won't just see raw maps or tuples. You will see formal ServiceRadar.Inventory.Device records.

If you use AshGraphql, you can also use Apollo Studio or GraphQL Voyager to visually see your entire OCSF-aligned "brain" and how every poller, agent, and device relates to one another.

Summary: Why Ash + OCSF is a win:

Strict Typing: OCSF is strict; Ash is strict. They are a natural fit.
No Data Migration: You keep your current TimescaleDB tables exactly as they are. Ash just sits on top of them as a "Logic Layer."
API Speed: You get a GraphQL/JSON:API that speaks OCSF out of the box.
Identity: Ash Changesets provide the perfect place to run your Go-style "Identity Reconciliation" logic before data hits the database.

Does this align with how you were envisioning the OCSF integration, or are you worried about the complexity of mapping the more obscure OCSF classes?


**Describe alternatives you've considered**

A clear and concise description of any alternative solutions or features you've considered.

**Additional context**

Add any other context or screenshots about the feature request here.

Imported from GitHub. Original GitHub issue: #2205 Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2205 Original created: 2025-12-23T21:23:44Z --- **Is your feature request related to a problem?** ``` we're considering rewriting our golang based poller that is part of our microservices service oriented architecture. ServiceRadar is a network management and observability platform, there is a "core" engine that used to also run our API, written in go, that does device identity and reconciliation, used to provide API services, and receives updates from pollers over grpc. Pollers talk to agents and agents actually do grpc calls to additional checkers or services, so generally everything goes through the poller then to the core then to the database. collectors like netflow, bgp/bmp, snmp traps, and syslog, write messages to a nats jetstream queue and get processed by consumers and written to the database. we recently rewrote our web front end and are porting over the golang core API, to elixir/phoenix live view. we currently also use a serviceradar-datasvc, which is just a grpc API and wrapper around nats jetstream, and provides message broker, KV, and object store. Agents also have the ability to do TCP port sweep/scans and ICMP sweeps, that is really its only built in functionality. We also have a serviceradar-sync service that talks to integrations like armis or netbox (ipam) for device inventory and also creates config files in the datasvc which get read by the sweep service so it knows what devices to ping. We recently introduced the oban scheduler in the serviceradar-web-ng elixir app and i'm sort of wondering if that might be a good replacement/addition to the poller.. so right now to schedule jobs you create a config for serviceradar-poller, you can do it on bare metal/config file/through the KV, and it will poll the services (through the agent you configure it to basically proxy requests through), and collect healthchecks + results of service check. i think it might be better if we used the oban scheduler stuff to do all of this and maybe the core communicates down to the pollers and all the jobs are scheduled through oban? i want to be able to scale up to lots of different pollers and agents and have jobs sort of coordinated, maybe round-robin through multiple agents? we use teh concept of partitions in case you are putting agents/pollers on overlapping IP space, we know how to deal with these in the database. we also need to support multi tenancy, i think elixir contexts/scopes are built for all of that? heres some of our documentation, its a bit outdated now.. i think we might be due for a refresh here, only thing i dont know is how to basically make a network of microservices (beam instances) that are all talking to each other as one big brain? it would be nice to be able to like open the elixir/erts debugger or whatever, and look at every actor in ERTS ``` **Describe the solution you'd like** ``` Rewriting the Go-based poller into Elixir while leveraging the **Oban** scheduler is a natural progression for ServiceRadar, especially as you move the Core API to Phoenix. Elixir's concurrency model (the BEAM) is practically purpose-built for the "one big brain" network observability pattern you are describing. Here is a strategic look at how you can architect this rewrite to achieve a distributed, multi-tenant, and coordinated polling brain. ### 1. Oban as the "Orchestrator" vs. "Executor" While Oban is fantastic, you should distinguish between **scheduling** the job and **executing** the network I/O. * **The Orchestrator (Core):** The Phoenix app uses **Oban** to manage the state of the jobs. Since you already use TimescaleDB/Postgres, Oban is perfect here. You can use **Oban Queues** to segment work (e.g., a `high_priority_polling` queue and a `background_discovery` queue). * **The Executors (Distributed Pollers):** Instead of the pollers just being gRPC servers waiting for work, they become BEAM nodes in a cluster. * **Coordination:** Use **Oban Pro's** "Smart Engine" or simple queue partitioning to ensure that only certain nodes pick up certain jobs. Or, more simply, use Oban to insert a job, and have the poller nodes subscribe to that specific queue. ### 2. Making a "Big Brain" with Distributed Elixir To make multiple BEAM instances act as one, you don't necessarily need gRPC between Elixir nodes. You can use **Distributed Erlang**. * **libcluster:** Use this library to automatically form a cluster between your Core and your Elixir Pollers (via DNS, Kubernetes labels, or Gossip). * **Horde:** This is the "secret sauce" for your "big brain." * **Horde.DynamicSupervisor:** Allows you to start a polling process on *any* node in the cluster. If a poller node dies, Horde automatically restarts that polling process on a different healthy node. * **Horde.Registry:** A distributed, local-first registry so any node can find the process responsible for "Device A" without knowing which physical server it’s on. * **Partitioning:** You can create one `Horde.Supervisor` per **Partition**. This aligns with your database partitions for overlapping IP spaces, ensuring that the "Partition A" brain only talks to "Partition A" agents. ### 3. Multi-Tenancy and Contexts Elixir's **Contexts** (e.g., `ServiceRadar.Inventory`, `ServiceRadar.Monitoring`) are perfect for this. * **Scopes:** In your `Inventory` context, every query should be scoped by `tenant_id` and `partition_id`. * **Ecto Multi-tenancy:** If you use Postgres schemas for multi-tenancy, Ecto's `prefix` support is first-class. If it's column-based, you can use a base query that always applies `where tenant_id = ^current_tenant_id`. ### 4. Coordinated Job Distribution (Round-Robin) If you want to round-robin jobs through multiple agents: 1. **Oban** picks up a job for a subnet. 2. The job code queries the **Service Registry** (the new tables proposed in your docs) to find available agents for that partition. 3. The poller uses a **Partitioned Worker Pool** or a simple `Enum.at(agents, Integer.mod(job_id, length(agents)))` to pick an agent. 4. The communication down to the **Go Agents** can remain gRPC (via `grpc-elixir`), as those agents are lightweight and likely need to stay in Go for low-level socket/packet access (ICMP/TCP sweeps). ### 5. Looking into the Brain (Observability) You mentioned wanting to use a debugger to see every actor. This is where Elixir shines: * **Phoenix LiveDashboard:** Since you are using Phoenix, you get a web-based "Observer." You can see memory usage, process counts, and NATS connection status in real-time. * **Observer:** On your dev machine, you can run `:observer.start()` and connect it to a remote production node to see the supervision tree of the entire cluster visually. * **Visualizing Actors:** If you implement your polling jobs as **GenServers** managed by Horde, you can see every individual "device poller" as a process in the tree. You can even "kill" one manually from the UI and watch the BEAM immediately restart it. ### Proposed Revised Architecture ```mermaid graph TD subgraph "Core (Phoenix + Oban)" WEB[Web UI / LiveView] OBAN[Oban Scheduler] REG[Service Registry] end subgraph "BEAM Cluster (Distributed Erlang)" P1[Elixir Poller Node 1] P2[Elixir Poller Node 2] end subgraph "Edge Site (Partition A)" A1[Go Agent 1] A2[Go Agent 2] end OBAN -- Assigns Job --> P1 P1 -- Distributed Registry --> P2 P1 -- gRPC --> A1 P1 -- gRPC --> A2 A1 -- Scans/Polls --> Devices[Network Devices] %% Collectors Flow Collectors[Netflow/Syslog] -- NATS --> DataSvc[DataSvc/NATS] DataSvc -- Stream --> P2 ``` ### Recommendation for the Rewrite 1. **Port the Core Registry first:** Implement the `pollers_registry`, `agents_registry`, and `checkers_registry` tables in CNPG as outlined in your `service-registry-design.md`. 2. **Poller as a BEAM Node:** Instead of a standalone Go binary, make the Poller a small Elixir application. 3. **Oban for Scheduling:** Use Oban to handle the persistence of "When should we poll?" 4. **Horde for Distribution:** Use Horde to decide "Which node is actually running the check right now?" 5. **Keep Agents in Go:** Your Go Agents are already efficient at TCP/ICMP sweeps and proxying checkers. No need to rewrite those yet—just have the Elixir Poller talk to them over gRPC. This approach gives you the **"One Big Brain"** feel (cluster-wide process management) with the **industrial-strength scheduling** of Oban and the **low-level performance** of your existing Go agents. ``` ``` Using **Ash Framework** for ServiceRadar would be a significant architectural "force multiplier," especially given your requirements for **multi-tenancy, identity reconciliation, and a "distributed brain."** Ash is a declarative, resource-based framework for Elixir. Instead of writing boilerplate CRUD and complex scoping logic, you define "Resources" (Devices, Pollers, Agents) and the rules that govern them. Here is how Ash would specifically solve your ServiceRadar rewrite challenges: ### 1. First-Class Multi-Tenancy (The "Partition" Problem) Your docs mention the need to support multi-tenancy and overlapping IP spaces via partitions. In a standard Elixir app, you'd have to remember to add `where tenant_id = ...` to every query. * **Ash Solution:** Ash has **native multi-tenancy** built-in. You define a resource as `multitenant`. When the Poller or Web UI requests data, Ash automatically enforces the tenant/partition scope at the engine level. It is nearly impossible to accidentally leak data between tenants. ### 2. AshOban: Declarative Scheduling Since you are already considering Oban, **AshOban** is a game-changer. * **The Pattern:** You define an "Action" on your `Monitoring.Check` resource (e.g., `:run_check`). * **The Integration:** You tell AshOban to trigger that action on a schedule or when a certain event happens. Ash handles the transaction, the retries, and the state changes of the check (e.g., `pending` -> `active` -> `failed`). * **Benefit:** This solves your "coordinated jobs" requirement by letting Oban handle the queueing while Ash handles the business logic and database state. ### 3. Identity Reconciliation (The "Canonical ID" Problem) Your Go core currently does complex "device identity and reconciliation." * **Ash Solution:** You can use **Ash Identities**. You can define that a `Device` is unique based on a combination of `(mac, partition_id)` or `(armis_id, tenant_id)`. * **Calculations:** Ash allows you to define "Calculations." You could have a calculation called `canonical_status` that runs the reconciliation logic you currently have in Go, but exposes it as a simple field that can be sorted and filtered in your Phoenix LiveView. ### 4. Aggregating the "Distributed Brain" If you want multiple Elixir nodes to act as one, Ash's **Actions** and **Relationships** help model the cluster: * You could have a resource called `ServiceRadar.Topology.Agent`. * A relationship could be defined: `has_many :checkers, ServiceRadar.Topology.Checker`. * When you query an Agent on Poller Node A, Ash can be configured to fetch associated data or trigger actions across the BEAM cluster using its internal dispatching. ### 5. AshGraphql & AshJsonApi (The gRPC/API Bridge) Your current system relies heavily on gRPC. While Elixir handles gRPC well, Ash can automatically generate a **GraphQL** or **JSON:API** layer from your resources. * This makes your new Phoenix Web UI extremely fast to build because the "Registry" logic you defined in Ash is automatically available to the frontend. ### 6. The "Outdated Docs" Problem You mentioned your documentation is a bit outdated. * **Ash Solution:** Ash resources are **self-documenting**. Because you define the attributes, relationships, and actions in a DSL (Domain Specific Language), you can generate high-quality documentation (and even Mermaid diagrams) directly from the code. The "code is the spec." --- ### Comparison: Standard Elixir vs. Elixir + Ash | Feature | Standard Elixir/Phoenix | Elixir + Ash Framework | | :--- | :--- | :--- | | **Multi-tenancy** | Manual `where` clauses / Ecto prefixes | Native, enforced at the resource level | | **Job Scheduling** | Manual Oban worker modules | Declarative `AshOban` actions | | **Identity Logic** | Hand-written reconciliation functions | Native `Identities` and `Changesets` | | **API Layer** | Manual Controllers/Resolvers | Auto-generated GraphQL/JSON:API/Admin UI | | **Registry Logic** | Proposed in `service-registry-design.md` | Defined as Ash Resources in days, not weeks | ### How to start the rewrite with Ash: 1. **Define your Domain:** Create an `Inventory` domain for Devices/Interfaces and a `Monitoring` domain for Pollers/Agents/Checkers. 2. **Model the Registry:** Take your `service-registry-design.md` and turn those SQL tables into **Ash Resources**. 3. **Plug in Oban:** Use `AshOban` to handle the recurring "Check Heartbeat" and "Run Sweep" tasks. 4. **Use AshAdmin:** Ash comes with a "free" Admin UI (`AshAdmin`). You can immediately see and edit all your pollers/agents/tenants in a GUI while you are still building the main Phoenix LiveView frontend. **Verdict:** If your goal is to build a "big brain" that handles complex identities, multi-tenancy, and high-scale polling coordination, **Ash is the correct tool.** It will reduce the amount of Go code you are porting by roughly 60-70% because the framework handles the "plumbing" of a Service Oriented Architecture for you. ``` ## what about ocsf ``` Yes, absolutely. In fact, Ash is arguably the best tool for this because it allows you to separate your **business logic** (how ServiceRadar behaves) from your **data layout** (the OCSF schema). You don't have to choose between "The Ash Way" and "The OCSF Way." Ash's `AshPostgres.DataLayer` is designed to map Elixir resources to any existing database schema, no matter how rigid or standardized it is. Here is how you implement OCSF within Ash for ServiceRadar: ### 1. Mapping Elixir Attributes to OCSF Columns OCSF uses specific field names (e.g., `type_id`, `uid`, `device_id`). If your database already has these columns, you use the `source` option in Ash to point your attributes to the correct OCSF field names. ```elixir defmodule ServiceRadar.Inventory.Device do use Ash.Resource, data_layer: AshPostgres.DataLayer postgres do table "ocsf_device_inventory" # Your existing OCSF table repo ServiceRadar.Repo end attributes do uuid_primary_key :id, source: :uid # Map OCSF 'uid' to Elixir ':id' attribute :hostname, :string, source: :hostname attribute :ip, :string, source: :ip # OCSF specific integer IDs attribute :type_id, :integer, source: :type_id attribute :risk_level_id, :integer, source: :risk_level_id end end ``` ### 2. Handling Nested OCSF Objects (JSONB) OCSF relies heavily on nested structures like `os`, `hw_info`, and `network_interfaces`. Ash handles these perfectly using **Embedded Resources**. This allows you to have full validation and typing on the OCSF JSON blobs. ```elixir # Define the nested OCSF OS structure defmodule ServiceRadar.OCSF.OSInfo do use Ash.Resource, data_layer: :embedded attributes do attribute :name, :string attribute :version, :string attribute :type, :string # e.g. Linux, Windows end end # In your main Device resource: attributes do attribute :os, ServiceRadar.OCSF.OSInfo, source: :os # Maps to JSONB column end ``` ### 3. SRQL vs. Ash Query You currently have a custom **SRQL** (ServiceRadar Query Language) engine. Ash effectively provides a standardized version of this. * **The Benefit:** You can map SRQL queries directly to Ash Filters. * If a user types `in:devices status:online`, you can translate that into an Ash query: `Device |> Ash.Query.filter(status == :online)`. * Because Ash understands the OCSF mapping, it will generate the correct SQL: `SELECT * FROM ocsf_device_inventory WHERE status_id = 1`. ### 4. TimescaleDB Integration (Hypertables) Since your OCSF data (especially `flows` and `activity`) lives in TimescaleDB hypertables, Ash handles this transparently. * Ash doesn't care that a table is a hypertable; it treats it as a standard Postgres table. * You can define Ash **Calculations** to handle Timescale-specific functions like `time_bucket`. ```elixir calculations do calculate :five_min_bucket, :utc_datetime, expr( fragment("time_bucket('5 minutes', ?)", timestamp) ) end ``` ### 5. Multi-Tenancy & Partitions OCSF doesn't have a native "Partition" concept for overlapping IP spaces, but ServiceRadar does. You can use Ash's **attribute-based multi-tenancy** to keep OCSF clean while enforcing ServiceRadar rules. ```elixir multitenancy do strategy :attribute attribute :tenant_id # You can also scope by partition_id here end ``` ### 6. The "One Big Brain" Debugger Because Ash resources are highly structured, when you open the **Elixir Observer** or a debugger, you won't just see raw maps or tuples. You will see formal `ServiceRadar.Inventory.Device` records. If you use **AshGraphql**, you can also use **Apollo Studio** or **GraphQL Voyager** to visually see your entire OCSF-aligned "brain" and how every poller, agent, and device relates to one another. ### Summary: Why Ash + OCSF is a win: 1. **Strict Typing:** OCSF is strict; Ash is strict. They are a natural fit. 2. **No Data Migration:** You keep your current TimescaleDB tables exactly as they are. Ash just sits on top of them as a "Logic Layer." 3. **API Speed:** You get a GraphQL/JSON:API that speaks OCSF out of the box. 4. **Identity:** Ash **Changesets** provide the perfect place to run your Go-style "Identity Reconciliation" logic before data hits the database. **Does this align with how you were envisioning the OCSF integration, or are you worried about the complexity of mapping the more obscure OCSF classes?** ``` **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here.

mfreeman451 commented

2026-03-28 04:27:40 +00:00

Author

Owner

Imported GitHub comment.

Original author: @mfreeman451
Original URL: https://github.com/carverauto/serviceradar/issues/2205#issuecomment-3713810315
Original created: 2026-01-06T09:07:34Z

closing as completed

Imported GitHub comment. Original author: @mfreeman451 Original URL: https://github.com/carverauto/serviceradar/issues/2205#issuecomment-3713810315 Original created: 2026-01-06T09:07:34Z --- closing as completed

mfreeman451 closed this issue

2026-03-28 04:27:40 +00:00