COLLECT_SET Function - Hive to Trino Migration

In Apache Hive the COLLECT_SET is an aggregate function that allows you to collect unique values from multiple rows into array. In Trino you can use ARRAY_AGG(DISTINCT) function:

Apache Hive:

  -- To avoid error: IllegalArgumentException Size requested for unknown type: java.util.Collection
  SET hive.map.aggr = false;
 
  -- Duplicate value 'red' will be removed
  SELECT  COLLECT_SET(color)
  FROM
  (
   SELECT 'red'   AS color
   UNION ALL
   SELECT 'white' AS color
   UNION ALL
   SELECT 'black' AS color
   UNION ALL
   SELECT 'red'   AS color
  ) t;
  # ["red","white","black"]

Trino:

  -- Duplicate value 'red' will be removed
  SELECT  ARRAY_AGG(DISTINCT color)
  FROM
  (
   SELECT 'red'   AS color
   UNION ALL
   SELECT 'white' AS color
   UNION ALL
   SELECT 'black' AS color
   UNION ALL
   SELECT 'red'   AS color
  ) t;
  # [black, white, red]

For more information, see Apache Hive to Trino Migration.