Q: How to convert nifi dataflow to parquet I am trying to convert dataflow to parquet, but I am not able to achieve it. I am using solr dataimport to load parquet files. dataflow seems to have a hierarchical structure, which is not the case with parquet, so the dataflow schema is not the same as the parquet schema. So how to convert the dataflow records to the parquet record to map it with the corresponding values? A: You can get the input dataflow with using the NiFi Metrics API, and then use the Pyspark API to convert it to parquet. Here is an example of what I have done in my blog: import time import os from datetime import datetime from pyspark.sql import SparkSession from pyspark.sql.types import * from pyspark.sql.functions import * from pyspark.sql.functions import col from pyspark.sql.functions import udf from pyspark.sql.functions import window # clean up schema. schema = StructType([ StructField("_id", IntegerType(), True), StructField("text", StringType(), True), StructField("author", StringType(), True), StructField("takenAt", StringType(), True) ]) # clean up sample data. df = sc.textFile("your/path/to/data/schema.json") # create schema schema = StructType(schema.fields + [StructField("_value", StringType(), True)]) # create input stream nifi = NiFiMetrics.getInstance() if not nifi.isEnabled(): print("NiFi Metrics is not enabled. This will make NiFi fail the NiFi Metrics client") else:




