[root@VLNX107011 hue]# hdfs fsck / Connecting to namenode via http://VLNX107011:50070 FSCK started by root (auth:SIMPLE) from /VLNX107011 for path / at Fri Dec 04 19:08:09 CST 2015 ................... Total size: 11837043630 B (Total open files size: 166 B) Total dirs: 3980 Total files: 3254 Total symlinks: 0 (Files currently being written: 2) Total blocks (validated): 2627 (avg. block size 4505916 B) (Total open file blocks (not validated): 2) Minimally replicated blocks: 2627 (100.0 %) Over-replicated blocks: 2253 (85.76323 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 2.9798248 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 10 Number of racks: 1 FSCK ended at Fri Dec 04 19:08:09 CST 2015 in 100 milliseconds The filesystem under path '/' is HEALTHY
File System Counters FILE: Number of bytes read=215 FILE: Number of bytes written=505 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1200 HDFS: Number of bytes written=274907 HDFS: Number of read operations=57 HDFS: Number of large read operations=0 HDFS: Number of write operations=11 Job Counters Launched map tasks=2 Launched reduce tasks=1 Other local map tasks=2 Total time spent by all maps in occupied slots (ms)=3664 Total time spent by all reduces in occupied slots (ms)=2492 TOTAL_LAUNCHED_UBERTASKS=3 NUM_UBER_SUBMAPS=2 NUM_UBER_SUBREDUCES=1 Map-Reduce Framework Map input records=2 Map output records=8 Map output bytes=82 Map output materialized bytes=85 Input split bytes=202 Combine input records=8 Combine output records=6 Reduce input groups=5 Reduce shuffle bytes=0 Reduce input records=6 Reduce output records=5 Spilled Records=12 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=65 CPU time spent (ms)=1610 Physical memory (bytes) snapshot=1229729792 Virtual memory (bytes) snapshot=5839392768 Total committed heap usage (bytes)=3087532032 File Input Format Counters Bytes Read=50 File Output Format Counters Bytes Written=41
select province,city, rank() over (partitionby province orderby people desc) rank, dense_rank() over (partitionby province orderby people desc) dense_rank, row_number() over(partitionby province orderby people desc) row_number from pcp groupby province,city,people;
To examine the internal structure and data of Parquet files, you can use the parquet-tools command that comes with CDH. Make sure this command is in your $PATH. (Typically, it is symlinked from /usr/bin; sometimes, depending on your installation setup, you might need to locate it under a CDH-specific bin directory.) The arguments to this command let you perform operations such as:
cat: 打印文件的全部内容信息到标准输出
head: 打印文件的头几行内容信息到标准输出
schema: 打印Parquet的schema
meta: 打印 file footer metadata, 包含 key-value properties (like Avro schema), compression ratios, encodings, compression used, and row group information.
caseINT96 => ParquetSchemaConverter.checkConversionRequirement( assumeInt96IsTimestamp, "INT96 is not supported unless it's interpreted as timestamp. " + s"Please try to set ${SQLConf.PARQUET_INT96_AS_TIMESTAMP.key} to true.") TimestampType