Code for Pat's dissertation.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Jeff Yates ecd02eea4f
added .gitignore
2 years ago
.gitignore added .gitignore 2 years ago fixing README table part 2 2 years ago
bins.csv added testing files 3 years ago
sorter.rb added code to catch error in timestamp 2 years ago
tester.txt added testing files 3 years ago


Code for Pat's dissertation.

sorter.rb usage


sorter.rb takes the following options:

option usage
-f, --file the name of the input file
-b, --bin-file the name of the bin csv file
-t, --type what type of splitting to do, can be "iat" or "pn"


sorter.rb will generate two files:

  • [filename]-out.json
  • [filename]-out.csv

Both files contain the same data in json or csv format.

Type options

The program has two filtering modes:


This mode grabs all text from the input file in between PLOVEOPENING and PLOVECLOSING. It ignores all text before PLOVEOPENING and after PLOVECLOSING. It does not support multiple sections of text.


This mode grabs each section of text from the input file in between Narrative: and Signatures:. It supports multiple sections from a single input text file.


./sorter.rb --file tester.txt --bin-file bins.csv --type iat

The above command will run against tester.txt, count strings according to bins.csv, and process the input text in iat mode. It will create tester-out.json and tester-out.csv containing the output data.