Updates

icon picker
FORMINDEX_10_13_2024

Daniel Friedman 0000-0001-6232-9096 October 13, 2024
This is published open source at:

Situation

In it was identified that the early build of could successfully read in & visualize the FORMIS export Bibtex format. Sanford sent an initial test set of citations, and then the full July 2024 FORMIS export.
Here is the sample of 316 citations
FORMIS 2024(July)-Bibtext_316_sample.txt
558.3 kB
Here is the full Bibtex export
FORMIS 2024(July)-Bibtex.txt
98.6 MB
Here is the reformat of Bibtex into JSON.
FORMIS_2024_July_Bibtex.json
99.7 MB
The goals of this phase of FORMINDEX were to:
Demonstrate the ability of analysis and visualization methods to scale to the full size of FORMIS (~80,000 records) as well as to arbitrary subsets of citations (targeted bibliographies).
Speculative and Realistically explore the use of synthetic intelligence and augmented multisensory methods for Myrmecology.

Methods & Results

Methods & Results are separated into two main sections, reflecting the two goals above.
The first section “FORMIS analysis” describes the initial intra-FORMIS analysis methods performed & their location in the open source code.
The second section “Generative AI methods” describes three applications of Generative AI methods.

FORMIS analysis

Ingress and Target Bibliographies

The script reads in the Bibtex and stores all records in JSON format.
The will generate subsets of all records, based upon the inclusion of key terms. Here there is an arbitrary indenting into 3 rows, so that the first row is a clade name, second row is Ant topics/terms, and third row is places.
TARGET_BIBLIOGRAPHIES = [
"Formica", "Camponotus", "Myrmica", "Pheidole", "Pogonomyrmex", "P. barbatus", "Monomorium", "Messor", "Cataglyphis",
"Foraging", "Bioenergetics", "Myrmecophiles", "Ant-plant", "Dopamine", "Serotonin", "Art", "Robotics",
"Hawaii", "India", "Mexico", "Brazil", "California", "Intelligence"
]

Visualization

is a set of visualization methods that apply to the full FORMIS JSON database, and any Target Bibliographies in the folder.
The visualization outputs are sub-folders of .
Distribution of record types
image.png
Histogram of record types
image.png
Top authors
image.png
Publications per year
image.png
Co-authorship heatmap
image.png
Top venues
image.png
Venues through time
image.png
Top locations
image.png
Title word cloud
image.png
Abstract word cloud
image.png

Here are the outputs for targeted bibliography.
Distribution of record types
image.png
Histogram of record types
image.png
Top publishing venues
image.png
Top authors
image.png
Author co-authorship heatmap
image.png

Publications per year (not assumed to be correct/total number, especially after 1996, for certain languages, etc).
image.png
Top locations of publication
image.png
Title word cloud
image.png
Abstracts word cloud
image.png
and relevant records from term analysis.

Generative AI methods

Three Generative AI methods were used: NotebookLM, OpenAI,and Perplexity.

NotebookLM

Google’s NotebookLM was used to produce conversational podcasts on target sub-FORMIS-scale bibliographies. The target bibliography in JSON format was uploaded to a new Notebook. Then unless errors arose, I clicked “Generate Podcast”. RevidAI was used on the audio files to add video & captions.
Several short (~10-20 minutes) podcasts for target sets of FORMIS citations, are hosted on my YouTube channel.
""

Red Harvester Ant ("")


OpenAI API

OpenAI LLM API was used for summarization and translation, all methods in
concatenates each of the target bibliographies with a general summarization prompt.
uses an LLM call via OpenAI API to send the results of script 1 (the pro-summary) and receive the summary output.
takes in literature summaries and translates into target non-English languages.
The total LLM cost, using GPT-4o-mini, per summary & translation depends on the length of input and output, probably on the order of one-tenth to several cents USD in October 2024 (they are cheaper/different methods as well)

For example, here is the Myrmecophile targeted literature summary, translated into Hindi.
image.png

Perplexity API

LLM API was used for internet-enabled Myrmecological augmented inquiries.
Prompts are listed here:
image.png
The short_name will be used in the output file naming
iterates over Prompts. For now, enter your Perplexity API in plaintext at line 17 in order to use the script.
image.png
There is customizable system prompt:
image.png
captures the inputs, outputs, and timings. It takes about 10-40 seconds per prompt, and the number of returned lines is ~20-200
image.png

All outputs from Perplexity are in
Just to give 3 examples:
Career meta-analysis of
image.png

Next steps

Communicate with FORMIS stakeholders and see if there is any feedback or suggestions for useful analyses/visualizations.
FORMIS analyses
Improve analysis and visualization on the FORMIS dataset.
Publish an open source report to describe the tools and analyses available at . We can continue development at for better long-term stewardship.
Beyond FORMIS July 2024 snapshot.
Add more recent literature using literature engine APIs.
Integrate bibliographic records with NCBI species ID: . Later this will be a key identifier for the broader integrative effort.
Perplexity searches for all species groups.
Integration with fabric and other coordination mechanisms ()

..-. --- .-. -- .. -. -.. . -..-

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.