Explore

Updates

FORMINDEX_10_13_2024

Daniel Friedman 0000-0001-6232-9096 October 13, 2024

This is published open source at:

https://zenodo.org/records/13927034⁠

Situation

FORMINDEX_10-08-2024⁠

it was identified that the early build of

FORMINDEX scripts⁠

could successfully read in & visualize the FORMIS export Bibtex format. Sanford sent an initial test set of citations, and then the full July 2024 FORMIS export.

Here is the sample of 316 citations

⁠

FORMIS 2024(July)-Bibtext_316_sample.txt

558.3 kB

⁠

Here is the full Bibtex export

⁠

FORMIS 2024(July)-Bibtex.txt

98.6 MB

⁠

Here is the reformat of Bibtex into JSON.

⁠

FORMIS_2024_July_Bibtex.json

99.7 MB

⁠

The goals of this phase of FORMINDEX were to:

Demonstrate the ability of analysis and visualization methods to scale to the full size of FORMIS (~80,000 records) as well as to arbitrary subsets of citations (targeted bibliographies).

Speculative and Realistically explore the use of synthetic intelligence and augmented multisensory methods for Myrmecology.

Methods & Results

Methods & Results are separated into two main sections, reflecting the two goals above.

The first section “FORMIS analysis” describes the initial intra-FORMIS analysis methods performed & their location in the open source code.

The second section “Generative AI methods” describes three applications of Generative AI methods.

FORMIS analysis

Ingress and Target Bibliographies

The

Read_in_FORMIS.py⁠

script reads in the Bibtex and stores all records in JSON format.

The

Generate_Target_Bibliographies.py⁠

will generate subsets of all records, based upon the inclusion of key terms. Here there is an arbitrary indenting into 3 rows, so that the first row is a clade name, second row is Ant topics/terms, and third row is places.

TARGET_BIBLIOGRAPHIES = [

"Formica", "Camponotus", "Myrmica", "Pheidole", "Pogonomyrmex", "P. barbatus", "Monomorium", "Messor", "Cataglyphis",

"Foraging", "Bioenergetics", "Myrmecophiles", "Ant-plant", "Dopamine", "Serotonin", "Art", "Robotics",

"Hawaii", "India", "Mexico", "Brazil", "California", "Intelligence"

]

Targeted bibliographies are output to:

https://github.com/docxology/FORMINDEX/tree/main/Targeted_Bibliographies⁠

⁠

Visualization

⁠

Visualize_FORMIS.py⁠

is a set of visualization methods that apply to the full FORMIS JSON database, and any Target Bibliographies in the folder.

The visualization outputs are sub-folders of

https://github.com/docxology/FORMINDEX/tree/main/Visualizations⁠

Here is the

folder for all-FORMIS visualizations⁠

⁠

Distribution of record types

⁠

Histogram of record types

⁠

Top authors

⁠

Publications per year

⁠

Co-authorship heatmap

⁠

Top venues

⁠

Venues through time

⁠

Top locations

⁠

Top topics⁠

Title word cloud

⁠

Abstract word cloud

⁠

Here are the outputs for

Foraging⁠

targeted bibliography.

Distribution of record types

⁠

Histogram of record types

⁠

Top publishing venues

⁠

Top authors

⁠

Author co-authorship heatmap

⁠

Publications per year (not assumed to be correct/total number, especially after 1996, for certain languages, etc).

⁠

Top locations of publication

⁠

Title word cloud

⁠

Abstracts word cloud

⁠

Top topics⁠

and relevant records from term analysis.

Generative AI methods

Three Generative AI methods were used: NotebookLM, OpenAI,and Perplexity.

NotebookLM

Google’s NotebookLM was used

https://notebooklm.google.com/⁠

to produce conversational podcasts on target sub-FORMIS-scale bibliographies. The target bibliography in JSON format was uploaded to a new Notebook. Then unless errors arose, I clicked “Generate Podcast”. RevidAI

https://revid.ai/⁠

was used on the audio files to add video & captions.

Several short (~10-20 minutes) podcasts for target sets of FORMIS citations, are hosted on my YouTube channel.

⁠

Dopamine⁠

Bioenergetics⁠

⁠

Red Harvester Ant ("

P. barbatus⁠

⁠

California⁠

⁠

Intelligence⁠

⁠

OpenAI API

OpenAI LLM API was used for summarization and translation, all methods in

https://github.com/docxology/FORMINDEX/tree/main/LLM_Methods⁠

⁠

Script 1⁠

concatenates each of the target bibliographies with a general summarization prompt.

⁠

Script 2⁠

uses an LLM call via OpenAI API to send the results of script 1 (the pro-summary) and receive the summary output.

⁠

Script 3⁠

takes in literature summaries and translates into target non-English languages.

Outputs are all in

https://github.com/docxology/FORMINDEX/tree/main/LLM_Methods/Inputs_and_Outputs⁠

The total LLM cost, using GPT-4o-mini, per summary & translation depends on the length of input and output, probably on the order of one-tenth to several cents USD in October 2024 (they are cheaper/different methods as well)

For example, here is the Myrmecophile targeted literature summary, translated into Hindi.

⁠

https://github.com/docxology/FORMINDEX/blob/main/LLM_Methods/Inputs_and_Outputs/Translated_Summaries/Hindi/translated_Hindi_Myrmecophiles_bibliography_summary.md⁠

⁠