Skip to content

Commit

Permalink
Merge pull request #174 from eye-on-surveillance/main
Browse files Browse the repository at this point in the history
release: citations fixes
  • Loading branch information
marvinmarnold authored Dec 6, 2023
2 parents abd3ef1 + b9232d1 commit 36e9b65
Show file tree
Hide file tree
Showing 104 changed files with 21,687 additions and 27 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ raw/
txt/
*.csv
package-lock.json
package.json
package.json
*.DS_Store
4 changes: 2 additions & 2 deletions packages/backend/src/cache/faiss_index_general.dvc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: 3878f1ecec272cc887ded3697602a653.dir
size: 89147118
- md5: bfa4f1431d21d6972594243fae97f037.dir
size: 119112746
nfiles: 2
hash: md5
path: faiss_index_general
4 changes: 2 additions & 2 deletions packages/backend/src/cache/faiss_index_in_depth.dvc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: f3f1f964db26296ca3f600c2c092bea8.dir
size: 89147118
- md5: 67c1f4c02de13c682a8136fe6f54beea.dir
size: 119112746
nfiles: 2
hash: md5
path: faiss_index_in_depth

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{
"messages": [
{
"page_content": "- Topic: Special City Council Meeting\n- Summary: The special city council meeting was held on Monday, January 24, 2022, with five members present and constituting a quorum.\n- Ordinance Number: N/A\n- Votes Summary:\n - Vote 1: N/A - (5 present, 2 absent)\n- Decision/Key Actions: Quorum established for the special meeting\n- Tags/Keywords: City Council, Quorum, Special Meeting\n- UID: f883c863-6bc8-4a13-9241-85f66d8d4111",
"uid": "f883c863-6bc8-4a13-9241-85f66d8d4111",
"publish_date": "1-24-2022"
},
{
"page_content": "- Topic: Presentation - A Working Dialogue with Criminal Justice System Stakeholders\n- Summary: The New Orleans City Council held a presentation where several stakeholders from the criminal justice system, including Gary Sells, Tanyaka B. Cline, Lisa Tennenbaum, and others, provided input and engaged in dialogue.\n- Ordinance Number: N/A\n- Votes Summary: N/A\n- Decision/Key Actions: The presentation was for informational purposes and did not involve a vote.\n- Tags/Keywords: Criminal Justice, Stakeholders, Dialogue\n- UID: 5c777703-d0c5-4ebc-87aa-3cca9b9c1cba\n\nPlease note that the provided link is not accessible and may need to be verified for accurate information.",
"uid": "5c777703-d0c5-4ebc-87aa-3cca9b9c1cba",
"publish_date": "1-24-2022"
},
{
"page_content": "- Topic: Motion to Suspend Rule 30\n- Summary: The motion to suspend Rule 30 was introduced by King and seconded by Harris. The motion to suspend the rules passed with 7 YEAS and 0 NAYS.\n- Ordinance Number: N/A\n- Votes Summary:\n Vote 1: Passed - (7 YEAS, 0 NAYS, 0 ABSTAIN, 0 ABSENT)\n- Decision/Key Actions: The motion to suspend Rule 30 was approved.\n- UID: b88bc882-aa3d-4479-8d02-1238120dfcac",
"uid": "b88bc882-aa3d-4479-8d02-1238120dfcac",
"publish_date": "1-24-2022"
},
{
"page_content": "- Topic: Conditional use permit for a neighborhood commercial establishment\n- Summary: The ordinance aims to establish a conditional use to permit a neighborhood commercial establishment in an HU-RM1 Historic Urban Multi-Family Residential District. The specific location is Square 486, Lot 5, in the First Municipal District, bounded by Thalia Street, South Roman Street, South Prieur Street, and Martin Luther King, Jr. Boulevard. \n- Ordinance Number: CAL. NO. 33,608\n- Votes Summary:\n - Motion to Suspend the Rules: Passed - (7 YEAS, 0 NAYS, 0 ABSTAIN, 0 ABSENT)\n- Decision/Key Actions: The motion to suspend the rules to introduce the ordinance on first reading passed. The ordinance was introduced and laid over as required by law, with a 90-day deadline of 4/6/22.\n- UID: ec959e9e-59b1-45ab-87ea-4d9d4957df2f",
"uid": "ec959e9e-59b1-45ab-87ea-4d9d4957df2f",
"publish_date": "1-24-2022"
},
{
"page_content": "- Topic: Adjournment Motion\n- Summary: Council member Harris seconded the motion to adjourn the meeting.\n- Ordinance Number: N/A\n- Votes Summary:\n Vote 1: Adjourn - 7 YEAS, 0 NAYS, 0 ABSTAIN, 0 ABSENT\n- Decision/Key Actions: The motion to adjourn the meeting passed unanimously.\n- Tags/Keywords: Adjournment, Motion, Meeting\n- UID: e4203df0-3dfb-4934-af9a-63de216225d0",
"uid": "e4203df0-3dfb-4934-af9a-63de216225d0",
"publish_date": "1-24-2022"
}
]
}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
5,904 changes: 5,904 additions & 0 deletions packages/backend/src/minutes_agendas_directory/Minutes 2021.json

Large diffs are not rendered by default.

4,729 changes: 4,729 additions & 0 deletions packages/backend/src/minutes_agendas_directory/Minutes 2022.json

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions packages/backend/src/preprocessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from langchain.chains import LLMChain, HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate
from langchain.vectorstores.faiss import FAISS
from langchain import OpenAI
from langchain.llms import OpenAI
from pathlib import Path
import shutil

Expand Down Expand Up @@ -43,6 +43,7 @@ def create_embeddings():
llm_chain=llm_chain_general,
base_embeddings=base_embeddings,
)

in_depth_embeddings = HypotheticalDocumentEmbedder(
llm_chain=llm_chain_in_depth, base_embeddings=base_embeddings
)
Expand Down Expand Up @@ -72,7 +73,7 @@ def create_db_from_minutes_and_agendas(doc_directory):

data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=15000, chunk_overlap=10000
chunk_size=4000, chunk_overlap=1000
)
docs = text_splitter.split_documents(data)
all_docs.extend(docs)
Expand Down Expand Up @@ -221,7 +222,7 @@ def create_db_from_youtube_urls_and_pdfs(
pc_docs = create_db_from_public_comments(pc_directory)
news_docs = create_db_from_news_transcripts(news_directory)

all_docs = fc_video_docs + cj_video_docs + news_docs + pc_docs
all_docs = fc_video_docs + cj_video_docs + news_docs + pc_docs + pdf_docs

db_general = FAISS.from_documents(all_docs, general_embeddings)
db_in_depth = FAISS.from_documents(all_docs, in_depth_embeddings)
Expand Down
1 change: 0 additions & 1 deletion packages/googlecloud/functions/getanswer/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

from dotenv import find_dotenv, load_dotenv
from inquirer import answer_query
from langchain.embeddings.openai import OpenAIEmbeddings
from helper import get_dbs
from api import RESPONSE_TYPE_GENERAL, RESPONSE_TYPE_DEPTH

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: 3878f1ecec272cc887ded3697602a653.dir
size: 89147118
- md5: bfa4f1431d21d6972594243fae97f037.dir
size: 119112746
nfiles: 2
hash: md5
path: faiss_index_general
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
outs:
- md5: f3f1f964db26296ca3f600c2c092bea8.dir
size: 89147118
- md5: 67c1f4c02de13c682a8136fe6f54beea.dir
size: 119112746
nfiles: 2
hash: md5
path: faiss_index_in_depth
7 changes: 3 additions & 4 deletions packages/googlecloud/functions/getanswer/inquirer.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,13 +171,10 @@ def get_indepth_response_from_query(df, db, query, k):
query = transform_query_for_date(query)

doc_list = db.similarity_search_with_score(query, k=k)
print(doc_list)
docs = sort_retrived_documents(doc_list)
docs_page_content = append_metadata_to_content(doc_list)

template = """
Documents: {docs}
Question: {question}
Expand All @@ -188,9 +185,10 @@ def get_indepth_response_from_query(df, db, query, k):
elaborate on the implications and broader societal or community impacts of the identified issues relevant to {question};
investigate any underlying biases or assumptions present in the city council's discourse or actions relevant to {question}.
Summarize your answer from the analysis regarding {question} into one cohesive paragraph.
The final output should be in paragraph form without any formatting, such as prefixing your points with "a.", "b.", or "c."
The final output should not include any reference to the model's active sorting by date.
Documents: {docs}
"""

prompt = PromptTemplate(
Expand All @@ -200,6 +198,7 @@ def get_indepth_response_from_query(df, db, query, k):

chain_llm = LLMChain(llm=llm, prompt=prompt)
responses_llm = chain_llm.run(question=query, docs=docs_page_content, temperature=1)
print(responses_llm)

return process_responses_llm(responses_llm, docs)

Expand Down
4 changes: 2 additions & 2 deletions packages/googlecloud/functions/getanswer/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ functions-framework
flask
google-cloud-error-reporting
python-dotenv
langchain==0.0.330
openai==0.28.1
langchain
openai
google-api-python-client # Google API
google-search-results # SerpAPI
youtube-transcript-api
Expand Down
10 changes: 10 additions & 0 deletions packages/web/components/NewQuery.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,16 @@ export default function NewQuery() {
}}
/>
</div>
<button
className={`w-full rounded-lg md:w-1/2 ${
isProcessing ? "bg-primary cursor-wait" : "bg-primary"
} p-2 text-2xl text-blue`}
type="submit"
disabled={isProcessing}
>
Get answer from Sawt
</button>

</form>

<p className="text-left font-light">
Expand Down
16 changes: 8 additions & 8 deletions packages/web/yarn.lock
Original file line number Diff line number Diff line change
Expand Up @@ -159,15 +159,10 @@
dependencies:
glob "7.1.7"

"@next/swc-linux-x64-gnu@13.4.4":
"@next/swc-darwin-arm64@13.4.4":
version "13.4.4"
resolved "https://registry.npmjs.org/@next/swc-linux-x64-gnu/-/swc-linux-x64-gnu-13.4.4.tgz"
integrity sha512-PX706XcCHr2FfkyhP2lpf+pX/tUvq6/ke7JYnnr0ykNdEMo+sb7cC/o91gnURh4sPYSiZJhsF2gbIqg9rciOHQ==

"@next/[email protected]":
version "13.4.4"
resolved "https://registry.npmjs.org/@next/swc-linux-x64-musl/-/swc-linux-x64-musl-13.4.4.tgz"
integrity sha512-TKUUx3Ftd95JlHV6XagEnqpT204Y+IsEa3awaYIjayn0MOGjgKZMZibqarK3B1FsMSPaieJf2FEAcu9z0yT5aA==
resolved "https://registry.npmjs.org/@next/swc-darwin-arm64/-/swc-darwin-arm64-13.4.4.tgz"
integrity sha512-xfjgXvp4KalNUKZMHmsFxr1Ug+aGmmO6NWP0uoh4G3WFqP/mJ1xxfww0gMOeMeSq/Jyr5k7DvoZ2Pv+XOITTtw==

"@nodelib/[email protected]":
version "2.1.5"
Expand Down Expand Up @@ -1393,6 +1388,11 @@ fs.realpath@^1.0.0:
resolved "https://registry.npmjs.org/fs.realpath/-/fs.realpath-1.0.0.tgz"
integrity sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==

fsevents@~2.3.2:
version "2.3.2"
resolved "https://registry.npmjs.org/fsevents/-/fsevents-2.3.2.tgz"
integrity sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==

function-bind@^1.1.1:
version "1.1.1"
resolved "https://registry.npmjs.org/function-bind/-/function-bind-1.1.1.tgz"
Expand Down
File renamed without changes.
File renamed without changes.
52 changes: 52 additions & 0 deletions packages/wrangle/summaries/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import os
from summary_model import (
pdf_to_images,
extract_text_from_image,
save_ocr_to_json,
load_and_split,
extract_date_from_filename,
summarize_text,
save_summaries_to_json,
concatenate_jsons,
)


def main():
documents_directory = "../../backend/src/minutes_agendas_directory/2021/pdfs"
output_json_dir = "../../backend/src/minutes_agendas_directory/2021/json"

os.makedirs(output_json_dir, exist_ok=True)

for pdf_filename in os.listdir(documents_directory):
if pdf_filename.endswith(".pdf"):
output_json_path = os.path.join(
output_json_dir, f"{os.path.splitext(pdf_filename)[0]}.json"
)

if os.path.exists(output_json_path):
print(f"Skipping {pdf_filename}, output already exists.")
continue

pdf_path = os.path.join(documents_directory, pdf_filename)
publish_date = extract_date_from_filename(pdf_filename)
ocr_json_path = (
"../../backend/src/minutes_agendas_directory/2022/json/ocr_text.json"
)

save_ocr_to_json(pdf_path, ocr_json_path, publish_date)
chunks = load_and_split(ocr_json_path)
summaries = summarize_text(chunks, publish_date)

save_summaries_to_json(summaries, output_json_dir, pdf_filename)
os.remove(ocr_json_path)

input_json_directory = "../../backend/src/minutes_agendas_directory/2021/json"
output_json_concat_path = (
"../../backend/src/minutes_agendas_directory/Minutes 2021.json"
)
concatenate_jsons(input_json_directory, output_json_concat_path)
print(f"Summaries saved in directory: {output_json_dir}")


if __name__ == "__main__":
main()
Loading

1 comment on commit 36e9b65

@vercel
Copy link

@vercel vercel bot commented on 36e9b65 Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.